Jump to content

Retrospect crashes (and reboots the server)


Recommended Posts

I'm going quietly batty:

 

We are running Retrospect Server with five backup sets - four that rotate quarterly, one that rotates monthly, backing up a mixture of Win and Mac clients of varying ages.

 

- One (the newest) backup set will backup anything.

- One backup set will backup anything except Windows XP or OS X 10.4.7

- One backup set will backup anything except Windows XP

- One backup set will backup anything execpt Mac OS 10.3 or later

 

The errors only occur when running Retrospect and only occurr when running some but not all of the backup sets.

 

When a backup set encounters whatever flavor of client OS it doesn't like, Retrospect starts the backup, connects to the client, runs through the matching process, then reboots the server just as it would be about to start backing up.

 

Rebuilding a catalog sometimes produces a partial improvement but does not "fix" a flaky set. Drive and permissions verification has not turned up anything suspicious.

 

We can temporarily get around the problem by starting a new backup set, but the problem is too frequent to make this a practical or desirable long term work around.

 

The problem is not sensitive to client software version so far as we can tell.

 

The problem does not _appear_ to be linked to the number of files on a client or the size of the backup set, though there is a partial correlation to the age of the set.

 

The logs, retrospect or system, show nothing except a PMU -212 system error. So I spent a lot of time poking at the system hardware (none of which makes sense given the selective failure pattern anyway), and then the system software.

 

It's not the line supply (we've moved it from one conditioning UPS to another conditioning UPS on a different circuit)

It's not the port on the power strip (we've moved it and put a power flicker sensitive box server on the original point)

It's not the power cord (replaced)

It's not the PMU settings (reset)

It's not the PRAM (reset)

It's not the NVRAM (reset)

It's not the SCSI card (removed)

It's not the FC card (we have two - swapped them around, ran with one, ran with the other)

It's not the motherboard or power supply (problem has persisted accross two server boxes)

It's not the hard drive (replaced several times)

It's not the system as such (rebuilt from scratch repeatedly)

It's probably not the OS version (persists accross both 10.3 and 10.4)

It's not the Retrospect installation (see system rebuilds above, also a couple of clean installs onto an existing system)

It's probably not the Retrospect version (having persisted accross the last two updates)

It's probably not the FC switch(es) (pattern of failure isn't right)

It's probably not the Tape library (pattern of failure isn't right)

 

However, since replacing almost everything at least once has not helped, and since the problem is reproducibly tied to specific operations in Retrospect on specific backup sets, I'm still looking at Retrospect.

 

Turning off matching does not seem to solve the problem.

 

Any ideas as to what might be going on? Am I overlooking something?

 

TIA!

Link to comment
Share on other sites

waltr: watchdog/launchd might be involved, now that you mention it. At one time, the system used to hang under similar conditions, then it switched to rebooting which would be consistent with watchdog deciding the system was frozen... I will poke at this a bit, thanks!

 

rhwalker:

 

Currently it's running on a Dual Powermac G5 (2.3 GHz), 1.5GB RAM, workstation, 10.4.7, connected over FC (Apple) through a QLogic switch to a drive partition (1 drive) on an ADIC Scalar i2000 library.

Retrospect server version 6.1.126.

Clients on the Windows side are currently 7.0.107 almost exclusively (though the problem has definitely persisted accross several client verison upgrades), and on the Mac mostly 6.1.107 with a few 6.1.130 from our latest client version tests. The problem also occurs when backing up the local drive.

Client OS versions include Win2K, WinXPsp2, OS 9, OS X 10.1, 10.2.8, 10.3, 10.4.

 

The problem appears to date back to retrospect 5.x on a PowerMac G3 running 10.2, then 10.3, with a SCSI connection to an ADIC [200] library. I _think_ we switch from hangs to reboots after indexing sometime before we upgraded the server hardware, but may have been shortly afterwards (I need to do some note checking). Every time we rebuilt the server, moved the server, reinstalled or upgraded we think we'd solved it for a little while, and then find it was the same old story.

 

We have monitored memory usage, since we though that might be a problem, there is not a clear correlation between memory use and the reboot symptom (though some individual backup jobs do use up a significant proportion of the available memory).

 

Thank you both for such quick responses!

Link to comment
Share on other sites

Quote:

The problem also occurs when backing up the local drive.

 


 

It would probably be helpful to avoid red herring concerns such as client versions; this sounds like an issue with the hardware/software of the Retrospect machine, not with any outside backup clients.

 

> The problem appears to date back to retrospect 5.x on a PowerMac G3 running

>10.2, then 10.3, with a SCSI connection to an ADIC [200] library

 

It wasn't unusual back in the 10.2 days to see hangs on OS X using SCSI host adapters. And there have been reports of Watchdog restarting OS X servers since Retrospect 5.0 shipped (and before).

 

But this is some serious modern iron you're using. I'd suggest that you open a support incident with EMCInsignia about this; depending on community support for enterprise class hardware seems a bit inefficient.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...