Jump to content

Retrospect Server Crashes


Recommended Posts

I'm getting crashing/hangups on one of my Retrospect servers (Mac v 6.1.126) - it seems to be having problems backing up another one of my servers, and the log doesn't show anything helpful. The system drive on the server in question does back up successfully, but the data drive will get to 'copying' then hang the program. I have had to force-quit Retrospect three days this week, which is not good at all.

 

Does anyone have any ideas where I should start?

 

Any help appreciated.

 

thanks

 

BC

Link to comment
Share on other sites

Quote:

Does anyone have any ideas where I should start?

 


Some information about your configuration and specifics about the issues would be helpful.

 

What Mac OS version (and whether server or non-server Mac OS) on the two machines?

What Retrospect client on the client machine?

What Mac models and hardware configuration on the two machines?

 

Does this happen every time you try to back up this data drive on the client computer?

Does this happen in all backup sets, or just some?

What type of backup set?

 

Russ

Link to comment
Share on other sites

Russ,

 

-OS X 10.4.8 on both Retrospect server and client machines

-Retrospect client v 6.0.109

-Both servers are XServes - Dual G5 2GHz, 2GB Ram - the Retrospect client has one 80GB system drive, and two 250GB data drives as a mirrored RAID; the Retrospect server has the same system drive, and is connected to an XRAID with 1TB of striped+mirrored drives

 

-This seems to have started a couple weeks or so back; it seemed to be with certain backup sets at first, but now seems to be ALL the sets (we do a 4-week tape rotation) are hanging on copying the client's data drive; and it only seems to happen on Normal backups, as it can get through a Recycle backup without crashing (if it has time to run all weekend).

 

Because this wasn't happening a while ago, I get the sense that it may be related to the file structure on the client's data drive - we are a school, so the teachers' and students' data grows throughout the school-year, then we wipe the machines and they have to start over 'fresh' at the beginning of the next year... as such, we're just trying to get through another week or so of active use of the client server (and backups, if needed).

Link to comment
Share on other sites

Quote:

Retrospect client v 6.0.109

 


Ok, that's a very old client and is not appropriate for Retrospect 6.1.x. I'd suggest updating to Client 6.1.130, which is here:

Retrospect Mac updates

 

Quote:

Both servers are XServes - Dual G5 2GHz, 2GB Ram

 


Good. At least we aren't dealing with Intel / Rosetta issues.

I assume that this is Mac OS X 10.4.8 server PPC, not the Universal Binary version (which is a very different animal than the PPC version, and which requires a "wipe-and-install" rather than the normal install/upgrade process). Correct me if my assumptions are not correct.

 

Quote:

This seems to have started a couple weeks or so back; it seemed to be with certain backup sets at first, but now seems to be ALL the sets (we do a 4-week tape rotation) are hanging on copying the client's data drive; and it only seems to happen on Normal backups, as it can get through a Recycle backup without crashing (if it has time to run all weekend).

 


This seems to me to indicate that it might be related to catalog corruption. You don't mention which RDU (Retrospect Driver Update) version that you are running - some people have reported problems with RDU 6.1.10.100, and you may also have an old RDU whose problems have been fixed by a later RDU. I'd suggest that you try RDU 6.1.9.102 if you aren't using that version already. The various RDU versions are here:

]RDU version history

 

Another cause could be catalog or preference corruption caused by the several forced quits that you mention you have done. If the crashing is because of catalog corruption, whether because of an RDU issue or because of a force quit, the solution is the painful and time-consuming catalog rebuild from scratch.

 

It could also be a catalog volume (size) issue because you have 2 GB RAM and dual G5 processors in your Xserve. You don't say how much data / how many files we are talking about here. If you've got a lot of data, 2 GB RAM is a bit marginal for Retrospect on a dual G5 system.

 

Quote:

I get the sense that it may be related to the file structure on the client's data drive

 


Has the client Xserve (not the Retrospect client program) ever crashed during this period? Have you tried to take the client Xserve down for maintenance and run disk utility on the drive in question? Are there network home directories on this problematic drive? Is the entire drive shared, or just certain folders on the volume? Is the problematic volume an HFS+ volume? Just out of curiosity, are ACLs enabled on the problematic drive? (I'm only aware of ACL issues with the Universal Binary 10.4.8 version of Mac OS and Retrospect, not the PPC version, but this might be an important piece of information).

 

From your original post:

Quote:

but the data drive will get to 'copying' then hang the program.

 


It's not clear from this whether any files have been copied yet, or whether only the scanning phase has completed without any copied files.

 

Have you tried splitting the problematic drive into Retrospect "subvolumes" for separate backup to try to narrow down where the problem area is?

 

Just some thoughts to toss out.

 

Russ

Link to comment
Share on other sites

Quote:

I get the sense that it may be related to the file structure on the client's data drive

 


 

As opposed to the Tape hardware or client software?

 

Easy test would be to take a Source that always fails when you're writing to a Tape Backup Set and use it as the Source of a fresh File Backup Set. If it also fails, it points to the data on the Source (your sense).

 

Dave

Link to comment
Share on other sites

Yea, I agree, Dave. That's why I noted:

 

Quote:

It's not clear from this whether any files have been copied yet, or whether only the scanning phase has completed without any copied files.

 


It could point to the tape subsystem (SCSI? Firewire?) if it happened on the first command to the drive. No clue is given as to the tape subsystem hardware or drive type, etc.

 

If the tape drive is attached to an Apple BTO dual-channel SCSI card (really an LSI Logic 22320 rebranded by Apple), that's a piece of junk. I have the one that came in our Xserve sitting on a shelf, gathering dust. Never worked right, causes problems that immediately went away when it was replaced by an ATTO UL4D.

 

Russ

Link to comment
Share on other sites

Russ, I updated my client version - I don't know why it had such an old version on there actually. Your assumptions about my version of 10.4.8 is correct, it is the PPC version

 

Quote:

This seems to me to indicate that it might be related to catalog corruption. You don't mention which RDU (Retrospect Driver Update) version that you are running - some people have reported problems with RDU 6.1.10.100, and you may also have an old RDU whose problems have been fixed by a later RDU. I'd suggest that you try RDU 6.1.9.102 if you aren't using that version already. The various RDU versions are here:

]RDU version history

 


 

I don't know what RDU version I'm running and have spent too much time trying to find out where I can find the info - can you help me there?

 

Quote:

Another cause could be catalog or preference corruption caused by the several forced quits that you mention you have done. If the crashing is because of catalog corruption, whether because of an RDU issue or because of a force quit, the solution is the painful and time-consuming catalog rebuild from scratch.

 

It could also be a catalog volume (size) issue because you have 2 GB RAM and dual G5 processors in your Xserve. You don't say how much data / how many files we are talking about here. If you've got a lot of data, 2 GB RAM is a bit marginal for Retrospect on a dual G5 system.

 


 

I may indeed need to rebuild the catalog file... I might do that tomorrow when I load new tapes into my changer. The thing about RAM is that I have another G5 server running Retrospect Server with 2GB of RAM, and it is actually doing MUCH more backing up, like 5+ times the volume (though it has a VXA 3 drive attached to it via SCSI, and this server has a VXA 2 via FireWire 800), and it never crashes like this.

 

Quote:

Has the client Xserve (not the Retrospect client program) ever crashed during this period? Have you tried to take the client Xserve down for maintenance and run disk utility on the drive in question? Are there network home directories on this problematic drive? Is the entire drive shared, or just certain folders on the volume? Is the problematic volume an HFS+ volume? Just out of curiosity, are ACLs enabled on the problematic drive? (I'm only aware of ACL issues with the Universal Binary 10.4.8 version of Mac OS and Retrospect, not the PPC version, but this might be an important piece of information).

 


 

Data-volume wise, the server that Retrospect is hanging on is backing up TWO servers (the one that hangs it, and itself): the client server has a system drive and about 90GB of data on the data drive (the Retrospect server is backing up its own system drive and about 200GB of data on the data drive). The data drives include network home directories and group shares - so its just certain folders on the volume that are shared - the drives are HFS formatted, and ACLs are on.

 

Quote:

Have you tried splitting the problematic drive into Retrospect "subvolumes" for separate backup to try to narrow down where the problem area is?

 


 

I hadn't thought about trying this, but it is a good way to try to isolate the problem, if it is in fact related to particular folders (users/shares)... I may try this tomorrow also, but I may wait until next week, so that I can try to fix the particular problem (and maybe keep it from continuing).

 

If any of the above information helps you to narrow down what I should try FIRST, say, let me know - thanks for the suggestions!

 

BC

Link to comment
Share on other sites

Quote:

I don't know what RDU version I'm running and have spent too much time trying to find out where I can find the info - can you help me there?

 


Yes. That information appears in two places:

 

(1) Retrospect menu, "About Retrospect..."; and

(2) Retrospect log - version information is printed into the log each time Retrospect launches.

 

The RDU file is put into Retrospect's folder in the Applications folder. It provides driver updates/bug fixes and can also provide some Retrospect bug fixes.

 

Quote:

Data-volume wise, the server that Retrospect is hanging on is backing up TWO servers (the one that hangs it, and itself): the client server has a system drive and about 90GB of data on the data drive (the Retrospect server is backing up its own system drive and about 200GB of data on the data drive). The data drives include network home directories and group shares - so its just certain folders on the volume that are shared - the drives are HFS formatted, and ACLs are on.

 


Sorry if my comment wasn't clear. It's the catalog size (number of files) that is the issue, which Retrospect has to chew on, not the volume of data being backed up. That's what the 2 GB comment was discussing. You might have gobs of small files on this volume rather than a much smaller number of large files on another volume. If you are towards the end of the school year, and these are home directories, then I bet you have lots of small files. Some size threshold might have been crossed for the catalog that is causing issues.

 

Ok, the fact that ACLs are on for this volume may indicate that updating the client might fix things. The fact that only certain folders are shared, not the whole drive, will simplify things. HFS (not HFS+ ?) tells us that the format won't be foreign to the client, so there aren't issues there.

 

If I had to prioritize the things to try, it would be:

 

(1) update the Retrospect client - seems you have done that.

(2) update the RDU, if needed. I suggest RDU 6.1.9.102.

(3) subvolumes to try to track down the problem.

 

Note that subvolumes addresses two areas simultaneously - it reduces the number of files being sorted/analyzed, and it also isolates possible "long file pathnames" / "problematic files" / "problematic filenames" / "tangled filesystem hierarchies being navigated".

 

Consider, for example, cross-linked entries where ".." links to itself, or other similar filesystem issues.

 

Russ

Link to comment
Share on other sites

Quote:

Thanks Russ - I checked both locations for RDU info, but as I said, there's really nothing to be found in either the log or the 'about retrospect' window... FWIW: my other XServe running Retrospect Server does show an RDU version, which is at 6.1.3.101

 


 

Ok, then it's clear that the "other" Xserve is running a very out of date RDU (again, I suggest RDU 6.1.9.102) and that this Xserve has no RDU installed (because no RDU version is reported), and one should be installed. Just put the applicable RDU (it will be titled "Retrospect 6.1 Driver Update") in the Retrospect folder in the Applications folder. The RDU updates are here:

RDU version history

 

Quit Retrospect, put in the RDU, launch Retrospect, check the log for RDU version.

 

Russ

Link to comment
Share on other sites

Just thought I'd give an update here - I updated both my Retrospect Server machines to the RDU you suggested Russ, updated my outdated client to the current version, trashed the catalog file for the week's rotation, and divided my client into subvolumes.

 

After a successful Recycle backup (which was what I was getting before), I am now three days into Normal backups w/o any hangups - of course, because I took multiple steps, I am not yet sure what the problem was exactly, but so far so good.

 

If I get through the entire week's cycle without the program crashing, I will probably trash next week's catalog file as well.

 

Thanks for the help!

 

BC

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...