brising Posted July 8, 2003 Report Share Posted July 8, 2003 The last 2 times that I've tried backing up my system, Retrospect has backed up a non-system volume without trouble, backed up the system volume, and then hanged when it went to compare the system volume, just sitting there with a "Comparing Malcom..." (my OS X boot volume) and 0 files completed. Once was over the last long weekend, where it just sat in the above state for 3 and one-half days. Now it has been doing the same for the last 3 hours and 45 minutes. The indicator lights on the VXA-1 firewire drive are in a state which is not in the user manual during the hang. After aborting the backup, the tape drive gives the signal for 'format recovery'. After recovery, Retrospect claimed the first time that the media are unrecognizable, but had no trouble the second time. At first I thought the problem was a hardware problem (bad tape) the first time it occurred, since the tape was seemingly unusable after I quit Retrospect. The fact that it has now quit at exactly the same point during both backups on two different tapes has me now thinking that Retrospect is linked to the problem. What is curious is that the problem is new, and existed both before I updated the drivers and after. My suspicion is that there is a file on the system volume which is causing Retrospect to hang. A quick peek at top shows that Retrospect is indeed thinking quite hard: (note that the above output didn't show on preview... perhaps it shows on your machine, perhaps it does not.) The %CPU time stays high even after the backup has been aborted. Looking at the devices, Retrospect claims that the tape drive is busy (even though it isn't really doing anything). System: Mac OS X 10.2.6 on a dual 800 G4 quicksilver (the 49% above is from one of the two processors). Retrospect version 5.0.238 Driver Update 3.6.106 (had problem with 3.5.104, too). VXA-1 firewire tape drive with firmware 2EAE Tape drive cleaned before first of the two hangs. Any and all help would be appreciated. I would be sending this message to the folks at Ecrix (Exabyte), also, but it appears they are relocating, and inaccessible. Thanks, Bill Link to comment Share on other sites More sharing options...
shadowspawn Posted July 9, 2003 Report Share Posted July 9, 2003 On the corrupt-file theory, you could split the backup of the boot volume in half using sub-volumes and see whether 0/1/2 of the resulting backups hang. This is a direct way of narrowing down or eliminating that as a cause. I don't have any suggestions on the hardware side, other than using that cleaning tape again. I have seen a few similar (but rare) failures, and I associate the cpu use with a hardware/communication failure. (Retrospect seems to poll the hardware aggressively, and burn the cpu waiting for a reply.) A few more details that might be useful: How much RAM does your computer have? Total numbers of files and size on the good (non-system) volume? Total numbers of files and size on the bad (system) volume? Link to comment Share on other sites More sharing options...
brising Posted July 9, 2003 Author Report Share Posted July 9, 2003 Splitting the boot volume is a good idea (which brings back bad memories of extension conflicts in the pre-Conflict Catcher world...). I'll have to try that when I have the time to fiddle with the backups. As for other hardware details: RAM: 1 GB (2 x 512MB) Files on successful 'drive' (really a subvolume): just 9 (it was some Virtual PC disks) Files on boot drive 206,196 Now - the incremental backup ran just fine on the problem boot volume last night, and had no trouble starting the compare with the user partition (with 128,193 files). Other items of possible interest: Both tapes with which Retrospect had troubles were newly erased members of old sets. Of course, this is probably meaningless, since there have never been troubles with such newly erased members in the past. Bill Link to comment Share on other sites More sharing options...
ghoffman Posted July 9, 2003 Report Share Posted July 9, 2003 What WAS the state of the indicator lights on the VXA drive when the backup hung? Was the fourth light red? What size tapes are you using when the hang occurs? V6, V10, V17? Are the tapes near full when the hang occurs? How many Gigabytes of information are you backing up from the various volumes? You said "just sitting there with a "Comparing Malcom...". Then you said "After aborting the backup, the tape drive gives the signal for 'format recovery'." My understanding is that format recovery will occur on a VXA drive only if 1) the drive is in write mode and 2) power is removed from the drive. Did you turn off the tape drive as part of your "aborting the backup"? If so, why? What else did you have to do to abort the backup and recover normal operation of you backup machine? In any case, the format recovery suggests that Retrospect was not in the Compare phase of the backup, but rather still wrapping up the copy phase. Have you tried running Disk First Aid on the troublesome volume? Glenn Link to comment Share on other sites More sharing options...
brising Posted July 9, 2003 Author Report Share Posted July 9, 2003 State of indicator lights: off off amber green Usual state of lights when writing to tape: off off amber blinking green So... perhaps Retrospect was trying to write a close block (or whatever it would be called) the whole time. What I don't understand is that the dialog box said "comparing" which comes after it closes off the write. Size of Tape: V17 Space Taken: Hardly any (no the tapes were not very full). Size of Backups: Failed System volume: 4.5GB Successful small number of files: 4.0GB Successful user volume: roughly 20GB (and spanned tapes, due to intervening server backups). Turn off tape drive? First time - yes, second time - no. Turned it off the first time, because the drive had been sitting for 3 days doing nothing. When the previous VXA-1 drive failed, this was the first action suggested by tech support. It is also what is suggested in the users manual to clear a failed self-test. Phase of Retrospect: It does appear that the write was not complete even though the Comparing dialog was up. This is troublesome. Disk First Aid: Oops. I have not done that. Good suggestion. -- Bill Link to comment Share on other sites More sharing options...
brising Posted July 9, 2003 Author Report Share Posted July 9, 2003 Addendum The startup volume is checked on boot. What with OS X being unix, I'm not used to restarting. So... I'll have to restart the machine before I start experimenting with backups of the boot drive. Link to comment Share on other sites More sharing options...
ghoffman Posted July 10, 2003 Report Share Posted July 10, 2003 I interpret the indicator lights showing off off amber green (4th flashing or not) as the VXA drive being in "write mode". Write mode is just from my experience: I don't know that Exabyte calls it that. The 4th light gets brighter during the moments when data is actually being transferred to/from the drive, causing a flash. When a hang occurs no data is being transferred and the light is dim. Given that the drive was in write mode the 1st time the backup hung, cycling power on the drive was your only recourse. I am puzzled that you did not have to turn the drive off when the backup hung the second time. Did the drive show a different set of indicator lights with the second hang? Seeing the "Comparing ..." dialog up while the drive was still in write mode suggests that Retrospect has gotten ahead of the drive. I.e., a communication problem. I cannot begin to say who's fault that might be. Could be Retrospect, Retrospect driver, Mac OS X, cables or other hardware, VXA drive, VXA firmware. I went thru about a year of hangs almost like yours (I saw an "Updating catalog..." dialog rather than "Comparing...") but the rest is essentially the same. Yet, in the last month, I have been free the hangs (knock on wood). My backups now proceed much like they did in Mac OS 9 before I converted to Retro 5 / Mac OS X. I don't know what has changed that "suddenly" eliminated the hangs, but here are some theories: 1) Continuing updates to software, both Retrospect, Mac OS X, VXA, and SCSI (yeah, SCSI is my dark little secret, as most VXA users have FireWire). I have all the latest, including the 5.1 client software. 2) I recently re-setup my tape drive hardware setup. I.e., I "loaned out" the drive for some experiments on another host, then restored drive and cable to the backup host computer. Perhaps the physical disturbance, or the communication with different software and hardware, altered some aspect of my backup configuration. Besides any ideas inspired by the above discussion, I think your best bet is to configure your backup script to backup by sub-volumes as suggested in earlier posts. This may point to a trouble-spot in a sub-volume, or like in my case, you may find that the backup completes without error. In my case after backing up all the sub-volumes, I would backup the volume as a whole to get anything not in the sub-volumes. I ran like that for many months, with only a rare hang, until my recent discovery that this workaround was no longer needed. Link to comment Share on other sites More sharing options...
brising Posted July 11, 2003 Author Report Share Posted July 11, 2003 Quote: I interpret the indicator lights showing off off amber green (4th flashing or not) as the VXA drive being in "write mode". Write mode is just from my experience: I don't know that Exabyte calls it that. The 4th light gets brighter during the moments when data is actually being transferred to/from the drive, causing a flash. When a hang occurs no data is being transferred and the light is dim. Yes, this is what I've seen too, but somehow I never connected the not-blinking to hanging. Thanks for making the connection for me. Quote: I am puzzled that you did not have to turn the drive off when the backup hung the second time. Did the drive show a different set of indicator lights with the second hang? No - it was still the same set of lights. In both cases, I ended up shutting off the drive for 10 seconds. The only difference was in the timing of when I aborted the backup. When I simply aborted Retrospect w/o cycling the power to the drive, Retrospect seemed to think there was no tape in the drive. I don't remember what happened when I first turned off the power to the drive. Thanks again for all the input. The subvolumes and full volume trick is pretty tricky, but it could really be the secret to getting things to work. I'll first see if the backup croaks on the next recycle media backup. -- Bill Link to comment Share on other sites More sharing options...
brising Posted July 29, 2003 Author Report Share Posted July 29, 2003 Hmm... another recycle backup, another hang on the boot drive. Just so folks can see what Retrospect was displaying: Notice that the drive sat the entire night without doing anything. I guess I'll have to play the subvolume game, though it seems to be a colossal waste of time. Does anyone have any other hints? Link to comment Share on other sites More sharing options...
brising Posted March 14, 2005 Author Report Share Posted March 14, 2005 The hang-on-compare problem returned and was vanquished by the firmware upgrade for my VXA-1 tape drive from earlier last year. Still, I figured I'd post the details, in case anyone else runs into the problem. Here were the details of the hang-on-compare this time: The hang-on-compare problem happened with a non-boot drive, so repairing permissions couldn't have been a help. When I canceled the hung backup, Retrospect sat for a good while, say >10 minutes, then canceled. If I tried to look at the devices through Retrospect, the correct tape member showed up in the correct tape drive, BUT Retrospect sat. Clearly, there must have been some sort of communication problem. Shutting off the tape drive caused Retrospect to go back to normal behavior (which is better than force-quitting Retrospect). Here's what I did this time: + Cleaned the tape (again) (result: nothing good) + Verified the drives on the machine using Disk Utility. (nothing) + Zapped the PRAM (nothing) + Unplugged firewire and turned off tape drive, rebooted (nothing) + Installed the latest firmware (11100) since this apparently can fix communication hangs. (fixed!) Also: held in the eject button to see what the last error codes were, but this was likely immaterial. Problem fixed, for another year and a half. It seems that the common denominator between these two incidences was that they both happened after cleaning the tape heads. Whatever. My system is slightly different, in case it means anything: Mac OS X 10.3.8 on a dual 800 G4 quicksilver Retrospect version 5.1.177 Driver Update 4.3.103 VXA-1 firewire tape drive with firmware 2EAE (before the hangs) 1110 (after) -- Bill Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.