Retrospect won't verify archives or backups

greggerman · October 13, 2009

I have a Quantum tape library and I'm having problems archiving with Retrospect. It gets through the writing portion fine, but it fails in the verification stage. I have tried re-verifying several times and it always fails. Basically, the verify will go up to a certain point, but never the same point, and then just stall. If I go look at the tape library, I can see that the activity light has stopped blinking, meaning the drive is no longer reading data. If I wait long enough, Retrospect will eventually error out and say that the verification failed and show which files were not verified. At this point, Retrospect can no longer communicate with the tape library. So I have to force quit Retrospect and reboot both the server and the tape library before they will communicate again.

Yes, more configuration details of course...

CONFIGURATION

Server: Xserve 2 x 2.0 GHz Dual-Core Intel Xeon, 4 GB RAM

OS: OS X Server 10.5.8

SCSI Card: ATTO ExpressPCI UL5D, Driver Version 4.40, Flash Bundle Version 2009_04_10

Retrospect version 6.1.230, Driver Update Version 6.1.16.100

Autoloader: Quantum SuperLoader 3 tape library (LTO-3 Half-Height SCSI Tape Drive), Library Firmware V61, Drive Firmware V2181

Note: All above versions are the latest available.

SPECIFICS OF ARCHIVE

I was attempting to archive about 800 GB of data from a fibre attached Xsan 1.4.2 volume. The data definitely finished writing, but fails on verification. I should also note that this is not an isolated incident. I noticed a little while back that my nightly backups were always failing in the verification stage. I didn't have time to deal with it at the time, so I just turned off verification in my backup script. But clearly, an archive cannot be considered complete without verification. This setup used to work fine for a couple of years (with earlier versions of all of the drivers and software), but started behaving badly while I was addressing a hardware issue with the Quantum tape drive (which has since been replaced). In assessing the problem, I began upgrading software and drivers and it has never worked properly since. I have tried reverting to the last known working state, but I neglected to write down the firmware version of the tape drive, so I wouldn't know which version to revert to, even if Quantum made older versions available.

RETROSPECT LOG

+ Executing Verify at 10/8/2009 5:05 PM

To backup set Summer 09 Archiveâ€¦

Trouble reading: â€œ1-Summer 09 Archiveâ€ (2160271), error 205 (lost access to storage medium).

1140 files were not verified.

10/8/2009 7:21:35 PM: Execution incomplete.

Remaining: 1140 files, 687.0 GB

Completed: 2284 files, 131.9 GB

Performance: 992.8 MB/minute

Duration: 02:16:05 (00:00:07 idle/loading/preparing)

Quit at 10/8/2009 8:42 PM

SYSTEM LOG

Whenever I want to look at the tape library through Retrospect (Configure > Devices), I start to get a steady stream of the following message:

10/9/09 11:54:20 AM kernel SCSITaskUserClient - Invalid arguments: scatterGatherEntries = 1, requestedTransferCount = 0, transferDirection is 0

This continues until I leave that screen. The same message reappears about 5 or 6 times when I start the verify process in Retrospect. I haven't seen anything in the system logs that seems to correspond to verification failure.

WRAPPING IT UP

I have read on this forum that error 205 points to a SCSI issue. I have replaced both the SCSI cable and terminator, and even tried it without a terminator, all to no avail. I have also seen some things in the Retrospect 8 forum that mention the above SCSITaskUserClient error and that it had something to do with the Mac OS X 10.5.7 update. But I don't know if this also applies to Retrospect 6, or if it even has anything to do with my issue. I'm in quite a bind here, as my SAN is filling up and I need to remove some data, but I can't until I can verify that it has been archived. Any ideas or tips would be appreciated. Thanks!

rhwalker · October 13, 2009

A comment and then some possible avenues to investigate.

Because your issue doesn't happen at the same place, it's probably a race condition that is timing dependent and hard to reproduce with a repeatable test condition. Those are the worst. I'm assuming that you have good cables and active termination.

However, the error message that you are seeing doesn't seem to indicate such an issue, but instead seems to indicate that Retrospect is passing in bad parameters ("requestedTransferCount = 0") to the SCSI manager.

In that case, the error is probably happening in some rarely-used Retrospect error recovery routine that hasn't had the opportunity to be fully tested. At this point, with programming effort focused on Retrospect 8, it's unlikely that Retrospect 6 can (or will) see any further fixes.

(1) Is the tape drive / autoloader on their own SCSI channel of the ATTO UL5D, away from disks?

We saw some strangeness with Retrospect 6 and our ATTO UL4D, not quite the same as yours (ours happened at SCSI device scan when Retrospect gets ready to write), such that Retrospect seemed to cause a dropped disk transfer completion interrupt for a RAID 1 mirror of our boot volume that was on the same SCSI channel). Only solution was to put the tape drive and autoloader on their own SCSI channel, away from disk drives, and no problem since. But our problem, which happened about once a month, also hard to reproduce, caused the system to hang because of the dropped transfer completion on the boot volume, and you aren't seeing a hang of the disk subsystem.

(2) You might want to work with ATTO to adjust some parameters for the UL5D to change the timing enough so that the error recovery routine isn't triggered.

(3) If the suspicion is that a SCSI manager update somehow introduced or triggered this, a good test might be to get a spare boot volume, regress to an older Mac OS Server 10.5.x booted on that spare drive, see if the problem goes away.

(4) If the Quantum drive has some diagnostics, might be interesting to do some extended full-tape read/write tests over a weekend to see whether it's some sort of a thermal-triggered failure. That would take Retrospect out of the loop and test the SCSI and tape subsystem.

Russ

greggerman · October 13, 2009

(1) Is the tape drive / autoloader on their own SCSI channel of the ATTO UL5D, away from disks?

Our autoloader is the only SCSI device connected to the ATTO UL5D, so it is on it's own channel.

(2) You might want to work with ATTO to adjust some parameters for the UL5D to change the timing enough so that the error recovery routine isn't triggered.

That sounds like a good idea. I'll try that soon.

(3) If the suspicion is that a SCSI manager update somehow introduced or triggered this, a good test might be to get a spare boot volume, regress to an older Mac OS Server 10.5.x booted on that spare drive, see if the problem goes away.

Now that I think about it, we started having this issue when the server was running OS X 10.4.11, so it's doubtful that 10.5.7 has anything to do with my problem.

(4) If the Quantum drive has some diagnostics, might be interesting to do some extended full-tape read/write tests over a weekend to see whether it's some sort of a thermal-triggered failure. That would take Retrospect out of the loop and test the SCSI and tape subsystem.

Russ

I would guess that this is not an issue with the Quantum drive. As I mentioned, we had the autoloader replaced with a new one after the issue surfaced, and it hasn't made any difference. Also, Quantum's diagnostic tools are Windows only (boo hiss!), so I can't run them anyhow.

Thanks for your suggestions, Russ!

rhwalker · October 13, 2009

(3) If the suspicion is that a SCSI manager update somehow introduced or triggered this' date=' a good test might be to get a spare boot volume, regress to an older Mac OS Server 10.5.x booted on that spare drive, see if the problem goes away.

[/quote']

Now that I think about it, we started having this issue when the server was running OS X 10.4.11, so it's doubtful that 10.5.7 has anything to do with my problem.

Ok. Simply FYI, I have never seen your specific issue (and the SCSI manager error) with Retrospect 6.x. We've got an Exabyte VXA-2 1x10 1U (SCSI) PacketLoader attached to an ATTO UL4D in our xServe.

Russ

greggerman · October 14, 2009

In that case, the error is probably happening in some rarely-used Retrospect error recovery routine that hasn't had the opportunity to be fully tested. At this point, with programming effort focused on Retrospect 8, it's unlikely that Retrospect 6 can (or will) see any further fixes.

With this in mind, I decided to install a trial version of Retrospect 8 on my server and redo the archive. It worked without a hitch! This indicates to me that Retrospect 6 is the problem. And since development on v6 is dead, it's probably time to upgrade to v8. There may be a way to get v6 to work again, but I'm tired of fighting with it...

rhwalker · October 14, 2009

Congrats. Life might be better when R8 stabilizes and matures. Would be nice if there were a manual...

Sign In

Retrospect won't verify archives or backups

Recommended Posts

greggerman

Link to comment

Share on other sites

rhwalker

Link to comment

Share on other sites

greggerman

Link to comment

Share on other sites

rhwalker

Link to comment

Share on other sites

greggerman

Link to comment

Share on other sites

rhwalker

Link to comment

Share on other sites

Join the conversation

Browse

Activity