Jump to content

Chunk checksum errors


Recommended Posts

Hi

 

I have a problem that I haven't been able to find a satisfactory answer to in the forums although almost the same question has been asked before.

 

I am running Retro server 5.0.238 in a mixed environment (Win/Mac of all versions). I have about 20 clients on a 100MBit switched network. The server is a PowerMacG4 1Ghz with 768MB RAM and OS 10.2.6 with all the latest patches. The tape drive is a new LaCie Firewire AIT-2. The server is dedicated to retrospect and no other 3rd-party software has ever been installed on it.

 

I am rotating sets each week, backing up everyday. There are 3 sets(A, B and C) of 3 tapes all in all.

 

The problem turns up when the tapes are rotated for the second or third time. The server scans the client and during the matching it gives a Chuck checksum -24042 error and skips along to the next client, which also gets the same error.

 

I have tried searching the catalogs for files and I get the same error, so assuming the catalog was corrupt I have rebuilt the catalogs but this just buys me another week or so. Also I have checked the HD (standard built-in ATA drive) with disk utility, nothing there.

 

So according to Technote 307 I have a chronic chunk problem. Does anyone have any idea about what could reasonably be the cause of this? I can't figure out what could be wrong with my hardware setup.

 

Any suggestions would be much appreciated!

 

 

 

 

Link to comment
Share on other sites

Hi

 

Pardon the remedial stuff but how often does this machine get rebooted?

Any chance you could save the catalog files on a USB or firewire disk as a test? That would rule out your hard drive as a possible point of failure.

 

Nate

Link to comment
Share on other sites

  • 2 weeks later...

Thanks for your reply Nate,

 

The machine isn't rebooted regularly (maybe twice a month on average), of course I could schedule regular (weekly) reboots just in case.

 

I don't have access to an external drive right now, but I can have a look around. It just seems so unlikely that there would be a problem with a standard built-in new HD.

 

Also, I tried verifying 2 sets, 1 verified OK and also doesn't show any chunk errors during BU either. The other one, stopped verifing halfway through the first tape (1 of 3) with a chunk checksum error. So I guess this means that something isn't being written correctly to the tape.

 

So, is there a possibility that my (also brand new) tape drive is faulty and/or my HD?

 

Grateful for any suggestions anyone might have

 

Eric

Link to comment
Share on other sites

Hi Eric,

 

I had the same problem awhile back. Using OS X and an 11 tape LTO system with a B&W G3 running it.

 

The check sum error seemed to move around - I couldn't seem to narrow it down - rebuilt catalogs and several other things without luck. Finally installed Retrospect on a different machine and the problem went away. In my case it was the hard drive.

 

Perhaps if I had formatted and started again with the G3 it would have worked, but I had a better machine available so I did not go back and test it out.

 

If you had another machine to save your catalogs to, you may want to give that a try -even though it seems unlikely because your machine is new.

 

You may have seen this in the Knowledge base article but if not this it has some good info in it :

Article #26641

 

'.... If only with one, rebuild the catalog as noted above in "Matching During a Backup", or simply start a new backup set. If the problems affect more than one backup set, you may have a hardware or system configuration problem that is causing repeated corruption of the backup set catalogs stored on your hard disk. We have seen these problems caused by specific failing hard disk. '

 

 

good luck

janet

Link to comment
Share on other sites

Hi

 

 

 

I suspect the problem is more likely the IDE bus itself than the drive. That is why I suggested the USB drive. Somethimes a bus reset will throw things out of whack.

 

 

 

You may also want to try a different firewire cable and another firewire port on the machine. If you haven't already make sure you have the latest driver update too.

 

 

 

Nate

Link to comment
Share on other sites

Thanks for the help so far...

 

I couldn't get my hands on an external drive of any kind, so I put in a new ATA disk on the ATA 66bus and am saving the catalogs to it now. The original drive is on the back bay ATA 100 bus so if there is anything wrong with that bus I will have avoided it.

 

Also I updated to the latest driver for 5.0, I was 1 version behind.

 

It will take a few days for the error to show up, but I will let you know what happens. Hopefully it will all work.

 

/eric

Link to comment
Share on other sites

  • 2 weeks later...

Hi!

 

The chunk errors came back over the weekend.

 

According to the log, it looks like it popped up during normal backup. The server finshed a client OK and went on to the next on the list which happened to be the actual backup server itself. Every backup since then has finshed off with a chunk checksum error.

 

I am attempting to rebuild the catalog for the particular set to determine if the tape is corrupted. I will return with the results of this soon.

 

Also, the server had crashed due an elem-c -817 so I gather that I have to fix the HD on the offending clent?

 

 

BTW: Is there a clever way to cycle log files, so they don't get so big?

 

/eric

Link to comment
Share on other sites

Hi, thanks again for help so far.

 

The catalog rebuild seemed to do the trick. The question is for how long. I did notice 3 clients (out of 20) that got chunk checksum errors but backed up OK the second time retrospect polled around. Is this normal?

 

The client with the elem.c error BUed OK today, without any intervention from me. I hadn't had time to run any utilities on that machine. I will keep an eye on it though.

 

About the logs: What I really was looking for was a way to roll the log files and have them archive away in case I want to look at them later. But that isn't a big deal. I am more concerned about the checksum errors.

 

So right now, things seem to be OK, but I have the feeling that these checksum errors will be back.. in which case, I'll write about it here.

 

/eric

Link to comment
Share on other sites

Hi, and sorry to return with problems again....

 

The chunk checksum errors are back. I had a set running OK doing daily incremental BU and the second time I switched to it, about 2 days into that set (i.e. it had been in use for a total of 9 days) all the attempted backups returned chunk checksum errors as before.

 

This backup set is on 2 tapes, and I first did a catalog update. That worked OK but didn't help the errors.

 

Then I did a rebuild from tape, this has solved the problem (for today).

 

As far as I understand, the info on tapes is OK since catalog files can be constructed from it. So if the problem is in the catalog file, what should I do now? I have already tried a different disk on a separate ATA bus, so I don't believe that there is a fault with the disk either.

 

Any ideas would be greatly appreciated...

 

 

 

 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...