Jump to content

Error: chunk checksum didn't match

Recommended Posts

I have been wrestling with this error for weeks now and finally seem to have found a root cause that may have uncovered a bug.


The original error I was receiving was:


'Trouble matching DRIVE-VOLUME on SYSTEM-NAME to BACKUP-SET, error -641 ( chunk checksum didn't match)'


This error began appearing on every system and volume I had on our network. I was unable to perform *any* backups, even recycling my backup sets, and eventually completely wiping my ProgramData\Retrospect config dir and starting from scratch; same end result. At this point, I have come to the conclusion that something is wrong with my clients or the data that is contained within. Manually backing each system up and watching each volume status carefully, I found the location where the backup stops and reports the above error was while backing up a 16GB data file (this is a 16GB contiguous file). I have backed up VMDK's and other large files, but never one 16GB in total size. This file in particular when added to the catalog index completely broke my backups. Once I removed this file from my system, recycled my backup set and catalog, I was able to successfully resume our network backups.


I'm not sure how or why the size or contents of this particular file caused a catalog corruption and could no longer perform matching, but here it is.


Mayoff, have you or the team seen any catalog matching issues when backing up large files like this?






Our Retrospect Server Config:

Windows Server 2008 R2 SP0 x64 running Retrospect v7.7.325 (64-bit edition), SAS LTO-3 Loader

Link to comment
Share on other sites

  • 2 weeks later...

I have since had this problem resurface. With no changes to the config, now all my backups are failing with the same error above.


(BTW, good feedback, thanks Robert.)


I'm reluctant to rebuild my entire config and client DB *again*, is this possibly a application bug? What more information can be found about "error -641 ( chunk checksum didn't match)"?






Link to comment
Share on other sites

Yes, we set all Debug Logging (Foundations, User Interface, Application Logic, Devices, Engine, Trees/Volumes, Backup Sets/Catalog Files, Networking, Retrospect API) to the max: 7

as well as 'Execution: Disable Fast Matching'


Same result:

enotProgEvent: 'MaiS'= 0, msg = Trouble matching System (C:) on HOST to Tape Set, error -641 ( chunk checksum didn't match)


Support case has not had any resolution either. We have just rebuilt a new server from scratch with a fresh Retrospect config. The first backup run on all clients completes successfully, but fails on every client on subsequent operations. It appears once the catalog has a single copy of the files, the file/catalog 'matching' is unable to function; disabling 'matching' allows the operation to complete successfully.


We are extremely disappointed that de-duplication in this product may not work for us on 2008 R2 until a major new rev.


Thanks again~

Link to comment
Share on other sites

Just a few suggestions for you to test so as to eliminate the possibility of something other than Retrospect:


(1) You haven't provided any specifics about where the catalog and backup set are stored (local to the Retrospect machine? on a NAS? etc.). Even though you did indicate in your configuration data that you have a SAS LTO-3 autoloader, it's unclear to me whether the backup set is stored on that LTO-3 or instead is a disk-based backup set (but your error message indicates that a Tape backup set is involved). Perhaps the storage medium itself has issues with large transfers. Have you investigated firmware and driver updates for the LTO-3 and SAS infrastructure?


(2) It's possible, if the backup set or catalog is on a NAS or some network share, that you might be seeing network errors not caught by packet checksums if the packets are huge, as might happen if you have "jumbo packets" enabled, or if some piece of network infrastructure is problematic at your site.


(3) It's also possible, if the backup set or catalog is on a RAID 5 device, that you've got some obscure corruption issues with the RAID 5. I have seen hard-to-troubleshoot instances of RAID-5 corruption caused by RAID controller bugs. Have you investigated firmware or driver updates, etc., if this might be the case?


(4) And, of course, there's always the remote possibility of memory errors. Do you have ECC on your server? If so, have you checked the logs to see if there might be ECC errors occurring?


Agreed, it looks like its a Retrospect issue from your thorough troubleshooting reports above (and complete and intelligent troubleshooting reports like yours are rare but always appreciated), but it might be a case of Retrospect being the "canary in the coal mine". Just some thoughts for you to consider in your testing.



Link to comment
Share on other sites

Good points Russ, answers below:


The backup set is a Tapes Backup Set (LTO-3). The catalog is stored on direct attached storage to the Retrospect server, same as it has been since v7.5. Currently set with Hardware data compression, Catalog File compression (tried both compressed and "don't compress"), and Fast catalog file rebuild. We have 13 members in use with 117 sessions and 34 snapshots. The catalog file is 1.44GB.


I have the most current firmware on both the SAS controller and the tape drive, I have not tried a new SAS controller.


No jumbo-packets enabled on our gigabit infrastructure.


The backup catalog file is local to the server and on direct attach RAID5 storage. No other problems with our disk based backup sets or other data stored on the DAS. Yes, memory is ECC on server, no memory errors, relative log events, or indications we are experiencing any hardware issues at all.




Thanks again for the feedback~

Link to comment
Share on other sites

We are extremely disappointed that de-duplication in this product may not work for us on 2008 R2 until a major new rev.


What does the problem you're describing have to do with a deduplication, a feature, which based on the typical way the term is used: to mean block-level deduplication, is not a feature of Retrospect?

Link to comment
Share on other sites

"De-duplication" to mean Retrospect's implementation of file "Matching".


When disabling file "Matching" features the backups function successfully, which unfortunately creates exponential increase in backup media usage.


Obviously we want to use this feature as we have successfully in the past prior to upgrading to 2008 R2.


Edited by Guest
being polite...
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...