sconstantine Posted April 19, 2010 Report Share Posted April 19, 2010 I have been wrestling with this error for weeks now and finally seem to have found a root cause that may have uncovered a bug. The original error I was receiving was: 'Trouble matching DRIVE-VOLUME on SYSTEM-NAME to BACKUP-SET, error -641 ( chunk checksum didn't match)' This error began appearing on every system and volume I had on our network. I was unable to perform *any* backups, even recycling my backup sets, and eventually completely wiping my ProgramData\Retrospect config dir and starting from scratch; same end result. At this point, I have come to the conclusion that something is wrong with my clients or the data that is contained within. Manually backing each system up and watching each volume status carefully, I found the location where the backup stops and reports the above error was while backing up a 16GB data file (this is a 16GB contiguous file). I have backed up VMDK's and other large files, but never one 16GB in total size. This file in particular when added to the catalog index completely broke my backups. Once I removed this file from my system, recycled my backup set and catalog, I was able to successfully resume our network backups. I'm not sure how or why the size or contents of this particular file caused a catalog corruption and could no longer perform matching, but here it is. Mayoff, have you or the team seen any catalog matching issues when backing up large files like this? Thanks~ Our Retrospect Server Config: Windows Server 2008 R2 SP0 x64 running Retrospect v7.7.325 (64-bit edition), SAS LTO-3 Loader Quote Link to comment Share on other sites More sharing options...
robvil Posted April 19, 2010 Report Share Posted April 19, 2010 With version 7.6 I backup a file from a linux that´s 18Gb large without issues. Regards Robert Quote Link to comment Share on other sites More sharing options...
sconstantine Posted May 3, 2010 Author Report Share Posted May 3, 2010 I have since had this problem resurface. With no changes to the config, now all my backups are failing with the same error above. (BTW, good feedback, thanks Robert.) I'm reluctant to rebuild my entire config and client DB *again*, is this possibly a application bug? What more information can be found about "error -641 ( chunk checksum didn't match)"? :question: Thanks~ Quote Link to comment Share on other sites More sharing options...
robvil Posted May 3, 2010 Report Share Posted May 3, 2010 have you tried pressing CTRL+ALT+P+P and turn up logging? Regards Robert Quote Link to comment Share on other sites More sharing options...
sconstantine Posted May 5, 2010 Author Report Share Posted May 5, 2010 Yes, we set all Debug Logging (Foundations, User Interface, Application Logic, Devices, Engine, Trees/Volumes, Backup Sets/Catalog Files, Networking, Retrospect API) to the max: 7 as well as 'Execution: Disable Fast Matching' Same result: enotProgEvent: 'MaiS'= 0, msg = Trouble matching System (C:) on HOST to Tape Set, error -641 ( chunk checksum didn't match) Support case has not had any resolution either. We have just rebuilt a new server from scratch with a fresh Retrospect config. The first backup run on all clients completes successfully, but fails on every client on subsequent operations. It appears once the catalog has a single copy of the files, the file/catalog 'matching' is unable to function; disabling 'matching' allows the operation to complete successfully. We are extremely disappointed that de-duplication in this product may not work for us on 2008 R2 until a major new rev. Thanks again~ Quote Link to comment Share on other sites More sharing options...
rhwalker Posted May 5, 2010 Report Share Posted May 5, 2010 Just a few suggestions for you to test so as to eliminate the possibility of something other than Retrospect: (1) You haven't provided any specifics about where the catalog and backup set are stored (local to the Retrospect machine? on a NAS? etc.). Even though you did indicate in your configuration data that you have a SAS LTO-3 autoloader, it's unclear to me whether the backup set is stored on that LTO-3 or instead is a disk-based backup set (but your error message indicates that a Tape backup set is involved). Perhaps the storage medium itself has issues with large transfers. Have you investigated firmware and driver updates for the LTO-3 and SAS infrastructure? (2) It's possible, if the backup set or catalog is on a NAS or some network share, that you might be seeing network errors not caught by packet checksums if the packets are huge, as might happen if you have "jumbo packets" enabled, or if some piece of network infrastructure is problematic at your site. (3) It's also possible, if the backup set or catalog is on a RAID 5 device, that you've got some obscure corruption issues with the RAID 5. I have seen hard-to-troubleshoot instances of RAID-5 corruption caused by RAID controller bugs. Have you investigated firmware or driver updates, etc., if this might be the case? (4) And, of course, there's always the remote possibility of memory errors. Do you have ECC on your server? If so, have you checked the logs to see if there might be ECC errors occurring? Agreed, it looks like its a Retrospect issue from your thorough troubleshooting reports above (and complete and intelligent troubleshooting reports like yours are rare but always appreciated), but it might be a case of Retrospect being the "canary in the coal mine". Just some thoughts for you to consider in your testing. Russ Quote Link to comment Share on other sites More sharing options...
sconstantine Posted May 6, 2010 Author Report Share Posted May 6, 2010 Good points Russ, answers below: The backup set is a Tapes Backup Set (LTO-3). The catalog is stored on direct attached storage to the Retrospect server, same as it has been since v7.5. Currently set with Hardware data compression, Catalog File compression (tried both compressed and "don't compress"), and Fast catalog file rebuild. We have 13 members in use with 117 sessions and 34 snapshots. The catalog file is 1.44GB. I have the most current firmware on both the SAS controller and the tape drive, I have not tried a new SAS controller. No jumbo-packets enabled on our gigabit infrastructure. The backup catalog file is local to the server and on direct attach RAID5 storage. No other problems with our disk based backup sets or other data stored on the DAS. Yes, memory is ECC on server, no memory errors, relative log events, or indications we are experiencing any hardware issues at all. Thanks again for the feedback~ Quote Link to comment Share on other sites More sharing options...
mauricev Posted May 6, 2010 Report Share Posted May 6, 2010 We are extremely disappointed that de-duplication in this product may not work for us on 2008 R2 until a major new rev. What does the problem you're describing have to do with a deduplication, a feature, which based on the typical way the term is used: to mean block-level deduplication, is not a feature of Retrospect? Quote Link to comment Share on other sites More sharing options...
sconstantine Posted May 6, 2010 Author Report Share Posted May 6, 2010 (edited) "De-duplication" to mean Retrospect's implementation of file "Matching". When disabling file "Matching" features the backups function successfully, which unfortunately creates exponential increase in backup media usage. Obviously we want to use this feature as we have successfully in the past prior to upgrading to 2008 R2. Edited May 6, 2010 by Guest being polite... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.