robertdana Posted August 5, 2008 Report Share Posted August 5, 2008 Right now, Retrospect's duplicate file detection mechanism is really limited... files have to have the same name (and more troublesome... timestamp) to be considered the same. It seems to me that, since you are already calculating MD5 checksums of files, you could implement a much more accurate (and tolerant) duplicate file detection mechanism. Certainly keep the existing approach, but add a check for duplicate md5 checksums in the catalog after completing the backup of each file. If there's a checksum match, catalog it as a duplicate and throw away the unnecessary data. To reduce the catalog query overhead you might want to include a preference setting to disable this kind of matching on files below a configurable size threshold. Quote Link to comment Share on other sites More sharing options...
MRIS Posted August 31, 2008 Report Share Posted August 31, 2008 + 1 on this feature. Performing MD5 comparisons is actually very efficient. I once designed a duplicate file detection system using nothing but MS-ACCESS and Microsoft's FCIV tool. http://support.microsoft.com/kb/841290 Although the FCIV scan took a few minutes, the actual queries to detect duplicate files using msaccess only took seconds. The beauty of this feature is that files with identical content but with different file names can be backed up in such a way that the content is backed up once, and the different file names can all be kept etc. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.