Jump to content

TO ALL USERS: New feature proposal: Avoid redundant backup of renamed files.

Recommended Posts


If you think the situation has changed WRT client-side deduplication that detects block-level duplicates across multiple clients, which Retrospect does for file-level duplicates, here is why and how to file a Support Case for the enhancement.  In your Support Case, please be sure to mention the backup software solution you used to use that did this.

You should be aware that Retrospect Inc. no longer sells a VMWare Add-On for Retrospect Windows.  Instead it sells a separate product, which I am only allowed by the head of Retrospect Technical Support to refer to as R. V..  That product, which runs at the VM Manager level, does not even have the concept of Client—as he has informed all of us through me .

Link to comment
Share on other sites

  • 4 months later...

(At long last, I have filed a support case with extensive info on how/why I recommend this change...)

Here's what I wrote:


In 2014 KB article https://www.retrospect.com/en/support/kb/file_vs_block_deduplication your team discusses why you've chosen File-based dedupe, based on name/size/attributes.

I strongly recommend revisiting this decision, due to changes in technology, typical usage, and data volume. AND, there's at least one good implementation in a competitor that demonstrates the advantages of a move to block-based dedupe.

Tech change: hash calcs are *extremely* fast in modern CPU's. Both traditional and new ones (cf CityHash). Multiple GB/sec! Storage technology now easily exceeds 100MB/sec, often to 500MB/sec.

Data volume: with multi-TB drives, large SSD's, 4k video and large photo sensors, it's quite common to toss around many many GB of *noncompressible* files. Many TB in fact. Compression has little value as we move toward photo/video media.

Typical usage: File-metadata dedupe is woefully inadequate today. The 2014 article doesn't cover many common scenarios:

  • the exact same 20MB photo is often renamed as it gets copied to dropbox, shared with others, etc.
  • Multi-MB audio files can have header info significantly changed, while the modify timestamp is retained. (eg every play causes the play count to update in the file! Yes, the access timestamp is updated.
  • Photo and video collections are typically duplicated, renamed, and minor metadata header info is updated, while the bulk of the data blocks are not touched at all.
  • A simple partition copy can cause Retrospect to make a new backup of every file in the partition.
  • Many modern apps use the same DLL. Unfortunately, filenames and timestamps *often* vary (for identical content.) (A great, paid, tool I use for content-dupe-finding is SpaceMan99. Run it against a ? drive on a well-used computer!)
  • C:\Windows\Installer contains duplicates of installed executables, with different names. Many hundred MB.

Right now, all of the above scenarios cause anything from waste to havoc in Retrospect.

One commercial example that handles all of this quite well: CrashPlan. (They left the home and SMB market last year, so not really competing anymore...)

  • The client maintains a local database of file metadata and block hashes.
  • Across the board, a "file" is a [metadata packet] plus N [content blocks].
  • Metadata can be quickly scanned for grooming, storage, recovery purposes
  • Content block references can be efficiently rebuilt as needed via a scan
  • Deterioration of both source and backed-up blocks is easily detected via a scan, and re-backup solves it
  • - Up to N copies of any given content block can be retained for redundancy.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...