Jump to content

"Don't add duplicate files to the Media Set" is ignored


Recommended Posts

Retrospect 11 Version 11.5.3 (103)

OS X 10.9.5 (13F34)

Collection of OS X and Linux cilents.

 

 

I have "Don't add duplicate files to the Media Set" checked but duplicate files are being added to media sets.

 

Two examples:

 

- a folder of distribution images, .iso and .dmg files that are often 4G DVD images.   They are backed up on the host system script "home" and clients are also backed up using the "home" script.  If I copy one of these images to a client for installation, they are backed up by the "home" script to the same media set holding the host system backups

 

- a folder of binary content that is rarely modified, but new content is added.  I moved this folder to a larger filesystem, added the new location to the script, and the next backup included a backup of all binary content in the folder.  It behaved as if I'd started a new backup from a new media set.

Link to comment
Share on other sites

That is a problem since the OS reports different file metadata to Retrospect for different versions of the OS. 

Also, you get different file metadata for files on a mounted file server disk compared to a local disk (on a Retro client computer or on the Retro server computer).

 

That's why Retrospect sees the files as different and backs them up again.

 

http://en.wikipedia.org/wiki/File_system

http://en.wikipedia.org/wiki/Metadata

Link to comment
Share on other sites

You're suggesting this option will never work, that it's impossible to not recognize a file as a duplicate using a SHA or MD5 hash of the file's contents.     I've worked with *nix filesystems for quite some time and have had few problems using hashes to recognize identical files.  Storing related metadata for a file -- say last access time -- is not rocket science.

 

This also happens when the files are on the same machine.  In my second example above: if I have /Volumes/foo/data, back that up; then make a copy as /Volumes/bar/data, why would /Volumes/bar/data get a full backup?   Simple hash checks would recognize these as duplicate files.

 

An even simpler case:  I download a 4G .iso and it shows up in ~/Downloads.  I copy that file to ~/Public so I can easily download it from other machines on our local network.  There has been zero changes to the file, it's still the same 4G .iso, but the copy in ~/Public will get a full backup as if it were a unique file.

Link to comment
Share on other sites

I'm pretty sure I did make any predictions/suggestions about the future. Let me check.... Nope, I did not.

 

The two volumes foo and bar might have different file systems.

 

The folders ~/Downloads and ~/Public have different permissions, which might cause the moved file to be backed up again.

 

I made a screen shot from the Windows version that might be of interest. I'm pretty sure you can uncheck the corresponding option in the Mac version:

post-8868-0-56308400-1422432023_thumb.png

Link to comment
Share on other sites

Metadata is more than a hash of the data and the access time. Proper restoration of a filesystem includes the owner and group, the mode bits, the ACLs, and as Lennart points out, a few other things that change depending on the filesystem/mount point. (something I was not aware of)

 

Retro is pretty good at this, and tries very hard to restore ALL the metadata. The downside is that the files appear to have changed, and get re-copied when some of this more obscure metadata changes.

  • Like 1
Link to comment
Share on other sites

I've written *nix fs code, most recently for dealing with 40G of log data imported every 24 hours from 300K servers.   I agree that it's quite likely that metadata about a file will change but the contents of the file do not change.  Someone reads the file or makes a copy of the file, and that changes the read data, the location, and the creation time.    However, the *contents* of the file don't change.  (If the contents did change I would be in big trouble with the people reading the stats data I cranked out!)  If the MD5 of the file doesn't change, but the metadata changes, then only back up the metadata changes with a pointer to the original file.

 

Let's say I distribute a UNIX kernel to 300 machines.  It's the same file, but the creation dates and modification dates will all differ by seconds, if not minutes, on each of those machines.  Do I really need to back up 300 copies of the kernel file?  Or do I need 300 copies of the meta data and one copy of the kernel?  (I'm happy to work with Retrospect on a fix for this, I'm a consultant/contract *nix person who started when Bush I was POTUS.)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...