Jump to content

Does Retrospect's data deduplication works that way?


Recommended Posts

Hi,

 

we are interested in Retrospect. But I don't know if Retrospect is the right application. What we need:

 

 

6am:

Full backup is created.

 

7am:

User created a new 10GB file "foo.mp4" in /currentWork.

 

9am:

Backup will run. It will backup "/currentWork/foo.mp4" for the first time.

 

10am:

User will rename "foo.mp4" to "bar.mp4".

 

11am:

Backup will run again. What will happen now? Will Retrospect copy the "bar.mp4" again or will Retrospect recognize, that it has already backed up the file and it was just renamed?

 

12am:

User will move /currentWork/bar.mp4 to its final destination /projects/clientsX/bar.mp4.

 

13am:

Backup will run again. What will happen now? Will Retrospect copy /projects/clientsX/bar.mp4 or will it recognize, that it has already backed up the file?

 

So the final question is, if I will end up with 20GB wasted space or if only 10GB will be used in the backup media.

 

If deduplication will work as expected (only 10GB will be used), will a new full backup add the 10 GB files again or not?

 

What will happen, if I copy the file over via network to another system, which will also be backed up by Retrospect (the two systems are using the same backup set)? Does it recognized, that the file is already in the the set or will Retrospect add a new copy...

 

 

 

Last thing - Virtual Machines:

For example we have the folder /vms which contains our VMs :)

So there are many large files (the virtual disks). When you just boot one VM, the file will change. What will happen? Will Retrospect treats that modified file as a new file, so it will copy the entire file or only the changed blocks?

 

Many questions, I hope someone is able to answer.

 

Thanks.

Link to comment
Share on other sites

Retrospect uses several matching criteria to find new or changed files. If one of the criteria has been changed, Retrospect will back up the file again. On Windows, Retrospect looks at creation date and time, modified date and time, size and name. If match only in same location option is set, Retrospect matches on the path, volume name and drive letter also.

By default, the archive attribute is not used as a matching criteria in Windows, allowing for true and reliable backups to multiple backup sets.

So you will get a backup at 9am and at 11am, but not at 1pm.

As for the VM files, Retrospect will backup the whole files again. Install and run a Retrospect Client in the virtual machine instead. Never backup VM files "as is".

Link to comment
Share on other sites

So you will get a backup at 9am and at 11am, but not at 1pm

 

Just to clear up the original post, one hour after "11:00 am" is "12:00 pm" and one hour after that is "1:00 PM"

 

User will move /currentWork/bar.mp4 to its final destination /projects/clientsX/bar.mp4

 

If you are using "/" and "move" accurately (which would be an odd folder structure for a modern OS X install but not impossible) then Lennart's information is (as usual) correct; the file would match and not be copied again.

Same if you moved the file from <~/currentWork/bar.mp4> to <~/projects/clientsX/bar.mp4>.

 

 

What will happen, if I copy the file over via network to another system, which will also be backed up by Retrospect (the two systems are using the same backup set)? Does it recognized, that the file is already in the the set or will Retrospect add a new copy...

 

Is the question: Computers "A" and "B" are both running the Retrospect Client software, and each is being backed up to the same Media Set by the Retrospect Engine running on computer "X". Using a file sharing protocol (such as AFP or CIFS) I copy the file from a shared volume on "A" onto "B" (or vice versa), will the file match?

 

I suppose it would depend on the behavior of the operating system configuration and status of computers "A" and "B". I assume modern hard drive formatting is consistent in block allocation size nowadays, but my knowledge of that has lapsed over time. Maybe a ginormous RAID volume could use a different block size then a "small" 2 TB (LOL!) drive. But as long as the file size ends up being the same (to the byte), that the copy operation didn't result in a changed modification date, and that you didn't change Retrospect's default settings in regards to location matching, then yes, it would Match and _not_ copy.

 

Copying the file could easily result in a change of the POSIX ownership of the file, and I do know that Retrospect "Classic" on OS X (versions 5 & 6) maintained ownership in the Catalog to prevent recopying a file who's only modification was a permission change, but frankly I don't know for sure and haven't tested to see if this behavior is the same in 8/9. You might need to disable the "Use attribute modification date when matching" to get the results you want.

 

Dave

Link to comment
Share on other sites

  • 4 weeks later...

Does the number of saved 'snap shots' have any bearing on the amount of data saved?

 

I have backup sets that save the last 4 backups.

 

As an example, client data is 160GB. The client proactive backup runs to 90GB backed up then the client shuts off the computer. The next day the job runs and one would think only 70GB would need to be backed up...but instead it registers 130GB (or even the full 160GB) need to be backed up.

 

If this is not a similar thread, I'll gladly start a new post. Thanks.

Link to comment
Share on other sites

Does the number of saved 'snap shots' have any bearing on the amount of data saved?

No, it should not.

 

 

 

As an example, client data is 160GB. The client proactive backup runs to 90GB backed up then the client shuts off the computer. The next day the job runs and one would think only 70GB would need to be backed up...but instead it registers 130GB (or even the full 160GB) need to be backed up.

Well, which is it? 130GB or 160 GB the second time?

 

How much of the original 160GB has changed during the time the computer was off site?

Link to comment
Share on other sites

The numbers I included are random...as is the variation in the amount actually backed up - so, the 130 could be 160.

 

One would not think that 100+GB of a users original data would change within a week or two.

It just seems that more often than not, large 2nd,3rd and/or 4th snapshots are rather large.

 

I guess matching and deduplication could be a little better.

Link to comment
Share on other sites

The numbers I included are random...as is the variation in the amount actually backed up - so, the 130 could be 160.

It would be easier to understand if you provided actual figures, instead of making them up.

 

 

 

One would not think that 100+GB of a users original data would change within a week or two.

Right.

 

One thing that might be worth looking into is virtual disks. Say you run Parallels Desktop (or VMWare Fusion) and have a 50GB virtual disk for that. As soon as you run that VM, the file changes and needs to be backed up again. All 50GB!

So I always make sure I exclude such files from my backups.

 

 

 

It just seems that more often than not, large 2nd,3rd and/or 4th snapshots are rather large.

That is not my experience. It works for me. But I run mostly Retrospect on Windows. On Mac, I upgraded to version 9 this weekend and has not interrupted any 9 backups yet. (Interrupted version 8 backups were just fine.)

 

 

 

I guess matching and deduplication could be a little better.

Well, it is fine here. So it is probably something in your setup that causes the problem.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...