Jump to content

Is Retrospect deduplication still crippled -- only one execution unit at a time when enabled?


baxsie
 Share

Recommended Posts

Finally side-graded from 16.6 to 18.2

I remember being super excited when Retrospect announced it had enabled deduplication -- only to be dismayed when I found out that it could only use one execution unit going to one backup set to make deduplication work.

Some of our backups take multiple days (using multiple execution units), and backing up all our clients simply would not complete within the 1-week window if we had to use a single execution unit. 

Is this still the case in 18.2 or am I missing something?

(Backup server is a dual Xeon with 12/24 cores, 144GB ram, backing up to one of two 6x4TB removable RAID 5 arrays - recycled and swapped weekly. Clients are a mix of windows PCs.)

When it first starts up, it is pretty CPU heavy:

image.thumb.png.b6c4affbf1e95842c6a0268605b9815c.png

Once it settles down, it is disk write (and possibly ethernet) bound:

image.thumb.png.db3884e6d0f02e14f658837f95b25445.png

 

 

Link to comment
Share on other sites

5 hours ago, baxsie said:

I remember being super excited when Retrospect announced it had enabled deduplication -- only to be dismayed when I found out that it could only use one execution unit going to one backup set to make deduplication work.

As far as I remember, deduplication has always been one of Retrospect's features.

https://en.wikipedia.org/wiki/Data_deduplication

I fail to see what that has to do with the number of execution units per backup set. For disk backup sets there has always been one execution unit per backup set.

So perhaps you could elaborate on what you expected to happen?

Link to comment
Share on other sites

Thank you for your replies.

I am using a disk storage group.

Many of my clients are nearly identical Windows 10 desktops. I would gain a lot by being able to deduplicate across machines.

Is it possible to configure Retrospect 18.2 to deduplicate across clients?

For instance, if 9 clients all have an identical "command.com" file, can Retrospect be set to back that file up only once?

Link to comment
Share on other sites

14 minutes ago, baxsie said:

Is it possible to configure Retrospect 18.2 to deduplicate across clients?

For instance, if 9 clients all have an identical "command.com" file, can Retrospect be set to back that file up only once?

Yes, that is the default (and has always been). 

However, Retrospect also looks at the file metadata, so a file that has identical content, but different modification date (for instance) will be backed up twice.

That means that a lot of Windows' files will be backed up multiple times, even if seemingly identical, because their metadata is different. Windows updates, for instance, does not run on exactly the same time on identical computers, so updated files will be different.

Link to comment
Share on other sites

13 minutes ago, Lennart_T said:

. . . Yes, that is the default (and has always been).  . . .

Do you remember when Retrospect did not have multiple execution units? Basically it was single threaded? For sure deduplication was added some time after multiple execution units were introduced.

I think it only deduplicates within a given backup set. There is a clue in the check box:

image.png.d1093e876e4f80a1aadbe506576143ed.png

Since I am using a disk storage group, there is a unique backup set for each drive (automatically generated, thank you). So I do not think it deduplicates across machines in this case.

When I looked at it after deduplication was first introduced, the only way to make deduplication work across machines was to have all the machines back up to one (huge) backup set, and since multiple execution units cannot backup to a single backup set you are limited to one execution unit.

I sincerely hope I am wrong . . . deduplication would save a ton of time and space if it could be enabled across machines while keeping multiple execution units enabled.

Drat:

https://www.retrospect.com/en/support/kb/storage_groups#_data_deduplication

Quote

Data Deduplication

The architecture for Storage Groups allows simultaneous operations to the same destination because each volume is a different backup set under the hood. However, this workflow also prevents data deduplication across volumes.

 

Link to comment
Share on other sites

25 minutes ago, baxsie said:

Do you remember when Retrospect did not have multiple execution units? Basically it was single threaded? For sure deduplication was added some time after multiple execution units were introduced.

Yes, Yes and No* 

*) I'm almost 100% sure that data deduplication was part of Retrospect 3, which was the first version I used.

25 minutes ago, baxsie said:

I think it only deduplicates within a given backup set.

Yes, of course. A backup set doesn't "know" about any other backup set and its contents.

Link to comment
Share on other sites

OK, so Retrospect only deduplicates within a given backup set, and only one execution unit can access a given backup set.

So if you want to deduplicate across machines, you are limited backing all the machines up to one backup set, using only one execution unit, which will be very slow.

To me, that is crippled 😞

I do not care that everything has to go to one backup set, but limiting to one execution unit would make the job impossibly slow.

Hopefully one day they can deduplicate across clients without the impossible performance penalty.

Link to comment
Share on other sites

For my former employer, I managed Retrospect as a small part of my job.

We had about 70 computers, divided into 4 disk backup sets. The clients were divided by platform (Mac/Windows) and then by department (development vs. others). Each group of clients had their own backup set. So we had four executions running at the same time.

Then we had a tape backup set for off site storage. Every weekend we transferred from each of the four disk backup sets to the tape backup set. The disk backup sets were then groomed.

Perhaps you could do something similar: Having four small backup sets, groomed to keep the last (say) three backups and one large disk backup set, transferring the last backup/snapshot for each client from the small sets to the large every night.

  • Like 1
Link to comment
Share on other sites

  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...