Jump to content

Shrink snapshots of existing and new backups


jez

Recommended Posts

Hi,

 

I have been using Retrospect for about 8 years now and have a variety of disk backups. The problem is they are huge, and, it turns out, unnecessarily so, because of how Retrospect works.

 

Here is what is happening, and suggestions for how it could easily be improved and also give users a reason to pay for an upgrade.

 

I (and probably most users backing up to disk) will back up from one or two volumes (C and D drives e.g.), to a disk set on an external drive. I know that historically backup was more to tape, but these days it isn't, and Retrospect has not really been adapted to this world, even though it is bundled with endless disk drive solutions.

 

Snapshots are currently 400 MB on my Windows machine. That is huge. But without a snapshot, Retrospect is essentially useless right now. I have analysed what is happening and:

 

- There are 450,000 files on the computer. 250,000 of these are part of Windows and Program Files, so I can not get rid of them.

- When I turn off all the snapshot 'additions', such as system state, folder security info, the backup drops to 125 MB - still huge (and sadly I only just discovered this and turned it off, so I have been unnecessarily chewing up even more storage space)

- My daily backup is about 60 MB of changed stuff, so the snapshots are still about 2/3 of my backup storage requirement.

 

However, a snapshot contains ALL the files on the computer, even though I am only backing up a small number of folders, representing about a third of the files on the computer, and I have no use for the bulk of the snapshot. That indicates that the snapshots only need to be about 40MB in size, for the backups I am doing. That would more than HALVE the size of the backups, and also make them considerably quicker.

 

The reason the snapshots are there is so you can restore the whole system - but this is not something that I want to be able to do, to any day in the last umpteen. However I DO want to access my own files to greater granularity (I currently have the last 8 years of them, daily, and pruning out the snapshots is something I do not want to do.

 

There is a very simple solution - instead of storing a snapshot of ALL the files, simply store a snapshot of what TOP LEVEL FOLDERS are backed up in each session, and all the tree under those folders. That would reduce the size of snapshots drastically (assuming users also turned off the System state and other unneeded options).

 

Clearly this is already technically possible, because it is essentially what happens when you create subvolumes. But the subvolume approach is NOT what I want to do, because I want to back up EVERYTHING except the things I do not want to. That way it is safe, for example if I make a new top level directory, as that would be included. Also the subvolume thing, e.g. with 10 top level directories, is very unmanageable with tons of 'snapshots', and you would have to make subvolumes of things like Users and so on, and who knows where Windows will put things in the future. The only practical way to operate is to backup everything except things you definitely do not want.

 

There is a secondary thing, which is that the snapshots (with just the file listings, not the system state etc.) will compress 5x - so my 125MB .rdb file with the snapshot in will compress down to 18MB. That is 6x compression.

 

End result is that my snapshots could be 18MB and my data 60MB, compared to the current minimum of 125MB and 60MB. So overall less than half the storage space would be required. I suspect that for most users the gains would be larger, as most of them will not be backing up over 200,000 files...

 

The huge snapshots created by the default settings are disastrous these days. Even with the system state off, they are still hugely bigger than they need to be. The solutions are simple and would improve the product drastically. Please consider them.

 

To add - this sort of improvement would be a major reason to upgrade. I finally upgraded from v7 to v11 and found no advantages at all - very sad - I hoped things would be smaller and quicker.

 

Also, to give an additional reason to upgrade, you could provide a 'Shrink Snapshots' option. This would:

- have options to remove system state and file / folder security information from existing snapshots

- Have options just to compress existing snapshots (my 400mb snapshots zip down to 70mb, even with system state etc. in there - note that I am using software compression in Retrospect)

- Convert the current file listings in the snapshots into much smaller ones per my suggestion earlier

 

That would give someone an easy way to recover huge amounts of disk space - more than half, in my case, and judging by comments on various forums, lots of other people would also benefit from these improvements.

 

Final thing, as an aside - having a way to select multiple snapshots, either to 'forget' or to 'retrieve' is such a simple thing to do, but we have to deal with them one by one - exceedingly tedious.

 

I tried using NTFS file compression, but that would only work for my USB attached drive, not for my NAS. Also the performance takes a hit, especially as it would be trying to compress other files that would be largely incompressible - it needs to be built into Retrospect to do it optimally.

 

Regards,

Jeremy Kenyon

  • Like 2
Link to comment
Share on other sites

However, a snapshot contains ALL the files on the computer, even though I am only backing up a small number of folders, 

 

The reason for containing all files is that you may have moved them around to different folders. You may also have changed system state and file permissions on the files. None of these changes causes the files to be backed up again.

When restoring, you want the files in the correct position (folder) and with the correct permissions (etc). You do not want them in the folder they were when they were backed up several years ago and you want the current permissions to be restored, not the ones used years ago.

 

Those are the reasons for a snapshot to contain all files and those are very good reasons, in my opinion.

Link to comment
Share on other sites

Snapshots are currently 400 MB on my Windows machine. That is huge.

 

 

- My daily backup is about 60 MB of changed stuff, 

Let's round your 460 MB upward to 500MB, or 0.5 GB.

Let's say you buy a hard drive of 4 TB, or 4000 GB

 

That means you can perform 8000 of your backups onto that drive. Since you have two source drives (C and D), you can use that drive for 4000 days, or more than 10 years.

 

So I'm sorry, but I don't really see your problem.

Link to comment
Share on other sites

The reason for containing all files is that you may have moved them around to different folders. You may also have changed system state and file permissions on the files. None of these changes causes the files to be backed up again.

When restoring, you want the files in the correct position (folder) and with the correct permissions (etc). You do not want them in the folder they were when they were backed up several years ago and you want the current permissions to be restored, not the ones used years ago.

 

Those are the reasons for a snapshot to contain all files and those are very good reasons, in my opinion.

I second what Lennart said.  We live in both the Restrospect and Rapid Recovery worlds.  Particularly when the system is capable of bare metal restores, one of the best things that ever happened to us was the fact that Rapid Recovery does not allow the admin to pick and choose any files to include or omit; that is just too big an opening for human error.  You must back up all files on a volume, or none.  Yes, it will require more backup disk capacity but that is the trade-off for the extra insurance against human error that this affords.  And as most of us who have been in this business awhile know, human error needs no invitation, it walks in and tries to wreak havoc whenever it gets the chance.

Link to comment
Share on other sites

The reason for containing all files is that you may have moved them around to different folders. You may also have changed system state and file permissions on the files. None of these changes causes the files to be backed up again.

When restoring, you want the files in the correct position (folder) and with the correct permissions (etc). You do not want them in the folder they were when they were backed up several years ago and you want the current permissions to be restored, not the ones used years ago.

 

Those are the reasons for a snapshot to contain all files and those are very good reasons, in my opinion.

Your logic is incorrect. I want to back up particular folders on a daily granularity. I know those folders do not and never will include Windows or Program Files etc. So I do not want to waste what amounts to a third of my total backup space storing snapshots of the folder contents of those folders.

 

Moving files around makes no difference to what I need in a snapshot. I want to back up what I want to back up, and nothing else. If I move files to a different top level folder, then the next snapshot would include that new top level folder and its tree. At no point do I need a snapshot of a tree I do NOT want to back up.

 

If you could give a realistic scenario where my suggestion would not work, please do so, because I can not see one for my needs.

Link to comment
Share on other sites

I second what Lennart said.  We live in both the Restrospect and Rapid Recovery worlds.  Particularly when the system is capable of bare metal restores, one of the best things that ever happened to us was the fact that Rapid Recovery does not allow the admin to pick and choose any files to include or omit; that is just too big an opening for human error.  You must back up all files on a volume, or none.  Yes, it will require more backup disk capacity but that is the trade-off for the extra insurance against human error that this affords.  And as most of us who have been in this business awhile know, human error needs no invitation, it walks in and tries to wreak havoc whenever it gets the chance.

There are two different needs with backups.

 

The first is to restore a system to a previous state. This is actually not the most important thing for a lot of people. It is also hugely space consuming if you keep a long history (I currently have 8 years of daily backups).

 

The second is to back up your 'work' - and this is both to prevent loss, and allow you to see the previous versions of things. That is where I want fine granularity for a long time.

 

What you are both talking about is the former, not the latter. I run two daily backups to disk - a system one with one set of rules, and a work one with other rules. In particular the work one filters out a lot of things, including windows, programs and so on, but it does it by exclusion, so any new folders are automatically included.

 

You also have to remember there are two very different markets for Retrospect - the home user (or indeed the small company) have different requirements to people backing up dozens of systems.

Link to comment
Share on other sites

Let's round your 460 MB upward to 500MB, or 0.5 GB.

Let's say you buy a hard drive of 4 TB, or 4000 GB

 

That means you can perform 8000 of your backups onto that drive. Since you have two source drives (C and D), you can use that drive for 4000 days, or more than 10 years.

 

So I'm sorry, but I don't really see your problem.

 

The backup I was describing is my 'work files' backup. In addition I have a system backup, which I do not keep as long a history for, which is much larger per day (over 1gb).

 

Here are 7 reasons the problem is very real.

 

1. You can not rely on a single external disk for backups. You need at least two, which doubles the cost. The bigger the disks, the more the cost.

 

2. You are assuming the software is perfect. It is not. It has had a variety of problems over the years that can result in a corrupt backup. To counter this I run two independent backups, identical settings, to two different disks, on alternate days. I have learnt over the years how to recover from failed backups, by deleting the most recent rdb files, but it is not a satisfactory solution.

 

3. You are also assuming you can incrementally back up forever. However Retrospect has had bugs over the years that result in it not noticing some files have changed. I was astonished at this, but I have recovered from backups only to find that some of the files are different in the backup, but retrospect has not noticed so the backup has become incorrect (and does not self-correct). It is quite specific what can be different and is not due to disk faults in the backup. The end result is that I restart my backup chain every couple of years or so, which means an initial huge backup again. This also protects against the problem of an rdb file going corrupt, invalidating all subsequent backups.

 

4. You are assuming 10 years is a long time. I currently have 8 years backups, and for a variety of reasons I want to keep that history going. It can be for IP protection reasons, to have appropriate evidence, or just because I may have accidentally deleted a folder of photos and didn't notice - who knows - the point is we don't know, which is why we back up.

 

5. You are ignoring the performance aspect - backing up to a network drive not only hammers the network, but it is slow. Backing up 100mb instead of 500mb would make a substantial difference. Even to a local disk, writing less is a good idea.

 

6. Your vulnerability to a media error is largely proportional to the amount of media you use. If you halve the media used, you halve the chance of losing data due to a bad block (crudely speaking of course).

 

7. A typical home user who invests in a large external disk will use it for more than just backups. I am sure most people use them for temporary space, or other storage they may not need backing up.

 

Retrospect 11.5 has introduced a small change to compression, but it has left my system snapshots the same size. It has shrunk the file listing component significantly, maybe to a third of what it was (40mb instead of 125mb) - not as good as it being zipped, but I guess it is performance vs size - though cpu speed these days probably means it should put more effort into compression.

Link to comment
Share on other sites

The versions released this week now include automatic compressing of the backup metadata saved to the backup media. We have found at this metadata gets about 80% compression.

 

Hi Robin, I have been testing the new 11.5 version today. However, while it has reduced the size of the snapshot without any system data or meta data, it has not made a significant difference to the snapshot including all the system data (the various snapshot options turned on). The last snapshot was 441mb with 11.5, and the one before (with 11.0.1.106) was 437mb, so it has grown slightly in fact. Both can be zipped by a factor of 6. It is as if somehow the new compression does not kick in in some circumstances?

Jeremy

Link to comment
Share on other sites

 I want to back up particular folders on a daily granularity. 

 

Then I suggest you take a look at subvolumes. Using a subvolume your snapshot would contain only the files contained in your subvolume. Create a subvolume of your "work" folder would do the trick.

 

Of course you do want to backup the entire C: for disaster recovery, but you don't need to that as often. 

Link to comment
Share on other sites

Then I suggest you take a look at subvolumes. Using a subvolume your snapshot would contain only the files contained in your subvolume. Create a subvolume of your "work" folder would do the trick.

 

Of course you do want to backup the entire C: for disaster recovery, but you don't need to that as often. 

 

I explored this before, and as I said in the original post, subvolumes are not practical, for two reasons:
- You need one for each top level folder, which very quickly becomes unmanageable (I would need about 10), and you end up with a sea of clutter. It is also a pain to do restores and such like then, as you have to do it snapshot at a time.
- If you add a new top level folder it won't be backed up unless you remember, so it is not 'fail safe'.
 
What subvolumes do is prove how easy it would be for Retrospect to implement a proper snapshot that only has the top level folders you were backing up. The list of files and folders to backup is already calculated early in the backup process, just extract the top level folders from it. There is already a mechanism to snapshot a folder, as used by subvolumes. Just put the two together... Then apply a decent compression to every part of a snapshot.
 
These are pretty basic things for a backup system really - compressing data as much as possible and not backing up unnecessary things. Of course I am not saying the alternative products are better - there are a lot of rubbish ones - but Retrospect could dominate if they improved some things. 
Link to comment
Share on other sites

I could also write my own backup package...

 

1. There are several folders, no matter what. For example a users folder, a settings folder, an htdocs folder, an inetpub folder as well as my own half dozen top level folders.

2. I am not going to restructure my entire world into one folder and rebuild everything around the new locations, to work around a limitation in a backup suite - I would switch backup system first.

3. As I said, I also want a system backup, and that is huge because of the poor compression of snapshots.

 

This part of the forum is for suggestions for improvements to Retrospect. I have made several very good suggestions and they would benefit lots of users - indeed they would benefit just about every user. They are not difficult improvements to make either, and would increase the robustness of peoples backups (primarily by using less media).

 

Maybe Retrospect will ignore my suggestions, maybe not. I see they have been working on the compression of meta data, so hopefully they will consider the suggestion to compress all parts of snapshots.

 

There have been other people asking why the snapshots are so big, and indeed making a similar suggestion to mine, that you only snapshot the needed folders.

 

I have managed a huge variety of software, successfully out competing other companies - and a part of that is listening to customers, and the problems they face. I know what happens when engineers stop listening to the customers that have a clue - those companies eventually die because other people will one day make better products.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...