Jump to content
NoelC

Grooming Policy Too Simplistic

Recommended Posts

As a relatively new user, the way I expected Retrospect to work is to make just enough room for the latest backup as each new incremental backup runs.

What I'm getting, now that my 8 TB backup drive has filled, is backup failure after backup failure because the backup disk is simply filled up.  It does groom something every time, but apparently my oldest backup wasn't as big as the newest one I need to store.

I understand that this is adjustable via the Grooming policy of the Backup Set.

I have it set to "Groom to Retrospect defined policy" set to 1 month, and Storage-optimized grooming.  Apparently my drive isn't big enough to do that, given the amount of data my system crunches through.  Fair enough.

I know I can switch to "Groom to remove backups older than N", but...

Wouldn't it make more sense if Retrospect would just "Groom the oldest backups off as needed to make just enough room for the new backup"?  Is this what you get if you don't check either box?  If so, it's not at all obvious.

-Noel

Share this post


Link to post
Share on other sites

That would be complex, fraught with error, and have huge potential for unexpected data loss.

It sounds like you've either under-specced your target drive, have too long a retention period, or have a huge amount of data churn. First two are easy enough to sort and, for the last, do you really need to back up all that data?

We have a policy here that if data is transient or can easily be regenerated it should not be backed up. Maybe you could do the same, either by storing it all in a directory that doesn't get backed up or by using rules that match your workflows and requirements to automatically reduce the amount of data you're scraping.

Whilst it would be nice to back up everything and keep it forever, resources often dictate otherwise. So you'll have to find a balance that you (and/or any Compliance Officer you may answer to!) can live with.

Share this post


Link to post
Share on other sites

Being able to accomplish complex things is a reason companies like Retrospect can demand the big bucks.  How many software packages have you paid more than $100 for?  I don't know about you but I want quality and functionality, not ongoing headaches from the professional software I pay for.

And how would this be different than choosing a fixed horizon of N backups, or following even the complex grooming policy that's default?

I'm a software engineer.  I can envision ways to do it.  Microsoft (and before them Digital Equipment Corporation) figured out how to do it with their Volume Snapshot Service going back what, 40 years?  Complex data management is not a new thing, and computers nowadays have more than enough CPU power and I/O speed, and RAM storage.

I suggest that perhaps the one biggest thing people would want in a "set it and forget it" backup is to make reliable backups in such a way as to give the longest possible backup horizon for any given available backup space.

Regarding what to back up...  8 TB of backup space is not a small amount for a single workstation that I'd like to be able to restore should even a catastrophic failure occur.  And in my case at least, the system has many spare hours to do it.

I don't want to have to futz around to find out just how much data I can keep.  That's what I'd like the software to do for me.  Unfortunately, that's not what Retrospect is delivering.  I may have made a mistake in purchasing this software - it's not delivering anywhere near the professional experience I expected based on  recommendations.

-Noel

Share this post


Link to post
Share on other sites

NoelC,

Here is why and how to submit a Support Case for a new feature.  In writing that Support Case for "make reliable backups in such a way as to give the longest possible backup horizon for any given available backup space" , here are some factors you should consider:

  • This type of grooming would have to be done at the end of the Matching phase of a Backup, since it would require pre-determining the amount of space that needs to be groomed out.  Presently Retrospect does grooming during during the Backing Up phase, whenever space has already run out on the Backup Set disk member being used.
  • This type of grooming would have to also comply with the other factors in the Grooming Policy for the Backup Set.  For many installations these are determined by legal requirements—non-existent 40 years ago, which is why Nigel Smith said above "you (and/or any Compliance Officer you may answer to!) can live with."  Does VSS "grooming" now comply with such legal requirements; does DEC (double-merged into HP) "grooming" do so?
  • The factor directly above would mean that the Catalog File would have to contain a list of each backed-up version of each file (which it may already do) plus the size of that version on disk (which it probably now doesn't).  The size would be needed to determine how much space would be saved by grooming out that version, while ensuring  that the Grooming Policy would not be violated by so doing.  An alternative would be to iteratively try a Groom-out of one version of each file, remaining in compliance with the Grooming Policy, while calculating if enough space had now been saved to accommodate the file versions that would be added to the member during the  Backing Up phase; no doubt you can see iteration would be kludgy!  With either approach, the Backing Up phase would have to be skipped entirely if Matching-phase grooming fails; would administrators prefer that?
  • Retrospect "Inc." would surely, because of the development cost considering the above factors, charge for an Add-On license for the feature.  For most installations that would be potential customers for this feature, the cost of the Add-On license would be more than the cost of an additional overflow disk—which for some reason you are dead set against purchasing.

Share this post


Link to post
Share on other sites

Thanks for the info.  I will consider creating a feature request, though having paid for the software I feel I really shouldn't have to work too hard for Retrospect to ask them to make it work reasonably.  Any reasonable software company should be hungry for new ideas.

Regarding what they would charge extra for...  I'm imagining they always should be looking for useful new features to add to the next version and which would add value to their next version and bring in more customers and get paid prior customers to purchase an upgrade.  There is nothing wrong with continuing to add value to a mature product, except that it may become more and more expensive to do as the implementation technology ages.  We can't know whether this feature will be easy or hard to add, but evidence hints that if it can do the grooming to one of several policies now, another policy should not be beyond hope. If policy compliance is an issue, it can be spelled out how the different policies provide such compliance.  I can't believe that customers wouldn't want to make more use of their existing hardware and get more reliable protection from the product for a given set of hardware.

FWIW, I have still not gotten my setup back to where I have complete successes during my nightly incremental backup.  My loss of ability to weather a data loss in this time of futzing around with the settings to try to get it to work means this sophisticated product has fallen on its face in my opinion.  I'm not even sure it IS my fault any more, since I had a period of time where it was successfully grooming and completing backups, which transitioned into a string of failures.  I can only continue fiddling with settings, reducing the horizon manually rather than the useful work I had planned to be doing.

-Noel

Share this post


Link to post
Share on other sites

Y'know what, forget it.  I'm sorry I bothered you helpful guys on the forum.  I'm tired of failure eMail after failure eMail from Retrospect in the middle of the night.  A paid product ought not to just fail after 2 months.

I'm requesting a refund (fat chance since the product took a couple of months to begin to fail, but I'm asking anway) and going back to using Windows VSS-based backup.  It does actually work, even on the newest Windows OS; I exercised it quite effectively to do a restore on my even bigger, more powerful office workstation just this past week.

-Noel

Share this post


Link to post
Share on other sites

NoelC,

If Retrospect "took a couple of months to begin to fail" on your home machine (you mention an "even bigger, more powerful office machine"), I assume that you don't have any legal constraints on data deletion on your home machine.  You haven't answered my question as to whether VSS-based backup can automatically comply with such constraints, but my added P. S.  implies it can't..  Assuming you don't have any legal constraints, and that after 2 months you developed the "data churn" that Nigel Smith mentioned in this post,  here's another simpler suggestion for the feature you could have requested:

Add a checkbox under "Media" at the bottom of the Backup Set Options dialog, named something like "Override with Recycle to avoid Member overflow".  Again this type of "grooming" would have to be done at the end of the Matching phase of a Backup, since it would require pre-determining the amount of space that needs to be groomed out.  If—with the checkbox checked—that space is more than the remaining space on the mounted Backup Set disk Member, convert this Normal backup to a Recycle backup—which erases everything that is already stored on the mounted Backup Set disk Member and then replaces it with the full current contents of the Source.   That's rather extreme "grooming", but checking the checkbox means the administrator doesn't care—because not having to pay for an overflow Member overrides the constraints in the Grooming policy indicated above that checkbox in the dialog.

I assumed that the paragraph immediately above—without the checkbox—was a description of what VSS-based backup does automatically, but the P.S. says otherwise.  Good luck, NoelC, and please don't come back complaining that you had to manually delete your previously-backed-up data from VSS.🤣

P.S.: This Web page, which is really a commercial for AOMEI Backerupper, describes under "The Reasons" why VSS issues an "Insufficient storage available to create either the shadow copy storage file or other shadow copy data" message.  That's the after-two-months problem you've described for your home installation.  Under "The Solutions", the article describes manual steps to take to cope with this message; starting with the second sentence under item 1., it describes a manual process using the command:

vssadmin delete shadows /for= [/oldest | /all | /shadow= ] [/quiet] 

You want Retrospect to do the equivalent of automatically choosing the parameter for one or more executions of that command—excluding /shadow= because automating that'd be impossible.  I've described here an involved way it could iteratively choose  /oldest.    I've described in the second paragraph of this very post a way it could instead choose /all.  For either case, VSS obviously doesn't have any information about legal constraints.  Maybe the non-free Professional editions of AOMEI store such information and use it;  I'll let you explore that question starting with this Web page about upgrades.🤣

Edited by DavidHertzberg
P.S. reveals that NoelC would have to _manually_ run command(s) to make VSS do what he wants Retrospect to do _automatically_; change 3rd and 1st paragraphs because VSS doesn't _automatically_ delete previously-backed-up_

Share this post


Link to post
Share on other sites
On 11/1/2019 at 5:20 PM, NoelC said:

And how would this be different than choosing a fixed horizon of N backups, or following even the complex grooming policy that's default?

Simple example -- you've a system with enough space to account for your expected 5% churn daily, so you set up a grooming policy that keeps things for 14 days to give you some wiggle room. You expect to always be able to restore a file version from 2 weeks ago.

You find out about this whizzy new grooming feature which clears enough space for your latest backups every session, and enable it.

Couple of nights later a client's process (or a typical user!) runs amok and unexpectedly dumps a shedload of data to disk. RS does exactly as asked and, to make space for that data, grooms out 50% of your backups. And suddenly, unexpectedly, that file version form 2 weeks ago is no longer restorable...

But I agree with you -- backups need to reliable, dependable, and behave as expected. Which brings us to...

21 hours ago, NoelC said:

Y'know what, forget it.

To be honest, I don't blame you! If you can't get software to reliably work how you want it to -- particularly, perhaps, backup software -- you should cut your losses and look elsewhere. While I'd love you to continue using RS, your situation isn't mine, your requirements aren't mine, so your best solution may not be mine.

Share this post


Link to post
Share on other sites
On 11/1/2019 at 5:22 AM, Nigel Smith said:

We have a policy here that if data is transient or can easily be regenerated it should not be backed up. Maybe you could do the same, either by storing it all in a directory that doesn't get backed up or by using rules that match your workflows and requirements to automatically reduce the amount of data you're scraping.

 

True enough, but there is no report that identifies those files that have been backed up N times in the last N days/weeks, as in a list that includes file name, directory, file size and dates of recent backups. And it would be even nicer if I could just check a box and Retrospect would automagically groom out these file and even add an exclude item to my backup script.  Yeah, I'm smoking the good stuff now.

I know that when I back up my PROGRAMS drive, which is Windows and installed programs, I'm backing up some daily or weekly updates for my security software and probably other utilities, and also a HUGE number of Windows files with really cryptic names.  But it's too much work to track down these files to determine which are transient.  It's "easier" to spend a few bucks on a larger backup drive.

 

Share this post


Link to post
Share on other sites

x509,

Actually you may be smoking more-powerful stuff than you think you are.😄 

How about having a one-bit flag in the Catalog for a Backup Set that marks a file as "transient" if it has been backed up N times in the last N days/weeks—which it would have been because its file size or contents kept changing while its name and directory stayed the same?  It would be safe (but see the next-to-last paragraph for when it wouldn't be) to keep only the latest backup of such "transient" files—regardless of legal requirements—so long as they aren't in certain directories known to possibly contain business-critical files.  It would probably be safest to have the Windows/Mac variant of Retrospect automatically avoid doing "transient" flagging in such directories.   There would no doubt have to be an additional checkbox for each Backup Set's Grooming Options , with a subsidiary checkbox specifying whether "transient" flagging is to be done on a daily or weekly basis.

There could then be a Built-In Selector (see page 437 of the Retrospect Windows 16 User's Guide; the Retrospect Mac term is Rule), usable only in a Groom script (a Retrospect 15 feature)—as opposed to a Backup or Proactive script, that would be used to Exclude all files marked as "transient" unless they have the date of the latest backup per the Backup Set Catalog.  Such a Groom script could be run after the last of the daily/weekly backups to a Backup Set.

On second thought, for Backup Sets whose Grooming Options have the additional box for "transient" flagging and which has been Groomed for "transients", a Restore would have to use the Catalog File—rather than a Snapshot—for any files flagged as "transient" regardless of whether a previous Snapshot was chosen, in order to restore an entire source volume.  This would not be good for situations in which the source volume has been backed up and "transient"-Groomed since undetected ransomware encrypted it, or in which the latest versions of some applications files turn out to have been erroneously updated by the user—a situation which has happened to me.  That makes this enhancement sound considerably less attractive.

Here is why and how to submit a Support Case for an enhancement.

Edited by DavidHertzberg
Insert next-to-last paragraph about Restore of an entire source volume from a Backup Set that has been Groomed for "transients"—makes enhancement considerably less attractive

Share this post


Link to post
Share on other sites
On 11/17/2019 at 3:47 AM, x509 said:

True enough, but there is no report that identifies those files that have been backed up N times in the last N days/weeks

True enough 😉, but is one really necessary?

"Transient data", in amounts that matter to disk sizing/grooming policies, is usually pretty obvious and a result of your workflow (rather than some background OS thing). Think video capture which you then upload to a server -- no need to back that up on the client too. Or data that you download, process, then throw away -- back up the result of the processing, not the data itself. Home-wise, RS already has a "caches" filter amongst others, and why back up your downloads folder when you can just re-download, etc, etc.

OP ran out of space on an 8TB drive with only a 1 month retention policy. That's either a woefully under-specced drive or a huge amount of churn -- and it's probably the latter:

On 10/31/2019 at 12:54 PM, NoelC said:

given the amount of data my system crunches through

...rather than "given the amount of data on my machine".

Like David, I'd be reluctant to let RS do this choosing for me -- "transient" is very much in the eye of the beholder and, ultimately, requires a value to be placed on that data on that machine before a reasoned decision can be made.

Share this post


Link to post
Share on other sites
On 11/18/2019 at 9:10 AM, Nigel Smith said:

True enough 😉, but is one really necessary?

"Transient data", in amounts that matter to disk sizing/grooming policies, is usually pretty obvious and a result of your workflow (rather than some background OS thing). Think video capture which you then upload to a server -- no need to back that up on the client too. Or data that you download, process, then throw away -- back up the result of the processing, not the data itself. Home-wise, RS already has a "caches" filter amongst others, and why back up your downloads folder when you can just re-download, etc, etc.

Like David, I'd be reluctant to let RS do this choosing for me -- "transient" is very much in the eye of the beholder and, ultimately, requires a value to be placed on that data on that machine before a reasoned decision can be made.

When I back up my Windows 10 partition, which also contains all program installs, I get an enormous number of files in the various Windows directories and subdirectories that get backed up into my PROGRAMS dataset.  I back up daily, yet sometimes there are several GB of such changed files.  These aren't files I downloaded, and they have really long, obscure names.  I'm reluctant to simply exclude the directories that contain these files, since they may be critical to restoring a functional Windows installation.  (Been there, done that with a restore approach that didn't include all critical files.  That restore failed.)

So up to now, my "solution"  has been to shrug my shoulders and buy a bigger backup drive.  As a single-LAN user, I back up my system and all clients to a separate backup drive installed in my system.  I get a new drive each year.  For years, I was fine with a 4 TB drive.  For 2018, with increasing photo file volumes, I got a 6 TB drive.  Even with pruning and then transferring older "Monthly Transaction Data" into my regular DATA dataset, that 6 TB drive is barely enough.  The Monthly Transaction Data is files I create and modify on a daily or near-daily basis.  That's not what I'm concerned about with this "transient data" issue.

So to deal with this issue for now, for 2020, I just bought an 8 TB drive.  With likely photo file growth, I will probably need a 10 TB drive by 2021 or 2022.

If I could manage all this transient data, I would certainly have a smaller PROGRAMS dataset.  Perhaps enough to get by with one size smaller backup drive.

EDIT:  Spell checker made "LAN" into "land."  Fixed with edit. 11-24

x509

Edited by x509
fixed spell checker edit

Share this post


Link to post
Share on other sites

It's about 12 hours after I wrote the last post, just above, and I think I left out an important point.

I don't keep any user data on my PROGRAMS drive.  There is the Windows ProgramData and the \users\phil set of subdirectories, which keep program config and status data. And some of that data does change frequently.  However, these data files are pretty small, usually well under 1 MB.

Right now, as I write this post, a script is backing up the PROGRAMS drive.  

 On one system, there is 13, 262 files and 4.6 GB of backup.  On another system, 10, 533 files and 6.6 GB. Again, this isn't real user data.   I think all these files represent true "transient data,"  where I would need only the latest version or maybe 2 versions for restore purposes.
This is the kind of datathat needs better grooming capabilities.

Share this post


Link to post
Share on other sites
On 11/16/2019 at 10:31 PM, DavidHertzberg said:

x509,

Actually you may be smoking more-powerful stuff than you think you are.😄 

How about having a one-bit flag in the Catalog for a Backup Set that marks a file as "transient" if it has been backed up N times in the last N days/weeks—which it would have been because its file size or contents kept changing while its name and directory stayed the same?  It would be safe (but see the next-to-last paragraph for when it wouldn't be) to keep only the latest backup of such "transient" files—regardless of legal requirements—so long as they aren't in certain directories known to possibly contain business-critical files.  It would probably be safest to have the Windows/Mac variant of Retrospect automatically avoid doing "transient" flagging in such directories.   There would no doubt have to be an additional checkbox for each Backup Set's Grooming Options , with a subsidiary checkbox specifying whether "transient" flagging is to be done on a daily or weekly basis.

There could then be a Built-In Selector (see page 437 of the Retrospect Windows 16 User's Guide; the Retrospect Mac term is Rule), usable only in a Groom script (a Retrospect 15 feature)—as opposed to a Backup or Proactive script, that would be used to Exclude all files marked as "transient" unless they have the date of the latest backup per the Backup Set Catalog.  Such a Groom script could be run after the last of the daily/weekly backups to a Backup Set.

On second thought, for Backup Sets whose Grooming Options have the additional box for "transient" flagging and which has been Groomed for "transients", a Restore would have to use the Catalog File—rather than a Snapshot—for any files flagged as "transient" regardless of whether a previous Snapshot was chosen, in order to restore an entire source volume.  This would not be good for situations in which the source volume has been backed up and "transient"-Groomed since undetected ransomware encrypted it, or in which the latest versions of some applications files turn out to have been erroneously updated by the user—a situation which has happened to me.  That makes this enhancement sound considerably less attractive.

Here is why and how to submit a Support Case for an enhancement.

So I just submitted that feature request.  I'll post updates if/when Retrospect responds.

x509

Share this post


Link to post
Share on other sites
On 11/24/2019 at 12:36 AM, x509 said:

It's about 12 hours after I wrote the last post, just above, and I think I left out an important point.

I don't keep any user data on my PROGRAMS drive.  There is the Windows ProgramData and the \usere\phil set of subdirectories, which keep program config and status data. And some of that data does change frequently.  However, these data files are pretty small, usually well under 1 MB.

Right now, as I write this post, a script is backing up the PROGRAMS drive.  


 On one system, there is 13, 262 files and 4.6 GB of backup.  On another system, 10, 533 files and 6.6 GB. Again, this isn't real user data.   I think all these files represent true "transient data,"  where I would need only the latest version or maybe 2 versions for restore purposes.
This is the kind of datathat needs better grooming capabilities.

x509,

Due to your file hygiene—which might serve as a shining example to the rest of us, IMHO you don't need a new feature. 😀  All you need is a separate Backup Set for the PROGRAMS drive.  You would define the "Grooming Options" for that Backup Set, per pages 380-381 of the Retrospect Windows 16 User's Guide, as Keep Only the Last 2 Backups.  Then, after every one of those Backups, you'd schedule a Groom script per pages 221-223 of the UG.  If you're worried about disasters or ransomware, you could schedule a Transfer Backup Sets script per pages 209-213 of the UG—whose destination Backup Set would go off-site ASAP, before or after that Groom script.  Alternatively you could make every Backup of the PROGRAMS drive a Recycle, but that wouldn't be as good protection against disasters or ransomware—even with a Transfer Backups script—unless you scheduled a Recycle of the PROGRAMS drive periodically.  After using your Disaster Recovery Disk, you'd restore from your PROGRAMS  backup first before restoring from your DATA backup.

Edited by DavidHertzberg
Add sentence to 1st paragraph describing sequence for disaster recovery; add "unless" clause to next-to-last sentence

Share this post


Link to post
Share on other sites

DavidHertzberg,

Thanks for the compliment.  I never thought that anything I do could be a "shining example" for the stalwarts of this group.

I already have a separate backup script for the PROGRAMS drive.  (I have standard partition/naming assignments for all systems in my LAN.  Using the Volumes option, all the various systems' C drives are in the PROGRAMS group, and this group is the target for the PROGRAMS script.  It never occurred to me to take the approach you describe, but it does make sense.  I  like your suggestions and will implement them as soon as I get a chance.  Being cautious, I might retain 3 or 4 backups of each file.

I keep a copy of all my current year datasets on a USB drive that I plug in periodically to back up those datasets.  I didn't think to do a dataset transfer.  Instead I use the Goodsync utility to sync the datasets between my G backup drive and the USB drive.  That way, I avoid having to groom the USB drive.

x509

Share this post


Link to post
Share on other sites
On 11/24/2019 at 5:36 AM, x509 said:

I think all these files represent true "transient data," where I would need only the latest version or maybe 2 versions for restore purposes.

If you need to restore it (ie can't just copy it from elsewhere or easily regenerate it from other sources) then it isn't transient data -- and you've already assigned it a "value" of "I need the last two versions". IMO, David's nailed it -- separate script with its own grooming strategy.

Because of the way we back up, I've never used grooming beyond a quick play. Is it possible to specify one or more strategies that apply to the same backup set to, for example, keep x versions of files in directory /fooy versions of everything in /bar, and z versions of everything else?

As for your overall backup/management strategy, I can only echo David -- awesome! Would that I were as conscientious...

Share this post


Link to post
Share on other sites

Retrospect response to my feature request:

You can view a listing of what file are backed up on specific days by going to Reports>Session contents.  You can browse the session for a specific backup to view a listing of every file copied from that disk on that day.

 

Please let us know if you have any additional questions.

 

Share this post


Link to post
Share on other sites

David and everyone else,

I followed David's suggestion and set the Grooming policy for my PROGRAMS dataset to 6, wanting to be very, very conservative about keeping the ability to do restores from not just the last few days.  And I selected Storage-Optimized because I'm almost out of space on my backup drive.  Anyway, before the backup, PROGRAMS was 1156 GB, over 20% of my 5.45 (net) TB backup drive.  The grooming operation removed 675 GB from this backup set.  so now I'm done to a net 481 GB for this backup set and I have lots more free space on the backup drive.

Part of my "backup hygiene" is that I use a new backup drive each year, starting Jan. 1.  So this grooming operation means I won't be running out of space before year-end.

Amazing.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×