Jump to content

Grooming Questions

Recommended Posts

Im experimenting with grooming. I havent gotten a solid grip on how exactly it works, and how trustworthy it is. The "Getting Started Guide" doesn't mention Grooming - I stumbled across the Grooming options on my own.



Id like for those of you who are incorporating Grooming into your backup routines to please chime in and discuss all things related to Grooming - The good bad and the ugly.



Anyone care to comment on your Grooming experiences? Im curious to hear about how often you enable Grooming, how long the process takes, and what sizes of data you are backing up.



How does EMC's "Smart Incremental" technology and Grooming compare to competitors technologies such as Synthetic Backups? Just curious if the paradigms are similar.



My backup routine: I have 1 script (with Matching enabled) that runs 5 nights a week and it backs up a specific Mac production server Xserve over the LAN. The server (client) has 200 GB data. I have a single media set which is stored on a remote NAS that mounts on the R8 backup server. The server (client) data doesn't change too much, so my incrementals are fairly small and the window is fast - relatively speaking. I work in a 9-5 shop. Backups run at night. I have been considering using Grooming to 'prune' data that is 30 days old or older. So I may consider Grooming after 20 backups have ran (~1 month of backups) Im trying to keep a 4 week retention on data and Grooming might do the trick, but Im not confident yet on how it works. Does this sound like a reasonable plan?

Edited by Guest
Link to comment
Share on other sites

I'm using Grooming -- on a regular basis. I have all my sets using the custom "groom after X" backup setting -- not the retrospect default grooming.


My overall comments about it:


1) The more *files* you have in your media set -- the longer grooming will take. It's not the *size* of the set -- it's the number of files.


2) If you are running other activities while grooming is going on -- grooming will take longer. Grooming seems to want to use as much RAM as possible for it's action.


3) If something crashes during a groom -- immediately rebuild your catalog file. Do not attempt to backup to that media set (or attempt a regroom) until you've done this.



My current setup:


I have 6 sets I am grooming weekly. I groom one a day for four days and then the last two on the fifth day (I do two at once as those two sets have the smallest number of *files* in them)


I start these at 6 p.m. At the *moment* -- the one-a-day grooms take about 3 hours to run (but, based on what is *in* the media sets, I'm currently not grooming very much data) The two-at-once takes slightly over an hour.



My media sets (the one-a-day) basically look like this:


A -- 1.0M files (181G)

B -- 1.1M files (183G)

C -- 0.9M files (185G)

D -- 1.0M files (181G)


All of the above are set to keep "60" backups per client. Not all clients are at 60 backups yet (laptop/vacation users)


My two-a-day sets are:


E -- 16K files (191G)

F -- 82K files (580G)


E is set to retain *90* backups (not there yet). F is set for "60" and most clients are at that range.



I've been grooming since the beginning. The only problems I had of note (in the *release* builds) were when my media sets were damaged.


This caused the groom to not work and report errors. The only way I really found this out, though, was by when I recatalogged the media sets -- which reported problems.


I've since cycled those media sets out.



I've put a number of suggestions in the forum about ways *I'd* like to see grooming made better. For what it does, though, it's working.



I know others who have media sets that have over 4M files -- those grooms take 24 hours or so...

Link to comment
Share on other sites

Oh, and I've also *manually* groomed sets by deleting "Past Backups" and then running a groom.


(Usually, I've only done that when I've noticed a clients incremental daily backup was exceptionally large -- because they downloaded something they shouldn't have).


If you *manually delete* Past Backups, it will take a while (a long while) for the catalog file to update accordingly before you can do the groom.


It depends on where the Past Backup is, though. If it's the *last backup* -- it's fairly quick.


If you decide to manually delete out the 45th Past Backup out of 75 backups -- be prepared for a *long* time for the catalog file to update accordingly. And *then* you can groom.



Link to comment
Share on other sites



Thanks for the details on your routines. Much appreciated.



Do you backup to tape or disk?



You mentioned that your scripts are set to keep "60" or "90" backups per client. Does this basically equal to a 60 day or 90 day retention? Im curious if you have a SLA or policy at your company for how long you are expected to archive data.



When does the Grooming process occur? Before a backup runs? After a backup runs? Or on its own via some "intelligent" logic? Im trying to understand if Grooming is executed via backup preflight/postflight scripts specifically or if R8 just "knows" what to do with the Media Sets. I don't see any functions in the backup Scripts to do the Grooming. It looks like Grooming is a "hard-coded" attribute of the Media Set themselves. How does that work?



What exactly does "Groom to Retrospect defined policy" mean? Is there a global Groom policy that can be applied to ALL Media Sets?



Based on the worked of the Grooming options in the R8 console, I assume that you can enable Grooming based on how many TIMES a backup has been performed on a given Media Set, but can you Groom based on the AGE of the data (i.e.; based on timestamp metadata of when it was backed up)?



For me I will have a simple daily backup routine that will more or less have a 1:1 ratio of days to backups (5 days = 5 backups - easy to keep track of). So if I Groom every 20 backups that would equate to roughly once a month (assuming I backup 5 days a week). Does that sound right? This would be trickier to keep track of if I were backing up every 2 weeks or alternating Media Sets, etc. My point is that I could see situations where someone may be more interested in the backup dates rather than how many times the data has been backed up. Im just thinking of different possible situations I might experience in the future. Just curious on your thoughts on looking at the Grooming process from the perspective of the number of backups as opposed to actual backup dates.



Your comments about Catalog corruption makes me nervous. I ran Retro 6 (and earlier versions) for many years back in the Dantz days and very rarely had and any catalog problems. Of course back then I didnt do disk-to-disk backups. Everything went to tape in a very linear way. Files were smaller, etc.



On a related note regarding the Backup Options tab: What's the difference between the option to "Match source files against the Media Set" and the option to "Don't add duplicate files to the Media Set"? They seem like the would do the same thing, based on the way they are worded. Can you clarify please?




Edited by Guest
Link to comment
Share on other sites

1) I back up to disk. I have no experience with grooming tape media sets.


2) From here, a setting grooming to "60" usually means a 90-day retention. Our official policy is to actually keep 90-days or 60-backups -- whichever comes first -- for our client machines. We keep our *server* data longer.


3) The grooming process either occurs when you script it to run *or* when you run out of space on your media set (or the disk housing the media sets). There's a formula somewhere that indicates when this kicks in.


To script a groom -- you create a groom *script* -- it's one of the "new script" options.


4) The defined policy is something like: keep the last 7 days, then a backup from one day for the last 4 weeks, then a backup from every previous month -- space permitting. It's around somewhere in one of the forum postings and I think if you mouse-over the groom options, the yellow pop-up box explains this.


You have to turn grooming *on* for a media set. It's off by default.


5) There is currently no "age of backup" grooming option. It's one of my many requests for future enhancement. You can either only groom to the "defined policy" or by keeping an explicit number of backups per client.


You ask about the perspective of "number" vs. "dates". My retention policy is only to really want to keep 60 backups of client data (2 months of daily backup).



Remember, if you groom an older backup, that does not remove *all* the files from that backup. It only grooms out the files that are *unique* to that specific backup. So you can still have files that are decades old on the machine in the media set -- as long as they exist on the source in later backups.


My reasoning for having a "date" flag vs. a "time" flag is so I could actually say "90 days" instead of having to *manually* delete a past backup that is 91 days old -- but where the client hasn't had 60 backups yet.



6) Corruption -- if your engine is not crashing, you are unlikely to have catalog corruption. If you are backing up to *tape*, you are probably (?) even more unlikely to have *media set* corruption. You can always occasionally rebuild your catalog files if you are worrying about it.



7) You know, I'm not sure. I *think* the "don't add duplicates" would mean if you had the exact same file on your source in multiple locations, it will only back up the file once (but keep track of where the other files are".


Which is different from "match source" which means if the file is *already in the media set* -- then don't back it up again.


(Somebody else might want to clarify that one. I've honestly never thought about it...)


Link to comment
Share on other sites

Thanks Maser


"7) You know, I'm not sure. I *think* the "don't add duplicates" would mean if you had the exact same file on your source in multiple locations, it will only back up the file once (but keep track of where the other files are".


Which is different from "match source" which means if the file is *already in the media set* -- then don't back it up again."




Thats what I was thinking too, but there is already an option to do this in the R8 console. Its called:


"Match only files in the same location/path"



So I guess this means that


A) if the exact same file is somewhere else on the *Media Set* then don't back it up, instead, just refer to the original file - Regardless of which CLIENT the file was backed up from (i.e.; 5 client computers are all working on the same Photoshop file and they all have the same version on their Desktop at some point in time during a given backup.


Or perhaps its has a meaning more specific to a client and not a Media Set? Like this:


B) If the exact same file is on the SAME CLIENT but in 2 or more locations on the client, then dont back it up again, instead, just refer to the original file. An example of this may be 2 users on a given client Sue and Joe have the same file located in /Users/sue/Desktop/file1.psd and /Users/joe/Desktop/file1.psd





Link to comment
Share on other sites

Hypothetically, if I have a Media Set that has the option enabled to explicitly Groom every 30 backups, it wont groom on its own after 30 said backups, but instead it will continue to grow forever and then groom back to the most recent 30 days ONLY when the Media Set gets full (unless I specify otherwise via a script)?



Let me change-up the scenario a little:


Lets say that I have a Media Set that DOES NOT have the option enabled explicitly to Groom, can I run a Grooming script on that Media Set, even if Grooming is disabled on that Media Set?



Speaking of Grooming scripts: Would you suggest setting the Grooming script to run on a schedule (i.e.; every 30 or 60 days - based on data retention policies, etc) or run it manually? Im curious as to what happens if a backup job that uses that particular Media Set happens to be running when the Groom script starts up to try and Groom that same Media Set (or visa-versa).



By the way - there is a related option called "Lock for Grooming" which is located under the "Past Backups" section of the R8 Admin Console. If you click on a past backup from the list and then click the "Options" tab below it you will see a checkbox called "Lock for Grooming". What exactly does this do and how would it be incorporated into a Grooming strategy?



Edited by Guest
Link to comment
Share on other sites

Not in front of R8 at the moment, but from memory:


1) Your hypothetical scenario is correct.


2) If you don't have groming turned on, then you can't run a script on the set. You don't see all the "Past Backups" without Grooming on, for instance.


3) I can't say which method (scripted or manual) grooming would work for you. It would depend on how you are backing up and how much time it would take to groom your media set, etc. Some people never manually run/script grooms and just let it wait until the disk is full and it runs automatically.


4) If the media set is "in use" by some other activity (backup/restore, etc), then a grooming script would *not* run until that is finished. Vice-versa: When a set is in the process of grooming, you can't backup/restore from the set.


5) I *believe* "Lock From Grooming will keep that Past Backup from ever being groomed. That said -- I never tried it personally so I can't say it does/doesn't work as advertised.


As to how you would incorporate that -- I assume if you knew you only had data backed up on a specific data which was then deleted from the *source* -- this would be a way to keep that data in that set in perpetuity. that would depend on your data as to if you would use it accordingly.



Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...