Jump to content

Grooming Still Dead Slow In V9?


hvar

Recommended Posts

While being a nice feature grooming was so slow in version 8 that we never could use it. I just installed version 9 (demo) and tried to turn on grooming for one backup set made with retrospect 8.

 

It has now run for over 50 hours at 100 - 120% CPU, but I do not know much about what it is actually doing - nor when it will ever finish.

 

Old 4-core Mac Pro with 7GB RAM. Backupset is 8.5TB data on a decent SATA-RAID. Media set is 415MB.

 

Anyone getting faster ("working") grooming in version 9?

Link to comment
Share on other sites

Guest Steve Maser

For me, Grooming works exactly the same in 9 as it did in 8.2 -- no faster, no slower (and I groom some media sets weekly and others once a month.)

 

Grooming speed is proportional to the number of *files* in the media set -- not the *size* of the backup set.

 

How many *files* are in this media set you are grooming?

Link to comment
Share on other sites

Backupset is 8.5TB data on a decent SATA-RAID. Media set is 415MB

 

This is unclear.

 

A "Backup Set" is the _set_ of the Catalog file and the Members of media on which the backed up data is written.

 

So in your case, is it a Disk Media Set with the first (and presumably only) Member stored on the RAID, with a 415 MB Catalog file?

 

Steve is right about the number of files being the critical component, and 8.5TB of data could be a lot of files, and take a long time.

Link to comment
Share on other sites

Guest Steve Maser

Here's a real-world example of grooming from my server (2G core2duo mac mini with 4G RAM):

 

I have one script that grooms two media sets. Set A has 23K files (catalog file 235M compressed), Set B has 222K files (catalog file 4.6G compressed). When I run this groom script, it takes 2:20 to groom both sets (I know Set A goes fairly quickly, though -- last time I watched it it took about 30 minutes...) Usually, no other activities are running during these grooms.

 

The other groom I ran with Retro 9 was one of my larger sets that has 1.46M files (catalog file is about 6G compressed) -- this groom run took 15:47. There are usually a couple of backups to other media sets that happen during the grooming of my larger sets.

 

 

90+% of the time grooming is spent "preparing" the groom. The actual *grooming of the files* usually is around 10% of the total time for me. You can see the status of the activity to see where things are.

 

 

No better/worse under Retrospect 9 than any numbers I've seen when using Retrospect 8.2. One of these days I'll replace my Mac mini with a new one and I'm hopeful that the CPU increase will make the "preparing" phase go faster.

Link to comment
Share on other sites

After one week (150 hours!) it is still grooming. I think. One CPU core is still 100% and it does update its file count in between.

 

I do not know how many files there is in this backup. Probably some millions. I guess grooming a backup with this amount of files is out of reach for Retrospect and I should approach this differently?

 

Before I cancel it: Is every round of grooming a start from scratch, or is it only the "first time" that will take a long time?

Edited by hvar
Link to comment
Share on other sites

Guest Steve Maser

Define "millions"? I did a groom of another one of my media sets over the weekend and it took 21 hours. This media set -- while not my largest in *size* -- had some of the most files in it.

 

It also depends (somewhat) on what your groom settings are. Are you grooming to keep a fixed number of backups (which is what I do) or to use the "defined policy"?

 

 

As for canceling a groom -- you *can* cancel a groom *unless* it's at the point where it's actually grooming files. If it's still preparing the groom, you can stop it without any problems. That will give you the opportunity to see how many files are in your media set.

 

So, just for my last two media sets, this may give you an idea for comparison:

 

set with 1.47M files (post groom) -- took 15:47 to groom (and my sets are set to keep 60 backups and I groom them once a month)

 

set with 1.57M files (post groom) -- took 21 hours to groom.

 

(My other larger sets have 1.6 and 2.1M files in them, but I won't have retro 9 grooming data on them for a couple of weeks...

Link to comment
Share on other sites

Guest Steve Maser

I tried the defined policy.

 

 

 

I've never actually done any tests using the defined policy in comparison on a large media set. I do know, however, that the defined policy can remove a *lot more files* (depending on how old your media set is) than the method I use (keeping X backups per client). So it's entirely possible that the define policy groom would take longer to prepare the set for grooming.

Link to comment
Share on other sites

I've never actually done any tests using the defined policy in comparison on a large media set. I do know, however, that the defined policy can remove a *lot more files* (depending on how old your media set is) than the method I use (keeping X backups per client). So it's entirely possible that the define policy groom would take longer to prepare the set for grooming.

 

 

Since Retrospect often appears to just sitting around doing nothing, i find the lsof-command in the terminal most useful.

For example:

 

lsof -p pid |grep -i .rdb (pid is the ProcessID from the engine)

 

gives you an output of the rdb files from a Disk Mediaset touched in that moment from the Engine.

 

If you use this command, you can see in situations, where Retrospect seems to stall, that it actually is doing something.

Like, if you do a repair of a Mediaset, Retrospect seems to stall for Minutes, depending on the size of the MediaSet. With using lsof, you can see, that Retrospect actually scans through all the rdb files in the MediaSet. If your Mediaset is small, you have to be fast.

 

 

There are a hell of a lot of variables, so you can adjust the lsof command to your needs. Depending on the access rights, you might have to run lsof as root

Link to comment
Share on other sites

Guest Steve Maser

The act of grooming does (basically) two things: First it has to prepare the list of files that will be groomed (the longest part of the process -- but also the part of the process that can be stopped without issue) and *then* it actually goes through the steps of removing the files from the .rdb files in the media set (the shorter part of the process, but the part that would require you to rebuild the catalog if you stop the groom here...)/

 

Without having a lot of different hardware to work with to try this out, I've always wondered how much the first part of the process is RAM dependent vs. CPU speed dependent (or even disk speed dependent?) in terms of what can be done to speed that part up. At some point when I replace my core2duo mac mini with a more current model of mini, I'll probably be able to see how much the CPU change matters.

Link to comment
Share on other sites

While being a nice feature grooming was so slow in version 8 that we never could use it. I just installed version 9 (demo) and tried to turn on grooming for one backup set made with retrospect 8.

 

It has now run for over 50 hours at 100 - 120% CPU, but I do not know much about what it is actually doing - nor when it will ever finish.

 

Old 4-core Mac Pro with 7GB RAM. Backupset is 8.5TB data on a decent SATA-RAID. Media set is 415MB.

 

Anyone getting faster ("working") grooming in version 9?

 

 

My grooming finished! Set now has 22 million files. Grooming was 9 full days of scanning at 100% CPU (one core only) and another 2 days of actually removing the files.

 

What should I do?

back up fewer files?

better CPU?

move database to SSD?

drop grooming?

Link to comment
Share on other sites

Guest Steve Maser

22 million files is a factor of 10 higher than anything I've ever groomed, so it's hard to say what would be the best option here (and I'd be curious if others can chime in on this who have done similarly sized media sets.) However, that seems about in line with what I see (if I multiple my times by 10, that would be about the same number of days/files (which calculates to - roughly -- 100K files will take an hour to groom on my system (2Ghz core2duo) -- which I think is consistent with my smaller media sets...)

 

I'd be curious what -- if anything -- can be done to speed up grooming and what factors (RAM? CPU? HD Speed?) would make a difference. You said you have an "old" MacPro, but not what the exact CPU is. You have more RAM than I do, so it doesn't appear to be a function of how much RAM is available.

 

But, bottom line, if you can't afford a week to groom your media sets, your options would be to:

 

1) Start a new media set after X days/weeks? You could then groom the old media set as a separate process whenever you wanted (running a groom on one media set will not affect the operations with *other* media sets...)

 

2) Break your clients into new smaller media sets (IIRC, you never said how many clients are backed up to that media set, nor how long it had been in use before you groomed it...). This is what I do -- I have my clients broken into 5 different media sets specifically so I can groom each of them in less than a day as necessary... If you wanted to retain the client backups from the large media set

 

 

Backing up fewer files is a possibility -- it all depends on what you feel the need to be able to restore. If you are backing up "cache" files, you probably don't need to, for example (as that's a large number of small files that things need to keep track of...)

Link to comment
Share on other sites

...

Backing up fewer files is a possibility -- it all depends on what you feel the need to be able to restore. If you are backing up "cache" files, you probably don't need to, for example (as that's a large number of small files that things need to keep track of...)

 

 

to OP:

 

One strategy I have found helpful (and it seems counterintuitive) is to not back up the local email stores. These change by several hundred files for many users each day, and can lead to a lot of data churn. Since the server is backed up (and everyone is using IMAP anyway) it really is just local cache for us (we do manual archiving of messages, so I catch the stuff pulled off the server that needs to be backed up through that, though I probably could get more specific in my rules to just skip the IMAP folders and still backup the archives). But definitely try to exclude the browser cache folders- tremendous churn there.

Link to comment
Share on other sites

Guest Steve Maser

to OP:

 

One strategy I have found helpful (and it seems counterintuitive) is to not back up the local email stores. These change by several hundred files for many users each day, and can lead to a lot of data churn. Since the server is backed up (and everyone is using IMAP anyway) it really is just local cache for us (we do manual archiving of messages, so I catch the stuff pulled off the server that needs to be backed up through that, though I probably could get more specific in my rules to just skip the IMAP folders and still backup the archives). But definitely try to exclude the browser cache folders- tremendous churn there.

 

 

But be sure you want to do that. I had a user accidentally filter out all of his IMAP mail once and the mail admins were unable to restore it -- so restoring the "cached" copies was the only way to get that back.

 

Some of the things I filter out (besides the caches) are the "Pubsub" directory -- there's lots of useless stuff there -- , and the users /application support/mobilesync folder

Link to comment
Share on other sites

Some of the things I filter out (besides the caches) are ... the users' /application support/mobilesync folder

 

How do you do that? In your Rule do you have:

 

Folder Mac Path is ~/Library/Application Support/MobileSync

 

Does it know what '~' means? Is it OK with the space between 'Application' and 'Support'?

 

Thanks,

 

James.

Link to comment
Share on other sites

22 million files is a factor of 10 higher than anything I've ever groomed, so it's hard to say what would be the best option here (and I'd be curious if others can chime in on this who have done similarly sized media sets.) However, that seems about in line with what I see (if I multiple my times by 10, that would be about the same number of days/files (which calculates to - roughly -- 100K files will take an hour to groom on my system (2Ghz core2duo) -- which I think is consistent with my smaller media sets...)

 

I'd be curious what -- if anything -- can be done to speed up grooming and what factors (RAM? CPU? HD Speed?) would make a difference. You said you have an "old" MacPro, but not what the exact CPU is. You have more RAM than I do, so it doesn't appear to be a function of how much RAM is available.

 

But, bottom line, if you can't afford a week to groom your media sets, your options would be to:

 

1) Start a new media set after X days/weeks? You could then groom the old media set as a separate process whenever you wanted (running a groom on one media set will not affect the operations with *other* media sets...)

 

2) Break your clients into new smaller media sets (IIRC, you never said how many clients are backed up to that media set, nor how long it had been in use before you groomed it...). This is what I do -- I have my clients broken into 5 different media sets specifically so I can groom each of them in less than a day as necessary... If you wanted to retain the client backups from the large media set

 

 

Backing up fewer files is a possibility -- it all depends on what you feel the need to be able to restore. If you are backing up "cache" files, you probably don't need to, for example (as that's a large number of small files that things need to keep track of...)

 

We used to have a delicate and intricate Rules setup in version 6. So I tried to mimic that manually with the (in my opinion) more cubersome version 8. After a while Retrospect just wiped its settings and everything was lost. I did not have a propper backup of the settings. Stupid me. I have never had the motivation/time to recreate the Rules and am probably backing up a lot of unnecessary files.

 

I did try to look at activity monitor while grooming: CPU was full blast. Not much disk activity, But: read/write of database-entries would probably not show up in Activity monitor anyway. I might pop a small SSD in there to see if that helps.

 

At one point we did back up to two separated RAIDs. Starting afresh every 6 months or so and skip grooming alltogether.

 

Later our approach have been to manually delete the oldest media sets, and mark them as "missing". And then recreate catalog every month or so. The recreation finishes in around one day.

Link to comment
Share on other sites

Guest Steve Maser

How do you do that? In your Rule do you have:

 

Folder Mac Path is ~/Library/Application Support/MobileSync

 

Does it know what '~' means? Is it OK with the space between 'Application' and 'Support'?

 

Thanks,

 

James.

 

 

My Rule is:

 

Folder Mac Path contains /Application Support/MobileSync

 

 

As this path only exists in a users Library folder, I don't worry about having to specify a direct "is" path.

 

 

Space is fine.

 

 

~ -- is not (at least I don't think so). There still needs to be a way to do wild cards in Retro 9 -- I think the Windows version has a "pattern" match that the Mac version does not have (so I've been told...)

Link to comment
Share on other sites

Another data point: I recently groomed (to "defined policy") a media set that contains about 9.7 million files after grooming and it took just 16 minutes short of 4 days. This, along with the other data points, seems to imply that grooming time is more or less linear with the number of files in the media set. This media set hadn't been groomed in quite a while. I don't know if it would have been faster if it had been groomed more recently.

Link to comment
Share on other sites

Guest Steve Maser

In a conversation I had with Robin, he mentioned that grooming speed is primarily CPU speed dependent at this point. He did not think that more RAM or having the catalog files on an SSD would improve the grooming speed.

 

In 6 months or so when I upgrade my backup server to new hardware, I'll be able to check that out...

Link to comment
Share on other sites

Guest Steve Maser

Upthread posts note the activity is maxing out a single core. Does that mean this CPU speed dependent process can't be dispatched out to multiple processor cores?

 

 

I didn't ask that question directly. But my guess is "no".

Link to comment
Share on other sites

Guest Steve Maser

Just another data point: my 2.0G core2duo mini -- groomed a media set with 2.1M files in it -- in 18 hours. Which seems to validate my approximate 1hr/100K files experience with this speed of CPU.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...