Grooming misery

Fulco · January 11, 2010

Grooming misery

For some years now Grooming â€œDisk Backup Setsâ€ starts damaging the Backup Set.

First a new â€œDisk Backup Setâ€ is created (on a newly formatted, NTFS (NOT compressed) hard disk). After the Backup Set reaches its limit, errors start to appear.

These range from â€œ.. you must recreate the Backup Setâ€™s Catalogâ€ to Assert errors.

Sometimes a â€œCatalog repairâ€ solves the issue.

One Backup Set keeps on failing:

+ Executing Server Groom (HD2) at 1/9/2010 9:50 PM (Execution unit 1)

Grooming Backup Set Server (HD2)...

Backup Set format inconsistency (10 at 1105958796)

Grooming Backup Set Server (HD2) failed, error -2242 (Catalog File duplicated or ambiguous)

You must recreate the Backup Set's Catalog File.

See the Retrospect User's Guide or online help for details on recreating Catalog Files.

Can't compress Catalog File for Backup Set Server (HD2), error -1 (unknown)

1/9/2010 10:12:13 PM: 3 execution errors

Duration: 00:20:51 (00:00:35 idle/loading/preparing)

The line: â€œCan't compress Catalog File for Backup Set Server (HD2), error -1 (unknown)â€ is really â€˜stupidâ€™ due to the fact that the catalog isnâ€™t compressed at all.

Every Recreate Catalog, followed by a Groom, gives the same result.

Also there is a â€˜Server (HD2).rbc.logâ€™ file

I followed the advice in â€œGrooming Tips and Troubleshooting (9629)â€

(1) >30Gb on C: (system)

(2) Catalog files are stored on separate disk (D: Not disks containing .rdb files)

(3) C: (system) and D: (Catalog files) are continually defragmented. I personally (??) suspect the defragmenting of the disks containing the .rdb file, as a cause for the damaging of â€œDisk Backup Setsâ€.

(4) ??â€Donâ€™t groom too oftenâ€?? Guyâ€™s we are talking about computers, not about eating ice cream.

(5) 3, 4 or â€œDefined Policyâ€

(7) Done often

(8) 32 Gb RAM

(9) 10% of disk-space for disks containing .rdb files, C and D more than 50% free space

(10,11) all disks are in server

These errors make Retrospectâ€™s Disk2Disk backup unreliable.

Are these errors corrected in Retrospect 7.7?

Any similar findings?

Any tips?

Fulco

rcohen · January 11, 2010

I have found grooming to be very intolerant of read/write failures. More so than backups.

In the past, I have had problems due to simultanious FTPs or copies of backup results to another drive, and iSCSI on a non-segregated network. Defragging and anti-virus could also potential interfere with read/write operations.

Also, before the 64-bit version, running out of memory could corrupt a groom file.

Once I segregated iSCSI and scheduled copy & FTP jobs to not happening during grooming, they have been very reliable. I run them once a week.

When a groom does fail, you need to rebuild a catalog file.

Also, starting with 7.7, I have had occasional errors when running simultanious jobs, so I had to drop to one execution unit.

If things are still acting up, try recycling the backup set and starting from scratch. Of course, you lose your backed up data that way, unless you have another copy.

rhwalker · January 11, 2010

These errors make Retrospectâ€™s Disk2Disk backup unreliable.

Are these errors corrected in Retrospect 7.7?

Here is the list of bug fixes for Retrospect 7.7:

Retrospect 7.7 Release Notes

Fulco · January 11, 2010

Retrospect 7.7:

19582: Retrospect Defined Grooming Policy does not keep most recent backup for each week or month

So no other Grooming bugs fixed.

Fulco

Ramon88 · January 12, 2010

Fulco,

'Strangely' we do groom our sets but it almost always works correctly. Nowadays most of our backup sets are located on iSCSI storage, but some are on local storage.

The only problem that sometimes surfaces is the problem Retrospect can't always groom out everything it needs. In effect the storage seems to grow until it can't groom out enough to make a difference. In such a case we recycle the backup. We always have an A and B set for this kind of backup, so it is not a problem to recycle.

Your problem seems something different though. Is it possible for you to test with another server altogether? Actually I'm leaning towards a problem with your hardware. In the past we have had systems operating correctly, but with retrospect they were less stable due to the massive I/O generated. Swapping memory solved that problem.

At this date we haven't switched to 7.7 due to reliability issues with that version. So I can't really say if EMC improved grooming. But I agree they probably didn't do much with it.

Ultimately you shouldn't see those grooming errors. So it must be something else. Either hardware or indeed things like antivirus. To troubleshoot that effectively, you will need extra hardware I'm afraid...

Fulco · January 12, 2010

One â€˜hardwareâ€™ related problem I found, is the interaction of Grooming and Adaptecâ€™s Power Management (APM).

APM is a feature of Adaptecâ€™s RAID controllers to power down hard disks (Volumes) not used/accessed for a set amount of time. Normally this doesnâ€™t interfere with the Backup. But I found Grooming can spend a long time (Matching), without accessing the disk containing the .rdb files. Possible this error, could be due to this:

+ Executing Hamlet Groom (HD1) at 11/18/2009 10:00 PM (Execution unit 1)

Grooming Backup Set Server (HD1)...

Groomed zero KB from Backup Set Server (HD1).

Grooming Backup Set Server (HD1) failed, error -1101 (file/directory not found)

You must recreate the Backup Set's Catalog File.

See the Retrospect User's Guide or online help for details on recreating Catalog Files.

Can't compress Catalog File for Backup Set Server (HD1), error -1 (unknown)

11/18/2009 10:01:13 PM: 2 execution errors

Duration: 00:01:06

Fulco

Ramon88 · January 14, 2010

Fulco, I presume you can switch APM off and try again?

Fulco · January 14, 2010

Yes, I tested Grooming with APM switched off.

The errors (-1101) I mentioned in my last post disappeared.

Still appearing are:

Backup Set format inconsistency (10 at 1105958796)

Grooming Backup Set Server (HD2) failed, error -2242 (Catalog File duplicated or ambiguous)

Can't compress Catalog File for Backup Set Server (HD2), error -1 (unknown)

Fulco

Ramon88 · January 14, 2010

Okay, maybe you have/had two simultaneous problems...

Or your backup set was corrupted due to the 'APM problem'. Did you rebuild the catalog beforehand?

Fulco · January 14, 2010

I rebuild the catalog several times.

Recycling the backup set will be the only solution (I think).

Fulco

robvil · January 14, 2010

I believe this is a bug in retrospect. I have backup set / catalog issues periodicly... sometimes a catalog rebuild solves it and other times I have to recycle the backupset.

During the years I have replaced hardware/software/drivers and nothing solves the problem. Even if the disk subsystem has heavy I/O retrospect shall not crash the data.

And why blame other stuff as itÂ´s only Retrospect that has issues on our hardware?

Regards

Robert

Ramon88 · January 14, 2010

Robert, are you sure this is a bug? We do not see this error on our Retrospect servers. We very rarely have backup set/catalogue issues.

I'm not saying you are wrong, but on the other hand you might not be right as well. Besides that, I believe some extra testing might resolve this problem for Fulco.

If you have the same problem as Fulco it might be interesting to check what you have in common (hard & software wise).

In the end it is a fact Retrospect can tax your system's I/O pretty good. So you might even see hardware error related issues you otherwise wouldn't notice.

robvil · January 14, 2010

I am 100% sure itÂ´s not a hardware issue.

On the same hardware previously I had a SQL DB running doing way more I/O and memory usage than retrospect does. Never had issues there.

And again even if the underlaying disksystem is doing heavy I/O it should not trash data. For me it looks as a timing issues when retrospect grooms data and Retrospect does not handle this correct.

Regards

Robert

Ramon88 · January 14, 2010

It is perfectly possible MS SQL isn't taxing I/O as much as Retrospect does. Not all data is written or read in the same fashion. In my experience Retrospect grooming can use more system resources than SQL does.

I agree this should not happen, but it can, and does. Remember in the end it is the OS that does the writing and reading. Not Retrospect. Taxing a system to the max for a sustained interval can lead to errors, regardless if Retrospect is involved.

But it's also perfectly possible this problem is Retrospect related, however I'm at this time not convinced Fulco's particular problem is 100% a Retrospect problem and the same problem that you have. But I'm not shooting you down, after all you might be right.

I still think you and Fulco should compare notes. What do your setups have in common?

robvil · January 14, 2010

You might be right that itÂ´s not a retrospect problem, but I suspect it to be a retrospect issue... I have had unexpected end of data during backup where the only solution is to recycle the backupset and I cannot figure out why this happens. This happens approx. 2 times a year and IÂ´m not running out of diskspace or backupset space.

Btw. my sql do generate way more I/O than retrospect does. Running multiply Firebird sqlÂ´s with 200 concurrent users on the largest DB doing havy inserts, updates, read, joins ect. 12 hours strait each day. I have approx. 400 users using 40 DBÂ´s each day 365 days a year and never had issues like than on the same hardware.

Regards

Robert

Ramon88 · January 15, 2010

The only problem we have with grooming is some backup sets fill up and can't be groomed out. There are probably limits to what can be groomed out and thus the data space in the storage set grows until it gets to the set limit.

We have at least seven MS SQL servers running (not counting development machines). Some have thousands of concurrent users. I'm not familiar with Firebird though.

I'm sure SQL can tax machines a lot. But there is also a lot af caching involved. Retrospect, while grooming, has a tremendous amount of disk I/O. And the way they do that is probably different from the way SQL works. In other words not all is equal. I've had 16 core machines with 48GB RAM become temporary unresponsive during a groom. MS SQL doesn't behave like that. It's quite 'intelligent' compared to Retrospect I think.

I do remember a couple of years back, when grooming was introduced, there were many grooming errors resulting in backup set corruption. However this is pretty much solved by patches and updates nowadays. But indeed it illustrates the problem can be Retrospect related.

Due to the fact there are so many variables involved (kind of hardware, drivers, software installed, etc) it is not easy to troubleshoot this kind of problem. But if you are seeing the same errors, you and Fulco might have something in common, which might be very useful information for EMC.

Fulco · January 28, 2010

Here it is again, with another Disk Backup Set:

+ Executing Server Groom (HD3) at 1/28/2010 4:14 PM (Execution unit 1)

Grooming Backup Set Server (HD3)...

Backup Set format inconsistency (10 at 690087248)

Grooming Backup Set Server (HD3) failed, error -2242 (Catalog File duplicated or ambiguous)

You must recreate the Backup Set's Catalog File.

See the Retrospect User's Guide or online help for details on recreating Catalog Files.

Can't compress Catalog File for Backup Set Server (HD3), error -1 (unknown)

1/28/2010 4:16:38 PM: 3 execution errors

Duration: 00:01:25 (00:00:31 idle/loading/preparing)

A few weeks ago Disk Backup Set HD2 gave the same errors.

How do I get rid of these errors?

Recatalog followed by Groom, gave the same result (just like the last time with HD2).

Could the Disk Defragmentation software have something to do with this?

Fulco

Ramon88 · January 28, 2010

Could the Disk Defragmentation software have something to do with this?

Hmm, nasty...

I presume, by your remark, you have some disk defrag tool running? It might be the culprit. Can't you switch it off and try again? Maybe start with a new backup set?

For Retrospect storage I personally think defragmentation is not really needed.

Fulco · January 28, 2010

No, the Defragment program is not running during Groom and ReCatalog.

The disk containing the Backup Set is Defragmented one's a week.

This normally NOT happens during a Backup. However I canâ€™t rule out the Defragmenting taking place during a Groom action automatically started by a Backup Set reaching its capacity. But Defragmenting stops automatically when the disk has a lot of IO.

No, I suspect moving files around (defragmenting) breaks the Backup Set.

Our 3 Disk Backups sets started showing these errors after a Defragmentation program was installed.

This could be just a coincidence.

Fulco

Ramon88 · January 28, 2010

It depends a bit. Some defrag tools can dig deep into the system and it's not always known if they are 100% reliable in every working condition.

Grooming in the past didn't always work when they introduced it. But nowadays I find it very reliable (7.6.123). I can't really imagine not having it anymore!

However, to remove disk defragmentation from the equation, would it be a real problem to remove that software from your setup?

It's a real nasty problem to troubleshoot...

Fulco · February 10, 2011

Grooming is sill failing.

Every time a disk backup set, reaches it capacity, retrospect stops or crashes.

When will Roxio fix this?

Lennart_T · February 10, 2011

Grooming is sill failing.

Every time a disk backup set, reaches it capacity, retrospect stops or crashes.

When will Roxio fix this?

Do you have the catalog file on the same volume? That is not recommended.

We run groom scripts every weekend, to make sure the backup set never reaches maximum capacity.

How would we, your fellow users, know what Roxio will do?

Fulco · February 10, 2011

I follow kb Article # 9629 to the letter:

(1) C drive has 50% free space (>100Gb)

(2) Catalog files are stored on D, also > 50% free space (100Gb)

(3) Disk based backup set (.rdb files), are stored on separate hard disk, with always 10% free disk space

All disks are regularly defragmented, but only when Retrospect is NOT running

(5) 2 snapshots used

(7) rebuild takes hours!

This must be done manually. When grooming fails, Retrospect hangs: waiting for new space (backup set)

(8) Server has 32 Gb memory

Lennart_T · February 11, 2011

I follow kb Article # 9629 to the letter:

Well, in tip 4 it says: "If you want to make sure the disk never fills, create a grooming script (Automate>Manage Scripts>New) to run once a week."

Since you are having problems when the disk fills, I would schedule a groom script once a week.

Fulco · February 11, 2011

I will give it a try.

However: some of the Backup Sets reside on disks that have 50% free space (900Gb). These sets also suffer from the same issue!

And it also says: donâ€™t groom too often.

Grooming misery

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation