Jump to content

Grooming misery


Fulco

Recommended Posts

Grooming misery

 

For some years now Grooming “Disk Backup Sets†starts damaging the Backup Set.

First a new “Disk Backup Set†is created (on a newly formatted, NTFS (NOT compressed) hard disk). After the Backup Set reaches its limit, errors start to appear.

These range from “.. you must recreate the Backup Set’s Catalog†to Assert errors.

Sometimes a “Catalog repair†solves the issue.

One Backup Set keeps on failing:

 

+ Executing Server Groom (HD2) at 1/9/2010 9:50 PM (Execution unit 1)

Grooming Backup Set Server (HD2)...

Backup Set format inconsistency (10 at 1105958796)

Grooming Backup Set Server (HD2) failed, error -2242 (Catalog File duplicated or ambiguous)

You must recreate the Backup Set's Catalog File.

See the Retrospect User's Guide or online help for details on recreating Catalog Files.

Can't compress Catalog File for Backup Set Server (HD2), error -1 (unknown)

1/9/2010 10:12:13 PM: 3 execution errors

Duration: 00:20:51 (00:00:35 idle/loading/preparing)

 

The line: “Can't compress Catalog File for Backup Set Server (HD2), error -1 (unknown)†is really ‘stupid’ due to the fact that the catalog isn’t compressed at all.

Every Recreate Catalog, followed by a Groom, gives the same result.

Also there is a ‘Server (HD2).rbc.log’ file

 

I followed the advice in “Grooming Tips and Troubleshooting (9629)â€

(1) >30Gb on C: (system)

(2) Catalog files are stored on separate disk (D: Not disks containing .rdb files)

(3) C: (system) and D: (Catalog files) are continually defragmented. I personally (??) suspect the defragmenting of the disks containing the .rdb file, as a cause for the damaging of “Disk Backup Setsâ€.

(4) ??â€Don’t groom too oftenâ€?? Guy’s we are talking about computers, not about eating ice cream.

(5) 3, 4 or “Defined Policyâ€

(7) Done often

(8) 32 Gb RAM

(9) 10% of disk-space for disks containing .rdb files, C and D more than 50% free space

(10,11) all disks are in server

 

These errors make Retrospect’s Disk2Disk backup unreliable.

 

Are these errors corrected in Retrospect 7.7?

Any similar findings?

Any tips?

 

 

Fulco

 

Link to comment
Share on other sites

I have found grooming to be very intolerant of read/write failures. More so than backups.

 

In the past, I have had problems due to simultanious FTPs or copies of backup results to another drive, and iSCSI on a non-segregated network. Defragging and anti-virus could also potential interfere with read/write operations.

 

Also, before the 64-bit version, running out of memory could corrupt a groom file.

 

Once I segregated iSCSI and scheduled copy & FTP jobs to not happening during grooming, they have been very reliable. I run them once a week.

 

When a groom does fail, you need to rebuild a catalog file.

 

Also, starting with 7.7, I have had occasional errors when running simultanious jobs, so I had to drop to one execution unit.

 

If things are still acting up, try recycling the backup set and starting from scratch. Of course, you lose your backed up data that way, unless you have another copy.

 

Link to comment
Share on other sites

Fulco,

 

'Strangely' we do groom our sets but it almost always works correctly. Nowadays most of our backup sets are located on iSCSI storage, but some are on local storage.

The only problem that sometimes surfaces is the problem Retrospect can't always groom out everything it needs. In effect the storage seems to grow until it can't groom out enough to make a difference. In such a case we recycle the backup. We always have an A and B set for this kind of backup, so it is not a problem to recycle.

 

Your problem seems something different though. Is it possible for you to test with another server altogether? Actually I'm leaning towards a problem with your hardware. In the past we have had systems operating correctly, but with retrospect they were less stable due to the massive I/O generated. Swapping memory solved that problem.

 

At this date we haven't switched to 7.7 due to reliability issues with that version. So I can't really say if EMC improved grooming. But I agree they probably didn't do much with it.

 

Ultimately you shouldn't see those grooming errors. So it must be something else. Either hardware or indeed things like antivirus. To troubleshoot that effectively, you will need extra hardware I'm afraid...

Link to comment
Share on other sites

One ‘hardware’ related problem I found, is the interaction of Grooming and Adaptec’s Power Management (APM).

APM is a feature of Adaptec’s RAID controllers to power down hard disks (Volumes) not used/accessed for a set amount of time. Normally this doesn’t interfere with the Backup. But I found Grooming can spend a long time (Matching), without accessing the disk containing the .rdb files. Possible this error, could be due to this:

 

+ Executing Hamlet Groom (HD1) at 11/18/2009 10:00 PM (Execution unit 1)

Grooming Backup Set Server (HD1)...

Groomed zero KB from Backup Set Server (HD1).

Grooming Backup Set Server (HD1) failed, error -1101 (file/directory not found)

You must recreate the Backup Set's Catalog File.

See the Retrospect User's Guide or online help for details on recreating Catalog Files.

Can't compress Catalog File for Backup Set Server (HD1), error -1 (unknown)

11/18/2009 10:01:13 PM: 2 execution errors

Duration: 00:01:06

 

Fulco

 

Link to comment
Share on other sites

Yes, I tested Grooming with APM switched off.

The errors (-1101) I mentioned in my last post disappeared.

 

Still appearing are:

Backup Set format inconsistency (10 at 1105958796)

Grooming Backup Set Server (HD2) failed, error -2242 (Catalog File duplicated or ambiguous)

 

Can't compress Catalog File for Backup Set Server (HD2), error -1 (unknown)

 

Fulco

Link to comment
Share on other sites

I believe this is a bug in retrospect. I have backup set / catalog issues periodicly... sometimes a catalog rebuild solves it and other times I have to recycle the backupset.

 

During the years I have replaced hardware/software/drivers and nothing solves the problem. Even if the disk subsystem has heavy I/O retrospect shall not crash the data.

 

And why blame other stuff as it´s only Retrospect that has issues on our hardware?

 

Regards

Robert

 

Link to comment
Share on other sites

Robert, are you sure this is a bug? We do not see this error on our Retrospect servers. We very rarely have backup set/catalogue issues.

 

I'm not saying you are wrong, but on the other hand you might not be right as well. Besides that, I believe some extra testing might resolve this problem for Fulco.

 

If you have the same problem as Fulco it might be interesting to check what you have in common (hard & software wise).

 

In the end it is a fact Retrospect can tax your system's I/O pretty good. So you might even see hardware error related issues you otherwise wouldn't notice.

Link to comment
Share on other sites

I am 100% sure it´s not a hardware issue.

 

On the same hardware previously I had a SQL DB running doing way more I/O and memory usage than retrospect does. Never had issues there.

 

And again even if the underlaying disksystem is doing heavy I/O it should not trash data. For me it looks as a timing issues when retrospect grooms data and Retrospect does not handle this correct.

 

Regards

Robert

 

Link to comment
Share on other sites

It is perfectly possible MS SQL isn't taxing I/O as much as Retrospect does. Not all data is written or read in the same fashion. In my experience Retrospect grooming can use more system resources than SQL does.

 

I agree this should not happen, but it can, and does. Remember in the end it is the OS that does the writing and reading. Not Retrospect. Taxing a system to the max for a sustained interval can lead to errors, regardless if Retrospect is involved.

 

But it's also perfectly possible this problem is Retrospect related, however I'm at this time not convinced Fulco's particular problem is 100% a Retrospect problem and the same problem that you have. But I'm not shooting you down, after all you might be right. :D

 

I still think you and Fulco should compare notes. What do your setups have in common?

Link to comment
Share on other sites

You might be right that it´s not a retrospect problem, but I suspect it to be a retrospect issue... I have had unexpected end of data during backup where the only solution is to recycle the backupset and I cannot figure out why this happens. This happens approx. 2 times a year and I´m not running out of diskspace or backupset space.

 

Btw. my sql do generate way more I/O than retrospect does. Running multiply Firebird sql´s with 200 concurrent users on the largest DB doing havy inserts, updates, read, joins ect. 12 hours strait each day. I have approx. 400 users using 40 DB´s each day 365 days a year and never had issues like than on the same hardware.

 

Regards

Robert

 

 

 

Link to comment
Share on other sites

The only problem we have with grooming is some backup sets fill up and can't be groomed out. There are probably limits to what can be groomed out and thus the data space in the storage set grows until it gets to the set limit.

 

We have at least seven MS SQL servers running (not counting development machines). Some have thousands of concurrent users. I'm not familiar with Firebird though.

 

I'm sure SQL can tax machines a lot. But there is also a lot af caching involved. Retrospect, while grooming, has a tremendous amount of disk I/O. And the way they do that is probably different from the way SQL works. In other words not all is equal. I've had 16 core machines with 48GB RAM become temporary unresponsive during a groom. MS SQL doesn't behave like that. It's quite 'intelligent' compared to Retrospect I think. :)

 

I do remember a couple of years back, when grooming was introduced, there were many grooming errors resulting in backup set corruption. However this is pretty much solved by patches and updates nowadays. But indeed it illustrates the problem can be Retrospect related.

 

Due to the fact there are so many variables involved (kind of hardware, drivers, software installed, etc) it is not easy to troubleshoot this kind of problem. But if you are seeing the same errors, you and Fulco might have something in common, which might be very useful information for EMC.

Link to comment
Share on other sites

  • 2 weeks later...

Here it is again, with another Disk Backup Set:

 

+ Executing Server Groom (HD3) at 1/28/2010 4:14 PM (Execution unit 1)

Grooming Backup Set Server (HD3)...

Backup Set format inconsistency (10 at 690087248)

Grooming Backup Set Server (HD3) failed, error -2242 (Catalog File duplicated or ambiguous)

You must recreate the Backup Set's Catalog File.

See the Retrospect User's Guide or online help for details on recreating Catalog Files.

Can't compress Catalog File for Backup Set Server (HD3), error -1 (unknown)

1/28/2010 4:16:38 PM: 3 execution errors

Duration: 00:01:25 (00:00:31 idle/loading/preparing)

 

A few weeks ago Disk Backup Set HD2 gave the same errors.

 

How do I get rid of these errors?

Recatalog followed by Groom, gave the same result (just like the last time with HD2).

 

Could the Disk Defragmentation software have something to do with this?

 

Fulco

 

Link to comment
Share on other sites

Could the Disk Defragmentation software have something to do with this?

 

Hmm, nasty...

 

I presume, by your remark, you have some disk defrag tool running? It might be the culprit. Can't you switch it off and try again? Maybe start with a new backup set?

 

For Retrospect storage I personally think defragmentation is not really needed.

Link to comment
Share on other sites

No, the Defragment program is not running during Groom and ReCatalog.

The disk containing the Backup Set is Defragmented one's a week.

This normally NOT happens during a Backup. However I can’t rule out the Defragmenting taking place during a Groom action automatically started by a Backup Set reaching its capacity. But Defragmenting stops automatically when the disk has a lot of IO.

No, I suspect moving files around (defragmenting) breaks the Backup Set.

Our 3 Disk Backups sets started showing these errors after a Defragmentation program was installed.

This could be just a coincidence.

 

 

Fulco

Link to comment
Share on other sites

It depends a bit. Some defrag tools can dig deep into the system and it's not always known if they are 100% reliable in every working condition.

 

Grooming in the past didn't always work when they introduced it. But nowadays I find it very reliable (7.6.123). I can't really imagine not having it anymore!

 

However, to remove disk defragmentation from the equation, would it be a real problem to remove that software from your setup?

 

It's a real nasty problem to troubleshoot...

Link to comment
Share on other sites

  • 1 year later...
Grooming is sill failing.

 

Every time a disk backup set, reaches it capacity, retrospect stops or crashes.

 

 

When will Roxio fix this?

Do you have the catalog file on the same volume? That is not recommended.

 

We run groom scripts every weekend, to make sure the backup set never reaches maximum capacity.

 

How would we, your fellow users, know what Roxio will do?

Link to comment
Share on other sites

I follow kb Article # 9629 to the letter:

 

(1) C drive has 50% free space (>100Gb)

(2) Catalog files are stored on D, also > 50% free space (100Gb)

(3) Disk based backup set (.rdb files), are stored on separate hard disk, with always 10% free disk space

All disks are regularly defragmented, but only when Retrospect is NOT running

(5) 2 snapshots used

(7) rebuild takes hours!

This must be done manually. When grooming fails, Retrospect hangs: waiting for new space (backup set)

(8) Server has 32 Gb memory

Link to comment
Share on other sites

I follow kb Article # 9629 to the letter:

Well, in tip 4 it says: "If you want to make sure the disk never fills, create a grooming script (Automate>Manage Scripts>New) to run once a week."

 

Since you are having problems when the disk fills, I would schedule a groom script once a week.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...