Jump to content

Slow matching under 7.7?


Recommended Posts

Is anyone else seeing really slow/stalled matching since upgrading to RS 7.7? I've been looking into it today a bit more.

 

CPU - 20-25%

Memory - 288MB (Over 1GB free)

Disk utilisation - 0.2% (Catalogue disk)

Ethernet - 0% (Backup set on NAS)

 

I have two jobs sitting at 'matching' which don't seem to be making much progress at all and I keep seeing this since upgrading.

 

One backup is Client > Laptop NAS box... (Proactive)

One is duplicating from NAS > SMB share on another machine... (Scheduled backup)

 

Since stopping the 'stalled' Proactive backup, the NAS > SMB script suddenly started making progress!

 

A few blips of the C: (catalogue) disk later:

CPU - 20-26%

Memory - 340MB (Over 1GB free)

Disk utilisation - 3.4% (Had blipped up to 100%)

Ethernet - 0% (Backup set on NAS)

 

Richard

Edited by Guest
Link to comment
Share on other sites

Ugh, this is driving me nuts.. I have 4 backups running, all 'matching'. None of them are making any progress, no disk activity, no network activity yet a pretty steady 25-30% CPU usage.

 

RS is using 553MB right now... memory usage isn't changing whatsoever (in Task Manager).

 

One of the backups is our CVS server, which I'm duplicating the snapshot to a removable USB drive. This has bucket loads of files on it, so maybe this could be running out of memory, or table space which is stalling and blocking other backups from matching...?

 

Odd...

 

After the CVS backup was cancelled the other still remained stalled on 'matching'.

 

Rich

Edited by Guest
Link to comment
Share on other sites

You aren't the only one.

 

I am having slow matching issues too. It is magnified, at least appears so, when I access clients with a large number of files. Memory usage "peaks" in mine too. It is very odd. It hits a max and stays there the whole time.

 

Fortunately, am am only doing daily backups of 20 machines, otherwise this issue would be causing problems for us.

 

Jeff

Link to comment
Share on other sites

we had this problem in early January. it was impossible, and we downgraded to 7.6 to fix it.

 

Some data: we had ~10 backup sets with about 1TB of data each, 500 sessions, and about the same number of snapshots. 7.6 groomed each in an hour or two; we gave up after 3 days under 7.7.

 

There has been a patch release since then, but I'm going to wait until someone else tests it...

 

Actual backups (matching) had what seemed to be the same fundamental problem. There also seemed to be much more multi-thread overhead than in 7.6; one groom used 25%user+3%sys on a 4 core box; two grooms used 40%user and 20%sys.

 

all this on w2008server64bit. 7.6 works much better.

 

Link to comment
Share on other sites

Thanks guys.. at least I'm not alone on this. any ideas where the patch is as I can't go on with huuuuuuuuuuuuuuuuuge backup and grooming delays.

 

For example, this is a proactive backup for one of our client machines:

 

+ Normal backup using Laptop Backups at 15/02/2010 08:24 (Execution unit 3) To Backup Set NAS Laptop...

 

- 15/02/2010 08:24:40: Copying SW_Preload (C:) on Tony ******* (N200)

15/02/2010 13:18:43: Snapshot stored, 349.5 MB

15/02/2010 13:19:53: Execution completed successfully

 

Completed: 5410 files, 370.0 MB, with 61% compression

Performance: 60.3 MB/minute

Duration: 04:55:12 (04:49:03 idle/loading/preparing)

 

FIVE HOURS!!

 

Rich

Edited by Guest
Link to comment
Share on other sites

Any news on progress with this Mayoff?

 

My clients are starting to complain that they haven't been backed up for a week now as each proactive backup is taking so long (not to mention the proactive list isn't working right).

 

b7787ce3.png

 

^^ 6 client backups have been able to run all day! (note, one of the above failed and restarted)

 

.. check out the time to complete.. 4-5 HOURS!

 

Rich

Edited by Guest
Link to comment
Share on other sites

I've been experiencing very long matching times for the past few months when doing Normal backups. I am backing up a data set that contains over 3 million files over 2 volumes. I split my backups into 2 scripts, each script backups up one volume, roughly 1.5 million files each (using the same catalog). One script takes about 9 hours to complete, the other takes about 5 hours, with the primary time consumer being the matching. You can see this at the end of the log because RS tell you how long the script was running in idle mode that includes the "matching" time. Usually the time actually writing to tape is below an hour for the script that takes 9 hours to complete.

 

Not sure the reason for the discrepency. These volumes house large graphic files and tiny font files because we are in the printing business. Maybe one volume has more fonts. who knows.

 

My assumption for the long matching time is that Retrospect has to build a table that has 3 million files on one side and 1.5 million on the other and than compare. That's a database function and either it does not do it well or there's a physical limitation I am hitting with such large data sets.

 

I was hoping that combining the new RS 64 bit program on a Windows 2003 64 bit server would speed up my matching time, but it hasn't. It even seems to have made it worse.

 

I'm thinking I have to move on to a more robust backup program that can handle this large number of files.

Link to comment
Share on other sites

I feel your pain Rich. I have 40 servers and many of them have files numbering in the millions. Backups are taking 2-3 days and thats "if" they complete before I get the TMemory error and assert crash. I have broken up source groups, recreated catalog files, and tried everything else in my power and RS is still not getting it done. I can't believe they can't fix this issue. They have the code for version 7.6 (where this didn't happen). I am being forced to get quotes for BackupExec, Data Protector, and CommVault because we can't go on much longer this way. My company stands to lose money and I stand to lose my job if this keeps up.

 

-Matt

Link to comment
Share on other sites

Came in this morning to find this...

 

+ Normal backup using General Server -> NAS at 01/03/2010 21:00 (Execution unit 4)

To Backup Set NAS GeneralS...

 

01/03/2010 21:00:19: Connected to Application Server

* Resolved container Application Server to 2 volumes:

Local Disk (C:) on Application Server

Virtual Machines (E:) on Application Server

 

- 01/03/2010 21:00:01: Copying Local Disk (C:) on Application Server

01/03/2010 21:32:23: Snapshot stored, 63.4 MB

01/03/2010 21:33:06: Execution completed successfully

Completed: 4 files, 130.2 MB, with 67% compression

Performance: 75.8 MB/minute

Duration: 00:33:05 (00:31:22 idle/loading/preparing)

 

 

- 01/03/2010 21:33:07: Copying Virtual Machines (E:) on Application Server

01/03/2010 21:33:07: No files need to be copied

01/03/2010 21:42:49: Snapshot stored, 8,779 KB

01/03/2010 21:43:24: Execution completed successfully

Duration: 00:10:17 (00:09:24 idle/loading/preparing)

 

01/03/2010 21:43:25: Connected to Backup Server

* Resolved container Backup Server to 2 volumes:

Local Disk (C:) on Backup Server

Retrospect on Backup Server

 

- 01/03/2010 21:43:25: Copying Local Disk (C:) on Backup Server

01/03/2010 22:00:23: Snapshot stored, 51.3 MB

01/03/2010 22:03:22: Execution completed successfully

Completed: 15 files, 1.1 GB, with 71% compression

Performance: 189.4 MB/minute

Duration: 00:19:57 (00:14:03 idle/loading/preparing)

 

 

- 01/03/2010 22:03:23: Copying Retrospect on Backup Server

01/03/2010 22:06:23: Snapshot stored, 28 KB

01/03/2010 22:07:00: Execution completed successfully

Completed: 6 files, 8 KB, with 0% compression

Performance: 0.1 MB/minute

Duration: 00:03:37 (00:02:55 idle/loading/preparing)

 

01/03/2010 22:07:01: Connected to Brighter ******** Server

* Resolved container Brighter ******** Server to 3 volumes:

/ on Brighter ******** Server

/boot on Brighter ******** Server

/dev on Brighter ******** Server

 

- 01/03/2010 22:07:01: Copying / on Brighter ******** Server

01/03/2010 22:07:01: No files need to be copied

01/03/2010 22:32:30: Snapshot stored, 29.2 MB

01/03/2010 22:33:08: Execution completed successfully

Duration: 00:26:07 (00:24:45 idle/loading/preparing)

 

 

- 01/03/2010 22:33:09: Copying /boot on Brighter ******** Server

01/03/2010 22:33:09: No files need to be copied

01/03/2010 22:33:30: Snapshot stored, 75 KB

01/03/2010 22:33:53: Execution completed successfully

Duration: 00:00:44 (00:00:22 idle/loading/preparing)

 

 

- 01/03/2010 22:33:54: Copying /dev on Brighter ******** Server

01/03/2010 22:33:54: No files need to be copied

01/03/2010 22:34:16: Snapshot stored, 100 KB

01/03/2010 22:34:40: Execution completed successfully

Duration: 00:00:46 (00:00:24 idle/loading/preparing)

 

01/03/2010 22:34:41: Connected to CVS Server

* Resolved container CVS Server to 2 volumes:

Local Disk (C:) on CVS Server

Data (E:) on CVS Server

 

- 01/03/2010 22:34:41: Copying Local Disk (C:) on CVS Server

01/03/2010 23:03:32: Snapshot stored, 126.7 MB

01/03/2010 23:04:14: Execution completed successfully

Completed: 1 files, 1 KB, with 0% compression

Performance: 0.1 MB/minute

Duration: 00:29:33 (00:28:27 idle/loading/preparing)

 

 

- 01/03/2010 23:04:15: Copying Data (E:) on CVS Server

 

** WHAM ** it's still going and started at 10:30pm last night (it's 10am now).

Link to comment
Share on other sites

I'm in the same boat as all of you. I had one job that was matching for over 2 days after upgrading to 7.7 so I decided to place a call into tech support. Basically, I was told that I had to get my backup sets to have under 2-3 million files or this would happen. I mean, really? 7.6 didn't have this problem and I have several servers and a few workstations that have over 3 million files on them. Oh, and to make things worse, I decided to go back to 7.6 and realized that 7.6 won't see anything that was done in 7.7.

 

At this point, I have a very broken backup server that's making me very nervous. Tomorrow, I'm going to call EMC back and give them hell on this because 7.7 is really, really broken.

Link to comment
Share on other sites

Thanks for posting your experiences Jeremy. Any 'normal' company would have turned a fix around in a matter of days due to the impact, but not Retrospect, oh no. 3-4 MONTHS is an acceptable amount of time to wait, apparently.

 

I must remember to budget to a proper backup solution next year..

 

Richard

Link to comment
Share on other sites

I just got off the phone with both tech support and our sales rep. Tech support was useless and didn't really seem to care. After questioning their QA process and telling them their product was inherently broken because of the matching issue, I was told that there are only about 20 customers out of millions with the matching issue. Somehow I find that hard to believe. I mean, a product with Mac and Linux support that can't handle more than 2 million files in a backup set?

 

After getting off the phone with tech support, I decided to call our sales rep and had much better conversation. She listened to everything, seemed very concerned and said she'd find out what was going on. If you're in the same boat as I am, call your sales rep at EMC.

 

I'll let you all know where I get with this.

Edited by Guest
Link to comment
Share on other sites

Hi all, I am seeing the problem and its taking all the fun out of using Retrospect. It seems I am about the last hold-out on this forum for rolling back to 7.6. It seems they know about all the bug problems but are hiding it. My sales rep...for a hardware vendor that shall remain nameless, did some research posing as a potential customer. The Retrospect sales rep. would not say anything about the buggy latest release and when pressed said he hadn't heard of any problems. My rep also confirmed with their Symantec Contact that BE quotes and sales have risen sharply over the last few months.

 

So, to EMC/Iomega...please keep working on fixing this. I am holding out but can't for much longer. And yes, I have an ASM, and no, it hasn't done me any good.

Link to comment
Share on other sites

After reading the problems posted about 7.7, I opted to give the freebie trial a whirl. Using one of our test servers (2008 R2 x64, 12GB) I see exactly the behavior you boys talk about. Matching averages less than ten minutes per backup set on 7.6 Multiserver; after 16 hours on 7.7, the first set still is not matched.

 

Our Symantec rep is offering a hefty discount for Retrospect users switching to BE. We may take him up on it at this rate!

Link to comment
Share on other sites

Guys, I may also speak to Symantec if they're offering good discounts in the UK. I have rolled back to 7.6 now and backups are flying once again, although it has reminded me just how useful Delta backups would be.

 

It took me a whole day to install RS 7.6 and setup my scripts and backup sets again, then of course I had to add all the clients and servers - which I'm still in the process of as we have a high number of laptop users that haven't been in the office.

 

My poor staff are having their machines battered this week to build up a backup set again and I have had to bring the remote storage server into the office so it can syncronise, as we don't have enough bandwidth to duplicate anything but nightly changes to our network.

 

All in all, not a fun experience, especially now I don't have the ability to recover Win7 machines under 7.6.

 

I'm concerned how 7.7 got through testing without this being picked up...

 

Rich

Link to comment
Share on other sites

Heh, you must be one of the "20" customers out of millions tech support was talking about. :-)

 

It's such a shame that 7.7 is so broken. We also thought about moving to Backup Exec, but it is such a different piece of software in the way that it works that I think our backup server would need significantly more disk space for the GFS approach. It's just not as friendly with backup to disk.

Link to comment
Share on other sites

Heh, you must be one of the "20" customers out of millions tech support was talking about. :-)

 

Make that 21. Our department has not upgraded to 7.7 because of all the problem reports. So far, we have managed to work around the 7.6 bugs and limitations even with a growing Win 7 install base to support. The response I got from EMC support was that this is an "isolated problem". To what or whom it is isolated, there was no answer. The first suggestion was that there must be a conflict with other software on the machine. We tried 7.7 on both a VSphere virtual system that *only* has 2008 R2 installed with all current updates. The next trial was on a non-virtual machine. Impossibly slow matching in both cases.

 

It's such a shame that 7.7 is so broken. We also thought about moving to Backup Exec, but it is such a different piece of software in the way that it works that I think our backup server would need significantly more disk space for the GFS approach. It's just not as friendly with backup to disk.

 

Our overall corporate backup structure is not as well integrated as one would like - from an IT manager's perspective. Most departments use either Backup Exec or CommVault. We went with Retrospect many years ago as it allowed (at the time) the easiest self-support for recovery of accidental deletions or corrupted files.

 

Disk space is (comparatively) cheap these days. The most significant problem we see is migrating a useful number of Retrospect snapshots into a new backup package. We're looking into this to see how much can be scripted vs. needing manual intervention. I'll be happy to post our findings if you folks are interested. My colleagues SWAG at disk space usage came as somewhat of a surprise to me. They find deduplication in both CV and BE produces smaller average client archives than we see with Retrospect 7.6. If the clients and appliances in your network do not share many files in common, you may not see as much of a size reduction.

 

Link to comment
Share on other sites

About 10 days ago I ran a test to determine if the matching performance in Retrospect 7.7 was tied to the number of files in the backup set. Normally I run a Recycle backup to tape on the weekend of my production data that has roughly 3 million files in it covering 2 separate production volumes. Then one day a week I run a Normal backup off of the previous weekends backup set.

 

This time I split my backup set into 2, one for each volume, essentially cutting the number of files in each backup set in half.

 

I've checked my operations log and I see that in the past, this Normal backup using RS 7.6 with one backup set that included about 3 million files in the single backup set took about 4 hours to run each production volume.

 

In the backup I'm referring to now using RS 7.7 on a 64 bit server with 8 gigs of RAM, the one backup script took 7 hours to run and the other backup script took 4.75 hours to run. Most of that time was spent matching. So this RS 7.7 backup session, with each backup set only holding about 1.5 million files each took almost 4 more hours to run then an RS 7.6 backup on the same data set that used one backup set which housed about 3 million files. Something is wrong with RS 7.7.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...