Jump to content

UI hangs 4+ hours per mouse click during restore.


Recommended Posts

Hello. System stats: Retrospect Multiserver version 7.6.123, DU 7.6.2.101. dual Xeon 2.33Ghz 3065, 4GB RAM, Win 2k3 X64 SP2. Backup to (2) 1.5TB disks.

 

Backup set I'm recovering from is 170.5GB for 5,272,372 files, available space is 1,176.6GB. Catalog file compression is enabled. The source is a quad Xeon 2.3Ghz CentOS 4 Linux system with 4GB RAM and a 1TB RAID 5 array formatted using ext3.

 

I've been using this configuration for several months and getting very good performance. Today, I went to recover a single file from a snapshot from April 27th. I selected "Restore files and folders from a point in time", selected Advanced Mode, and began to select my source, destination and files chosen. I selected the specific snapshot I wanted, and that took a bit of time, not too long.

 

When I went to select the file to restore, the application seemed to lock up -- hourglass flipping for 4-5 *hours* for each key press or click in the window. I had to recover the file and this was my only source, so I decided to see it through.

 

Click on the first-level directory in the selector window, watch hourglass for 4 hours. Click the + next to it, wait four hours. Click in the scroll down area to move a page down, wait four more hours. No lie, it took me 48 hours to recover the file. The system task manager showed Retrospect was using 550MB of RAM during this time (Retrospect at idle is using 30.4MB). No other applications were running on the system, and Retrospect was doing nothing else the whole time. If I moved the cursor outside the Retrospect window, I could do other things, like launch Task Manager. So the system wasn't deadlocked; it was the application that was stuck.

 

It seems obvious that the problem must be the number of files in the file system. My question is, what can I do to back up this file system and still make timely restores when requested?

Link to comment
Share on other sites

How much disk space is free on the drive where the catalog is? Could that drive be fragmented?

 

I wonder if a catalog rebuild would speed it up.

 

Also, the total size of the backed up computer was 170GB? I wonder if you could have restored the whole thing to another location. It be ironic if that took less time than locating this one file.

Link to comment
Share on other sites

  • 6 months later...

I'm just taking a moment to update this post with what I've learned in the interim, in case it proves useful to someone.

 

First, I've learned that Retrospect (at least up to and including 7.6x Windows Multiserver version) doesn't handle millions of small files per client very well at all. We were able to restore some performance by having the linux machine with all the little files make a special, once/day tar file of the directories with many files, and then having Retrospect back up the tar file, while masking out the same directories in the selector we're using. (Thereby bypassing most of the many files in the backup.) Of course this means we have to restore the large tar file, and then search through it for the file(s) we want, but it's made backup and restore possible in human time.

 

2. I had set up a 50GB RAID1 partition for the system software (c: drive) across the (2) 1.5TB drives, and created two backup sets per client across the remainder of the 3TBs of disk. I had created a 1.3GB "One" file system, and a second 1.3GB "Two" partition. Then I had "client-red" backup to "One" and "client-blue" backup to "Two".

 

I had left the backup set catalogs on the c: drive (in the 50GB). Didn't realize how critical the c: drive partition is to Retrospect operations -- I'm guessing part of my problem was that this partition was filling up and not leaving enough space for things to work. I moved the catalogs for "-blue" from the c: partition to "One" (where the "-red" backup sets were stored) and the "-red" catalogs to "Two". (If I lose disk "One", I'll lose half of the backups and the catalogs from the other half of the backups. Not ideal, but recoverable in a system with only two disks.

 

3. As the "One" and "Two" partitions are filling up, I'm trying to adopt intelligent grooming practices, but these are also taking forever to run. The grooming operation for one client's "-blue" backup set has been running for almost 24 hours and counting, with very little sign of progress, even as the system does nothing else (I've stopped other scripts from running temporarily). There's 40GB of the 50GB c: available, so that's not the problem. TaskManager says that Retrospect is using about 2GB of reserved RAM while grooming, and the disk being groomed ("One") still has 61GB of 1.3TB free.

 

The upshot is that I'm still learning where the bottlenecks are in this system of backup server and 4 other clients, two with large numbers of files, and it's been verrrrrrry slooooow going.

Edited by Guest
Edited to add that I've updated to 7.7 when it came out, and I can't tell if the 64bit version is helping yet. Grooming still takes forever though.
Link to comment
Share on other sites

while masking out the same directories in the selector we're using. (Thereby bypassing most of the many files in the backup.)

If you did this by a selector "exclude", then it's not doing what you think it is. Retrospect does the filesystem tree walk and then applies the selector on the resulting files. So, using the approach you have chosen, the entire filesystem and all of the gazillion tiny files are having to be scanned to only get the tarball file(s).

 

A better way might be to dump the tarball files into a single directory and designate that directory as a Retrospect "subvolume". Then only that "subvolume", and filesystem branches below that subvolume, will be scanned. That's where you will see the big win.

 

Russ

Link to comment
Share on other sites

Thanks for the reply, and I'm frankly bowled over by this. We did the tarball business to avoid the "treewalk", and I did actually what you're warning about -- I thought I was masking off this set of files in the selector. It would explain a lot about the times and backup catalog file sizes we've seen.

 

To be sure -- I have a system with a subdirectory structure full of many small files. I'll refer to it as /home/manyfiles. My regular backup set for this machine includes /, /boot, /var and /home.

 

I can either prevent the directory from being scanned, or set it into its own backup set, by creating /home/manyfiles as a subvolume? Once defined, would the subvolume be included in the current backup set, or do I still need to do something to mark it to be avoided when /home is backed up? (That is, if the directory is a defined subvolume, it only gets scanned and backed up if I include the subvolume in the selector?) I'm not sure I'm clear on the way to use it. I went back to the manual and it said if both volume and subvolume are defined, the volume will still scan the subvolume. Or did I read it wrong...

Link to comment
Share on other sites

Thanks for the reply, and I'm frankly bowled over by this. We did the tarball business to avoid the "treewalk", and I did actually what you're warning about -- I thought I was masking off this set of files in the selector. It would explain a lot about the times and backup catalog file sizes we've seen.

I'm just a user like you, and I am just reporting the behavior. It's the way the algorithm has always been, and it's because of some of the complexities of what is possible in the selector settings. Not to say that a more complex algorithm couldn't do better, but it is what it is.

 

I can either prevent the directory from being scanned, or set it into its own backup set, by creating /home/manyfiles as a subvolume?

Hmmm... I've never tried performance metrics on the "prevent from being scanned" part, so I can't comment. I do, however, exclude subvolumes from our volume backup (we have special reasons, different from yours, for using subvolumes, and that's outside the scope of this thread).

 

But, as for the second part, this has nothing to do with making the subvolume "its own backup set". It simply presents another source (or destination) to choose.

 

So, you could have a bunch of subvolumes dumping into a common backup set, etc., or into different ones. It's all in how you set up the script. It just supplies a prefix (starting point) for the treewalk's path.

 

Once defined, would the subvolume be included in the current backup set, or do I still need to do something to mark it to be avoided when /home is backed up?

yes, if it's in the tree down from the current source's root; yes. It's just a shortcut for the prefix on the path.

 

(That is, if the directory is a defined subvolume, it only gets scanned and backed up if I include the subvolume in the selector?)

No. See above.

 

I'm not sure I'm clear on the way to use it. I went back to the manual and it said if both volume and subvolume are defined, the volume will still scan the subvolume. Or did I read it wrong...

The manual is correct. See above. Subvolume definition has no effect on using the entire volume as a source (unless you explicitly put the subvolume as an exclude condition in the selector).

 

Russ

Link to comment
Share on other sites

Getting there. I defined seven subvolumes of /home and then selected /home but not the subvolumes in the script source chooser. The selector being used for the script is still defined as all files except the seven subdirectories (I used unix path matching statements). When the script runs in the middle of the night, one of three things should happen. Either:

 

1. The entire /home directory, including the subvolumes, will be both scanned and backed up. This would mean both the subvolumes and the path selector was ignored, and I don't expect this.

 

2. The entire /home directory will be scanned but then the subvolumes will be avoided because of the selector pattern. Since there are so many files in the subvolumes, this would make the catalog large and scanning long. I'm trying to avoid this.

 

3. Everything but the subvolumes will be scanned and backed up. This is what I hope will happen.

 

I'll let you know tomorrow. -Gary

Link to comment
Share on other sites

You should be able to preview without doing the backup. That would answer your question.

 

I [color:red]STRONGLY[/color] suggest you preview what will happen when you test selectors with path specification. I've found that Retrospect, because of its history and cross-platform support, often doesn't do what you expect (issues of leading and trailing "/" or "\", etc.) with path specification.

 

Set up a non-scheduled script, go all the way to having it execute before you cut it loose, look at the list of files that it will back up.

 

Just a suggestion. Backups take a long time and (for us) eat up a lot of tape.

 

russ

Link to comment
Share on other sites

You were right, of course. The system scanned the whole disk first, but then had de-selected the subvolumes I had created to be avoided. I let that stand for now, and I opened a ticket with EMC to help me further.

 

I think I'm also going to restructure the backup sets so I have four weekly sets per client (I have two per client now) and set the first day of each to be a recycle backup. That way I should get one image backup and six incremental backups per week and it will reset the backup set so it won't have multiple sets of 10+ million files cataloged, but I will still have three other sets for redundancy and file versions.

 

Thank you for helping me understand things.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...