Jump to content

digimark

Members
  • Content count

    27
  • Joined

  • Last visited

Community Reputation

0 Neutral

About digimark

  • Rank
    Occasional Forum Poster
  1. Hi. I'm preparing to replace an existing linux (CentOS 3) server with a new CentOS 5 server. After I installed the machine to the rack and the latest Retro client to the OS, I edited the script that backs up the original machine to add the new client to it, with the expectation it will back up both machines to the same backup sets until I switch the server tasking, at which point I will remove the old machine from the backup set and the data center. Unfortunately Retrospect 7.7 Multi for Windows 2003 Server X64 doesn't seem to agree. The backup job starts (the script starts up at 1:30AM), nothing is recorded in the operation log, script log is empty, then I get a second startup message at 1:45AM when the next backup set for a different server kicks off. See this: + Retrospect version 7.7.203 Automatically launched at 1/7/2010 1:30 AM in user account BACKUPSERVER\User + Driver Update and Hot Fix, version 7.7.1.102 (64-bit) + Retrospect version 7.7.203 Automatically launched at 1/7/2010 1:45 AM in user account BACKUPSERVER\User + Driver Update and Hot Fix, version 7.7.1.102 (64-bit) + Normal backup using anothermachine at 1/7/2010 1:45 AM (Execution unit 2) To Backup Set anothermachine-blue... The script passes validation, the catalogs appear OK (no corruption) and there is plenty of disk space. The client can be reached from the server and seems to be set up OK. No pertinent messages in the event viewer, and the OS is fully patched. Any ideas? Thanks. -Gary
  2. You were right, of course. The system scanned the whole disk first, but then had de-selected the subvolumes I had created to be avoided. I let that stand for now, and I opened a ticket with EMC to help me further. I think I'm also going to restructure the backup sets so I have four weekly sets per client (I have two per client now) and set the first day of each to be a recycle backup. That way I should get one image backup and six incremental backups per week and it will reset the backup set so it won't have multiple sets of 10+ million files cataloged, but I will still have three other sets for redundancy and file versions. Thank you for helping me understand things.
  3. I hope I didn't complicate your topic unduly, but I think the catalog grew so large because it was rebuilding from a huge number of sessions from a filesystem with 10+ million files... I'm learning a lot in the last few days, as you might see from some of my other postings.
  4. Getting there. I defined seven subvolumes of /home and then selected /home but not the subvolumes in the script source chooser. The selector being used for the script is still defined as all files except the seven subdirectories (I used unix path matching statements). When the script runs in the middle of the night, one of three things should happen. Either: 1. The entire /home directory, including the subvolumes, will be both scanned and backed up. This would mean both the subvolumes and the path selector was ignored, and I don't expect this. 2. The entire /home directory will be scanned but then the subvolumes will be avoided because of the selector pattern. Since there are so many files in the subvolumes, this would make the catalog large and scanning long. I'm trying to avoid this. 3. Everything but the subvolumes will be scanned and backed up. This is what I hope will happen. I'll let you know tomorrow. -Gary
  5. Thanks for the reply, and I'm frankly bowled over by this. We did the tarball business to avoid the "treewalk", and I did actually what you're warning about -- I thought I was masking off this set of files in the selector. It would explain a lot about the times and backup catalog file sizes we've seen. To be sure -- I have a system with a subdirectory structure full of many small files. I'll refer to it as /home/manyfiles. My regular backup set for this machine includes /, /boot, /var and /home. I can either prevent the directory from being scanned, or set it into its own backup set, by creating /home/manyfiles as a subvolume? Once defined, would the subvolume be included in the current backup set, or do I still need to do something to mark it to be avoided when /home is backed up? (That is, if the directory is a defined subvolume, it only gets scanned and backed up if I include the subvolume in the selector?) I'm not sure I'm clear on the way to use it. I went back to the manual and it said if both volume and subvolume are defined, the volume will still scan the subvolume. Or did I read it wrong...
  6. I've been reading forum postings and support documents and I need to bounce a few questions off of anyone willing to answer. First -- my setup is a backup server and four other clients, all CentOS linux webservers. The backup server is a dual Xeon 2.33Ghz 3065, 4GB RAM, Win2K3 X64 Server, (2) 1.5TB disks partitioned into a RAID1 50GB C: and two 1.3TB E: labeled "One" and F: labeled "Two". I also just upgraded from RS 7.6 MultiServer to 7.7. The backup server runs only Retrospect and is dedicated to the task. The backup server and two of the linux machines are relatively normal, and backups/restores go fast and fine with them. But the other two linux machines have a directory structure under /home with 4-5 million small files each. These two machines have given me fits to manage backups and restores. I originally had all the RBC catalog files on the 50GB c: drive; I've learned this isn't enough room so I moved the four big client catalog files to the F: drive where there was a few spare 100's of GB of space, leaving the six normal-sized catalogs where they were. I need rotating backups, so originally I set all five machines into two backup sets, a "-red" and a "-blue", with one being saved to the first big disk "One" and the other to the second "Two". This was just too much and took too long to run; I then set each machine into their own "-red" and "-blue" backups (for a total of ten backup sets) -- half running every other week. Flash forward eight months, and I'm beginning to run out of room on the two disks holding the backups. I have not run any grooming or recycle scripts and the catalog files are getting quite large and hard to manage. Anytime there is a bobble or a program crash, there's a good chance I have to recreate one or more catalog files, and for the two linux servers with many files, that takes a good six hours and currently about 120GB of spare catalog disk space each. Not really understanding anything about how grooming and recycling work, I chose one of the client's "-red" backup sets, and started a manual grooming run, thinking this is the right way to reduce the disk space used. I also held up the running of any other scripts to maximize performance. After almost exactly 24 hours, this job has switched from "Matching..." to "Grooming segment 60 of 2,157" and is still processing. Also, Retrospect has reserved 2.2GB of memory according to the Task Manager. (EDIT: After 26 hours, the job crashed with error -2241, Catalog file invalid or damaged. Note that just before the grooming session, I had rebuilt the catalog...) Clearly this is not the way to handle space needs on an ongoing basis -- even if the currently running job finished now I still have three other huge backup sets to clean up. On to the questions: 1. Is it surprising to see grooming one backup set containing 2,157 segments, hundreds of sessions and millions of files take 24+ hours on a relatively fast machine doing nothing else? 2. The machine is running a 64bit OS (Win 2K3 Server) with 4GB RAM and paging file set to vary between 2GB and 4GB. How much would 4GB more RAM help? What would the best paging file size be (in the System control panel) -- system managed or a flat 6GB or 8GB? 3. Unless I want to let the disk backup set grow without bound by accumulating incremental snapshots (which is clearly not sustainable), at some point I need to be scheduling Recycle scripts. But a recycle script in this setup will dump all the contents of the backup set and start fresh. If the other disk failed during this, I'd lose all my backups. (I don't know why, but I had thought Retrospect would automatically handle deleting the oldest snapshots in order to keep the disk space available in a certain range based on the "Use at Most... or Percentage" setting in the device properties. This was set to 99% originally but I've just set it to 80%.) I don't want to dump so many versions of backups all at once. So it would seem that I need to break up my backup sets further -- with more color iterations ("-green", "-yellow") and change each from the current cycle of every other week to something like every fourth week, let this run for 2-3 months and begin each cycle with a recycle. This would give me multiple sets of image and incremental backups of each client for several months, without accumulating them forever, true? 4. Are there any EMC/Insignia/Dantz white papers on suggested disk-based backup-set schemes that would address this? The manual wasn't helpful on this subject. 5. If I recycle backup sets reasonably often, where does grooming come in? Does the number of snapshots Retrospect keeps while grooming have any meaning if you never groom? What about the device properties "Use at most" setting? How does that work? I realize this was a lot to read through -- I really appreciate your input. -Gary
  7. I'm just taking a moment to update this post with what I've learned in the interim, in case it proves useful to someone. First, I've learned that Retrospect (at least up to and including 7.6x Windows Multiserver version) doesn't handle millions of small files per client very well at all. We were able to restore some performance by having the linux machine with all the little files make a special, once/day tar file of the directories with many files, and then having Retrospect back up the tar file, while masking out the same directories in the selector we're using. (Thereby bypassing most of the many files in the backup.) Of course this means we have to restore the large tar file, and then search through it for the file(s) we want, but it's made backup and restore possible in human time. 2. I had set up a 50GB RAID1 partition for the system software (c: drive) across the (2) 1.5TB drives, and created two backup sets per client across the remainder of the 3TBs of disk. I had created a 1.3GB "One" file system, and a second 1.3GB "Two" partition. Then I had "client-red" backup to "One" and "client-blue" backup to "Two". I had left the backup set catalogs on the c: drive (in the 50GB). Didn't realize how critical the c: drive partition is to Retrospect operations -- I'm guessing part of my problem was that this partition was filling up and not leaving enough space for things to work. I moved the catalogs for "-blue" from the c: partition to "One" (where the "-red" backup sets were stored) and the "-red" catalogs to "Two". (If I lose disk "One", I'll lose half of the backups and the catalogs from the other half of the backups. Not ideal, but recoverable in a system with only two disks. 3. As the "One" and "Two" partitions are filling up, I'm trying to adopt intelligent grooming practices, but these are also taking forever to run. The grooming operation for one client's "-blue" backup set has been running for almost 24 hours and counting, with very little sign of progress, even as the system does nothing else (I've stopped other scripts from running temporarily). There's 40GB of the 50GB c: available, so that's not the problem. TaskManager says that Retrospect is using about 2GB of reserved RAM while grooming, and the disk being groomed ("One") still has 61GB of 1.3TB free. The upshot is that I'm still learning where the bottlenecks are in this system of backup server and 4 other clients, two with large numbers of files, and it's been verrrrrrry slooooow going.
  8. I'm also having similar problems. Win2K3/X64 Standard Server, Retrospect 7.7, dual core Dell R200 4GB RAM, 50GB C:, 2x 1TB backup disks. Five backup sets with catalogs stored on C:, three are about 200M each, the remaining two are catalogs for machines that have several million files each -- the catalogs are usually about 2-4GB. One of the catalogs has zoomed up to the size of available disk space, currently about 39GB, and then failing. Lots of Disk Full and heap memory failure errors. From these forum messages and others, I moved the catalog file to another disk with 300GB free and started a catalog rebuild. The executing recatalog job been running for seven hours, the completed line says 3681103 files, 138.3GB and the rebuilding catalog file is at 72.2GB and continuing to grow. Should the rebuilding catalog file be 1/2 the size of the entire backup set? And how long should this process run? I'm afraid its stuck in a loop where it's not reading any more of the backup set but the catalog file is growing without bound...
  9. Hello. System stats: Retrospect Multiserver version 7.6.123, DU 7.6.2.101. dual Xeon 2.33Ghz 3065, 4GB RAM, Win 2k3 X64 SP2. Backup to (2) 1.5TB disks. Backup set I'm recovering from is 170.5GB for 5,272,372 files, available space is 1,176.6GB. Catalog file compression is enabled. The source is a quad Xeon 2.3Ghz CentOS 4 Linux system with 4GB RAM and a 1TB RAID 5 array formatted using ext3. I've been using this configuration for several months and getting very good performance. Today, I went to recover a single file from a snapshot from April 27th. I selected "Restore files and folders from a point in time", selected Advanced Mode, and began to select my source, destination and files chosen. I selected the specific snapshot I wanted, and that took a bit of time, not too long. When I went to select the file to restore, the application seemed to lock up -- hourglass flipping for 4-5 *hours* for each key press or click in the window. I had to recover the file and this was my only source, so I decided to see it through. Click on the first-level directory in the selector window, watch hourglass for 4 hours. Click the + next to it, wait four hours. Click in the scroll down area to move a page down, wait four more hours. No lie, it took me 48 hours to recover the file. The system task manager showed Retrospect was using 550MB of RAM during this time (Retrospect at idle is using 30.4MB). No other applications were running on the system, and Retrospect was doing nothing else the whole time. If I moved the cursor outside the Retrospect window, I could do other things, like launch Task Manager. So the system wasn't deadlocked; it was the application that was stuck. It seems obvious that the problem must be the number of files in the file system. My question is, what can I do to back up this file system and still make timely restores when requested?
  10. Just a note to close out this topic -- I was able to source a Windows 2K3 Server license, so that's what we went with. So far, it's worked well.
  11. digimark

    Slooow snapshot time vs. backup

    To close out the topic, I eventually learned that I could dramatically reduce the time by breaking up the backup into multiple backup sets. Instead of one job with five machines and 15 file systems, I have five backup sets, one per machine. Backups are much, much faster now, and (a plus) run concurrently. Although, I still struggle with two file systems that have 5+ million files each. It seems Retrospect *cannot* handle so many files in a file system very well.
  12. digimark

    Not enough application memory?

    More information -- now I'm getting "not enough disk space for the Catalog file (short by about 1,717,986,893.8 G). The C:\ drive on the backup host is a 50GB partition, with half left. There are two catalog files, holding the every-other week backup info, eac is about 8GB. It's not clear to me whether its the client or the backup host that's choking on this. Any thoughts? Please?
  13. Just had a new error pop up. It stopped the running backup and the rest of the backup set. Here's the log output: - 1/5/2009 8:48:23 AM: Copying /home on oboe TMemory: heap 39 1,457 K virtual 6 59.0 M commit 59.0 M purgeable 0 zero K Pool:pools, users 2 252 max allowed mem 614.0 M max block size 8,192 K total mem blocks 2 16.0 M used mem blocks 1 8,192 K file count, size 1 8,192 K requested 45 570.9 M purgeable 0 zero K avail vm size 1,769,930,752 B TMemory::mhalloc: VirtualAlloc(267.0 M, MEM_RESERVE) failed, error 8 TMemory: heap 40 1,459 K virtual 6 59.0 M commit 59.0 M purgeable 0 zero K Pool:pools, users 2 252 max allowed mem 614.0 M max block size 8,192 K total mem blocks 2 16.0 M used mem blocks 1 8,192 K file count, size 1 8,192 K requested 46 570.9 M purgeable 0 zero K avail vm size 1,769,930,752 B TMemory::mhalloc: VirtualAlloc(267.0 M, MEM_RESERVE) failed, error 8 Not enough application memory 1/5/2009 2:03:09 PM: Execution incomplete Duration: 05:14:46 (05:09:21 idle/loading/preparing) 1/5/2009 2:08:35 PM: Execution incomplete Total performance: 99.0 MB/minute with 49% compression Total duration: 12:33:10 (11:42:52 idle/loading/preparing) There was another backup client waiting for processing after this client in the set, so I'm guessing the problem is on the backup host and not the client being backed up? Any thoughts? (7.6.123 Retro, 4GB RAM dual Xeon 2.33Ghz Dell PowerEdge R200 server, Win2K3 X64 OS). Thanks. -Gary Edited to add that there is nothing else running on this machine besides ClamWin anti-virus. Also, this was in the Event Log for the system: From Retrospect: Script "Backup To Set A" failed during automatic execution, error -625 (not enough memory). Please launch Retrospect and check the log for details.
  14. digimark

    Slooow snapshot time vs. backup

    My apologies for not knowing, but what's the proper way to tell what state it's in at any given time? On the client or the backup host? The log file for the running backup job just says "Copying /home on bongo".
  15. digimark

    Slooow snapshot time vs. backup

    Thanks for the reply. I checked on the CentOS machine (bongo) while it's being backed up - here's the activity from the ps command: root 596 1 0 Nov16 ? 03:21:44 /usr/local/dantz/client/retroclient -daemon root 18811 596 5 02:05 ? 00:18:25 retropds.23 This was at about 7:17AM, after about 5 hours of the backup of /home running. If this is what we can expect of a snapshot performance for so many files, then I should ask: 1. Is there a way to run the backups concurrently instead of consecutively? 2. What is the network and CPU load on the client during the snapshot build? If it spends 95% of the 5 hour backup window preparing a snapshot, is it only the 5% time when it is transferring files to the backup server when the network is active? The backup server has two 1.5TB drives. They are partitioned into a 50GB and "the rest" partitions; the (2) 50GB partitions are then mirrored and the backups go to the 2.6TB of space on the remainder. The only thing on the C: mirror is the OS and the Retrospect software, plus the catalogs. The machine has 4GB of RAM.
×