Jump to content

Retrospect taking 33 hours to backup 2.28TB


Recommended Posts

Recently I updated our Intel Xserve to Leopard Server 10.5.4. Previous backup durations under Tiger were around 18 hours but occasionally exhibited the same issue. At the start of the backup it indicates approx. 2900 MB/min but slowly degrades to 1300 MB/min. Once that happens Retrospect starts occupying 60 to 95% of the processors. RAM is never peaked, typically has 40% available at all times. The server is not hammered by user activity, typically less than 20 people connect to it.

 

Log snippit:

- 9/17/2008 8:37:13 AM: Copying Bank2…

9/18/2008 6:04:52 PM: Execution completed successfully.

Completed: 342180 files, 2.5 TB

Performance: 1300.2 MB/minute

Duration: 33:27:39 (00:13:38 idle/loading/preparing)

 

 

Details of the system are as follows...

 

Hardware:


  • Intel Xserve 2GHz Dual-Core (4GB RAM, an 8GB upgrade is awaiting installation)
    Xserve Raid (1.51, formatted as 2 x 2.28TB RAID 5 volumes)
    Apple 4GB Fibre Channel Card (Firmware 1.3.14.0)
    ATTO Celerity FC-41XS (3.25 driver installed, connected to tape drive)
    Dell PowerVault 132T w/ Fibre Channel

 

 

Software involved:


  • Leopard Server (10.5.4)
    Retrospect (6.1.230)
    Retrospect 6.1 Driver Update (6.1.15.101)
    SuperDuper! (2.5 v84, for drive synchronizing)
    Dell PowerVault RMU (210F.00002)
    Dell PowerVault Library (310D.GY004)

 

I am anxiously awaiting the Intel optimized version, I have terabytes of archive files that would need to be transcribed if I was forced to change backup applications. Any assistance would be greatly appreciated.

 

 

Edited by Guest
More details available
Link to comment
Share on other sites

Recently I updated our Intel Xserve to Leopard Server 10.5.4. Previous backup ... under Tiger ... occasionally exhibited the same issue

 

If what you are reporting also occurred under 10.4, then the upgrade to 10.5 is a red herring.

 

> ... backup durations ... were around 18 hours

 

The total time to backup is only relevant if you are talking about the same amount of data. Are you describing observations of new (or Recycle or New Media) backups of the same Source? Obviously incremental backups will take less time.

 

> At the start of the backup it indicates approx. 2900 MB/min but slowly

> degrades to 1300 MB/min

 

The tape drive will not be able to write any faster then the speed it sees files. Do you see this slowdown on all Source volumes? Large and small?

 

> Xserve Raid (1.51, formatted as 2 x 2.28TB RAID 5 volumes

> Completed: 342180 files, 2.5 TB

So "Bank2" is a volume that is larger then either of the RAID volumes? Please describe what your Source for this backup is.

 

> Next upgrade is a XServe RAID card

How is the Xserve Raid currently providing RAID 5 volumes without such a card?

 

 

David

(who is reaching, not having such big iron of his own to play with)

Link to comment
Share on other sites

Dave, it's a little confusing. The Xserve RAID is a storage box (no longer sold because it was way obsolete and never updated) that attaches to the Xserve via FiberChannel.

 

The Xserve RAID "card" replaces the Xserve's internal controller card, and provides a RAID controller for the Xserve's internal drives. With the Xserve G5, it was done by a PCI card (Apple rebranded LSI Logic Megaraid card) and recabling from the drives to the PCI card. With the Intel Xserve, it is done by a controller board swap.

 

By the way, the Xserve G5 Apple Hardware RAID card has a firmware bug that was never fixed, whereby the RAID card doesn't fully flush its write cache on graceful power down before disconnecting the drives from the bus. This was a very difficult bug for me to find and to develop a repeatable test case. RADAR ID 4350243. Only known workaround is to disable the write cache on the RAID card, which greatly reduces performance. Don't know if the Intel card has the same bug. Be sure to test, GMR, before deployment into production. It's a hard bug to find because it causes mystery garbage blocks from space at random places in the RAID 5 on only some drives. Causes bad data but the RAID does not think that it is degraded.

 

Not a happy camper that Apple never fixed this bug, which LSI Logic fixed in its firmware long ago (for its own card released to the PC market).

 

Russ

Link to comment
Share on other sites

Thanks for the responses, gentlemen. I'll go into the hardware configuration in a little more depth.

 

The Intel Xserve currently has 3 internal SATA drives, an 80GB system drive and two 750GB drives formatted as a RAID 1 software mirror. Connected via 4GB Apple Fibre Channel card is an Xserve RAID with 14 x 500GB drives, formatted as two 2.8TB RAID 5 volumes (Bank1 and Bank2) because each controller can access only 7 drives at a time. Another 750GB drive is ordered and an Apple Xserve RAID card is on hand for the Xserve itself to convert the volumes from a software raid to a hardware enabled RAID 5 architecture.

 

At 9PM every weeknight an application called SuperDuper! performs a "Smart Update" between Bank1 and Bank2. This replicates changes performed throughout the day on Bank1 to Bank2. The Smart Update typically takes about 20 minutes to do the replication between volumes.

 

Here is an example of the Retrospect activity...

 

Backups.jpg

 

The tape libraries are duplicated to another volume for recovery in case the primary hard drive goes down. This typically takes about 2 minutes.

 

With OSX Tiger Server the backups were typically 18 hours, the only time it ran over 24 hours was if I left another application running accidentally like Safari or one of the Server maintenance applications. Now that the backup system is routinely running over 24 hours it requires me to interrupt the backup cycle to perform recoveries from archive tapes when required.

 

To answer CallMeDave's queries...


  • The total amount of data typically is the same, it varies from 2TB to 2.7TB over a monthly cycle.
     
    Bank1 and Bank2 are essentially identical volumes replicated prior to backup operations, Bank1 is an online volume and Bank2 is an offline volume specifically maintained for backup operations.
     
    The write speed of the tape drive has not changed during this time frame, if it was able to backup 2.7TB in 18 hours previously I would expect it to handle it now.
     
    The Xserve RAID has two hardware controllers for the 14 drives within it's enclosure. The Xserve RAID card is an upgrade for the Intel Xserve itself to provide hardware RAID connectivity to the 3 drives it can contain.

 

I suspect there are new interactions between the Leopard OS and Retrospect that are creating the extended backup times. Maybe Apple has changed the way it is emulating the Rosetta routines in Leopard. Whatever it is it has gotten worse since the upgrade but I cannot rollback to the previous version of server software as the Xserve is being used as a System Update server for our connected clients.

 

 

Link to comment
Share on other sites

Just for comparisons sake, here a few log snippets from the backup log, same configuration and backup scripts as listed in the initial post. These are full backups, not incremental.

 

 

+ Recycle backup using Weekend at 8/23/2008 9:16 AM

To backup set Weekend-E…

8/23/2008 9:16:39 AM: Recycle backup: The backup set was reset

 

- 8/23/2008 9:16:39 AM: Copying Bank2…

8/24/2008 2:16:17 AM: Execution completed successfully.

Completed: 371430 files, 2.7 TB

Performance: 2734.6 MB/minute

Duration: 16:59:38 (00:13:39 idle/loading/preparing)

 

 

 

+ Recycle backup using Weekday at 8/25/2008 10:00 PM

To backup set Weekday-A…

8/25/2008 10:00:12 PM: Recycle backup: The backup set was reset

 

- 8/25/2008 10:00:12 PM: Copying Bank2…

8/26/2008 3:10:59 PM: Execution completed successfully.

Completed: 372567 files, 2.7 TB

Performance: 2730.0 MB/minute

Duration: 17:10:47 (00:13:49 idle/loading/preparing)

 

 

 

+ Recycle backup using Weekday at 8/26/2008 10:00 PM

To backup set Weekday-A…

8/26/2008 10:00:19 PM: Recycle backup: The backup set was reset

 

- 8/26/2008 10:00:19 PM: Copying Bank2…

8/27/2008 3:19:14 PM: Execution completed successfully.

Completed: 372677 files, 2.7 TB

Performance: 2712.8 MB/minute

Duration: 17:18:55 (00:14:47 idle/loading/preparing)

 

 

 

Link to comment
Share on other sites

Next upgrade is a XServe RAID card in hopes it will improve throughput

 

Throughput of what? If the XServe RAID card is not going to be used to control any Source volumes then it's probably not going to make any difference in this issue.

 

> I suspect there are new interactions between the Leopard OS and Retrospect

> that are creating the extended backup times

 

Your original post specifically stated:

 

"Previous backup ... under Tiger ... occasionally exhibited the same issue"

 

It's unlikely to be a Leopard specific issue if you were seeing it, even if only intermittently, under Tiger.

 

> The tape libraries are duplicated to another volume

 

The correct Retrospect terminology is Catalog. Tape libraries are robotic tape handling hardware devices.

 

> Copying Bank2

 

Have you tried defining a subvolume on the Bank2 volume, and using that as a Source? It would be interesting to know if the hardware is consistently this speed, or only when it's trying to backup the entire volume.

 

> These are full backups, not incremental

 

While not a solution to why you're only getting 1300.2 MB/minute now instead of the 2730.0 MB/minute you were getting last month (and Russ' suggestion of contacting ATTO about the FC controller is still on the table), I have to wonder why you're _not_ performing Normal backups? All that extra wear on the tapes, all the extra backup time, etc. Are your files so large that incremental backups during the week would consume too much media?

Link to comment
Share on other sites

For CallMeDave:

 

Currently our Archive volume is a pair of software mirrored 750GB drives, the system drive is 80GB and is not mirrored. Shortly after the Leopard update Retrospect locked up the GUI and we were forced to reboot, after which the OS (10.5.3 at the time) was corrupted and required re-installation. It was decided that the hardware RAID card would allow us to mirror the boot drive as well as provide additional storage. Offloading the RAID duties to a dedicated card instead of forcing the computer to deal with data replication was my basis for the 'throughput' comment.

 

I still suspect it is an issue with Leopard Rosetta emulation. Our Tiger configuration exhibited the slow write problem once or twice a month, not enough of an issue to be concerned about. The current configuration has made the slow writes the rule instead of the exception. The examples of the high speed transfers posted earlier were immediately after a server restart on the August 23rd, right after the rebuild of the server mentioned in the previous paragraph. Shortly after that backups began to slow down which prompted the initial post on September 18th.

 

Pardon me for my incorrect usage of the term 'libraries', I'll endeavor to be more accurate in the future.

 

I will change the scripts to look at the folder currently containing the data as a subvolume to see if anything changes. As it is the Bank2 volume contains one folder, which in turn is a 'Share Point' within Leopard Server. It will take just a few minutes to do and will not effect other backup operations.

 

As to your question about using Normal backups, maybe it is time to switch to incremental backups instead of relying solely on full recycle backups. Personally I like the security of having a complete set of tapes every time the backup is run. Tape wear is not a real consideration since 1 of the 2 weekday rotations get pulled and replaced at the end of the month and are stored offsite as a monthly backup. At the end of the year one of the 6 week rotations are pulled and replaced and held indefinitely.

 

 

For rhwalker:

 

Thank you for your suggestion for contacting ATTO about configuring the card for the tape drive. I have filled out their support form and hope they can provide me a suggestion or two. I pulled the ATTO log and noticed an occasional SCSI error so I want to follow up on that.

Link to comment
Share on other sites

For rhwalker:

 

Thank you for your suggestion for contacting ATTO about configuring the card for the tape drive. I have filled out their support form and hope they can provide me a suggestion or two. I pulled the ATTO log and noticed an occasional SCSI error so I want to follow up on that.

Why not just give them a call? I've always been able to get through to them, and the support people are very knowledgeable.

 

Russ

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...