Jump to content

Retrospect Backup Strategy


Recommended Posts

I'm a consultant who has been using Retrospect Single Server for a while now, so I'm pretty familiar with how it works, but up to this point, most of my clients have only wanted to keep a rolling backup of maybe 2 months or so. If a client wants to retain some data indefinitely, that's a pretty small amount of data, and we back it up manually.

 

I just got a new client who has about 200GB of data on their server and another 200GB of data spread across several workstations. So that is 400GB of constantly changing data, live on the drive.

 

They want to retain everything. Forever.

 

I was figuring on running a disk-to-disk backup that would alternate between two backup sets every month. This covers me for the short-term recovery of files that might accidentally be deleted.

 

What are your suggestions for archiving data on a long-term (forever) basis? To me, tape drives are like dinosaurs. I haven't seen one in a long time, but is that my only option? Online services seem pretty expensive too.

 

 

--Matt

 

 

Link to comment
Share on other sites

I'm looking at at least double this amount of data daily, however most of that needs only be retained for a short time. So automatic grooming is probably my most favourite function of Retrospect. What needs to be archived gets archived in three locations on different media in a regular file system, so it's relatively future savvy.

 

We've extensively used tape in the past and I for us it was the best solution to switch to hard disk backup for its speed advantage.

 

Recently one of our clients wanted their online data, besides double offsite, extra secure offline as well. So at the moment we are testing an RDX cartridge system. This looks like the future tape replacement technology. However an RDX cartridge costs about four times as much as an LTO-4 cartridge. This regrettably is pure marketing, as the technology itself should only make it twice as expensive at the most.

 

Problem with LTO-4 is, while it is fast for a tape drive, it is slower compared to technology like RDX.

 

400 GB backup and compare would take about 7,5 hours and verification another 7,5 hours. RDX would do that in 6 hours and verification in 4,5 hours. So that is 15 hours compared to 10,5.

 

For us that difference is important. But for your client the costs would be a lot higher.

 

The only cheap option would be to use a removable eSATA based 3,5" hdd drop in bay. An 500GB hdd arguably costs about the same as an LTO-4 cartridge offering slightly more storage capacity. It's also faster than an RDX-system, so it will do a backup in even less time. The only drawbacks are the space physical storage of 3,5" disks take up and the fact you insert an physically unprotected disk in the drop in bay. That last reason is why we chose RDX (currently testing).

 

so I guess I'm saying hard disks might still be a feasible option with advantages and drawbacks compared to tape.

Link to comment
Share on other sites

This looks like the future tape replacement technology.

 

To me, this is another rehash of very old technology, http://en.wikipedia.org/wiki/Syquest and http://en.wikipedia.org/wiki/Iomega_Bernoulli_Box

 

Iomega has current a competing product, REV. See http://go.iomega.com/en-us/products/removable-storage-rev/?partner=4760

 

However an RDX cartridge costs about four times as much as an LTO-4 cartridge.

 

At $334.40 for a 500GB RDX cartridge (see http://www.iunitek.com/iunitek/index.cfm?fuseaction=shop.dspSpecs&part=2782502 ), the cost works out 67 cents a gigabyte.

 

At $41.64 per 800 GB LTO4 cartridge (see http://www.issidata.com/shopexd.asp?id=27738&source=PriceGrabber&zmam=90031077&zmas=25&zmac=99&zmap=183906 ), the cost works out to 5 cents a gigabyte.

 

67/5 = 13.4 times more.

 

Another point to consider is that RDX says it's unrecoverable error rate is 10-14, whereas LTO4 is 10-17 (see http://www.qualstar.com/146250.html ). That's a thousandfold difference.

 

An 500GB hdd arguably costs about the same as an LTO-4 cartridge offering slightly more storage capacity.

 

300 GB is slightly? There's also the error rate difference and there is also the chance of an electronic failure in an RDX/hard drive.

Link to comment
Share on other sites

Maurice, no offence, but you sound like a tape-evangelist. ;)

 

One thing's for sure. I didn't realise LTO4 had 800GB native capacity drives, but was in the understanding 400GB was the current maximum. Also LTO4 tapes can be found a lot cheaper nowadays. Which is a good thing, when storage needs grow sow much.

 

To me, this is another rehash of very old technology, http://en.wikipedia.org/wiki/Syquest and http://en.wikipedia.org/wiki/Iomega_Bernoulli_Box

 

Iomega has current a competing product, REV. See http://go.iomega.com/en-us/products/removable-storage-rev/?partner=4760

 

Arguably LTO tape is, like any curent magnetic storage medium, including hard disk technology based media, a "rehash of very old technology". Tape is actually older, 1928 versus 1956 for hard disk. But I don't see how this matters for the argument. the price per Gigabyte of harddisk has come down so considerably that it will eclipse tape in the near future. And while tape is a linear medium, hard disk has the enormous advantage of having random access.

 

Besides that, you compare RDX with Bernoulli and Syquest. And while both share basic disk technology, the difference is quite big. REV on the other hand is very similar to Syquest with the only difference the spindle motor is inside the cartridge. The heads however are not.

 

RDX on the other hand is just a 2.5" SATA hard disk inside a protective shell. Therefor the the major flaws of Syquest/Bernoulli/REV and alike have been addressed. Regrettably not the price per cartridge, which should be much lower considering the technology used (2.5" hard disks).

 

At $334.40 for a 500GB RDX cartridge (see http://www.iunitek.com/iunitek/index.cfm?fuseaction=shop.dspSpecs&part=2782502 ), the cost works out 67 cents a gigabyte.

 

At $41.64 per 800 GB LTO4 cartridge (see http://www.issidata.com/shopexd.asp?id=27738&source=PriceGrabber&zmam=90031077&zmas=25&zmac=99&zmap=183906 ), the cost works out to 5 cents a gigabyte.

 

67/5 = 13.4 times more.

 

I stand corrected. However you also need to take the initial costs of the system into account. What does a 800GB (native) LTO4 drive cost at that store (sorry, the link is not working for me, it times out). Plus most people need to buy a SCSI or SAS controller, how much does that add to the initial costs? The RDX drive's initial costs are very low. It's marketed more like an inkjet printer I suppose. You only need a free USB 2.0 port.

 

Because RDX uses 2.5" hard disks you can expect however larger capacity cartridges to appear. 640GB and 1000GB are just around the corner. And the beauty of it is you don't need a new drive for it. We can also expect cartridge prices to drop considerable, while it seems LTO can't get much lower (not that it really needs).

 

Another point to consider is that RDX says it's unrecoverable error rate is 10-14, whereas LTO4 is 10-17 (see http://www.qualstar.com/146250.html ). That's a thousandfold difference.

 

True, but that still is about 1 bit in every 113 Terabytes. And the original poster wants to keep data, not access it frequently.

Btw, where did you find that 10^15, because all I can find is 10^14 - which is regrettably even lower.

 

But I also wonder if that 10^17 figure is for the LTO drive itself? Because I would like to know what happens to that figure if you reuse the same tape... I would expect that 10^17 figure to come down a lot. Do they even consider the tape for that figure and not the SAS interface? Is there such a figure for the tape itself? For RDX the 10^14 figure is for the cartridge.

 

While we're throwing figures:

 

LTO4 drive MTBF is 250.000 hrs and the head lifespan is 'only' 60.000 hrs (more than 9 years). RDX per cartridge MTBF is 550.000 hrs and it can be inserted 5.000 times.

 

300 GB is slightly? There's also the error rate difference and there is also the chance of an electronic failure in an RDX/hard drive.

 

Like I mentioned I didn't realise there were 800GB native capacity LTO4 drives out there. Sorry for the mix up.

 

If an RDX cartridge fails you might be able to have a recovery service extract your data, just like they can with a regular hard disk (as this is just a regular hard disk actually). If your RDX drive fails, it's possible to access a cartridge by plugging it into a SATA port of a computer.

 

However there is still the argument of speed. 400GB of (lets say incompressible) data takes how many hours with a LTO4 drive? Can that even be done in less than 16 hours, including a verify pass? RDX and drop-in-hard disk-can, can LTO?

 

Truth be told, While we can argue about some technological details, Maurice did prove LTO still is an interesting technology. However it can also be argued hard disk is catching up rapidly. RDX is just an incarnation and also more targeted to replace lower end tape technology. LTO still has the edge in bulk scenarios.

 

Maurice, what interface for LTO-4 would you advise? SAS or SCSI? and what controller would suffice?

Link to comment
Share on other sites

Arguably LTO tape is, like any curent magnetic storage medium, including hard disk technology based media, a "rehash of very old technology". Tape is actually older, 1928 versus 1956 for hard disk. But I don't see how this matters for the argument.

 

You said RDX is a future tape replacement technology. So my argument is that it's really an old, not future, technology (which I don't expect will replace tape).

 

Maurice, no offence, but you sound like a tape-evangelist.

 

This certainly sounds about right! :banana:

 

What does a 800GB (native) LTO4 drive cost at that store

 

An LTO4 drive for a library configuration is close to $5000 (I know because I bought one), so there is, in fact, a big upfront cost. But it's beyond that because I can't imagine buying a standalone unit. At a capacity level suited for LTO4, you'd be using many tapes and it's impractical without a library. Similarly, a single RDX drive would be even more impractical given its smaller capacity. You need to invest in one of their "library" units. Now you could argue well, that a tape library is even more expensive. And it is true; tape libraries are very expensive. But in usage, I don't see an RDX system being very practical. You will need a lot of their "library" units to duplicate the capacity of a library. LTO4 is intended to handle capacities much larger than are currently practical with RDX.

 

Plus most people need to buy a SCSI or SAS controller, how much does that add to the initial costs?

 

This is very minor relative to the cost of an LTO4 drive.

 

However it can also be argued hard disk is catching up rapidly.

 

the price per Gigabyte of harddisk has come down so considerably that it will eclipse tape in the near future

 

1) RDX is currently over thirteen times the cost of tape.

2) The cost of LTO4 cartridges has halved in the last year.

3) There are plans for the industry to continue developing LTO; LTO5, which doubles capacity and performance, is supposedly due out in the first quarter of next year.

 

but that still is about 1 bit in every 113 Terabytes. And the original poster wants to keep data, not access it frequently.

Btw, where did you find that 10^15, because all I can find is 10^14 - which is regrettably even lower.

 

113 terabytes? 10^14 is 100 trillion bits; that's 12.5 terabytes. 10^17 is 100 quadrillion bits; that's 12.5 petabytes. I didn't see 10^15 anywhere, either.

 

I also wonder if that 10^17 figure is for the LTO drive itself?

 

I'm sure it's for the tape cartridge. The drive doesn't store any data. However, I'm not sure how reusing a tape affects this number. :confused2:

 

If an RDX cartridge fails you might be able to have a recovery service extract your data, just like they can with a regular hard disk (as this is just a regular hard disk actually).

 

This may be possible for tapes too. But because tapes are so cheap, you can keep duplicate tape sets.

 

However there is still the argument of speed. 400GB of (lets say incompressible) data takes how many hours with a LTO4 drive? Can that even be done in less than 16 hours, including a verify pass? RDX and drop-in-hard disk-can, can LTO?

 

If you believe the specs, RDX can't compete with LTO4, being about 1/3 the speed on SATA and 1/6 on USB. However, don't believe anyone's specs. For example, it should take just a hour or so to write 400 GB on LTO4! But for various reasons, I've never seen Retrospect even approach quoted speeds regardless of media type.

 

Maurice, what interface for LTO-4 would you advise? SAS or SCSI? and what controller would suffice?

 

I'm not familiar with tape libraries using SAS. And SCSI is giving way to iSCSI.

Link to comment
Share on other sites

I'm sure it's for the tape cartridge. The drive doesn't store any data. However, I'm not sure how reusing a tape affects this number. :confused2:

 

Well, that's what I find so odd about this number. According to Wipedia LTO tape lifetime tape durability is 200 passes (read/write). So, 200 x 800 MB = ±156 TB.

 

So I think we can assume that the 10^17 figure is for the drive itself when using brand new tapes for each pass. It's the theoretically best figure achievable. How each pass would influence this figure is anybody's guess.

 

If you believe the specs, RDX can't compete with LTO4, being about 1/3 the speed on SATA and 1/6 on USB. However, don't believe anyone's specs. For example, it should take just a hour or so to write 400 GB on LTO4! But for various reasons, I've never seen Retrospect even approach quoted speeds regardless of media type.

 

I'm currently testing an RDX drive (See http://forums.dantz.com/showtopic.php?tid/31568/) and have found a couple of shortcomings that are more Retrospect related. We are getting 18-20 MB/sec write and 24-25 MB/sec. read/verify. That's MegaBytes, not Megabits.

 

For LTO-4 (half height) the quoted speed is 80 MB/sec. Is this MegaBytes or Megabits? (I always assumed the latter, but your figures indicate the former. What can you actually achieve in Retrospect?

 

To get back on topic... For the original poster it would seem a stand alone LTO-4 unit would be the ticket. I feel for his application he doesn't need a library/robot system. He could even use WORM LTO tapes for legal purposes.

Link to comment
Share on other sites

An LTO4 drive for a library configuration is close to $5000

A quick check of CDW moments ago shows that there is a Tandberg (Exabyte) 7 slot SCSI LTO-4 StorageLoader for $2576, an 8 slot SAS LTO-4 StorageLoader for $4356, an 8 slot SCSI LTO-4 StorageLoader for $4199, a 24 slot SCSI LTO-4 StorageLoader for $5104, and a 24 slot SAS LTO-4 StorageLoader for $4984.

 

Russ

Link to comment
Share on other sites

We are getting 18-20 MB/sec write and 24-25 MB/sec. read/verify. That's MegaBytes, not Megabits.

 

For LTO-4 (half height) the quoted speed is 80 MB/sec.

 

LTO-4 is 120 megabytes per second uncompressed, but I get maybe 10% of that. But that's probably due to factors other than the tape drive.

 

Tandberg (Exabyte) 7 slot SCSI LTO-4 StorageLoader for $2576,

 

Very cheap. I'm impressed. :)

 

 

Link to comment
Share on other sites

LTO-4 is 120 megabytes per second uncompressed, but I get maybe 10% of that. But that's probably due to factors other than the tape drive.

 

I've yesterday ordered an LTO-4 unit. Hope I'll get a little bit more out of it than 12 MB/sec. That would be a problem, as I need to be able to write and verify at least 400-600GB within 8-12 hours. At least that is what we want. if we are going to get it is something else.

 

All I need it to do is transfer Snapshots from iSCSI storage to the LTO drive. The (Win 2008) server will be a completely new 24GB RAM dual Xeon Nehalem quadcore affair. Should be fast enough, and the cpu's/memory Retrospect can't use will be used for emergency VM's.

 

So you see Maurice, you've -really- converted/convinced me! :teeth:

Link to comment
Share on other sites

Transfer snapshots from iSCSI storage (gigabit ethernet) to a SCSI-connected LTO-3 (not 4) autoloader gave us the following results:

 

 

- 2009-08-30 15:43:44: Transferring from WinDesktops

Local Disk (C:) (2009-08-28 21:22) on Client "PC-TMT3"

2009-08-30 16:06:05: Execution completed successfully

Completed: 194492 files, 23,9 GB

Performance: 1128,0 MB/minute

Duration: 00:22:19 (00:00:40 idle/loading/preparing)

 

 

- 2009-08-30 16:06:05: Transferring from WinDesktops

Local Disk (C:) (2009-08-28 22:10) on Client "pc-og"

2009-08-30 16:17:39: Execution completed successfully

Completed: 51275 files, 14,4 GB

Performance: 1326,0 MB/minute

Duration: 00:11:30 (00:00:27 idle/loading/preparing)

 

 

- 2009-08-30 16:17:39: Transferring from WinDesktops

WinXP (C:) (2009-08-28 23:28) on Client "PC-JPA XP"

2009-08-30 16:42:58: Execution completed successfully

Completed: 142454 files, 32,2 GB

Performance: 1366,4 MB/minute

Duration: 00:25:17 (00:01:11 idle/loading/preparing)

 

 

- 2009-08-30 16:42:58: Transferring from WinDesktops

Local Disk (C:) (2009-08-29 01:25) on Client "pc-lt2"

2009-08-30 17:05:19: Execution completed successfully

Completed: 96053 files, 25,7 GB

Performance: 1209,8 MB/minute

Duration: 00:22:17 (00:00:36 idle/loading/preparing)

 

 

- 2009-08-30 17:05:19: Transferring from WinDesktops

UtvecklingVista (J:) (2009-08-29 01:11) on Client "PC-JPA XP"

2009-08-30 17:12:38: Execution completed successfully

Completed: 71870 files, 5,9 GB

Performance: 835,8 MB/minute

Duration: 00:07:14 (00:00:06 idle/loading/preparing)

 

 

- 2009-08-30 17:12:38: Transferring from WinDesktops

Local Disk (C:) (2009-08-29 02:55) on Client "PC-DJ2"

2009-08-30 18:15:44: Execution completed successfully

Completed: 476357 files, 82,4 GB

Performance: 1369,9 MB/minute

Duration: 01:03:01 (00:01:30 idle/loading/preparing)

 

 

- 2009-08-30 18:15:44: Transferring from WinDesktops

UtvecklingXP (D:) (2009-08-29 00:02) on Client "PC-JPA XP"

2009-08-30 18:44:25: Execution completed successfully

Completed: 347145 files, 33,7 GB

Performance: 1227,1 MB/minute

Duration: 00:28:39 (00:00:36 idle/loading/preparing)

 

2009-08-30 18:45:12: Execution completed successfully

Total performance: 928,4 MB/minute

Total duration: 12:22:08 (00:24:30 idle/loading/preparing)

 

 

 

Link to comment
Share on other sites

Thanks Lennart.

 

Our LTO-4 will have SAS instead of SCSI, so I might be able to improve on these a little bit (LTO-4 vs LTO-3 being the most important factor). Any ideas about verification. Is media verification needed, as LTO seems to provide a 'verify while you go' option. How does this work with respect to Retrospect?

 

Your iSCSI storage, might that by any chance be a DroboPro?

Link to comment
Share on other sites

Is media verification needed, as LTO seems to provide a 'verify while you go' option. How does this work with respect to Retrospect

If you turn on generation of MD5 digests (see page 270 of the Users Guide), then Retrospect (like all other modern backup programs) will calculate an MD5 polynomial value for each of the files it backs up, and will save that information away separate from the backup set (media set, to use the new terminology).

 

Then, at any time later, you can do the "media verification" and check the calculated MD5 polynomial for each of the files in the backup set against the MD5 value that was calculated at backup time, validating whether the data in the backup set is the same as what it was at the time of backup.

 

This eliminates all of the comparison errors with files that changed between the time the backup was written and when the comparison was done.

 

To my way of thinking, the "media verification" title is a misnomer, and a better term should be used. It's not merely verification that the backup set can be read successfully (often an issue with tape or optical media), but that the data for each file remains unchanged from what it was at the time the backup was made. The correct term might be something like "backup integrity validation" or some such.

 

Russ

Link to comment
Share on other sites

This eliminates all of the comparison errors with files that changed between the time the backup was written and when the comparison was done.

 

To my way of thinking, the "media verification" title is a misnomer, and a better term should be used. It's not merely verification that the backup set can be read successfully (often an issue with tape or optical media), but that the data for each file remains unchanged from what it was at the time the backup was made. The correct term might be something like "backup integrity validation" or some such.

 

I've used Retrospect's Media Verification for a long time, indeed it eliminates all compare errors when backing up 'live' clients. It's also faster IF your storage is faster than your clients are.

 

I agree the term is a bit of a misnomer (learned a new English word today) as it's not verifying the media itself, but verifying the written data. However the word "validation", feels a bit strong to me. My interpretation of validation is connected to a certain (and maybe variable) standard. So it's not really a black or white thing, which in case of date it just is (right or wrong). Maybe "backup integrity verification" would be better. However in this case the word backup might also be a problem. In theory there is a (very) slight chance that a correct MD5 hash is generated while there still is a compare error between source and target data. Though this is a very very chance indeed. Maybe that is why the former Dantz team came up with the word "media".

 

However my intended question was not about Retrospects verification methods, but if this process is needed with LTO.

 

From Wikipedia ( http://en.wikipedia.org/wiki/Linear_Tape-Open ) :

LTO uses an automatic verify-after-write technology to immediately check the data as it is being written[citation needed], but some backup systems explicitly perform a completely separate tape reading operation to verify the tape was written correctly. This separate verify operation doubles the number of end-to-end passes for each scheduled backup, and reduces the tape life by half.

 

I just wondered how this works in conjunction with Retrospect.

 

No, they are two HP Proliant DL100 G2, also called HP All-In-One server.

Ah, okay. I thought somewhere on the forum you said you (had?) used one. The Proliant will probably give you better concurrency. That doesn't seem to be a strong point of the DroboPro. On the other hand the DroboPro is very easy and simple. We only use it for some of our backup target storage. For that it has worked pretty good. I wouldn't run production data from it though. It's a little bit too 'closed' for that.

Link to comment
Share on other sites

The Wikipedia entry is correct. Most tape drives (not just LTO) have a separate "read after write" head and do a read after write comparison, returning a write error if the data doesn't match. Some technologies (e.g., VXA) spread the data over the block with error correction bits, and can correct some errors and even take corrective action (back up and erase the bad spot, leave a large inter-block gap over the bad spot). Retrospect handles those write errors "correctly", reporting an error and giving up on that tape (saw this happen when a DAT drive we had failed years back - could read but the write circuitry went bad).

 

The only problem with "read after write" comparison is that the tape is known to be exactly aligned over the head when that read is done, so it's really a best case situation that is testing for drop outs on the media. Isn't the same as another pass over the tape with slightly different alignment.

 

I haven't done the math recently and compared it against bit error rates for various tape media and tape technologies, but I seem to recall that the MD5 polynomial, while not a perfect assurance that every bit is accurate in the file, has an error distance that is comparable to the bit error rate for tape, such that your chance of getting a good MD5 match when, in fact, there is an error in the file is about comparable to not being able to read the tape a second time.

 

Russ

Link to comment
Share on other sites

Okay, seems like Retrospects media verification is the way to go.

 

About MD5, it has a known weakness that has been exposed in 2007, however in the sense of authenticity verification. I'm not sure how slim the chances are. Probably it's extremely rare, so much so that the chances of a mechanical tape/drive failure is significantly higher.

 

Interesting read about the weak spot of MD5: http://www.mathstat.dal.ca/~selinger/md5collision/

 

Maybe SHA1 could be integrated into Retrospect in the future, if that hash method is suited for that purpose.

Link to comment
Share on other sites

No' date=' they are two HP Proliant DL100 G2, also called HP All-In-One server.[/quote']

Ah, okay. I thought somewhere on the forum you said you (had?) used one. The Proliant will probably give you better concurrency. That doesn't seem to be a strong point of the DroboPro. On the other hand the DroboPro is very easy and simple. We only use it for some of our backup target storage. For that it has worked pretty good. I wouldn't run production data from it though. It's a little bit too 'closed' for that.

Sorry, no. Never even seen a Drobo in real life.

The Proliant servers doesn't do anything but acting as storage servers. And there is only one backup set on each, so there isn't any concurrent reading and writing either.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...