Jump to content

Third tape in set full, even though first two tapes mostly empty


Recommended Posts

Hi all,

I recently started a new job, taking over from someone else, and this is my first time using Retrospect. We are using v14.6.2 on Mac OS 10.11.6 (El Capitan).

Our backup uses a single DLT drive with a tape loader that holds eight tapes. Each media set is three tapes, and it runs a full backup on a Friday, with daily incrementals up to the following Thursday, after which we switch over to the next set (which is preloaded in the tape loader). The recently finished set is then replaced with the next set. The data is fulled from a RAID that is directly attached to the server.

Each tape has a nominal capacity 10.3Tb, so the media set has about 31Tb of capacity. Our current weekly usage is around 6.5Tb.

Until this morning, I hadn't looked at how this is distributed across tapes, but last night tape 3 became full and it asked for a new tape. I'm not sure what data changed on the drive to cause the massive incremental change (we are a media company, and it's not unusual for big files to be added) but I was concerned that the software chose to ask for another tape, wait an hour and then give up mid-backup, despite there being plenty of space on the previous two tapes.

I have attached a screenshot of the media set.

Is this just the way it works, or is my backup not configured correctly? Is there something I can do to the sets to 'defragment' them so that the other tapes are used to their full capacity?

I have seen this or similar questions asked, with various suggestions proposed, and people asking about compression and tape errors. We haven't played around with the compression and there's no tape errors reported, backup speeds seem fine. Buying another tape is not practical because the magazine doesn't have space for four-tape sets and a cleaning tape.

Any help that can be offered would be greatly appreciated. As I said, I'm new to the software, haven't dealt with tape backups in fifteen years and am far too busy to spend too much time on something which really should be automated. I am happy to provide any logs or screenshots necessary to get the resolved.

Many thanks in advance,

Adam

backup_fun.png

Link to comment
Share on other sites

Retrospect writes each tape until it reaches the end-of-tape marker or until it encounters an unrecoverable error.

It would be interesting to see if the log shows any errors.

If the data is not fed to the tape drive as fast as the tape drive needs it, the tape station will write empty blocks on the tape, reducing the capacity. It looks like there is a lot of wasted space on members 1 and 2.

What is the source of the backup? A fast local disk? Clients on a slow network? Clients on a fast network?

Link to comment
Share on other sites

Hi,

thanks very much for the reply and apologies for the delay in responding.

The only error in the log is the one I described; where the tape is deemed full and it requests a fourth.

-  04/04/2019 19:32:50: Copying RAID
    Using Instant Scan
    04/04/2019 19:33:00: Found: 1683234 files, 187799 folders, 5.6 TB
    04/04/2019 19:33:12: Finished matching
    04/04/2019 19:33:22: Copying: 1361 files (26.9 GB) and 0 hard links
    stucFinished: [IBM|ULTRIUM-HH6|E6R3] incorrect scsiServiceResponse 0x1, scsiStatus 0x2
    stucFinished: [0|0|0] transaction result 0x6
    xopWrite: trouble writing, error -102 (trouble communicating)
    xopWrite: trouble writing, error -102 (trouble communicating)
    xopFlush: flush failed, error -102 (trouble communicating)
    !Trouble writing: "3-WeekThree" (3771793408), error -102 (trouble communicating)
    !Trouble writing media:
  "3-WeekThree"
error -102 (trouble communicating)


    Media request for "4-WeekThree" timed out after waiting .
    04/04/2019 20:42:41: Execution incomplete
    Remaining: 977 files, 19.2 GB
    Completed: 384 files, 7.8 GB, with 1% compression
    Performance: 2,194.6 MB/minute
    Duration: 01:09:51 (01:06:14 idle/loading/preparing)

    04/04/2019 20:43:06: Execution incomplete
    Total performance: 1,579.9 MB/minute with 1% compression
    Total duration: 01:12:41 (01:07:33 idle/loading/preparing)
    +  Normal backup using WeekFour at 05/04/2019, 19:00:00 (Activity Thread 1)
    To Backup Set WeekFour...
    05/04/2019 19:00:00: Recycle backup: The Backup Set was reset
    -  05/04/2019 19:00:00: Copying KerioConfigBackup on mailserve01
    Using Instant Scan
    05/04/2019 19:00:12: Found: 16 files, 1 folders, 24.4 GB
    05/04/2019 19:00:12: Finished matching
    05/04/2019 19:00:12: Copying: 16 files (24.4 GB) and 0 hard links
    05/04/2019 19:08:43: Building Snapshot...
    05/04/2019 19:08:44: Checking 1 folders for ACLs or extended attributes
    05/04/2019 19:08:44: Finished copying 1 folders with ACLs or extended attributes
    05/04/2019 19:08:44: Copying Snapshot: 2 files (202 KB)
    05/04/2019 19:08:49: Snapshot stored, 202 KB
    05/04/2019 19:08:49: Execution completed successfully
    Completed: 16 files, 24.4 GB
    Performance: 3,500.3 MB/minute
    Duration: 00:08:49 (00:01:41 idle/loading/preparing)

The source of the backup is a RAID drive attached directly to the server, so there shouldn't be any issue with speed.

If the space on tapes 1 and 2 is being wasted for some reason, is there a way of recovering it? We've now moved on to the next backup set, so these tapes will be out of rotation for the next couple of weeks.

I've also just noticed that only 1TB of data is on the tape, rather than ten, so is it possible that the tape is damaged and Retrospect stopped when it reached this error? Or could the RAID have disappeared so it just wrote 9TB of empty blocks?

Thanks again,

Adam

Link to comment
Share on other sites

Come to think of it, are you sure it's a DLT drive and not an LTO drive? 

LTO-6 has a native capacity of 2.5 TB and a compressed capacity of 6.25 TB (assuming 2.5:1 ratio). In reality such high level of compression is rarely achieved. Many types of files are already compressed today: All video files, most image files, word processing documents, PDFs just to name a few.

3 tapes times 2.5 TB=7.5 TB and that is enough for your about 6 TB of data.

It is a bit strange that the capacity varies widely between your media sets. Do they consist of different kind of tapes?

Also I'm not sure how Retrospect actually calculates "Free" and "Capacity". It might be a miscalculation there. If so, there is probably not a waste of capacity as I wrote earlier, as a local RAID would be fast enough to feed data to the tape drive.

Yes, Retrospect stopped because it encountered an irrecoverable error with the last tape. It might be a tape that has gone bad or you just need to run a cleaning tape.

The error was "Trouble writing", so it can't be trouble reading the RAID.

Link to comment
Share on other sites

You're absolutely right, it's LTO, not DLT and they are all the same. And I agree about the capacity, having thought about it now, in which case it's doing exactly what it should be. I will have a proper read up on it so that I know exactly what the state of play is. We can't practically go above three tapes per set, so if they are approaching capacity, I'll have to archive some of the data on the RAID.

Anyway, it looks like it probably was just an error on the third tape, and it is an isolated incident. I guess I'll find out in four weeks time! If it does go wrong again, I'll replace the tapes.

Thank you so much for your help.

Link to comment
Share on other sites

Adam Ainsworth,

DLT is not "all the same" as LTO.  As the only sentence in the second paragraph of the Wikipedia article says, "In 2007 Quantum stopped developing DLT drives, shifting its strategy to LTO ."

Make sure that you are really running the Retrospect 14 Engine, not just the Retrospect 14.6.2 Console—which can be run with Engines as far back as Retrospect 12.5.  Page 240 of the Retrospect Mac 14 User's Guide says Retrospect 14.0  "Fixed issue where auto-cleaning request for tape devices was ignored (#6171)".  Pages 49-51 of the UG covers "Cleaning Your Tape Drive".

Also see this post regarding automating use of a cleaning tape on your tape drive.   For the Retrospect Mac 14 User's Guide, the relevant page numbers are 49-51and 44.

P.S.: Added 2nd paragraph.

P.P.S.: Added 3rd paragraph.

Link to comment
Share on other sites

1 hour ago, twickland said:

The peculiar reported capacity of these tapes may be related to the media set locked-unlocked bug I reported here back in 2015 that still has not been fixed.

twickland,

You should file a Support Case for that bug, in case the head of Retrospect Tech Support forgot to feed it into their evidently-sketchy bug list.  You'd basically just need to copy the contents of your 2015 Retrospect Bug Reports post into the Case; I'm not posting this suggestion in that thread because I don't know if anybody reads that sub-Forum anymore.

Link to comment
Share on other sites

To my eyes, that looks more like a SCSI error causing a premature end to tape writing than an error with the tape itself. Retrospect would, in that case, still move on to the next tape so the effect is the same.

Adam, you still haven't said what tapes you are using. I'm guessing from...

IBM|ULTRIUM-HH6|E6R3

...that Lennard's right and it's an LTO-6, and that Retrospect's reported nominal capacity is bogus. If we are right then, given that you are a media company and probably dealing with already-compressed video etc,  I reckon the "space used" on tapes 1 and 2 is both correct and the best you can expect (they're full but mis-reporting spare capacity), i.e. 2.5-3TB per tape because there's minimal compression, while tape 3 "finished early" because of the comms error. So each 3-tape set is barely big enough for your current weekly usage, even without problems like this.

But more info would help here -- what are the tapes, what's the drive and firmware version, what machine is the server on and how is it connected to the tape library, etc.

Oh, and I wouldn't recommend using Retrospect's "automated cleaning" with LTO drives. The drive itself should flag a cleaning requirement for you, either on its display or via remote management, when it needs it rather than after a certain amount of time.

Link to comment
Share on other sites

Apologies - work is getting on top of me this week. Thank you for your replies.

Correct, it is an LTO-6 drive (an IBM Ultrium HH6, as you noted). The drive and the RAID are both connected to the file server, and the tapes are managed by an 8 tape library, which is also attached directly to the server. I've attached a screenshot from Retrospect and more info on the drive can be found at https://www.ibm.com/uk-en/marketplace/ibm-lto-ultrium-6-data-cartridge

It appears that the drive isn't set up for automatic cleaning, but I can find no record of a cleaning cycle in the logs, and I've not been asked by Retrospect to do it in the time I've been here (two months).

I have been fooled by the spare capacity and you are indeed correct in that we are running out of space. A great deal of the data is video and large image files, although there is also a lot of code and SQL, which I would expect it compress somewhat. However, I will probably need to look in to either reducing the size of the backup or finding another solution (which I have been thinking about anyway as it is a bit of a faff every Monday morning, as this isn't the only backup process we have).

backup_devices.png

Link to comment
Share on other sites

24 minutes ago, Adam Ainsworth said:

there is also a lot of code and SQL, which I would expect it compress somewhat

Let me put it this way. Assuming your code and SQL scripts contains about the same amount of space as the whole contents of the bible. That would be a huge amount of code, actually. But it would still be only about 3.5 MB. Yes, it compresses easily, but in terms of space use it is negligible compare to 6 TB. :) 

Link to comment
Share on other sites

6 minutes ago, Lennart_T said:

Let me put it this way. Assuming your code and SQL scripts contains about the same amount of space as the whole contents of the bible. That would be a huge amount of code, actually. But it would still be only about 3.5 MB. Yes, it compresses easily, but in terms of space use it is negligible compare to 6 TB. :) 

I'm not sure what code bases you work on, but the WordPress install alone is 11.5Mb compressed (https://en-gb.wordpress.org/download/), and we have hundreds of them. When you add in themes and plugins,  uploaded assets, mature DB dumps, and other paraphernalia, a good proportion of our client sites come to 100s of Mb. They all need to be backed up in their entirety, because if the worst happens, we need them back as quickly as possible. 

Link to comment
Share on other sites

I work with C++, Pascal, C# and Dataflex. All of them have development software that are some MB to download. But the code you write is quite small, by comparison.

And in my world, a database dump is not SQL.

Themes, plugins and database dumps are (usually) not easily compressed, which can be seen in the log: 1% compression. 

So thanks for the clarification. I understand better now. :) 

 

Link to comment
Share on other sites

17 hours ago, Adam Ainsworth said:

Correct, it is an LTO-6 drive (an IBM Ultrium HH6, as you noted). The drive and the RAID are both connected to the file server, and the tapes are managed by an 8 tape library, which is also attached directly to the server. I've attached a screenshot from Retrospect and more info on the drive can be found at https://www.ibm.com/uk-en/marketplace/ibm-lto-ultrium-6-data-cartridge

It appears that the drive isn't set up for automatic cleaning, but I can find no record of a cleaning cycle in the logs, and I've not been asked by Retrospect to do it in the time I've been here (two months).

I have been fooled by the spare capacity and you are indeed correct in that we are running out of space. A great deal of the data is video and large image files, although there is also a lot of code and SQL, which I would expect it compress somewhat.

Ah, but what is the library, how is it attached, and to what?

I'll take a punt that it is a FlexStor II 1-U, fibre connected. On the face of it that is supported by Retrospect, but so is my fibre-connected Scalar 24 *except* the fibre connection is via a Thunderbolt to fibre adapter, which introduces its own problems -- I see similar SCSI errors to the ones you showed when my adapter has a "moment", and the tape finishes before it is "full" (note that on a sequential-tape system you can't go back and "fill up the gaps" later).

You'll see RS's automated cleaning option if you click on "Options" when the drive (not the library) is selected as in your screenshot above. It is in "hours of use" and by default it is "0" (never) and I'd suggest you keep it at that. Cancel the cleaning slot and keep the cleaning tape in your desk for when needed and that'll give you an instant 33% bump in set capacity simply by adding a tape to each, which will buy you time to sort things out.

Get the library on the network and use the management GUI to monitor the cleaning requirement instead. How often you'd need to manually load the tape and run a cleaning cycle will depend on your environment but, IMO, most people do it far too often given that LTO drives self-clean every time a cartridge is inserted. The drive/library should tell you when it is required: on the front panel there's probably a "Clean" indicator, and it'll show in (and should be set to email/Slack/SMS/whatever an alert from) the network management UI.

And note on backing up the WP installs: Lots of small files will (almost) always take up more space on tape than the same amount of data in one single file -- assuming compression is the same, an unchanged WP install will use more tape capacity than the original installer. Also, many of the WP files across your client folders will be unchanged and thus the same and, under RS's default "don't copy the same thing twice" setting, don't actually get backed up a second/third/100th time and so don't contribute to the "compression percentage". So, across the whole set, the WP *code* contribution is minimal compared to media, assets, etc your clients have uploaded (which, I think, is the point Lennart was trying to get across).

Hope some of that ramble helps...

Link to comment
Share on other sites

1 hour ago, Nigel Smith said:

Ah, but what is the library, how is it attached, and to what?

I'll take a punt that it is a FlexStor II 1-U, fibre connected. On the face of it that is supported by Retrospect, but so is my fibre-connected Scalar 24 *except* the fibre connection is via a Thunderbolt to fibre adapter, which introduces its own problems -- I see similar SCSI errors to the ones you showed when my adapter has a "moment", and the tape finishes before it is "full" (note that on a sequential-tape system you can't go back and "fill up the gaps" later).

It's a Neoseries of some description - I can't get round the back to see. I'm not sure of its age, but it hasn't even been turned off in nearly three years. The connection is via fibre though, and I'm guessing that some kind of interruption caused the problem. Given that it was never mentioned to me by my predecessor, it had either never happened before or was very rare.

1 hour ago, Nigel Smith said:

You'll see RS's automated cleaning option if you click on "Options" when the drive (not the library) is selected as in your screenshot above. It is in "hours of use" and by default it is "0" (never) and I'd suggest you keep it at that. Cancel the cleaning slot and keep the cleaning tape in your desk for when needed and that'll give you an instant 33% bump in set capacity simply by adding a tape to each, which will buy you time to sort things out.

Get the library on the network and use the management GUI to monitor the cleaning requirement instead. How often you'd need to manually load the tape and run a cleaning cycle will depend on your environment but, IMO, most people do it far too often given that LTO drives self-clean every time a cartridge is inserted. The drive/library should tell you when it is required: on the front panel there's probably a "Clean" indicator, and it'll show in (and should be set to email/Slack/SMS/whatever an alert from) the network management UI.

The hours of use is set to zero, and there is a light on the front, so I guess it has always been done manually. I'm not sure about messing around with the config right now as I have a few deadlines and Easter approaching, so I don't want to rock the boat too much. The same goes with getting it on the network - I don't know how practical that would be or whether it would be worth it.

1 hour ago, Nigel Smith said:

And note on backing up the WP installs: Lots of small files will (almost) always take up more space on tape than the same amount of data in one single file -- assuming compression is the same, an unchanged WP install will use more tape capacity than the original installer. Also, many of the WP files across your client folders will be unchanged and thus the same and, under RS's default "don't copy the same thing twice" setting, don't actually get backed up a second/third/100th time and so don't contribute to the "compression percentage". So, across the whole set, the WP *code* contribution is minimal compared to media, assets, etc your clients have uploaded (which, I think, is the point Lennart was trying to get across).

Something I hadn't mentioned was that most of the code is on virtual servers with virtual disks (split down in to chunks). So, while the number of files isn't important, it does mean that it only needs one to change for the entire chunk to be backed up, and it also means that duplicate files take up space, as do all the node_modules folders etc.

It's not an ideal situation, and one that has been on a to do list for some time to sort out (as well as archive old and ex customer sites). As with everything, time is the problem, as there tends to be customer work to do before internal housekeeping. 

1 hour ago, Nigel Smith said:

Hope some of that ramble helps...

It has - thank you. I am grateful to everyone who has contributed to this thread, as I now have a better understanding of our system, a good idea what went wrong, and it's made me think about what we ought to do to make things more efficient. I've also been given a helpful warning that we don't have as much space as I thought.

Link to comment
Share on other sites

48 minutes ago, Adam Ainsworth said:

It's a Neoseries of some description

Probably a StorageLoader, then. <https://www.overlandstorage.com/products/tape-libraries-and-autoloaders/neos-storageloader.aspx#Overview>

 

49 minutes ago, Adam Ainsworth said:

The hours of use is set to zero, and there is a light on the front, so I guess it has always been done manually. I'm not sure about messing around with the config right now as I have a few deadlines and Easter approaching, so I don't want to rock the boat too much. The same goes with getting it on the network - I don't know how practical that would be or whether it would be worth it.

Yes, that'll be manual cleaning. I understand you don't want to mess around too much before the Bank Holiday weekend but, when you're back, these are both easy jobs to do. The first is just spitting out the cleaning tape to the mail slot and removing it, cancelling the "cleaning slot" designation on the library slot, adding a new member to each tape set -- all done through the Retrospect GUI. The second would be setting up static-IP settings as per your network via the StorageLoader's front panel (pp48 of <https://www.overlandstorage.com/pdfs/support/NEO-S-Series-UG-ENG.pdf>, plugging in an ethernet cable, logging in with your web browser then resetting password etc. You can then use the Web GUI to do the important things like check/update the library firmware etc.

One word of warning (coz this always catches me out) -- pressing front panel buttons often (always?) rips control from Retrospect but, when you finish, RS (or the Mac OS SCSI routines) can't always gracefully resume operations without a restart. So pick an idle time when, if necessary, you can restart the backup server 😉

 

1 hour ago, Adam Ainsworth said:

Something I hadn't mentioned was that most of the code is on virtual servers with virtual disks (split down in to chunks). So, while the number of files isn't important, it does mean that it only needs one to change for the entire chunk to be backed up, and it also means that duplicate files take up space, as do all the node_modules folders etc.

Ooof! I excluded VHDs from our backups a l-o-n-g time ago, for that very reason. Not a practical solution for you though! And it may be worse than it first looks -- depending on how those VHDs are set up, every previous version of every edited file may still be on the VHD, hidden from a "file explorer" but still present in part or whole on the disk just as they are on a "real" hard drive. While you'll still get compression that might be countered by the large amounts of "unseen" cruft...

Others (David? Lennart?) may be able to chip in how how best to manage this -- I've never used block level incremental backups <https://www.retrospect.com/uk/documentation/user_guide/mac/block_level_incremental_backup> or Retrospect Virtual <https://www.retrospect.com/uk/products/virtual>, which are the two quick fixes that leap to mind.

Link to comment
Share on other sites

Adam Ainsworth and Nigel Smith,

I've never used block-level incremental backups, which are discussed on pages 206-210 of the Retrospect Mac 14 User's Guide.  "Options" on page 208 says "With block level incremental backup enabled, files 100 MB or larger will be backed up incrementally by default. Smaller files will automatically be backed up in full because restore overhead outweighs the benefits of incremental backup. "  My only database (I'm a home user with a tiny business) is less than 6MB.

R.V., which I have been forbidden by the head of Retrospect Tech Support from discussing on these Forums, only runs on Windows machines—check the System Requirements.  That worthy (or one of his subordinates) also told me R.V. doesn't even have the concept of a Client.  Think of it as a cheaper competitor to Veeam.  Then instead buy 4 more blank tapes, Adam, as Nigel suggested—IMHO it'd be cheaper and less work to set up .

Link to comment
Share on other sites

It's been a week since the problem, and this backup set has been absolutely fine, so I'm hoping it was just a blip.

I'm going to see if I can reduce the size of our backups as it appears that we're within 2Tb of running our out of space on three tapes, and that will be by far the easiest course of immediate action.

I need to look at our VM solutions as a whole, and not just the backup side of things, so that will have to wait for the time being. The machines between them only take up around 15% of the backup size (HD video and massive PSDs seem to be the main culprit).

Thanks again to everyone for advice, I certainly know a great deal more than I did this time last week!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...