Jump to content


Photo

Snapshot transfer not complete


  • Please log in to reply
2 replies to this topic

#1 speleo14

speleo14

    Newbie

  • Members
  • 9 posts

Posted 19 May 2017 - 08:13 AM

Hi all

 

I am using Retrospect 14 to back up our servers with a disk-to-disk-to-tape strategy: I back up to disk and then use a "Copy Backup" script to copy the disk media set to tape.

 

Recently, I did a full backup of all my servers using new media sets both on disk and on tape. The backups to disk went fine, but when I copied the data to tape, the snapshot transfer of one server was left half-finished, that is, only 2 TB of 4.9 TB on the disk were transferred to tape. No errors. It just stopped, and the remaining 2.9 TB were never copied afterwards

Here's the log entry of the snapshot transfer:

 

  snapshot transfers: ndex/typeItemCount/srcCount=0/1/9,mdex/snapCount=5/9

     RAID_GR (14/04/17 11:56:09) on Client "ieu-fsgr"
    -  21/04/17 10:17:57: Verifying IEU-ServerT-F
    -  21/04/17 14:47:16: Transferring from IEU-ServerD-EF
    21/04/17 14:59:18: Execution completed successfully
    Remaining: 1'188'488 files, 2.9 TB
    Completed: 1'062'673 files, 2 TB
    Performance: 6'670.8 MB/minute
    Duration: 06:09:12 (00:50:10 idle/loading/preparing
 
Now, when I try to restore "RAID_GR" from the tape media set "IEU-ServerT-F", Retrospect will throw endless "bad media set header found" errors. Restoring from the disk media set (that I still have) completes successfully without any error. Restoring a different server from the tape media "IEU-ServerT-F" set also works fine.
 
What is more, I always copy my disk backup sets to 2 different tape media sets. And it's the same for both tape media sets, the snapshot of "RAID_GR" is not completely transferred. However, a bit more data is copied to this set (2.3TB compared to 2TB on the other set). Here's the log entry for the other tape media set:
 
   snapshot transfers: ndex/typeItemCount/srcCount=0/1/9,mdex/snapCount=5/9
     RAID_GR (14/04/17 11:56:09) on Client "ieu-fsgr"
    -  21/04/17 05:36:17: Verifying IEU-ServerT-E
    -  21/04/17 11:56:48: Transferring from IEU-ServerD-EF
    21/04/17 12:09:18: Execution completed successfully
    Remaining: 1'188'417 files, 2.6 TB
    Completed: 1'062'744 files, 2.3 TB
    Performance: 6'813.7 MB/minute
    Duration: 07:06:27 (01:09:27 idle/loading/preparing)
 
I went through the whole log file to check if the other half of the data is copied later on, but no. Until today, it only ever copies small amounts of data, like I would expect with an incremental backup, so it actually really just never copied around 2 TB of data to tape!
 
How can that happen? And what can I do? Should I fully back up "RAID_GR" to disk and then to tape again? But how can I do this without backing up all other servers again, too? I have about 30TB of data on my servers and doing a full backup all over again is not really what I want to do...
I guess if I just create a new tape media set and copy the whole disk set to it, it will again leave out have the files of that particular snapshot...
Or do a verify? Will this copy files that fail verification? If half of the files are simply not there, will "Verify" notice that?
 
Any help is appreciated!
 
 
 
 
 


#2 twickland

twickland

    Retrospect Guru

  • Members
  • 1,503 posts

Posted 22 May 2017 - 08:52 PM

What selection parameters are you using in your Copy Backups script? Is there any possibility these may have changed recently?

 

Do you obtain any different results if you try running a Copy Media Set script rather than a Copy Backups script?

 

If you do a verify, I'd select the option "Verify entire media set." In my experience, a verify operation either does not recognize bad media set headers or it ignores them, as they are not noted in the log.

 

My biggest concern is the bad media set errors you're encountering with the tape copies. That suggests something subtle may be amiss with the disk media set, since you are apparently able to restore successfully from that set. If the above suggestions lead nowhere, I'd be inclined to open a support case with Retrospect.


Tim
________________________________
Retrospect 13.5.0 (173)
Mac Pro 3.7 GHz Quad-Core Xeon E5
16 GB RAM, OS 10.10.5
ATTO ThunderLink SH 2068 SAS HBA


#3 speleo14

speleo14

    Newbie

  • Members
  • 9 posts

Posted 23 May 2017 - 09:06 AM



What selection parameters are you using in your Copy Backups script? Is there any possibility these may have changed recently?

 

Do you obtain any different results if you try running a Copy Media Set script rather than a Copy Backups script?

 

If you do a verify, I'd select the option "Verify entire media set." In my experience, a verify operation either does not recognize bad media set headers or it ignores them, as they are not noted in the log.

 

My biggest concern is the bad media set errors you're encountering with the tape copies. That suggests something subtle may be amiss with the disk media set, since you are apparently able to restore successfully from that set. If the above suggestions lead nowhere, I'd be inclined to open a support case with Retrospect.

 

 

I use no selection criteria, just the "All Files" rule.

 

I haven't tried a Copy Media set...

 

I have done a Verify on one of the tape media sets - it completed with no errors. This didn't really surprise me - files that are not there will not be verified, and obviously it didn't miss them, just as it didn't miss them when it failed to copy them.

 

Meanwhile, I "tricked" the disk backup script into backing up the "RAID_GR" volume again by adding a new ACL to the RAID and then selecting "Use attribute modification date when matching" on the script. Then, I did again a "Copy Backup" to tape, and this time, it finished the whole 4.9TB. So I agree, something must have been wrong with the disk media set.

Actually, there were indeed errors when backing up to the disk set... however, after these errors, the script finished successfully the copy process and started comparing (which I stopped, thus it says it only completed 171 files, but that was the Compare):

 

   -  06/04/17 17:43:26: Copying RAID_GR on ieu-fsgr

    06/04/17 17:50:00: Found: 2255083 files, 86865 folders, 4.8 TB
    06/04/17 17:50:04: Finished matching
    06/04/17 17:50:18: Copying: 2255083 files (4.8 TB) and 0 hard links
    *File "RAID_GR/.....": can't read, error -1'101 (file/directory not found)
 
    [some file not found errors, but only a few files and not the ones missing later. Probably deleted by the user between the matching and the actual copy]
 
    [*] xopFlush: flush failed, error -102 (trouble communicating)
    [*] xopFlush: flush failed, error -102 (trouble communicating)
    !Trouble writing: "1-IEU-ServerD-EF" (498987008), error -102 (trouble communicating)
    !Trouble writing media:
  "1-IEU-ServerD-EF"
error -102 (trouble communicating)
 
 
    07/04/17 15:38:12: Building Snapshot...
    07/04/17 15:38:12: Checking 86'865 folders for ACLs or extended attributes
    07/04/17 15:40:44: Finished copying 86'860 folders with ACLs or extended attributes
    07/04/17 15:40:54: Copying Snapshot: 2 files (705.4 MB)
    07/04/17 15:41:01: Snapshot stored, 705.4 MB
    07/04/17 15:41:01: Comparing RAID_GR on ieu-fsgr
    07/04/17 15:41:20: Execution stopped by operator
    Remaining: 2'254'895 files, 4.8 TB
    Completed: 171 files, 447.3 MB
    Performance: 4'088.4 MB/minute (4'089 copy, 1'490.8 compare)
    Duration: 21:57:52 (01:20:54 idle/loading/preparing)
 
    07/04/17 15:41:22: Execution stopped by operator
    Total performance: 4'094.2 MB/minute
    Total duration: 22:03:33 (01:21:43 idle/loading/preparing)
    [*] NetConnTop::PreDispose: m_deferredPackets.MaxCount=15944
 

Still strange that a restore from the disk backup worked OK, but not the Copy Backup script.

My problem now is simply - how can I ever again trust the backups copied to tape - and is there anything else missing on the tapes? And if, how do I find out? There were no errors on the Copy Backup script, I just noticed by sheer chance that more than 2TB of data were left out (I analyzed the log because I wanted to find out about performance). 

 

Doing a full test restore from tape of all my servers at the moment, but still...






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users