Don Lee Posted June 3, 2013 Report Share Posted June 3, 2013 I have seen this before, but I thought it was pilot error. Now that Retro 10 does not allow me to "accidentally" change my scripts, I am now certain that this bug is real, and it's nasty. I have have scripts that "rotate" the backups on the first of the month. I have the options set so that the media set X is copied to X_last, recycliing X_last first, and recycling X if the copy is successful. When I do this manually, it seems to work fine. (Doing this I can "rotate" my media sets without changing the scripts.) This last June 1st, these scripts ran, and my 50 GB media sets produced X_last copies that are only 200 MBytes!!! I don't know exactly what it's copying, but it's only copying a small fraction of the data I wanted copied. Note that I am copying from a disk media set to a file media set. Retro is throwing my data away. Is this a known bug? Is there a workaround? Do I have to run these rotations manually to be safe? Retro 10.1.0 (221) engine on Mac OS X 10.8.3 - Mac Mini with 16 GB memory 1 Quote Link to comment Share on other sites More sharing options...
Don Lee Posted June 3, 2013 Author Report Share Posted June 3, 2013 FYI - the log from the operation that "copied" the media set, turning 50 GB into 300 MBytes + Executing Monthly_iCompute_rotate at 6/1/13 (Activity Thread 2) To Backup Set v9_iCompute_last... 6/1/13 12:01:01 AM: Recycle backup: The Backup Set was reset - 6/1/13 12:01:05 AM: Transferring from v9_iCompute 6/1/13 12:01:19 AM: Execution completed successfully Completed: 1179 files, 297 MB Performance: 1,370.7 MB/minute Duration: 00:00:14 (00:00:01 idle/loading/preparing) - 6/1/13 12:01:19 AM: Verifying v9_iCompute_last 6/1/13 12:01:19 AM: Execution completed successfully ! 6/1/13: Recycle of Media Set v9_iCompute Quote Link to comment Share on other sites More sharing options...
Don Lee Posted July 1, 2013 Author Report Share Posted July 1, 2013 With my July "rotation", this happened again. My rotation from current sets to the "previous" sets have produced stunted backup sets. This is a serious problem. My current sets appear reliable, but if I use the "copy media set" function in a script ,it randomly fails without ANY indication of failure, unless you look at the output and check it. Logs are clean. It's just that the size of the output is impossibly small, and the number of files copied is impossibly small. The worst part is that the operation that triggers the lost data is the "recycle source on successful copy". If I leave that unchecked, the copy seems to be OK. This means that if I use that option, I lose BOTH my current, and my previous months' backups! Not using that option simply means I have to go in and recycle the current set(s) manually. I am surprised that no one has commented on this. Is no one else copying media sets? How do others "rotate" their sets so that they don't have to screw around with their scripts to re-connect to the new media sets? Quote Link to comment Share on other sites More sharing options...
Don Lee Posted July 1, 2013 Author Report Share Posted July 1, 2013 I just tried a restore from one of my "stunted" copies, and they restore - silently - just a fraction of the snapshot volume. No errors. No complaints, just lots and lots of missing files. There is no indication at all that anything is amiss, even though it only restores 4 out of 50 GB of data expected. ??????? Quote Link to comment Share on other sites More sharing options...
Don Lee Posted September 2, 2013 Author Report Share Posted September 2, 2013 I have another data point. I ran my monthly automated "rotation" on Sep 1, and some of the media set copies came out stunted, as before. The difference is that I am NOT using the "recycle source on success" option. I'm sure. There is something more sinister going on. I don't *think* I've ever seen this except when the operation was scheduled, FWIW. Quote Link to comment Share on other sites More sharing options...
Don Lee Posted October 1, 2013 Author Report Share Posted October 1, 2013 October rotations have started, and I'm seeing stunted media set copies again. I enclose 2 screen shots - of the source media set, and the destination set. The latter is a small fraction of the former, and only has about half the snapshots (backups) as seen in the console.These always seem to work when I run them "manually". Only when scripted and scheduled do they screw up. Quote Link to comment Share on other sites More sharing options...
Don Lee Posted October 1, 2013 Author Report Share Posted October 1, 2013 I'm walking through my monthly rotations, and have noticed that most f the media set copy operations say they copied a LOT more data then they verified. This one copied 56 GB, and the verify operation took ** 4 seconds**. I don't think it did much. The log: + Executing rotate_Skinnie at 10/1/13 (Activity Thread 1)To Backup Set Skinnie_last...10/1/13 2:47:55 AM: Recycle backup: The Backup Set was reset- 10/1/13 2:47:58 AM: Transferring from Skinnie10/1/13 4:44:23 AM: Execution completed successfullyCompleted: 594740 files, 56 GBPerformance: 500.1 MB/minuteDuration: 01:56:24 (00:01:42 idle/loading/preparing)- 10/1/13 4:44:23 AM: Verifying Skinnie_last10/1/13 4:44:27 AM: Execution completed successfullyDuration: 00:00:04 I'm thinking that this must be some variation of the "verify whole media set" option on the verify script. The verify of the copy must be deciding for some reason it only needs to verify part of the copied media set. There is no option on the "media set copy" script to "verify all". What's up with this? It makes me very nervous about my resulting media set copies. Quote Link to comment Share on other sites More sharing options...
Don Lee Posted November 1, 2013 Author Report Share Posted November 1, 2013 I made a change in my Nov 1 rotations. I broke the rotate script into two parts. Part 1 simply recycles the target media set. Part 2 does the media set copy. This works reliably. I draw a conclusion from this. The option to "recycle destination media set" at initiation of copy op does not work properly. When I use it, it screws up the destination set, and results in a stunted copy of the media set. If I do the recycle as a separate op, then everything works. Quote Link to comment Share on other sites More sharing options...
bdunagan Posted November 18, 2013 Report Share Posted November 18, 2013 Sorry for not replying to this thread sooner. We've tried to reproduce this issue a number of times, but we haven't been able to. Could you contact our support team to work with them on it? Please reference issue 4131 and highlight the Nov 1 script change. Quote Link to comment Share on other sites More sharing options...
Don Lee Posted November 22, 2013 Author Report Share Posted November 22, 2013 FWIW, I have contacted support, and re-run some tests on 10.5. So far (so far....) I cannot reproduce the problem with 10.5. I would normally insist on reproducing the problem on the last version that exhibited the problem, but I only have one machine that can run the server, and it's a production machine, so I won't be doing that. I'll keep trying, but it looks like it might be fixed. Quote Link to comment Share on other sites More sharing options...
bdunagan Posted November 26, 2013 Report Share Posted November 26, 2013 That's excellent news. I spoke with support, and we can't reproduce it on the latest release either. Let me know if the issue comes up again. Quote Link to comment Share on other sites More sharing options...
Don Lee Posted November 26, 2013 Author Report Share Posted November 26, 2013 May I ask if you know what was fixed that made it work? That's just to boost my warm fuzzies that it's really fixed, and not the result of lucky intermittency. When I hit this, I lose both the current and the last month's data, so it's a big deal for me. I plan to remove my workaround on Dec 1. Quote Link to comment Share on other sites More sharing options...
Don Lee Posted January 1, 2014 Author Report Share Posted January 1, 2014 I changed my scripts at my customer sites, and this bug bit me again. Here is one log file from a media set with 75 GB of data and 650,000 files. Note the file count, the data xferred, and the speed of the "verify". + Executing rotate_Skinnie at 1/1/14To Backup Set Skinnie_last...1/1/14 1:16:22 AM: Recycle backup: The Backup Set was reset- 1/1/14 1:16:25 AM: Transferring from Skinnie1/1/14 1:34:33 AM: Execution completed successfullyCompleted: 77304 files, 14.8 GBPerformance: 859.3 MB/minuteDuration: 00:18:07 (00:00:31 idle/loading/preparing)- 1/1/14 1:34:33 AM: Verifying Skinnie_last1/1/14 1:34:33 AM: Execution completed successfully The big difference here is that the engine at this site is running on a 10.6.8 Mac Mini. (not the 10.8 OS that I run in production) Quote Link to comment Share on other sites More sharing options...
Don Lee Posted January 1, 2014 Author Report Share Posted January 1, 2014 I re-ran the "rotate" scripts manually, but they may still fall short. The media set report shows that "skinnie" has 649,522 files, but the copy media set operation copies649,299 files, without any indication of error. Where are the remaining 223 files? When I re-ran the script by manually recycliing "skinnie_last" and then running the "copy media set" script. This makes me nervous. Log below: + Executing rotate_Skinnie at 1/1/14 (Activity Thread 1)To Backup Set Skinnie_last...- 1/1/14 4:12:22 AM: Transferring from Skinnie1/1/14 5:37:42 AM: Execution completed successfullyCompleted: 649299 files, 66.7 GBPerformance: 818.9 MB/minuteDuration: 01:25:19 (00:01:54 idle/loading/preparing)- 1/1/14 5:37:42 AM: Verifying Skinnie_last1/1/14 6:07:00 AM: Execution completed successfullyCompleted: 649299 files, 66.7 GBPerformance: 2,405.4 MB/minuteDuration: 00:29:18 (00:00:53 idle/loading/preparing) Quote Link to comment Share on other sites More sharing options...
Don Lee Posted January 2, 2014 Author Report Share Posted January 2, 2014 Note that I seem to have one installation at a customer site - Retro 9.0.2.107 (engine) that seems to work fine. The bad behavior appears to have been introduced in the 10.x versions. Quote Link to comment Share on other sites More sharing options...
Don Lee Posted January 5, 2014 Author Report Share Posted January 5, 2014 by the way.... the customer running 9.x retro is also running Mac OS X 10.5.... Quote Link to comment Share on other sites More sharing options...
bdunagan Posted January 6, 2014 Report Share Posted January 6, 2014 Thanks for following up on this. Sorry you encountered the issue again. We're still trying to reproduce it. 1 Quote Link to comment Share on other sites More sharing options...
Don Lee Posted January 6, 2014 Author Report Share Posted January 6, 2014 Let me know if there is any sort of extra debug logging that I can enable to gather info to help track this down..... Quote Link to comment Share on other sites More sharing options...
Don Lee Posted February 2, 2014 Author Report Share Posted February 2, 2014 I continue to monitor this, and in the latest 2/1/2014 rotations, I saw this: + Executing daily_mercy_nfs_rotate_zero at 2/1/14 (Activity Thread 2)To Backup Set v9_daily_mercy_nfs_last_tmp...2/1/14 3:34:53 PM: Recycle backup: The Backup Set was reset- 2/1/14 3:34:53 PM: Transferring from v9_daily_mercy_nfs_last2/1/14 3:39:58 PM: Execution completed successfullyCompleted: 22938 files, 10.1 GBPerformance: 2,047.7 MB/minuteDuration: 00:05:05- 2/1/14 3:39:58 PM: Verifying v9_daily_mercy_nfs_last_tmp2/1/14 3:40:31 PM: Execution completed successfullyCompleted: 20891 files, 3.1 GBPerformance: 5,900 MB/minuteDuration: 00:00:32 Whether the data is right or not, having the mismatch between the data copied and data verified causes me concern. Quote Link to comment Share on other sites More sharing options...
Don Lee Posted March 2, 2014 Author Report Share Posted March 2, 2014 Bad news. It's still busted in 10.5. My log from the Mar 1 rotation: + Executing daily_household_rotate at 3/1/14 (Activity Thread 2)To Backup Set v9_daily_household_last...3/1/14 3:02:40 AM: Recycle backup: The Backup Set was reset- 3/1/14 3:02:40 AM: Transferring from v9_daily_household3/1/14 3:03:17 AM: Execution completed successfullyCompleted: 564 files, 247 MBPerformance: 529.2 MB/minuteDuration: 00:00:37 (00:00:09 idle/loading/preparing)- 3/1/14 3:03:17 AM: Verifying v9_daily_household_last3/1/14 3:03:17 AM: Execution completed successfully! 3/1/14: Recycle of Media Set v9_daily_household Unfortunately, the a single "full" backup of this source is 789 MBytes, and 1496 files. The copy should have shown something like 5000 files and 900 MBytes. Judging from the log, the copy is only part of a set. Worse - no error reported. Interestingly, the backups in the set run every day, and there should be a backup for every day from 1 Feb to 28 Feb.. The stunted copy shows only backups from 2/22 to 2/28. I have tried to reproduce this, and it only seems to happen in the middle of the night when running unattended. It's madening, but it appears to be real. It also means that I cannot trust my rotations. I have delay the reset until after I have scanned the logs to ensure that the copies are complete. I might be overconfident that my copies *are* complete, judging just from the counts in the logs. This is just the first instance of this bug I've seen. I have not finished scanning my logs. I may see more...... Quote Link to comment Share on other sites More sharing options...
Don Lee Posted April 1, 2014 Author Report Share Posted April 1, 2014 I see this note in the latest 11.0.1 release: Copy backup script failure with recycle enabled for source and destination (#4557) Are there any additional notes available? For my customers who don't/can't upgrade, is there any sort of workaround? What set of circumstances, exactly, triggers the problem? 1 Quote Link to comment Share on other sites More sharing options...
Don Lee Posted October 15, 2014 Author Report Share Posted October 15, 2014 Still interested in more detail on this fix. I'd like to use the feature, but don't dare until I either have that detail, or do my own debugging and testing. That took months for me in the past, only to get bit again. Can I get confirmation/info on the fix? Quote Link to comment Share on other sites More sharing options...
bdunagan Posted October 26, 2014 Report Share Posted October 26, 2014 Sorry again for not following up sooner. The bug fix you mentioned from the release notes applies to a "Copy Backup" script that recycles both the destination (first) and then the source (last) sets--different from your scenario. We haven't been able to reproduce your issue, but it's on our bug list. What is your current workaround if you don't use "Copy Backup" or "Copy Media Set" scripts? 1 Quote Link to comment Share on other sites More sharing options...
Don Lee Posted October 27, 2014 Author Report Share Posted October 27, 2014 I think this is exactly my scenario. I want to copy the "current" set to "last", recycling "last" at the start, and recycling "current" ONLY if the copy media set succeeds. I will try it and report back if I see more troubles. Thanks, Quote Link to comment Share on other sites More sharing options...
bdunagan Posted October 28, 2014 Report Share Posted October 28, 2014 I see your comment from 2013 about the "Recycle source Media Set after successful copy", but issue #4557 referenced in the release notes was introduced in the v11.0.0 release in March (and fixed in v11.0.1). That said, we found and fixed other data scenarios similar to #4557 as well, so v11.0.1 and v11.5 might fix your issue. Without being able to reproduce your issue, we can't say for certain. From the logs in this thread, it looks like you've seen several different bugs: * "Verify" pass is sometimes too fast: We'll look into this. * "Recycle destination" option might have bugs: We'll look into this, but we tested this a number of times while trying to reproduce what you're seeing, along with our normal testing. We still haven't found a problem with this option. * "Recycle source Media Set after successful copy": The issue you see here is connected to the one above, as it transfers too little and says it's successful. Is this correct? * File counts being off by small numbers: This might be due to how we count hard links. We'll look into this. Let me know if I missed something. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.