Jump to content
Don Lee

Nasty bug - scheduled "Copy media set" only copies fraction of data

Recommended Posts

Thank you for your detailed response.

 

In answer to your question, yes. My issue is a media set copy that transfers too little and says it's successful. As you can see above, it only seems to happen when I'm "not looking", and only when the destination and/or source is set to recycle.

 

I will try to get some good tests set up for my November rotation. (while ensuring that I don't lose my data!) Since this seems to be nearly impossible to reproduce manually, it is good that Nov 1 is soon. If everything works, it will be until Dec or Jan 1, with multiple "cycles" on my end before I can say with any confidence that it's fixed.

 

I'll report back. Thank you for your help.

Share this post


Link to post
Share on other sites

It's back. I attach the relevant log file (so the forum doesn't mess up the formatting)

 

This is with Retro 11.5.0.139.

 

It was a copy of a 160 GB set that resulted in only 21 GB being copied. It's not often, but it apparently still happens.

 

Note the recycle of the target set.

 

Will 11.5.1 fix it?

log.txt

Share this post


Link to post
Share on other sites

I went through my logs today, and it's definitely back. I have 3 more logs that show the problem. All have a "recycle if successful" option enabled. I can provide logs and more information if necessary.

 

"copy media set" when the "recycle when successful" option is set often results in stunted target media set. Let me help you reproduce this so it can be fixed.

Share this post


Link to post
Share on other sites

I just upgraded to retro 11.5.3 (103). I have tried several media set copies, with and without an initial recycle of the target media set.

 

Good news - the media set seems to copy the data reliably.

 

More good news - there seems to be no variation in behavior from run to run, and media set to media set.

 

Even more good news - it's fast!

 

The slightly bad news - The "verify" pass is speed-of-light fast. Dig the log below. Given that it takes *zero* seconds, I'm betting that the verify is not being done. I get this "fast verify" *only* when I do an initial recycle on the target media set. Note also that the performance of the script is *not* reported when the verify is "fast".

 

+ Executing acopy 3 at 12/24/14 9:35:12 PM (Activity Thread 2)

To Backup Set atrash3-3...

12/24/14 9:35:13 PM: Recycle backup: The Backup Set was reset

- 12/24/14 9:35:13 PM: Transferring from v9_daily_eims

12/24/14 10:08:34 PM: Execution completed successfully

Completed: 6,832 files, 108.9 GB

Performance: 3,441.2 MB/minute

Duration: 00:33:21 (00:00:55 idle/loading/preparing)

 

- 12/24/14 10:08:34 PM: Verifying atrash3-3

12/24/14 10:08:34 PM: Execution completed successfully

 

12/24/14 10:08:34 PM: Script "acopy 3" completed successfully

 

 

Not perfect yet, but this is much, much better. Thanks.

Share this post


Link to post
Share on other sites

Good to hear that the feature is now working for you, but the issue we fixed was introduced in v11.0. You first saw the issue on v10.1. We still haven't been able to reproduce this problem internally, so you still might encounter it again.

 

In terms of the zero-second verification pass, we're looking into that.

Share this post


Link to post
Share on other sites

I'm rolling this out carefully and gradually with my environments. I'll watch for it, and if I reproduce it, I'll take notes and report back.

 

Happy new year!

Share this post


Link to post
Share on other sites

I'm sorry to report that this is still broken in v 11.5.3

 

In last night's rotations, I had set up a fairly elaborate set of scripts to "catch" this problem without losing any of my data. The failure appeared in one of the rotations (so far - I have not yet checked them all)

 

The copy op that I want is from v9_daily_EIMS to v9_daily_EIMS_last. I want the copy to happen, and if successful, recycle the v9_daily_EIMS set.

 

What I have found is that if I check the "recycle source if successful" option, it will do a "short" copy, recycle the source, and report no error.

 

What I did to catch this is to set up an extra copy, where I have a scheduled script that simply recycles a "temp" destination, then another script that copies v9_daily_EIMS to the temp set. I then set the "real" rotation script to do the copy to "last", and recycle v9_daily_EIMS. Below is the log from the copy to "temp", and also from the "real" (failed) copy. It should have copied some 9000 files, but only copied 380.

 

This is with retro v 11.5.3. I hope this can be fixed. I have been working around it, but it's a pain to have to do the extra scripts and checking to make sure the rotation has not destroyed my backups.

 

First the "good" copy. (note that there is no other activity on these sets between these scripts, though there is lots of activity on other sets)

 

+ Executing daily_EIMS_rotate_tmp_copy at 3/1/15 3:40:09 AM (Activity Thread 2)

To Backup Set v9_daily_EIMS_tmp...
- 3/1/15 3:40:10 AM: Transferring from v9_daily_eims
3/1/15 4:59:35 AM: Execution completed successfully
Completed: 9,118 files, 169 GB
Performance: 2,268 MB/minute
Duration: 01:19:25 (00:03:06 idle/loading/preparing)

- 3/1/15 4:59:35 AM: Verifying v9_daily_EIMS_tmp
3/1/15 5:30:41 AM: Execution completed successfully
Completed: 9,118 files, 169 GB
Performance: 5,856.1 MB/minute
Duration: 00:31:06 (00:01:33 idle/loading/preparing)

3/1/15 5:30:41 AM: Script "daily_EIMS_rotate_tmp_copy" completed successfully

 

 

Now the "failed" (short) copy op:

 

+ Executing daily_EIMS_rotate at 3/1/15 6:09:37 AM (Activity Thread 2)
To Backup Set v9_daily_EIMS_last...
3/1/15 6:09:39 AM: Recycle backup: The Backup Set was reset
- 3/1/15 6:09:40 AM: Transferring from v9_daily_eims
3/1/15 6:12:55 AM: Execution completed successfully
Completed: 385 files, 9.3 GB
Performance: 3,014.8 MB/minute
Duration: 00:03:15 (00:00:04 idle/loading/preparing)

- 3/1/15 6:12:55 AM: Verifying v9_daily_EIMS_last
3/1/15 6:12:55 AM: Execution completed successfully

! 3/1/15 6:12:55 AM: Recycle of Media Set v9_daily_eims
3/1/15 6:13:03 AM: v9_daily_eims was recycled

3/1/15 6:13:03 AM: Script "daily_EIMS_rotate" completed successfully

 

 

A little speculation on my part: Where does Retro come up with the incorrect "385" file count? It so happens that the most recent two backups on the set have file counts that add up to 385 (screen shot enclosed)

 

Thanks for your help,

 

 

post-24400-0-95995700-1425229753_thumb.png

Share this post


Link to post
Share on other sites

Another one failed, too. This is another of the few I set up with "test" settings to recycle the destination set, and recycle the source if successful. I include screen shots of the "bad" and "good" versions. "Good" is the "temp" copy I did without the recycles, and the "bad" is the result of the scheduled copy WITH the recycles, of the same source media set. Nothing touched the set between the two copies.

 

 

post-24400-0-15045200-1425365538_thumb.png

post-24400-0-87178300-1425365549_thumb.png

Share this post


Link to post
Share on other sites

Unfortunately, we still haven't reproduced this internally, so it's unclear. I would hold off on testing again until we can work with you to reproduce it.

Share this post


Link to post
Share on other sites

I will likely move to v12. When I do, I'll keep trying it in a cautious way. With luck, I can figure out exactly what triggers it so it can be fixed.

 

I see the "secret prefs" are available. I might be able to use those to ferret out some useful details.

 

For that mater, I'm already set up in v 11.5.3. I'll keep futzing with that, too. I can't reliably reproduce it either. Every time I try to develop a "test case" it works flawlessly. Frustrating.

Share this post


Link to post
Share on other sites

My media set "rotations" will happen tomorrow night. The way I have it set up, there are two types of rotations. The ones that work reliably, and the ones that display the bug described above.  The scheduled script sequence that (seems to) run reliably looks like this:

 

    reset BKSET_last

    copy BKSET -> BKSET_last (NO recycle options)

    (check that copy worked manually)

    (if OK, reset BKSET manually. If not, manually re-do copy)

 

The scheduled script sequence that shows the stunted copies is below. The step that shows the stunted copy is the media set copy where the "recycle source" and/or "recycle destination" option is used.

 

This is all production/customer data, so I have to be careful with experiments. To "try" this, safely, The following script squence is run on the 1st of each month:

 

    reset BKSET_last_tmp

    copy BKSET -> BKSET_last_tmp

    copy BKSET -> BKSET_last (with dest_recycle option, and source_recycle_on_success option - this is the op that fails)

    (check for size of BKSET_last - if bad, do manual copy of BKSET_last_tmp -> BKSET_last)

 

I don't think I have ever seen the "recycle source on success" fail. The source is always reset.

 

QUESTION: Is there any debugging that I can turn on in "secret prefs" to help figure out what goes wrong? History tells me that I will see at least one instance of this problem tomorrow night, and it would be a great blessing to get this fixed.

Share this post


Link to post
Share on other sites

Yes, you should switch "Log Engine messages" to 6 via Secret Preferences. Let us know how it goes. If there is something happens, please send us the log via a support ticket.

  • Like 1

Share this post


Link to post
Share on other sites

Is this issue addressed by the bugfix in 12.0.1?

 

"Fix for snapshot transfer failing to transfer all necessary BLIB data under certain conditions (#5296)"

Share this post


Link to post
Share on other sites

I hit this again on the Aug 1 rotation with v 12.0.2 (116) One of the three rotations failed. It was a "disk" to "file" copy op.

 

This month I ran an experiment. Those 3 copy ops all had the "recycle target" option, but the "recycle source on completion" was *disabled*.

 

September 1, I will try with a separate op to recycle the target before the copy starts, but turn on the "recycle source on success *enabled*, and see what happens.

 

Given that I have quite a bit of experience doing these media set copies with the separate recycle op preceding the copy, and consider those copies completely reliable, it looks like the "recycle target" is the culprit. My Sept 1 rotations should confirm (and on Oct 1 as well, if it all runs without error)

Share this post


Link to post
Share on other sites

For what it's worth, I have changed my scripts so that none of them uses the "recycle" option at start of the copy. Since this change, I no longer see the error on any of my rotations. This is strong evidence that the bug is not linked to recycling the source at completion time, but linked to the recycle operation at the start of the script.

 

I plan to convert a few more scripts to operate this way - with a separate script doing the recycle op of the target, and enabling the "recycle source on completion" option. If this changed set of scripts continues to run error free, at least the option causing the problem will be clear. It's the "recycle target at start" op that somehow cripples the target of the copy.

Share this post


Link to post
Share on other sites

I can confirm that my testing has run cleanly for the last new months. It appears that the "recycle target" option at start of script causes the problem.

 

This means that I can do what I want by setting up two scripts - one to recycle the target media set, and then another to actually do the copy.

 

Any progress on getting this fixed?

Share this post


Link to post
Share on other sites

Great to hear that the new strategy is running well! We did identify a long-standing issue (#2336 in the Release Notes) relating to how much data was copied for "Copy Media Set" scripts and fixed it in Mac 13.0, but it's unclear whether it is the exact same issue that you were having.

Share this post


Link to post
Share on other sites

Given my difficulty in the past to reproduce this on command, verifying the fix would be non-trivial. What is clear is that if I stay away from the "recycle" option on the target media set, things appear to work.

 

The bad part is that what I want to do is "rotate" my media sets this way. At the end of month, I want to copy the current set to ->> "last" set and recycle the current set. The combination of the "recycle target" option and "recycle source when done" options does just what I want. Unfortunately, as it stands, if I use both of those options, the source gets recycled, and the target (copied) set ends up truncated and useless - sometimes. I lose BOTH sets - randomly. I can't take that risk, so I have been running with multiple scripts to zero out the "last" target, zero out a "safety" target, copy from current to the safety target, copy from current to "last" target (with potentially dangerous options), then go back later to recycle the "current" set, which usually has had at least one backup run on it since the last copy, making comparisons for the copy suboptimal. As it stands, I think I can do my rotations with two scripts, rather than 4, but it would be so much better if I could do it - reliably - with one.

 

Thanks for your help.

Share this post


Link to post
Share on other sites

i need to upgrade to Retro 13 so I can see if this is fixed. I have installed Retro 14-Windows at one of my clients. I may have an opportunity to try it there.

(No good deed goes unpunished - once you get it working reliably, no one ever upgrades again. ;-> )

Share this post


Link to post
Share on other sites

i need to upgrade to Retro 13 so I can see if this is fixed. I have installed Retro 14-Windows at one of my clients. I may have an opportunity to try it there.

(No good deed goes unpunished - once you get it working reliably, no one ever upgrades again. ;-> )

 

 

I'm getting a bit confused as to which Retrospect versions you are referring to.  I suspect you mean you need to upgrade to Retrospect Mac 13—which is one major release behind the latest—in order to see if this is fixed, and that you have installed Retrospect Windows 12—there won't be a Retrospect Windows 14 for two more major releases—at one of your clients.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×