Proactive job scheduling

Jan Löwe · September 24, 2020

Hi,

Just went back to Retrospect after many years, to backup about 30 clients. The new ProactiveAI is great because it makes good use of the capabilities of the backup server machine and its fast network connection by allowing parallel backups from much slower clients.

Now, when looking at what is happening, the scheduler seems to make a number of jobs active and then waits until ALL those jobs are finished, before starting another batch. That wastes a lot of time and resource and has the effect that if one client takes days to finish, everything else is held back. If doing a complete backup once in a while, it takes many days for all clients to be backed up for the first time (while others get incrementally backed up in the meantime).

To me that's poor behaviour of the scheduler that should make jobs active whenever required AND possible, and should prioritise those clients that have never been backup up to the current set.

Many thanks for your thoughts!

Jan

DavidHertzberg · September 25, 2020

21 hours ago, Jan Löwe said:

Hi,

Just went back to Retrospect after many years, to backup about 30 clients. The new ProactiveAI is great because it makes good use of the capabilities of the backup server machine and its fast network connection by allowing parallel backups from much slower clients.

Now, when looking at what is happening, the scheduler seems to make a number of jobs active and then waits until ALL those jobs are finished, before starting another batch.....

....

....

Jan

Jan Löwe,

First let's get some preliminary questions out of the way:

How long ago did you last use Retrospect? Was it before 2009, when Retrospect Mac was resurrected with a different User Interface from Retrospect Windows after being end-of-lifed?

Are you running Retrospect Windows on your "backup server" machine, or Retrospect Mac? The "backup server" Engines have common code, but the User Interfaces have different terminology and look different. (An old-timer in Sales says a key member of the team creating Retrospect Mac 8 was very bossy, and insisted that the old terminology and Graphic UI had to go. Mostly because Retrospect Mac 8—having been developed in a management-imposed rush—was very buggy and incompatible with older machines and existing backups, it got such a bad industry reputation that the Retrospect team decided not to make the corresponding changes in the Retrospect Windows terminology and UI.) Because you posted in the Windows—Professional ("Professional" as distinct from "Express"—a simplified Retrospect application for Windows that is no longer sold) Forum, I'll assume your "backup server" runs Retrospect Windows and use that terminology—which is a slight problem for me since I use Retrospect Mac.

Precisely what version of Retrospect Windows or Retrospect Mac are you running? You should upgrade ASAP to 17.5.0.x.

When you say "parallel backups", do you mean from different scripts? "jobs" isn't a Retrospect term, and an individual Retrospect script cannot backup multiple "clients" in parallel—unless the script's destination is a Storage Group instead of a Backup Set.

Are you using the Remote Backup feature? It was developed in 2018 for use by a centralized organization with a few employees/contractors working in scattered far-away places, but has been recently hurriedly adopted by centralized organizations that have sent many employees off to Work From Home nearby. The implementation of Remote Backup has a kludge that didn't matter much in 2018; IMHO you shouldn't use it if you don't have to.

DavidHertzberg · September 25, 2020

On 9/24/2020 at 11:48 AM, Jan Löwe said:

Hi,

.... The new ProactiveAI is great because it makes good use of the capabilities of the backup server machine and its fast network connection by allowing parallel backups from much slower clients.

Now, when looking at what is happening, the scheduler seems to make a number of jobs active and then waits until ALL those jobs are finished, before starting another batch. That wastes a lot of time and resource and has the effect that if one client takes days to finish, everything else is held back. If doing a complete backup once in a while, it takes many days for all clients to be backed up for the first time (while others get incrementally backed up in the meantime).

To me that's poor behaviour of the scheduler that should make jobs active whenever required AND possible, and should prioritise those clients that have never been backup up to the current set.

Many thanks for your thoughts!

Jan

Jan Löwe,

Proactive scripts were developed in the late 1990s for AFAIK the following use case: Sally SuperSalesperson visits the central office irregularly on a non-predictable schedule. You as administrator want to do a Normal incremental backup of Sally's laptop whenever she cables it up (or connects via WiFi) at the central office, so that the organization won't lose her valuable data if her laptop is stolen or is destroyed in a car crash. The organization has many female and male SuperSalespersons, each with his/her own schedule for central office visits—schedules which tend not to overlap. It is assumed that Sally's laptop was given a New Backup Set backup via a Backup script after the initial organization software needed for her position was installed, before her boss first handed it to her.

So why are you "doing a complete backup once in a while" via a Proactive script? A Retrospect "backup server" can run a maximum of 16 scripts in parallel "execution units" (page 211 of the Retrospect Windows 17 User's Guide and page 166 of the Retrospect Mac 17 UG still say 8 because they haven't been updated as of November 2012), and that's on a machine that has 20GB or more of RAM. Have each employee with a new "client" machine leave it at the central office overnight, so you can use it as a Source for a scheduled Backup script with the New Backup Set action to its own backup set—which you will then copy via a Transfer Backup Sets operation (pages 132-134 of the Retrospect Windows 17 UG, pages 118-120 of the Retrospect Mac 17 UG for Copy Media Set) to the backup set which will be the destination for its Proactive script with the incremental Normal action .

If you read "Algorithm" step 7 in this Knowledge Base article, you'll see "Sources with faster previous backups will be backed up sooner than sources with slower previous backups." That's a feature of the 2018 "ProactiveAI" decision tree, and it explains why you complain "it takes many days for all clients to be backed up for the first time (while others get incrementally backed up in the meantime)." Here's what to do if you want to attempt to get this feature changed, but I can assure you that the Retrospect engineers went to a lot of trouble to change the late-1990s scheduler algorithm that did have the prioritizing you want. You could try asking for an option to invert algorithm step 7, but who besides you would want it?

BTW, in my simple home installation I use only Backup scripts; I just boot the "backup server" to backup my single "client" machine at or after 3 a.m..

Nigel Smith · September 25, 2020

5 hours ago, DavidHertzberg said:

When you say "parallel backups", do you mean from different scripts? "jobs" isn't a Retrospect term, and an individual Retrospect script cannot backup multiple "clients" in parallel—unless the script's destination is a Storage Group instead of a Backup Set.

To expand on David's comment, because I think he's hit the nail on the head...

You can only run one backup at a time to a Media Set. If you want to parallelise you must have multiple Media Sets and distribute your clients across them. I do this, putting each "departments'" computers into their own Group, making a Disk Media Set for each "department", then making a Proactive script for each with the appropriate Group as Source and Media Set as target.

A Storage Group is, in essence, multiple Media Sets (one per client/volume) in a wrapper -- similar to above but with Retrospect doing the hard work and presenting you with a single UI element to use in your operations. So a single Proactive script can back up up to 16 clients in parallel to a Storage Group.

There are pros and cons to both approaches -- which you should use depends on your situation. You can even use both, for example multiple sets/scripts for local desktops and Storage Groups for Remote clients.

mbennett · September 25, 2020

In addition to what Nigel and David have mentioned above, there is a tunable in Preferences | General that limits the number of executions that can run at once. I think 16 is the default, but if you have the resources you can bump it up.

DavidHertzberg · September 25, 2020

Jan Löwe,

When I finished writing the first of my two up-thread posts, I still thought your problem was in not having enough "execution units" to provide sufficient parallelization of Proactive "client" backups. That accounts for my next-to-last paragraph in the post. However. as I started writing the second of my up-thread posts, I decided your fundamental problem is doing from-scratch backups of "client" machines via Proactive scripts that also do Normal backups of other "clients". Let's consider this in terms of what Nigel Smith said up-thread:

Quote

A Storage Group is, in essence, multiple Media Sets [Retrospect Mac term for Backup Sets] (one per client/volume) in a wrapper -- similar to above but with Retrospect doing the hard work and presenting you with a single UI element to use in your operations. So a single Proactive script can back up up to 16 clients in parallel to a Storage Group.

The maximum—not just the default per mbennett—number of "execution units" that can run simultaneously on one Retrospect "backup server"—with 20GB RAM—is 16 , whether the "execution units" are from explicitly-separate scripts or in-parallel "execution units" implicitly generated by using a Storage Group as the destination for a Proactive script—as Nigel Smith suggests. (Page 325 of the Retrospect Windows 17 User's Guide and page 166 of the Retrospect Mac 17 UG still say 8, but the fix for bug #1366 shown in the cumulative Retrospect Windows Release Notes shows that had been increased to 16—which I've confirmed in my Preferences ->General for Retrospect Mac 16.6—by November 2012.) Your OP says you have to back up about 30 "clients", and when I attended primary school 30 was greater than 16—so there's no way you can back up all your "clients" at once.

As I said in the third paragraph of my second up-thread post, a fundamental part of the ProactiveAI decision tree algorithm is that "Sources with faster previous backups will be backed up sooner than sources with slower previous backups." I don't know whether a Proactive script that has completed one of its incremental "client" backups will release the "execution unit" for that backup while a from-scratch backup using the same Storage Group destination continues. I suspect it will, but you're going to have to test that out with a couple of Proactive script runs that cover all 30 "clients" using the same Storage Group destination. It turns out it should but it won't; see this down-thread post.

If it works, you can have one Storage Group for all 30-odd "clients", and not run—as I suggested in my second up-thread post—separate scripted Backup runs followed by Transfer Backup Sets operations for the from-scratch backups. However be aware that, to avoid a confusing GUI mess, the Retrospect engineers required in 2018 that a Storage Group be defined with its first Member for all component Backup Sets on a single HDD or cloud equivalent. At least one administrator posting on these Forums has reported a problem with the first Member for one component Backup Set apparently prematurely running out of space. It's not yet clear whether one can define a second Member for an entire Storage Group, or whether one can do this for an individual component Backup Set (which isn't yet an option for a Retrospect Mac "backup server", because the engineers—before their new StorCentric masters diverted them to other tasks in Fall 2019—evidently didn't have time to enhance its GUI to enable accessing an individual Media Set—the Retrospect Mac term for Backup Set—component of a Storage Group). If one can't, you may have to Transfer Backup Sets off Storage Group component Backup Sets and re-initialize the components; that would be messier than what I suggested in my second post up-thread.

Nigel Smith's quote also says that a Storage Group creates a separate component Backup Set for each volume attached to each "client" machine—I've verified that. So if some "clients" have more than one volume attached, you'll have more than 30-odd component Backup Sets in your Storage Group.

Edited September 29, 2020 by DavidHertzberg
Revise next-to-last paragraph for clarity, including adding additional sentence. Add sentence to third-from-last paragraph.

Nigel Smith · September 25, 2020

16 hours ago, mbennett said:

In addition to what Nigel...

I think I've actually mis-represented things, although it's the behaviour most people will observe most of the time.

Really it's that "a catalog or media set can only written to by one process at a time". So if you had a single Proactive script targeting two or more media sets (sometimes done so you can rotate sets by alternating which is online) or multiple scripts using the same catalog, only one client could be backed up at a time.

Again, Storage Group catalogs behave like one catalog per client/volume (which is why there's no file level de-dup across clients [or volumes on the same client? I've not checked that]) which are presented as a single catalog for search, UI interaction, etc. Even if the catalog is just one file, internally it's probably a database with a table per client/volume.

On 9/24/2020 at 4:48 PM, Jan Löwe said:

To me that's poor behaviour of the scheduler that should make jobs active whenever required AND possible, and should prioritise those clients that have never been backup up to the current set.

The observed behaviour...

Edit: Oh dear, that'll teach me to keep up with the latest changes! See David's posts and links for the correct description of how ProactiveAI now works. But I'll leave this here, both as a monument to my own stupidity and because of the bit about single script/media set blocking.

...is that the ProactiveAI builds a list of clients to back up, based on the sources in currently active Proactive scripts, ordered by "least recently backed up". It works its way down that list, and starts to back up the first available client. If you only have one Proactive script and that doesn't use a Storage Group, it'll wait for the backup to finish before trying the next client in the list. But if you have multiple Proactive scripts using different media sets or your single script targets a Storage Group, a second process will start and try the next client in the list, and so on.

Clients "bubble down" the list after they are backed up so, with each iteration, the list remains ordered by "least recently backed up".

So it does what you want, if you have multiple (and correctly set up) Proactive scripts or use a single script and Storage Groups. Perhaps the only thing that is missing (at least, I've never observed it) is that if ProactiveAI is half-way down the list it doesn't jump back up to a higher-listed client if it becomes available -- the client will have to wait for the next loop round to get to it.

On 9/24/2020 at 4:48 PM, Jan Löwe said:

If doing a complete backup once in a while...

Complete backups should be a pretty rare event (and, ideally, done over the local network) -- Retrospect is very good at restoring from even a long series of incrementals in one go (unlike some other software where you do a restore, then overlay incremental 1, then incrementals 2, 3, 4...100). If you want to do it as part of "set management", have a look at the various transfer options -- instead of doing a (slow) complete backup you can copy the latest snapshot and data from old set to new, then start doing incrementals to that new media set.

But, in these days of remote working, getting a first complete backup can be a problem. You could use a separate script/media set for that and then transfer that backup to your "working" media set, so you didn't block other clients for the duration. Or you could use the same media set but a different script with an "only backup data newer than..." filter which you steadily roll back over a couple of weeks -- so you always back up the latest data and, after a few days, you'll have the rest too, with much-reduced impact on your other clients.

Retrospect is very flexible, with plenty of options from which you can chose what will work best for your situation.

Edited September 26, 2020 by Nigel Smith
My own stupidity

DavidHertzberg · September 26, 2020

Jan Löwe,

This post will be the last of my thoughts on your problem, unless and until we get further information from you. That information must include how many Proactive scripts you currently have scheduled, and what those schedules are.

What you are asking in your OP is whether we agree that "that's poor behaviour of the scheduler that should make jobs active whenever required AND possible, and should prioritise those clients that have never been backup up to the current set." I don't agree, because I think the use case for Proactive scripts described in the first paragraph of my second post in this thread has satisfied the needs of many administrators for over 20 years. You OTOH have a different use case that prioritizes clients that have never been backed up to the current Backup Set. I've explained in the remaining paragraphs of my second post how you can achieve that prioritization using separate scheduled Backup and Transfer scripts, and I've explained in my third post in this thread how you might be able to achieve that same prioritization using Storage Groups—despite their limitations.

What you may not fully appreciate, because the explanation on pages 197-198 of the Retrospect Windows 17 User's Guide is unclear (unlike page 108 of the Retrospect Mac 17 UG), is that the longest you can schedule a Proactive script is from midnight to midnight on a particular day. A daily Proactive script starts again with a new "backup window" on its next scheduled day, where—per "Algorithm" step 7 on page 191—"Sources with faster previous backups will be backed up sooner than sources with slower previous backups." The "Algorithm" sub-section (which is a duplicate of the Knowledge Base article I have linked to in up-thread posts) doesn't explain what priority is given to a brand-new source on its first Proactive backup. However the section lead on page 190 says "With ProactiveAI, backup scripts will optimize the backup window for the entire environment based on a decision tree algorithm and linear regression to ensure every source is protected as often as possible", which clearly means per step 7 that a new source that has been backed up at least once for several hours will get lower priority than an old source whose previous Normal incremental backup took a shorter time.

That's why per your OP "it takes many days for all clients to be backed up for the first time (while others get incrementally backed up in the meantime)". As I said in the third paragraph of my second post in this thread, you could file a Support Case asking for an option that would implement "Sources with faster previous backups will be backed up later than sources with slower previous backups." But I doubt the feature request will be accepted, for the reason I've given in the second paragraph of this post.

Nigel Smith · September 26, 2020

2 hours ago, DavidHertzberg said:

The "Algorithm" sub-section...

Thanks David -- I hadn't kept up with the changes, and have edited my post accordingly. Feel free to laugh -- "That Nigel, he's so version 16..." 🙂

Jan Löwe · September 26, 2020

Wow, so many replies already. Many thanks to all for that!

And now I need to apologise: I posted to the wrong group.

I am using Retrospect 17.0.2 (101), single server, unlimited clients, on a 2019 MacPro 16 cores, 32 GB RAM, 2 10 GBit network adapters, running MacOS Catalina. All clients use the latest Retrospect client software and are either MacOS (10.12-10.15) or Windows (ahem, XP-10). The backup goes to a single Disk Storage Group media set on a 2 PB EMC Isilon NAS, using a MacOS SMB mount point on the MacPro.

I agree that full backups are rare but the underlying issue still baffles me:

When my two ProactiveAI scripts (one for PCs, one for Macs, enabling the use of different rules) reach the time they are meant to do some work, they go through the decision tree (AI a big word for that) and schedule a number of Activity Threads (8-10 normally it appears, up to 14 allowed in Preferences). The Retrospect server engine runs in the background as a single MacOS multithreaded process (800-1000 %CPU) and does its work very quietly. So far so good. My issue is what happens next. The engine will wait until ALL Activity Threads have finished doing whatever they need doing. Only then, it seems, does the process look again if anything else needs doing by going through the decision tree. This looks like a simple loop to me rather than true parallelism inside the engine. The engine could/should check periodically if the decision tree allows any new Activity Threads to be started.

It is a simple complaint. If all backups take about the same amount of time, then this is not a huge issue. If they don't, very significant deviation from optimal performance occurs.

Anyway, so far I am glad that the backup seems to be running smoothly and the grooming feature (again not a great word) will hopefully create a steady state situation in terms of backup storage footprint that does not need much operator input.

Sorry again for the faux pas and thanks for all the help,

Jan

DavidHertzberg · September 26, 2020

Think nothing of it, Nigel Smith. 😄 My preceding post started out as a correction of your preceding post, but then I realized that Jan Löwe's fundamental problem seemed to be (he's just made a second post) that he didn't understand the use case for Proactive scripts—whether following the pre-2018 algorithm or the "AI" (I agree with him that "AI" is a bit of a naming stretch for what is really an "expert system") algorithm. I'm running 16.6.

Jan Löwe,

Don't worry about posting in the wrong Forum; over the years many Retrospect Mac administrators have been confused by the "Professional" title for this Forum. 🤣 I'll change the User's Guide links in my previous posts to also reference the Retrospect Mac 17 UG page numbers.

If your two Proactive scripts are each spawning multiple Activity Threads, then their destinations must be Storage Groups. What you don't yet realize is that Storage Groups, with their current limitations, are basically a kludge intended to save administrators from the bookkeeping that would be required for creating multiple Proactive scripts with explicitly-separate (to avoid conflict) Media Set destinations. AFAICT the use of a Storage Group currently doesn't create any "parent" Proactive thread that is looping while keeping tabs on the progress of the "child" Proactive activity threads; the looping is in each "child" thread—but that has AFAIK only one Source machine-volume so the looping is only theoretical. It would be wonderful if there were such a "parent" thread, but that new feature—which your Support Case should suggest—would require inter-process communication that would AFAIK be different for Retrospect Mac vs. Retrospect Windows (challenging the common-code-since-2009 foundation of the Engine). I've been told that the Retrospect "Inc." staff is now busy with meetings centered on StorCentric management's requirements, so don't expect that feature to be added soon. The third sentence in this paragraph is wrong; the Proactive script itself is supposed to be the "parent" thread—with no looping in the "child" thread, but there's an apparent bug in the "parent" looping—see this down-thread post.

Therefore IMHO what you should do is to go through the bookkeeping of creating multiple scripts. Some Proactive scripts can be for "client" machines that you know have been backed up before—possibly for "departmental" or "arrival-spread" subsets, and may use a Storage Group destination to enable parallel execution—which the Engine will keep track of but without looping. OTOH each "new client" machine should have its own separate script with its own separate destination Media Set, and IMHO that should be a one-time-scheduled Backup script—with a Recycle media action—that can keep running over a daily boundary until it finishes. After it finishes, you can run a Copy Media Set operation to copy the Media Set into the Storage Group; that run can overlap with other script runs using the same Storage Group for other "client" machines, and can then re-use the same Media Set with a Recycle media action for your next "new client" Backup. My warnings in my third post in this thread about Storage Group limitations continue to apply, especially for Retrospect Mac—where the GUI for accessing Storage Group component Media Sets doesn't exist yet.

BTW, are you in real life the director of a laboratory in Cambridge UK? (Or did you just pick his name as a Forums "handle"?) Just asking because it's helpful to know something about an administrator's installation; I've been quite open about my own. Nigel Smith's installation may resemble yours.

Nigel Smith · September 27, 2020

22 hours ago, Jan Löwe said:

The engine will wait until ALL Activity Threads have finished doing whatever they need doing.

My v17 tests wouldn't have been severe enough to bring about this situation (testing a work server and simultaneous multiple Remote clients all at your house is a sure way to cripple your home broadband, so I had to limit the time it ran!).

Previous versions did what you want, but that was using multiple scripts/sets rather than a Storage Group. In your setup -- if, say, the Mac script is backing up a Mac, does the PC script continue after it's finished a client? IE, is the Engine waiting until all execution threads have completed, or all execution threads for a script? And you may get some clues by setting the Engine Log Level to 5 (see p49 of the manual and this KB article).

Agreed that it's annoying if it isn't behaving optimally. OTOH, with the resources you've got, non-optimal performance shouldn't be an issue once you're doing incrementals (unless you've Remote Clients with significant amounts of churn).

23 hours ago, Jan Löwe said:

grooming feature (again not a great word)

AFAIK that was used in the context of backups long before the word gained its current "social" connotation. And it's a good word -- "pruning" is chopping off the oldest data, while "grooming" is cleverer and can use complex rules to remove unwanted data from any place in the set.

Jan Löwe · September 27, 2020

Thanks David and Nigel.

I see, using more scripts is the way forward. And thanks for the warnings about Storage Groups. They do look neat and tidy on the NAS but I get the point. I will try doing what David suggested once everything is fully stabilised. I will also look at the engine log level 5.

Yes, that's me. I shouldn't be dealing with stuff like workgroup backups myself (and I don't for the building's other ~800 machines) but it is a welcome distraction from less concrete and useful things that I seem to do normally 🙂

Jan

DavidHertzberg · September 27, 2020

Nigel Smith,

First, I don't know what you mean by "Mac script" and "PC script" in Jan Löwe's situation. He says all his scripts are running on a macOS Catalina "backup server", and there's nothing that says a single Retrospect script can't backup "clients" of both OS varieties. In fact from 2001–2004 I used to include as a "client" the drive on my Windows 95 "grey box" PC, which my bosses' boss had forced on me for work-from-home use so I wouldn't infect the rest of the office with a recurrence of shingles (his mother was Native American, and although he was a graduate of New York University he had certain non-modern health concepts), in a weekly Recycle Backup script—running on my Mac "backup server"—that also included my wife's and my Mac "clients". Although I don't use Proactive scripts, I'd expect that to apply to them as well.

Second, I don't think anything in the Engine's handling of multiple activity threads—execution units in Retrospect Windows parlance—has changed with the introduction of Storage Groups. Because Engine multi-threading was introduced in Retrospect Windows 7, here's a quote from page 161 of the Retrospect Windows 7.5 User's Guide (belatedly copyrighted in 2011)—whose explanation is simpler than in later editions:

Quote

NOTE: The software allows up to 8 [16 since at least November 2012] concurrent executions, provided the computer has enough memory and backup devices to support such a configuration.

When you're using multiple execution units [Retrospect Windows term for activity threads], you can run multiple operations at the same time. If you start more operations than there are available execution units, the additional operations are placed in a “Waiting” queue until an execution unit becomes available. See "Waiting Tab" on page 156.

NOTE: Proactive Backup scripts and User Initiated Restore operations do not go into the waiting queue. They only launch when an execution unit (and other required resources) is available.

IMHO the Notes would apply to a Proactive script whose destination is a Storage Group. Such a script simply attempts to launch a "child" Proactive script for each client-volume in the script's Sources; if there isn't an available activity thread, the corresponding "child" single-source Proactive script's launch goes into the "Waiting" queue—but only if the second Note doesn't apply). When such a "child" script has backed up its single Source, what's left for it to loop on? One solution would be if a multi-Source "parent" Proactive script were also running—looping while monitoring completion of the single-Source "child" Proactive scripts, but I can see nothing in pages 37–45 of the Retrospect Mac 17 User's Guide that says that feature exists yet. Don't expect it to be added soon. It won't be needed if the second Note quoted doesn't apply to a "child" Proactive script; Jan Löwe should test this. The feature does exist already, but has a bug; see this down-thread post.

Third, I agree with you about the aptness of the Retrospect term "grooming"—which appears in the Retrospect Windows 7.5 UG and thus can have originated no later than 2007. However I think Jan Löwe's objection is similar to the reason the bossy Retrospect developer had for banning the use of the term "snapshot" in the Retrospect Mac 8 UI. Both terms have acquired other more-popular connotations, in criminology for "grooming" and in Computer Science for "snapshot". Welcome to the wacky world of English language development. 🤣

Edited September 30, 2020 by DavidHertzberg
Add sentence at end of next-to-last paragraph.

Nigel Smith · September 28, 2020

10 hours ago, DavidHertzberg said:

First, I don't know what you mean by "Mac script" and "PC script" in Jan Löwe's situation.

Jan's running a script to back up Macs and another to back up PCs, so he can separate the different rules for each.

With his setup, we'd expect RS to start backing up the first 8-10 clients and, when the first of those is finished, it to go looking for the next available and start on that. It should maintain 8-10 parallel executions for as long as there are clients in need and available.

He appears to be seeing RS start to back up the first (say) 8 clients and, when the first of those is finished, it carries on with the remaining 7. Then 6, 5...1, and when that last finishes it goes looking for the next batch of available clients.

That sounds like a bug, and it would be a useful data point to know if that is happening across the whole Engine (Mac script is waiting because PC script is still backing up a client) or is per script (Mac script reloads next 6 when it finishes the last Mac, even while PC script is still backing up its last scheduled client). I suspect it's the former, both from Jan's description and my own vague sense of how things work, but if it's the latter that does suggest an easy workaround until the problem is fixed.

DavidHertzberg · September 28, 2020

Nigel Smith,

You're right on both counts; I didn't read Jan Löwe's second post in this thread anywhere nearly thoroughly enough 😢 , probably because he posted it while I was writing a post myself. Please accept my apologies.

However it appears that the analysis in the second point of my most-recent post isn't a total waste; the paragraph below the quote pinpoints what the bug is. When the destination of a Proactive script is a Storage Group, the "backup server" should immediately generate all "child" Proactive scripts—placing them into the "Waiting" queue in defiance of the second quoted Note. That way each "child" script for which there is initially no available activity thread would start to execute as a preceding "child" script finished executing—making an activity thread available.

Jan Löwe,

Please ASAP do the test Nigel Smith suggests in his last paragraph of the post immediately above this one, and submit a Support Case. It's now clear that it's really for a bug in an existing feature, so here's how to do that. I'd submit it myself, except that my past experience has shown that Retrospect Tech Support will ask the person who submitted the Support Case to test at least possible bug fix—and I no longer have enough "client" machines in my installation to do that. All you need to do for the Problem Statement is to copy the longest paragraph in your second post in this thread, and then append the result of that further test as an Additional Note.

Nigel Smith · September 28, 2020

29 minutes ago, DavidHertzberg said:

When the destination of a Proactive script is a Storage Group, the "backup server" should immediately generate all "child" Proactive scripts—placing them into the "Waiting" queue in defiance of the second quoted Note.

I don't think that's what is being said.

If I set up a bunch of scheduled scripts at the same time, the first kicks off and the others go into the waiting queue.

If I set up a bunch of Proactive scripts (note: scripts) they don't go in the queue. The note doesn't make it clear, but observed behaviour is that P-AI uses the source lists in all the scripts to generate the P-AI source list -- I don't know what happens if you are already running on all available units and another script starts (do the sources get added straight away, do you have to wait for a unit to finish before the P-AI list is regenerated [and, if so, does that take priority over skipping to the next client?]).

There's obviously something hinky in what Jan's seeing, but I think we're second-guessing at this point.

DavidHertzberg · September 29, 2020

Nigel Smith,

Sorry, per further reading the second paragraph in my preceding post proposes a wrong solution. Circumstances of my making the post are personally embarrassing; I did so early on the morning of Yom Kippur, when in other years I would have been on my way to a synagogue. However this year—due to the COVID-19 pandemic—I had to participate in services by watching YouTube live feeds at home, and—having made my preceding post just before the start of the Sunday evening service—I couldn't resist the temptation to take a quick look at this Forum before the start of the Monday morning service—and to post a quick reply while you and Jan Löwe would see it. Now I've got another sin to repent of, which is hastily-written reply posts.

On 9/26/2020 at 8:01 AM, Jan Löwe said:

....

....

....

I agree that full backups are rare but the underlying issue still baffles me:

When my two ProactiveAI scripts (one for PCs, one for Macs, enabling the use of different rules) reach the time they are meant to do some work, they go through the decision tree (AI a big word for that) and schedule a number of Activity Threads (8-10 normally it appears, up to 14 allowed in Preferences). The Retrospect server engine runs in the background as a single MacOS multithreaded process (800-1000 %CPU) and does its work very quietly. So far so good. My issue is what happens next. The engine will wait until ALL Activity Threads have finished doing whatever they need doing. Only then, it seems, does the process look again if anything else needs doing by going through the decision tree. This looks like a simple loop to me rather than true parallelism inside the engine. The engine could/should check periodically if the decision tree allows any new Activity Threads to be started.

It is a simple complaint. If all backups take about the same amount of time, then this is not a huge issue. If they don't, very significant deviation from optimal performance occurs.

....

....

....

Step 2 of The Algorithm's decision tree says:

Quote

2. Verify an execution unit is available: ProactiveAI only runs when an execution unit is available.

It looks to me as if the looping in the algorithm is not going back to Step 2 after each successful backup, maybe as an unintended (or trade-off?) result of the "10x Faster ProactiveAI" enhancement in 17.0.0.149. Here's The Algorithm's last step; "moves on" should mean looping back to Step 2:

Quote

10. Record next backup date: After a successful backup, Retrospect marks the next backup date for the source and moves on. As discussed earlier, this future date varies based on the script’s schedule.

If so, this is some kind of loop-back bug, possibly fixed in Retrospect Mac 17.5.0.185—the following bug fix sounds like it's to the same piece of code:

Quote

ProactiveAI: Fixed issue where backups could run even when script is inactive (#8739)

Jan Löwe,

Please upgrade your "backup server" to Retrospect Mac 17.5.0.185, rerun your previous test, and then—assuming you get the same erroneous results—submit a bug-fix Support Case. All you need to do for the Problem Statement is to copy the longest paragraph in your second post in this thread, and then append the result of that further test as an Additional Note.

Jan Löwe · September 30, 2020

thanks again for all the help and suggestions. I have now upgraded to 17.5 and still same behaviour. Will submit a bug-fix support case as suggested by David.

Jan

Nigel Smith · September 30, 2020

I assume it's the "!"s that are the problem?

If you click on one and check the "Status" in the "Summary" tab, what does it tell you? Might help to see the "Source" and "Destination" columns too, if that isn't too secret-squirrel to post.

DavidHertzberg · October 8, 2020

Jan Löwe,

Retrospect Windows 17.5.1.102 was released today. Its very-informative 🤣 entire entry in the cumulative Release Notes is:

Quote

Engine

Backup: Fixed issue that prevented simultaneous operations to storage groups (#8893)

Is #8893 the bug number assigned to the Support Case that you submitted about Proactive job scheduling?

P.S.: When I looked again at the cumulative Release Notes, they gave me an option to send a message. The first e-mailed reply, from "Jordan Shattuck", said

Quote

The issue was experienced and reported by multiple people. It was then reproduced internally and fixed. You can view the release notes using the link below [which was of course to the cumulative Release Notes].

After I replied to that e-mail, pointing out that I had initially read the Release Notes and describing your prominent position, I got back another e-mail saying

Quote

I have forwarded this on to our support team so they can assist with the information requested. They will be reaching out.

I strongly suspect that "Jordan Shattuck" is actually a bot. Just what I needed, StorCentric! 🙄 Real people may reach out Monday.

DavidHertzberg · October 12, 2020

Jan Löwe,

The head of Retrospect Tech Support did indeed reach out today, at what would be 5:24 a.m. California time. The second paragraph of his reply is:

Quote

Bug 8893 originally had the title "Running multiple executions at the same time to a storage group results in waiting executions". This bug was discovered internally during QA testing and not reported originally by individual customers, although it does apply to real life issues reported by users.

That sounds as if your bug was the one fixed in Retrospect Mac and Windows 17.5.1. As to who at Retrospect "inc." changed the bug title to be less informative, I'd better not speculate. I'd also be better off not speculating as to why QA testing supposedly (but see Jordan Shattuck's first e-mailed reply in my post directly above) discovered this bug after the 23 September release 17.5.0. However there is an Engine preference that specifies the maximum number (up to 16) of activity threads that can be running in parallel. So all an engineer had to do as an alpha-test is to set this preference to 3—leaving 2 for source "client" machine-drive combinations along with the thread for the Proactive "controller"—and then submit a script with 3 source "client" machine-drive combinations. Proper alpha-testing, including of the 17.0.0 "AI" speedup, should have caught this bug earlier.

Everybody,

The first paragraph of his reply is:

Quote

Please remember that the forum really is not an official method to reach technical support. If individual users of the forum have questions about bug fixes, they can contact support directly to get direct answers.

As stated in the post directly above, I originally contacted Retrospect "inc." about bug #8893 by using the messaging facility newly made available to readers of the cumulative Release Notes—not via this Forum. All subsequent communications were via e-mail. So I guess the head of R. T. S. is annoyed that I mentioned my communications with R.T.S. on this Forum. Be warned.

Also be warned that my sending a message from the cumulative Release Notes web page resulted in "Jordan Shattuck" creating a separate Support Case containing my original message and the subsequent e-mails. Maybe that's the only way he/she has of communicating them to R. T. S.. OTOH hitting option 7 on the Retrospect phone line and typing in "Shatt" elicited a statement that no one whose last name starts with those letters is listed, so I guess Inside Sales Manager Jordan Shattuck is considered too junior—maybe because he still has hair 🤣—to have a separate phone extension.

Edited December 20, 2020 by DavidHertzberg
Add three sentences to first long ¶, describing how _proper_ alpha-testing _should_ have caught this bug earlier. In 3rd ¶, Jordan Shattuck is a person—not a bot. In 1st ¶, Shattuck's first e-mail belies the head of R.T.S's reply.

Proactive job scheduling

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

ProactiveAI: Fixed issue where backups could run even when script is inactive (#8739)

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Engine

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation