Jump to content
Doug_M

NAS Volumes Disconnect During Scan

Recommended Posts

Greetings!

Running Retrospect Server 16.6.0 (114) on a MacPro trashcan updated to Catalina 10.15.3.

Currently there are no 'client' machines, I'm simply backing up 5 shared volumes on a Synology RS2414rp+ (also current DSM) to LTO-6.  I've added the shares via Sources using 'afp://myFileServer.local/Share_Name', and set it to log in using a registered user with admin privileges.  The shares all show up in Sources with the green status indicator.  Backup scripts are set up to run after hours, and I've got a 5-member media set, two blanks, and a cleaning tape in my 8 slot robot.  All Good, right?  Nope.

 

Shortly after a script starts, Retro begins scanning.  Then it hangs and I get a macOS system alert saying the server is shutting down.  After that, my Retro log fills with 'file not found' errors and nothing gets backed up.  Checking the Sources pane after this, the target volume shows as disconnected.  Status goes green again via either Browse button or Locate.

 

Any ideas?

Thanks in advance!

Share this post


Link to post
Share on other sites

Doug_M,

In regard to Catalina, you need to read this Knowledge Base article and also this one.  Note that the first KB article, in its last multi-sentence paragraph, erroneously links to the Mojave version of the second KB article rather than the Catalina version—which it should have (except that somebody on Retrospect "Inc."'s. august Documentation Committee didn't coordinate what he wrote in the first article with somebody else who was rewriting the second article for Catalina).

I'd guess your problem is covered under "Technical Details" at the bottom of the second Knowledge Base article I linked to in the first sentence of this post.  That article says Network Volumes are among the "folders" [the article's term—originally less sloppy when the first KB article article was originally written to apply to Mojave] "that require 'Application Data Privacy' ('Full Disk Access') privileges".  A sentence in the first KB article's last section says the same thing.

Share this post


Link to post
Share on other sites

Since I'm in testing mode with the latest RS...

I've just backed up a latest-DSM share over AFP, so it's possible. That you are seeing scanning start (and no "Retrospect has detected it is not listed under "Full Disk Access") suggests that FDA is enabled. That RS then cannot find the files it has scanned to then back them up suggests a network disconnect between those two phases -- possibly triggered by whatever is throwing the shutdown alert, though it could be that the alert is the result of the disconnect. What's the actual alert text?

Is your test a manual run, rather than sitting waiting to watch the out-of-hours schedule? What else is the trashcan doing? Any task or power scheduling on the Mac or Synology? Any non-default options/preferences in your script or RS server?

Share this post


Link to post
Share on other sites

Thanks for the replies!  Full Disk Access is indeed enabled for both the Engine and Instant Scan, the Synology doesn't sleep, and neither does the trashcan.

These are scheduled scripts, but I have kicked them off manually to test.  The only other thing going on with the Retro server is a time machine backup of the boot drive, to another share on the same fileserver, which I've disabled temporarily for testing.  Everything in the scripts is default.  I've checked the Console, and logs in Catalina are now very different and in this case useless, the logs on the Synology don't show anything unusual... successful connections to the specified shares, but nothing about disconnects.

The maddening thing is that I only get the last 500 lines of the Retro log, and I'd like to see the whole thing.  What I can see is mostly this:

*good path to actual file/filename - can't read, error -1,101 (file/directory not found)

And then the script closes with this:

 2/20/20 7:38:15 PM: Building Snapshot...
    !Can't read state information, error -1,101 (file/directory not found)
    2/20/20 7:38:17 PM: Execution incomplete
    No longer present: 4,117 files, 219.2 GB
    Completed: 10 files, 0 B
    Performance: 0 MB/minute
    Duration: 01:38:14 (01:37:41 idle/loading/preparing)

Share this post


Link to post
Share on other sites

One other detail - the only thing between the two units is a network switch.  They live next to each other in the rack.

The Synology and switch are set up for Dynamic Link Aggregation (IEEE 802.3ad LACP) bonding all four ports on the 2414, and functions perfectly in every other scenario.

 

Looking at the 'Past Backups' tab, the Status alert says Backup Interrupted.

Share this post


Link to post
Share on other sites

Doug_M,

This is just a stab in the dark, but make sure Instant Scan is Disabled in the Retrospect  Engine Systems Preferences pane per the screenshot at the bottom of page 206 of the Retrospect Mac 16 User's Guide.  This setting is Enabled in my copy of Retrospect Mac 16.1 (I haven't gone to 16.6 out of lack of use of Catalina and abject fear of hurried foul-ups by StorCentric-harried engineers).

FYI Instant Scan is described on page 203 of the UG, but it only has ever worked for local or "client" drives running under Windows or macOS—not any fileservers running under other OSes.  Instant Scan still works for Mac drives formatted with HFS+, so I have it enabled in the Retrospect Engine for three local drives that are mounted on my "backup server" machine.  However Instant Scan's been officially declared by Retrospect "Inc." (see "Known Issues") not to work for Mac drives formatted with the newer and more-glorious APFS, so I've disabled it for my MacBook Pro "client" Source—whose SSD was (despite my best efforts) forcibly converted to APFS when my MBP was upgraded to macOS 10.13 High Sierra.  Maybe 16.6 has a bug enabling server Instant Scan.

 

Share this post


Link to post
Share on other sites

Hey David!  I saw that, too.  I've had Instant Scan disabled in System Prefs/Retrospect pane, though it does have Full Disk Access permission.  As the NAS can't run Client in a sanctioned way (I'm not installing the Linux client), I was under the impression that Instant Scan would simply ignore that volume regardless.  I switched it off anyway, with no apparent difference.

 

P.S. @Nigel Smith - the alert is a simple system alert dialog with yellow triangle (!)  "Server is shutting down.  The file server "mySynologyNAS" is shutting down. [OK] 

Share this post


Link to post
Share on other sites

Doug_M,

To increase the size of the Retrospect Mac Console Log, see pages 155-156 of the Retrospect Mac 16 User's Guide.  However you may not even need to do that, because the imprecisely-written (doesn't distinguish between the Console Log and the Operations log) paragraph says

Quote

When the log reaches the limit set, the oldest entries are no longer shown, though they remain in the operations_log.utx file stored in/Library/Application Support/Retrospect/, up to the maximum log size specified in General Preferences (see “Log size limit” below).  You can view the operations log by choosing View > Log, or by pressing Cmd-L. Type the length you want for the log in the entry field.

That Operations log—of which the latest portion is copied into the Console log—size limit can be increased per page 157 of the UG.

When you say "the only thing between the two units is a network switch", I assume the two units are your Mac Pro and your  Synology.  You might also consider playing around with "Advanced Settings" on pages 161-163 of the UG.  Or maybe (dare I say it?) switch from AFP to SMB.😲

Share this post


Link to post
Share on other sites
On 2/21/2020 at 5:45 PM, Doug_M said:

P.S. @Nigel Smith - the alert is a simple system alert dialog with yellow triangle (!)  "Server is shutting down.  The file server "mySynologyNAS" is shutting down. [OK] 

So it's the Synology that's "shutting down", and not the Mac. Quite a useful data point!

Still sounds like a network problem -- AFP is a lot more resilient than Retrospect, and can retry/recover from a network flutter when RS will just say "Nope, server's gone", because the OS-recovered connection will often be named differently in /Volumes.

Try manually mounting the share on the Mac (rather than adding it to RS as a server and letting RS login) and backing up as a local volume. And as David suggests, unless you have a pressing need for AFP, try using SMB. Also try taking the switch and aggregation out of the equation by running a single direct connection from Synology to trashcan. I'd do this as disk-based backups, so as not to waste time/tape -- maybe also by creating a small-sized "Favourite" on the Synology to save time, then expanding the backup to a full share, then all the shares...

Basically, start as simple as possible and see if it works -- then add complexity until it breaks!

Share this post


Link to post
Share on other sites
On 2/24/2020 at 6:19 AM, Nigel Smith said:

Try manually mounting the share on the Mac (rather than adding it to RS as a server and letting RS login) and backing up as a local volume. And as David suggests, unless you have a pressing need for AFP, try using SMB. Also try taking the switch and aggregation out of the equation by running a single direct connection from Synology to trashcan.

I'll give the local volume backup a try.  I've not been using SMB due to name mangling, which is partly due to an smb.conf setting on the NAS, and years of Mac users not following our file/folder naming protocols.  That, coupled with the default performance hit due to signing, is why I'm pursuing AFP.  Additionally, the volumes in question are on our live production server, so testing some of these changes would require more down-time than I can get.

 

Here's what I'm seeing in Console.app during the hang / reconnect:


AFP_VFS afpfs_DoReconnect:  Restoring session /Volumes/Test_Volume
ASP_TCP ReplayPendingReqs: replaying slot 122 with reqID 5967 afpCmd 72 on so 0x62f0e214331bf2ef
ASP_TCP ReplayPendingReqs: replaying slot 123 with reqID 5968 afpCmd 72 on so 0x62f0e214331bf2ef
ASP_TCP ReplayPendingReqs: replaying slot 124 with reqID 5969 afpCmd 72 on so 0x62f0e214331bf2ef
AFP_VFS afpfs_DoReconnect:  get the reconnect token
ASP_TCP CancelOneRequest: cancelling slot 83 error 32 reqID 29780 flags 0x9 afpCmd 0x40 so 0x62f0e214331bf2ef
ASP_TCP Disconnect: triggering reconnect by bumping reconnTrigger from curr value 3993 on so 0x62f0e214331bf2ef
AFP_VFS afpfs_DoReconnect:  GetReconnectToken failed 32 /Volumes/Test_Volume
AFP_VFS afpfs_DoReconnect:  doing reconnect on /Volumes/Test_Volume
AFP_VFS afpfs_DoReconnect:  soft mounted and hidden volume so do not notify KEA for /Volumes/Test_Volume
AFP_VFS afpfs_DoReconnect:  Max reconnect time: 30 secs, Connect timeout: 15 secs for /Volumes/Test_Volume
AFP_VFS afpfs_DoReconnect:  connect to the server /Volumes/Test_Volume
Ignoring source 0x7fdf9641ca40, snapshot is from kernel kernel_task TCP4 flow id 44472 (close) so pkts rx 6 tx 9, bytes 456 845 cell 0 0 wifi 0 0 wired 696 1205 deltas 456 845 0 0 0 0 696 1205
AFP_VFS afpfs_DoReconnect:  Logging in with uam 8 /Volumes/Test_Volume
AFP_VFS afpfs_DoReconnect:  Restoring session /Volumes/Test_Volume

 

These are generated by 'kernel', and repeats several times every second - it's a flood of reconnect attempts.  Funny thing is that everyone here uses AFP, most are running Catalina, and we have no issues with availability except in this instance.

 

And here's what I'm seeing in my Retrospect test script log when it actually makes it through (rare):

 

+  Normal backup using NAS TEST at 2/21/20, 4:13:48 PM (Activity Thread 1)

    To Backup Set TestSet...

    -  2/21/20 4:13:48 PM: Copying Marketing_Sales

    [*] 2/21/20, 4:17:53 PM: Not done after 30s:

    [*]     0:/Volumes/Marketing_Sales/Production for Proposals/2018/JOB 18-1997/01 Accounting/

    [*]     1:/Volumes/Marketing_Sales/Production for Proposals/2018/JOB 18-1997/Proposal/

    [*]     2:/Volumes/Marketing_Sales/Production for Proposals/2018/JOB 18-1997/00 Project Management/

    [*]   > 3:/Volumes/Marketing_Sales/Production for Proposals/2018/JOB 18-2101/00 Project Management/

    [*] 2/21/20, 4:20:55 PM: mvlcProcessScanNodes resumed after 212s

    *File "/Volumes/Marketing_Sales/Production for Proposals/Other Folder”: can't read, error -1,101 (file/directory not found)

    *File "/Volumes/Marketing_Sales/Production for Proposals/Yet Another Folder“: can't read, error -1,101 (file/directory not found)

    2/21/20 4:20:58 PM: Found: 19188 files, 8200 folders, 198.2 GB

    2/21/20 4:20:59 PM: Finished matching

    *File "/Volumes/Marketing_Sales/Production for Proposals/2018/JOB 18-1997/01 Accounting/Invoices/": can't read, error -1101 (file/directory not found)

    and 8,297 others

    2/21/20 4:20:59 PM: Selector "No Files" was used to select 0 files out of 19,188.

    2/21/20 4:20:59 PM: Copying: 0 files (0 bytes) and 0 hard links

    2/21/20 4:21:02 PM: Building Snapshot...

    !Can't read state information, error -1,101 (file/directory not found)

    2/21/20 4:21:02 PM: Execution incomplete

    Duration: 00:07:14 (00:07:12 idle/loading/preparing)

Share this post


Link to post
Share on other sites

Doug_M,

You haven't said whether you have a brand-new installation of Retrospect Mac, or whether you've been using it for years and something's changed—including an upgrade to Retrospect Mac 16.

If you recently installed/upgraded to Retrospect Mac 16, I believe you're entitled to 30/45 days of free personalized support.  Here's why and how to file a Support Case for a bug, in which you'll have to disclose those trivial details.  In case you don't realize it, everyone on these Forums is a volunteer; no Retrospect "Inc." employees post here.

 

Share this post


Link to post
Share on other sites

Hey David!

It's an old installation, and had been working reliably.  Things got weird when the Retrospect Server machine was updated to Catalina / Retrospect Mac 16 (neither of those updates were my decision, in case you're wondering ;) ).  All of this happened over the New Year break, and I stepped in to sort it out last week.

 

I'm fully aware that there aren't any employees on these boards, but there are some knowledgeable folks lurking about... and since the 30/45 day support thing has expired, I thought I'd try my luck.

Thanks for the links!

Share this post


Link to post
Share on other sites

Doug_M,

Upgraded to Retrospect Mac 16 from what version?  As I've said in the third paragraph of this post in another thread, "my impression from several recent Forums threads is that the 'backup server' Engine—which is the same for both variants of Retrospect aside from GUI—lost its ability in version 15 to handle certain old versions of network protocols used by NASes."

Pleas look again at the fifth paragraph in the boilerplate Support Case post I linked to earlier in this thread.  Retrospect Tech Support may no longer give you personalized support, since your installation upgraded more than 45 days ago, but they've probably got a generalized response they can send you if you file a Support Case about the bug—which as I said in that fifth paragraph you're entitled to do. 

Besides, they may take pity on you because the updating was done over the New Year break—probably by ignoramuses from IT management.🤣  To enhance that pity I suggest you contact Retrospect Sales by phone, and threaten to procure another backup application to backup your live production server.  In the meantime, and to enhance the credibility of your threat to Sales, I suggest you put another Mac or Windows machine—running the older version of Retrospect that used to work—into service backing up your production NAS.

 

Share this post


Link to post
Share on other sites
On 2/26/2020 at 6:29 PM, Doug_M said:

I've not been using SMB due to name mangling, which is partly due to an smb.conf setting on the NAS, and years of Mac users not following our file/folder naming protocols.

Yep -- that's a pressing need (I'm dealing with similar at the moment, and have had to use the same AFP work-round).

On 2/26/2020 at 6:29 PM, Doug_M said:

Additionally, the volumes in question are on our live production server, so testing some of these changes would require more down-time than I can get.

Shouldn't need much, if any. Volumes definitions, direct mounting, etc, can be be done on the RS server without impacting the Synology.

I'd hoped your aggregated link would be resilient enough that you could simply re-route one cable to the trashcan instead of the switch and then change the network settings, but I've just done a quick test and it looks like you can't just remove a NIC from the LACP bond -- you'll have to delete the Bond then re-create it with three NICs. Maybe 5 minutes of downtime.

Did you start new media sets when you upgraded? If so, it could be that the "old" system was skipping already backed-up files which the new system is now trying to first-time access -- only there's a problem on the Synology. Have you run Data Scrubbing (Storage Manager->Storage Pool->Data Scrubbing) lately? I'd be happier doing that after a complete, successful, backup, but needs must when the devil drives...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×