Jump to content

Nigel Smith

  • Posts

  • Joined

  • Last visited

  • Days Won


Everything posted by Nigel Smith

  1. Smart move, IMO! These are deep waters, best left unrippled. Especially when you remember that network communication is not directly via IP address, but is next-hop routing via the mapping of IP addresses to gateway/MAC address in ARP tables. Table updates aren't instant, which is why I can quite easily see why my guess might happen -- step 5 is based on the MAC address of the previously detected client, obviously still "valid" since the interface used wasn't changed (just the IP address). But when we get to step 7 it's aged out/replaced, the IP address is no longer valid, and you get a comms fail.
  2. Not so fast... This is what I think might be happening (and why a WireShark run would help): Client is on "Automatic" location -- x.x.x.202 You switch to "Retrospect Priority", client address now x.x.x.201, and immediately run the server script Server multicasts to all devices, asking for client Client responds, but we know the client doesn't instantly reflect a network change, so says "Yay! Me! Here and ready on x.x.x.202!" Scan gets done By now, the client is listening is on x.x.x.201:497 (or, rather, is no longer listening on x.x.x.202:497) Server initiates the backup "Hey, x.x.x.202, give me all these things!" Silence... More silence... Server assumes network communication has failed and throws -519 Step 4 is total guesswork from me -- all we know is that there must be some mechanism for a multicasted client to tell the server its IP address. If I'm right, they might be able to fix this on the client, though it may dependent on the OS promptly informing all network-using services of an IP change (the client unnecessarily spamming the OS for updates would be horribly inefficient). Or they might be able to fix this on the server, with a re-multicast after step 8's failure to pick up the new address. But, even in these days of devices often changing networks, I doubt the above crops up very often and probably isn't worth fixing (directly, at least). x509's "binding to a bogus address" is much more common, and if solving that solves other issues too -- bonus!
  3. You're viewing the Piton protocol too narrowly. It's the protocol(s) by which server and client communicate and includes discovery, access and data transfer (amongst other things) and is used in the unicast (defined IP client, as above), broadcast and multicast "location" (using that since "discovery" usually means "first time ever finding a client" in RS) of a client on the network and all subsequent communication. You'll have to do a lot more digging with eg WireShark to know exactly why you saw what you saw -- I'd expect it to throw a -530 (because the client was still listening on x.x.x.202:497) or just work, not throw a -519 -- but I suspect that permanently binding the client to x.x.x.201 with "ipsave" might eliminate the issue. -530 is quite clear -- the client couldn't be found. That -519 is separate implies that the client could be found but then there was a problem, but I'm probably reading to much into it. All we really know is that "network communication failed", for whatever reason.
  4. Would just warn that different routers' DHCP servers behave in different ways. Some treat the address blocks reserved for statics as inviolate, some will continue to offer those addresses when no MAC address has been set, etc. I always belt-and-brace, putting MAC addresses in the router's table and setting static IPs on the clients, when I need a definitely-fixed IP. Also, some routers force a certain (often limited) range for statics and others let you do as you will, so check your docs before planning.
  5. There are pros and cons to both approaches. But consider this first -- how will you restore your system disk if there's a disaster, have you tested it, and does splitting it into separate "Favourite" folders result in way more work than the benefits are worth?
  6. Of course -- would I offer anything simple? 😉 More seriously, if the client is "confused" by network interfaces when it starts up, can we guarantee it won't also be "confused" on a restart? While it should be better, since it is restarting when there is (presumably) an active interface, it might be safer to explicitly tell the client what to do rather than hoping it gets it right. And a batch script triggered by double-click is a lot easier for my users than sending them to the command prompt. As always, horses for courses -- what's best for me isn't best for a lot of people here, but might nudge someone to their own best solution.
  7. Not just statics -- you can also use it for DHCP clients. And it wouldn't take much work to write a script that would find the current active IP and do a temporary rebind. On a Mac you can even tie it in to launchd using either NetworkState, or with WatchPaths on /private/var/run/resolv.conf (although, in my experience, Mac clients do get there eventually and rebinding is only necessary if you are in a hurry to do something after a network change).
  8. From my earlier back-of-an-envelope calculations, both D2D and D2T should fit in overnight. More importantly, because he isn't backing up during the day, the "to tape" part can happen during the day as well (my guess is that he was assuming nightlies would take as long as the weekend "initial" copy, rather than being incremental), so he should have bags of time. I know nothing about Veeam's file format, only that it's proprietary (rather than eg making a folder full of copies of files). It may be making, or updating, single files or disk images -- block level incrementals may be the answer. Or it may be that Veeam is actually set to do a full backup every time... It is a snapshot, in both computerese and "normal" English -- a record of state at a point in time. I don't think the fact that it is different to a file system snapshot, operating system snapshot, or ice hockey snap shot 😉 requires a different term -- the context makes it clear enough what's meant, IMO.
  9. No, no, and no 😉 Long time since I've seen Norton firewall, but make sure that you are opening port 497 on both TCP and UDP protocols (direct connection only need TCP, discovery uses UDP). Windows also has a habit of changing your network status after updates, deciding your "Home/Private" network is "Public" instead, if Norton makes use of those distinctions (Windows Firewall does). Easiest way to check for discovery is Configure->Devices->Add... and click Multicast -- is the device listed? Also try Subnet Broadcast. I have no particular problems with DHCPed PCs at work, so it's something about your setup. As David says, you could get round it by assigning static IPs -- check your router documentation first, some "home" routers supplied by ISPs have severely limited ranges that can be reserved for static mapping -- which can also make life easier for other things, eg just use "\\192.168.1.x" to access a share instead of hoping Windows network browsing is having a good day... Question: Are client and server both on the wired network, or is one (or both) wireless?
  10. Thanks for that, David -- a very clear explanation. And you're right -- the thing that's missing is a definition of "active backup", which is unfortunate given that it is fundamental to how "Copy Backups" works. Indeed, that's the only place the term "active backup" appears in the User Guide! Which is disappointing, since one of the things that should really, really, be clear in any instructions for a backup program is what will be backed up. But from what you say we'll get similar results using either "Copy Media" or "Copy Backup" with grooming retention set to at least the number of backups in a "rotation period" -- in Joriz's case that would be 15 (Mon-Fri = 5 backups a week, 3 tape sets rotated weekly -- 5x3=15). I'm starting to think that if you want everything, rather than a subset, on every tape set then "Copy Media" is more appropriate. But, again, I'd hesitate to say without proper testing.
  11. Currently a 2014 Mac Mini in a Sonnet enclosure with dual interface 10GbE card. Attached to an old (but brilliantly reliable) ADIC Scalar 24 with a single LTO-2 drive via a Promise SANLink2 Thunderbolt to FC adapter (flakey, leads to a lot of prematurely "full" tapes, but the data is always good). Disk-wise we use a Thunderbolt-chained pair of LaCie 8Bigs for 84TB of space, but can overflow onto networked Synologys if required. The server itself, including RS catalogs, is backed up with Time Machine to an external USB drive as well as with RS. Next iteration will probably be a new Mac Mini with built-in 10GbE in a Thunderbolt 3 enclosure, still using the 8Bigs but permanently adding one or more Synology NASs for even more disk capacity ("private" network via the card while built-in handles the clients), and adding a SAS card (H680?) to connect to a Quantum Superloader 3 with single LTO-8 drive. As you can tell, there's a lot of using what we've got while upgrading what we can/must. That's partly funding issues, partly because I hate spending money (even other people's!) if I don't have to, but mainly because I like to stick with what I know works rather than deal with the inevitable issues with new kit. The above is with all due respect to our hosts. I've had a Drobo on my desk since soon after they came out in the UK, have been very happy with it, but relatively low storage density/lack of dual PSU[1]/rack mounting means chained 8Ds or similar aren't an option. And I've dreamt of having a BEAST in the room for years, but they are too deep for our racks (shallow because they have built-in chilled water cooling), and since we're on the third floor we have a loading limit -- the BEASTs have great storage density, but we'd have to leave the rack half empty because of the weight! If/when we relocate the room or replace the racks, BEASTs will be on my shopping list... I'll skip over the Windows PC running RS that came after the Mac Mini, when we were considering jumping ship after rumours of Apple letting the Mini die, because <shudder>Windows!>/shudder>. The RS side isn't too bad -- oh look, it's Retrospect 6 with some new features! -- but, as a Mac guy, Windows always seems clunky to me. It's still running, still doing its job (in parallel with the Mac Mini), but it never feels right... As always, different requirements and situations lead to different solutions -- and I'd never hold up what we've done as a good example, just what works for us 😉 [1] Yeah, I know -- "Why do you want dual PSUs when the Mini only has one?" Because, in my experience, it's a lot easier to recover/rebuild the server than a big storage volume after a hard power-off. And we've got a good UPS, so the most likely reason for a power-off is clumsiness in the rack (yanking the wrong lead!) or a PSU failure which we've never had with a Mini but have seen a few of in other server room devices -- expensive Apple components, for the win!
  12. etc... And this is my confusion: p120 ...which implies that you can select whether to have all, some, or only the most recent while Copy Media is always all, and p121 ...where "backups" is plural, even for a single source. While I realise no manual is big enough to explain every single someone might come across, "Copy the most recent backup for each source" wouldn't take up much more virtual ink if, indeed, that is what happens.
  13. I wrote that because p118 of the guide seems to describe "Copy Media" and "Copy Backup" as two methods of achieving the same goal, though "Copy Backup" also has options to restrict/select what gets copied. Which makes me wonder why they aren't the same thing with expanded options. Which makes me think I'm missing something, either in what gets transferred, what can be restored from the resulting set, or in how it is done. Since the first two are generally important (data integrity) and the last may impact time required (and therefore backup windows needed in Joriz's scheme) and I can't run tests at the moment, I haven't a clue which is better suited.
  14. Retrospect makes a "Retrospect" directory on the media you select as a target -- any backups go in there, and RS manages folder structure etc., which is why it doesn't show you more in the console. So for the new media destination just select the directory containing the "Retrospect" folder, RS will use the "Retrospect" folder and create a new folder in there to store the rdb files (probably Smaller Backup:Retrospect Data:FT_FD:Retrospect:FT_FD-1) and you should be good. And no -- it's not immediately obvious that's how it works. I've quite a few volumes with "Retrospect" folders inside "Retrospect" folders where I've selected the wrong one. But if you think about the expected operation -- you pick a volume to store files on (rather than a sub-directory of a volume) -- and it becomes a little clearer.
  15. This happens -- I don't know if it is Retrospect that does the remembering or the tape library, but RS knows which tape is in the drive, what's in the other slots etc without a re-scan. Indeed, even if you quit and re-launch RS it seems to assume the last-known state is the same until proven otherwise. I just trust RS and the library to do their thing during the week, my only "hardware intervention" is to swap tapes and then do a single re-scan of the library on tape-change day. I don't know if tapes are rewound between sessions when left in the drive, but it doesn't matter much in terms of tape life. I know I mentioned tape life above, but there's a lot more wear in completely re-writing at least one tape every time -- even then, you should be averaging at least 200 full writes, which is >10 years using the above scheme 🙂 It's more a way of shoehorning an extra 3 tapes out of the boss, and look at the advantages that brings in terms of backup retention and reduction of the overall backup window required.
  16. Perhaps a word of explanation, since the above makes it sound like we don't "off-site". We do, but we use yet another method -- backing up the backup files! Yep, I just copy the .rdb files to tape and take those away. The downside is that a restore from these takes considerable time -- you have to restore all the .rdb files for the set, maybe re-catalog it, before you can restore single file. The upsides are that it is really easy to do the backups and, crucially, these can be done using a another machine while regular client backups are taking place. It works for us because our off-sites are one-part disaster recovery but nine-parts archive. I've never had to use them for DR and only once in the last five years have I had to go back to them to restore data from an old backup set (anything recent is handled using the onsite disk sets). But we have to have them because of retention policies, and this is an efficient way to handle a lot of data using relatively low-spec equipment with minimal impact on daily client backups. It also followed on neatly from my old RS6-based kludge for D2D2T using Internet backups... Our backup system re-spec has been put on hold because of COVID-19 -- after that's done I expect we'll opt for a more "normal" methodology similar to what's described above.
  17. There may also be differences in how and what you can restore. They both match "files in source to file already in the destination", so I don't know what the practical difference is. I haven't used either often enough to really know the pros and cons of each and, being stuck at home, don't want to risk banjaxing a work system that I can't then reset remotely just to run some tests. Perhaps someone who has used both in real life will chip in...
  18. Not quite right, IMO. I believe the first sentence is true, but the second is only correct if you have to you have to erase and restore that volume -- and the point of snapshots is that you can roll back the volume to the earlier state without erasing it. Bear in mind that eg WannaCry can (and will) delete volume shadow copies on Windows when user access to VSSadmin.exe hasn't been disabled -- it was, and I believe still is, enabled by default -- which is why they aren't considered a strong defence against ransomware "out of the box". You'd need admin privs to delete CCC/Time Machine/etc snapshots, and they can't be otherwise modified, so CCC et al are a much better defence by comparison. Part of the problem here is that "backup" is such a woolly concept, meaning different things to different people. So whether CCC is good for backing up depends primarily on what you require from your backup... But any backup is better than none and, IMO, the extra redundancy and flexibility gained by using both RS and CCC is well worth the extra expense.
  19. You could also do the same using Copy Media Set scripts -- see the User Guide, particularly p120, for a comparison of the two. Indeed, now that we're grooming the disk set a Copy Media Set script might be the better option. TBH, I don't really understand the difference between the two -- try them yourself and see which is more suitable 😉 Looks good. My main concern would be that you're re-writing all the tapes every rotation, reducing their life-span considerably. Adding another tape to each set would increase the time between re-writes enormously, from every third week for each set to every 30th week! You could even stagger it so one was done every couple of months, giving you the ability to go back 6 months for any data restore -- but you sell it to the boss as a cost benefit, spend a little more now to save in the longer term.
  20. Reasonable argument. ZFS does do a lot in the background -- it may even be that that's causing Retrospect to throw those errors. Your network change certainly won't hurt but, assuming the number of errors is a small percentage of the files being backed up, I'd be inclined to ignore them and trust RS to get the files on the next run. Maybe monitor, and see if there's any consistency in which files error -- a certain share, during similar times, particular file types, etc. Speed-wise, my feeling is that you are getting a reasonable write speed given the age of the hardware and the average file size. But that is just a gut feeling, with nothing solid behind it. Using RS does involve some overhead and a test using "tar" to write directly to tape would remove that, giving you an indication of "hardware speed". More importantly, ~6GB/min write speed implies ~4GB/min overall (write and compare) -- even if we play safe and call it 3GB/min that's 180GB/hour, a good figure to work with. If we say the three servers hold 3, 6.5 and 1TB respectively (allows plenty of margin for error) that's 16 hours, 36 hours, and 6 hours for full backups. So if you start the tape run after your Friday night D2D has completed you'll easily be done before Monday's D2D is due to start. Your total daily churn is ~80GB, so less than an hour including scanning. Will easily complete after a nightly D2D. Weekly churn is ~560GB (using 7 days to allow for extras), so allow 4 hours. Will easily complete after Friday's D2D. Tape-wise, two LTO-7s per set gives a set capacity of 12TB without compression (again, to give us plenty of wiggle room). Easily enough for a full backup plus 2 weeks of churn. That suggests the following: Nightly D2D backups continue as you are doing now. Use groom scripts set to keep 4 backups, run these before Friday's D2D scripts so you can always go back at least 5 days to eg restore a deleted file. First Friday after the D2D has finished, run a "Copy Backup Script", all backups, to Tape Set 1, recycling the tape media. Leave those tapes in and Mon-Thur of the next week run a nightly "Copy Backup Script", most recent backups, adding on to what's already in the tape set. Second Friday, do the same with Tape Set 2. Third Friday, same with Tape Set 3. Fourth Friday, back to Tape Set 1. So you'd always have the last 5-9 days of restore points of the P2000 and a rolling 21 days on tape, basically unattended apart from swapping tapes on a Friday and monitoring the logs for problems. You can easily vary the above to suit -- only have 2 sets of 3 tapes so you go longer between recycles but have less redundancy. Add 3 more tapes to go longer but keep the redundancy. Only copy "most recent" backups to tape (again, sacrificing redundancy for an increase in capacity), or groom the P2000's disk sets more aggressively. A lot will depend on how people work and what you expect to have to recover from -- my most frequent request is for files "lost" some time ago, so I keep a lot more than someone who was just expecting to restore the most recent version of a failed drive. There are many ways to skin the backup cat, and I'm not claiming the above is in any way definitive. With luck someone (David? Lennart?) will be along to improve it, and I'm sure you'll find a better way of doing it yourself. But hopefully it'll give you a good starting point from which to roll your own solution.
  21. Quickly, before addressing the other points properly... I was complaining about my poor English, not yours! I had to edit that bit to make it clearer, and wanted to put a reason in case David or yourself had already quoted the then edited post. It seems even my explanations need explaining... 🙂
  22. You probably had no response because nobody felt like playing "guess the missing info". What's your OS version, what options have you set for those backups, what are your system's sleep settings, etc. But it looks like Retrospect (assuming this is from the RS log, not sys.log or elsewhere) is trying to stop your Mac from going to sleep but can't create the required assertion. The second fragment shows RS trying write to "something" which is no longer reading -- could be a network socket, inter-process communication, file which has been closed, etc -- which is what generates the SIGPIPE, then the same failed sleep assertion. Bottom line -- if it isn't causing problems, isn't breaking your backups, and isn't doing much beyond putting a couple of extra lines in a log file -- ignore it.
  23. You still haven't stated size of shares to back up, expected churn on each, retention policies (ie how long must a backed up file be available for) and much more. I find it helps to think like you're asking a really nit-picky contractor to do the job -- and he'll give you exactly what you ask for, no more and no less, and take your money and leave you in the lurch if that wasn't what you actually wanted 🙂 And you'll have to compromise. For example: These are mutually exclusive -- a server room fire on Saturday will mean your disaster recovery will be from a week ago. To me that's acceptable -- it's a very small risk, we have no compliance issues to worry about, and if our server room catches fire we have a lot more problems than the loss of a week's worth of data -- but it may not be to you.
  24. Easier stuff first... This is usually either disk/filesystem problems on the NAS (copy phase) or on the NAS or target (compare phase), or networking issues (RS is more sensitive to these than file sharing is, the share can drop/remount and an OS copy operation will cope but RS won't). So disk checks and network checks may help. But if a file isn't backed up because of an error, RS will try again next time (assuming the file is still present). RS won't run again because of the errors, so you either wait for the next scheduled run or you trigger it manually. Think of it this way -- if you copy 1,000 files with a 1% chance of errors, on average 10 files will fail. So on the second run, when only those 10 files need to be copied, there's only a 1-in-10 chance that an error will be reported. Easy enough to check -- are the reported-to-error files from the first backup present in the backup set after the second? Now the harder stuff 😉 Is this overall? Or just during the write phase? How compressed is the data you are streaming (I'm thinking video files, for some reason!)? You could try your own speed test using "tar" in the Terminal, but RS does a fair amount of work in the "background" during a backup so I'd expect considerably slower speeds anyway... A newer Mac could only help here. I'm confused -- are you saying you back up your sources nightly, want to only keep one backup, but only go to tape once a week? So you don't want to off-site the Mon/Tues/Wed night backups? Regardless -- grooming only happens when either a) the target drive is full, b) you run a scripted groom, or c) you run a manual groom. It sounds like none of these apply, which is why disk usage hasn't dropped. If someone makes a small change to a file, the space used on the source will hardly alter -- but the entire file will be backed up again, inflating the media set's used space. If you've set "Use attribute modification date when matching" then a simple permissions change will mean the whole file is backed up again. If "Match only file in same location/path" is ticked, simply moving a file to a different folder will mean it is backed up again. It's expected the backup of an "in use" source is bigger than the source itself (always excepting exclusion rules, etc). At this point it might be better to start from scratch. Describe how your sources are used (capacity, churn, etc), define what you are trying to achieve (eg retention rules, number of copies), the resources you'll allocate (tapes per set, length of backup windows (both for sources and the tape op)), then design your solution to suit. You've described quite a complex situation, and I can't help but feel that it could be made simpler. And simpler often means "less error prone" -- which is just what you want!
  25. Because it works in a different way to the others you list, so is much more impacted by Apple's ever-moving security goal posts. That "different way" also gives it many of the advantages David lists above. Horses for courses. RS can quite easily do a "bare metal restore" if you can mount the device to be recovered to in target disk mode, or recover to an external drive then use Migration Assistant -- obviously not options for everyone, which is why I'd recommend a combination of RS plus something else for people in that situation.
  • Create New...