Jump to content


  • Content count

  • Joined

  • Last visited

  • Days Won


Ash7 last won the day on July 6 2017

Ash7 had the most liked content!

Community Reputation

3 Neutral

About Ash7

  • Rank

Recent Profile Visitors

105 profile views
  1. Well... hopefully we all have good reasons for doing what we do. ... Really though, I get what you're trying to convey ... in a purist sense it might seem that way but I suspect that is a fallacy because of the plethora of software out there which is wonderful and loved (or "good enough" and liked) by users who at the same time live with that software's kinks, missing gaps, and so forth. This seems more the norm than the exception. The imperfections aren't a sign the source code isn't well-crafted nor that the developers aren't talented. Things sometimes (if not usually) just evolve in ways that can leave something obvious missing for so many understandable reasons. It's funny... your inferred take on Retrospect's history for this gap is different from mine. I've always imaged Retrospect as having been originally created by developers who were tired of redundancy in backups caused by a simple rearranging of folders so they added folder-level deduplication. Whether or not folder-level (aka Progressive Backup) was part of the initial Retrospect might offer a clue into why the feature is the way it is. I can see various possibilities that have nothing to do with bad design or development... My best guess is that decades ago CPUs, RAM were far weaker/slower than today so costly hashing of an entire file had to be done judiciously. Including full deduplication would require re-reads of files in particular cases which by today's standards are quite manageable and outweighed by the cost and time required to manage the unnecessary redundancy caused by renaming several terabytes of files, if far less like a mere 100GB. (Post #3 discusses this.) There are other things that could have affected developers... the original implementation of folder-level-only deduplication, what we live with today, might have been deemed a good starting point for various reasons. Effectively it does contain the filenames... just view a catalog without the set present and you'll see file names... I believe you therefore can't mean just that so... When you say "directly" I believe you may be referring to potential abstractions between catalog structures or areas... if so, those are non-issues in the successful implementation of this feature in a backward compatible manner... and in a manner without significant/negative catalog size impact you'd like to avoid... feel free to post a concrete hypothetical if you want me to elaborate more specifically/technically... but I generally feel the catalog is conceptually simple enough that it isn't a big piece magic here... if you post a hypothetical with the worst catalog design you can conjure up, I can describe an easy path forward. The catalog and its size are not issues here... at least I'm not hearing anything to indicate such nor seeing it as a user with a good nose for that sort of thing. I'm sorry to hear about the marriage... those kinds of breakups are never fun... especially when you consider most any breakup isn't a fun thing. But you don't have any digital cameras?? ... get yourself a digital camera my friend! Well if you have any digital memories you cherish, or even if you're not sure but would like to save them in digital, flatbed scanners are fairly cheap these days... you can take an afternoon or two and capture things. ... and thank you for the sympathies on the feature... and for listening to what it was all about... wasn't sure anyone would see my post in this neck of the forum. LoL ... I will double-check but I think they added it on a list... here is what they said... ... That being said, product management team did mention that these changes "would have been worthwhile if we had many users requiring this feature." and also "if we start hearing from enough users, but for now, it doesn't seem as much of a high-request." ... So while we do see the benefit of having such a feature, at this time we do not have enough demand for it to warrant the cost of implementation. ... The request has been put on a request list which is reviewed each year when determining which features will be added to a release. ... If you think there's a more official thing to do let me know. Thanks again!
  2. Okay, so I infer you see the feature suggestion as a value-add and are a +1 for it with the qualification its implementation not significantly impact Retrospect's catalog file size. Maybe I don't know the history of Retrospect's choices but I feel doubtful that the Retrospect team would implement a relatively straightforward feature addition like this—one which is more a tweak to the product that a complete revamp—in a manner which would cause impact causing a snapshot to be "10 times as large." My technical assessment of things is fairly decent in areas like this... I realize you don't know me, so I'm not expecting you to take this on faith but this feature suggestion can, with relative and reasonable certainty, be added to Retrospect with minimal impact to users' expectations of Retrospect both in performance as well as in resulting catalog/set size, and for users benefiting from feature, it could save lots of hours and lots hard disk space if not related cloud costs. There's no need for me to see its source code or any highly technical documents of Retrospect's to make this assessment. It's based on very commonly available information on software systems, files, file-related information, cryptographic hash functions, and managing the persistence of such minutiae. What I'm inferring here is not required for the original purpose of this thread, but to the extent we do get into discussing implementation pitfalls as reasons for avoiding this feature, I just want to do my best to ensure I chime in as strongly as is possible about the simplicity I observe here before me. If I don't do that it could lead other Retrospect users to either skip this thread (if they aren't already) or believe this feature represents a huge risky change to the software, or perhaps even cause the goal of the feature to be lost within the technical discussion. So I appreciate your focus on implementation-related concerns but I also have to respond equally if I don't see things the same way... hope that makes sense. Consider media produced by family vacations, content creators, wedding photographers, and so many others... all from devices which produce funky names. Over time it's certain many will develop new wisdom about better ways to name things, folders to use. A file's name is sort of like a label on an old paper file. Simply changing the label on the file itself shouldn't force a user into storing the data in duplicate for any one file cabinet (any one backup set). If I have a 4TB library of such files, I shouldn't have to store 8TB merely because I "re-label" things. I personally think Retrospect should just implement this feature suggestion simply because it takes the product in the right direction, to a nice place with little impact to their team, and creates a really nice value-add, one which saves lots of time and money for affected users. Let me predict that full general deduplication will eventually become a common expectation of most any user of backup software. It may not be today, or next year... but perhaps that's part of the point... Retrospect has a following so implementing this feature today just keeps them ahead of the game by letting them catch up in some sense before others do it for them (as is happening already). There's a movement going on right now with more and more people aware of content creation, cloud storage, and backup processes connected with all that. Renaming stored data should neither force a user to store in duplicate to a single backup nor force a user to ditch a backup set for a new one... the former option makes organizing efforts of users inefficient and costly while the latter greatly harms the implicit protective/guard features a lifetime backup set can provide (against human error, malware and the like). Both choices make it difficult for a user to reorganize "labels" as there's no good avenue.
  3. I think it's important not to negate the viability of the feature suggestion based on assumptions about the product's implementation. Putting aside assumptions about how the feature suggestion might be implemented or any related obstacles in that endeavor, what would be more interesting to hear is whether or not you would find this feature, if implemented/released, a value-add, a positive, and good thing... Do you give the feature suggestion a +1 vote or not? That's truly the purpose of this thread as the Retrospect team suggested I seek out user feedback... if they hear enough such feedback, they'll be more apt to consider the suggestion. An aside... while I don't have access to Retrospect source code and don't want to make assumptions about its catalog file's implementation, I feel compelled to address your assumptions by saying that common sense tells me the catalog file's current design is either ripe for this feature or very easily extended to accommodate it in a backward compatible manner, that all of the issues you've raised so far are non-issues when considering whether or not you like the idea of this feature from a conceptual standpoint (regardless of Wikipedia, manuals, implementation/design assumptions, etc). This is all to repeat the question.
  4. If you restore Rebellia's new replacement hard drive using the snapshot taken at the time Rebellia last backed up her computer (in your example, that would be June 30), the restored files on Rebellia's replacement disk would have the name that existed when the snapshot was created, in this case the names prefixed wiith "CharlieCust" per your example. The need for any such such deduplication considerations already exists today without implementing the suggested feature. Since Retrospect performs partial deduplication today (on by default but it can be disabled), you would have a similar scenario if you restated it such that all users renamed (a.k.a. "moved") the folder location of a file. For example, if both UserA's disk and UserB's disk each have identical files in C:\SpecialPlace, where UserA renames it to C:\SpecialPlaceForUserA, and UserB renames it to C:\SpecialPlaceForUserB, you effectively have a rename which Retrospect deduplicates today. It is effectively the same scenario from the standpoint of evaluating disaster recovery scenarios. If UserB's system requires a replacement hard drive, UserB would end up with file names on the replacement drive based on the Snapshot used for restore.
  5. If the issues you outlined were potentials as a result of the suggested DeDuplication feature, they would already exist today in Retrospect since today Retrospect already does partial DeDuplication. Keep in mind I'm not suggesting anything new or novel, but rather asking that Retrospect fully implement DeDuplication rather than leave it partially implemented as it has been for a long time. Today, Retrospect properly DeDuplicates identical file across different folders so long as the file's name remains the same (by "name" I mean the trailing portion of the file's real full/complete name which would include its folder name... if the folder portion of the name changes, Retrospect current DeDuplicates today, that is, it implements the feature I'm proposing work uniformly/completely, not partially as it does today). So you already have DeDuplication today and the issues you describe would not be new issues by virtue of implementing the feature suggestion. I don't have Retrospect's source code but I believe today the issue you outline is currently solved by restoring from a snapshot which retains the original file name at the time a backup is created. I believe in addition to the snapshot, today Retrospect likely retains info of a file's name (at the time of backup) within the set itself even if it skips backing up the file due to DeDuplication... I believe it must do this in order to rebuild a catalog file which it can do today.
  6. There’s actually nothing inherently difficult about implementing this feature in a relative sense. There are many backup solutions beginning to support/emphasize deduplication. Retrospect’s “progressive” backup is essentially a partial implementation of a full deduplication feature… it goes some distance in deduplication but falls short of some competing local and cloud backup solutions that offer full deduplication. Don’t get me wrong, Retrospect’s concept of “progressive” backups has been something I’ve been appreciative of, and has been a strong point for some time, but using hashes and managing duplicates is not something out of the ordinary any longer… I feel “progressive backup” is no longer the precise novelty it used to be and I’m certainly feeling pain where it’s falling short. My feature suggestion is really about adding an option users can activate to have Retrospect go the full distance. The only change really required for the simplest introduction of this feature is the initial thing I described: When Retrospect’s present-day “progressive backup” logic detects no matches (a supposed new file), generate a hash of that supposed new source file so a secondary and complete deduplication check can be performed beyond Retrospect’s partial present-day deduplication check. That’s it in its simplest form. The following are some links relating to deduplication. If you prefer to web search instead of clicking links below, web search for "deduplication backup", "deduplication cloud backup", and you should find the Wikipedia and some other companies. I have no affiliation I’m aware of with any companies at the following links. These might help to highlight that deduplication is likely not difficult to achieve and may be worthwhile. I personally consider deduplication a CS102 task… it really shouldn’t be outside the realm of most software engineers evening starting out in this space. https://en.wikipedia.org/wiki/Data_deduplication https://www.google.com/#q=deduplicating+backup https://www.google.com/#q=deduplicating+cloud+backup http://www.acronis.com/en-us/resource-center/resource/deduplication/ https://www.druva.com/blog/why-global-dedupe-is-the-killer-feature-of-cloud-backup/ https://www.druva.com/public-cloud-native/scale-out-deduplication/ http://www.asigra.com/product/deduplication https://www.barracuda.com/products/backup/features https://borgbackup.readthedocs.io/en/stable/ http://zbackup.org/ https://attic-backup.org/ http://opendedup.org/odd/ Your needs sound different, unrelated to requirements that benefit from full deduplication. I don’t want to delete historical copies of files, yet I want to be able to rename files within the library without a backup storage penalty (disk, cloud, or otherwise). A full deduplication feature achieves this. Today’s Retrospect partly does this already so it’s within its scope. It just breaks as soon as you rename enormous files or large numbers of files… because your backup set essentially grows by the size of all files renamed. Makes no sense, easily averted is my point. Grooming can somewhat solve the storage issue by harming historical integrity for the sake of freeing up space, a deal-breaker for me... I don't want to lose historical integrity. Even with grooming, though, I’d still have storage impacts to the degree I need to retain history that is not yet groomed. Unless I always groom everything but one copy, I’ll be impacted the same way, yet historical integrity will be ruined/erased. I just don’t see how grooming works to replace deduplication. (But I get that it works for you... that's great, but you don't have deduplication requirements. )
  7. I can't see how grooming solves the same problem... Can you elaborate?
  8. Why check every file? Currently Retrospect preforms a progressive backup and avoids duplicates by attempting to match the source file to the catalog merely on name, size, dates, and attributes. If that match checking fails, Retrospect considers the file new and necessarily must read the entire file. As well, it by default generates an MD5 hash. So we know Retrospect must read all new files at least once (currently only once). This proposed feature has overhead of potentially reading such a source file twice, but that need can be optimized away with a little effort. Even without such optimizations, I would find this feature quite useful. The feature I’m suggesting could easily be implemented by performing an advance extra read (before writing to the set) of any such new files that fail the initial duplicates checking. This advance extra read would be performed in order to generate the MD5 hash toward determining if a file is a duplicate. With that MD5, Retrospect can perform the same matching as it does initially for finding duplicates but instead of using name/size/dates/attributes, it would use MD5/size/dates/attributes. If that matches a file in the catalog, Retrospect avoids a duplicate, and instead inserts a reference to the existing file already in the set. The overhead of the above suggestion is the so-called “advance extra read” that is required to produce the MD5 outside of Retrospect’s normal behavior of reading the file to produce that same MD5 while it also writes to the set. Yes, that is extra perf/overhead for new files but a few things on this… First, it’s worth it as described above. I could have it always enabled and be happy with that in my usage. However, I could also enable it for backups that I know have a lot of renamed files, then disable afterwards. This would allow me to get a backup set in sync with a huge rename. But even better than worrying about enabling/disabling, there are some potential optimizations which can be added… The key to avoiding the extra overhead is to avoid the extra source file read to produce the MD5 in advance which is needed to see if it’s a duplicate in content to anything already in the catalog. There are a number of ways which come to mind offhand that Retrospect could do this… For example, as Retrospect encounters each file that appears to be new (which could be a renamed file that is actually a duplicate), Retrospect looks at the catalog information for all files that match that potential new source file’s date, size and attributes (without the name). If there are no matches, Retrospect proceeds forward as it normally would, adding the new file to the set (no extra overhead here except that catalog search which should be nil). If there are one or more matches of size/dates/attributes to files in the catalog, Retrospect then proceeds to generate an MD5 for the source file which it then uses to check the catalog for a match on MD5/dates/size/attributes. If there is a match, Retrospect considers the file to be a duplicate and proceeds forward without backing up the new copy (but the file will appear in the snapshot of course, just as with any progressive backup). The example optimization just described is basically making sure the catalog has at least one matching file with the same dates/size/attributes before doing that extra read to produce the MD5. It seems all optimizations here are about avoiding that extra work. You need something strong, like that hash to do this content check, but you want to avoid that check if existing simpler data can be used for faster checks. But even more could be done… Retrospect could maintain additional "simpler" data (than a hash) to help with further optimizations. For example, Retrospect could maintain a simple checksum for arbitrary but strictly defined sections of a file, such as checksums for a few sectors of the file's beginning, middle, and end. These could be added to the prior optimization’s check. For example, as mentioned before, given a potential duplicate/new source file’s size/dates/attributes, Retrospect could now check the catalog for one or more files with matching size/dates/attributes but also matches on those simple checksums, then (only if a catalog file match is found) proceed forward with the advanced (extra) read of the source file to produce the MD5 and perform final check with MD5/dates/size/attributes. Again, I would LOVE this feature even without any such optimizations. The extra read would only occur for new files that fail all of Retrospect's current checks for changed files, or duplicate files allowing today's progressive backups. If all those fail to match, the extra read would then be required. So an extra read on all such new files to avoid duplicates on renames? To me... that's totally worth it. So the complexity I added above, which is not required for a good feature, is merely about finding creative ways to avoid that extra source read to get an MD5 to perform that MD5 check (by using checks that are faster and may eliminate the need to do the advance/extra file read to get that MD5). Does this make sense? I found it to be a nightmare that simply renaming files to converge several different naming conventions for a large library caused Retrospect to want to back up everything again which had been renamed even though only the names had changed. I actually already thought Retrospect did the above. Was surprised it did not. I would really like to see this feature added. Obviously they shouldn't do anything without understanding a worthwhile benefit beyond one user. I think they are open to suggestions but want to hear that other users would like this, I think.
  9. To All Retrospect Users, can you please read the following feature suggestion and offer a +1 vote response if you would like to see this feature added in a future version of Retrospect? If you have time to send a +1 vote to the Retrospect support team, that would be even better. Thank you! I contacted Retrospect support and proposed a new feature which would avoid redundant backups of renamed files which are otherwise the same in content, date, size, attributes. Currently, Retrospect performs progressive backups, avoiding duplicates, if a file's name remains the same, even if the folder portion of the name has changed. However, if a file remains in the same folder location and is merely renamed, Retrospect will backup the file as if it's a new file, duplicating the data within the backup set. This costs time and disk space if a massive number of files are renamed but otherwise left unchanged, or if the same file (in content, date, size, attributes) appears in various places throughout a backup source under a different name. If this proposed feature is implemented, it would allow a Retrospect user to rename a file in a backup source which would not subsequently be redundantly backed up if the file's contents, date, size, attributes did not change (i.e., just a file name change doesn't cause a duplicate backup). I made this suggestion in light of renaming a bunch of large files that caused Retrospect to want to re-backup tons of stuff it had already backed up, merely because I changed the files' name. I actually mistakenly thought Retrospect's progressive backup avoided such duplication because I had observed Retrospect avoiding such duplication when changing a file's folder. For a folder name change, Retrospect is progressive and avoids duplicates, but if a file is renamed, Retrospect is not progressive and backs up a duplicate as if it's a completely new file. If you +1 vote this suggestion, you will be supporting the possible implementation of a feature that will let you rename files without incurring a duplicate backup of each renamed file. This can allow you to reorganize a large library of files with new names to your liking without having to re-backup the entire library. Thanks for you time in reading this feature suggestion.
  10. Retrospect seems to have a bug on Windows with its handling file names and path/folder names containing ampersand characters. The bug is shown in the two annotated screenshots attached to this post. When specifying a single ampersand '&' character in a file/path name, the ampersand is omitted and an underscore is shown in the following character. When doubling the ampersands && the problem seems to go away but the selector is then internally incorrect. I verified this with at least one selector, where despite it looking incorrect with the underscore (caused by the single ampersand), it worked fine in finding the path containing the single ampersand, but when doubling the ampersands && (which escapes them for the UI components), while the UI looked correct, the selector was then incorrect with a double ampersand. Some classic Windows UI elements use ampersand as a way of indicating accelerator characters for UI fields and controls. For example, to specify "&First name" while creating UI resources leads to the F appearing underlined in the UI to indicate to the user it is the accelerator character. My guess is that Retrospect isn't filtering or dealing with this, and just displaying raw user entry into Windows components that are interpreting characters to be accelerators. Note, I notice this problem also affects files already backed up which contain paths with ampersands... if I go to the backup sets and view snapshots or sessions for such a file, and choose Properties, the UI will show paths which actually contained a single ampersand by omitting the ampersand itself and underlining the following character... my guess is that the UI is incorrect but that the underlying backup and name of the path is correct. For example, a backed up file with a path "C:\This & That" will show up in the UI properties for the file as "C:\This _That" because the UI control is seeing a single non-escaped ampersand so it then uses that to underline the next character which is the space before "That" so you get "C:\This _That" in the UI which is very confusing. Thankfully this appears to be a UI issue only but it's incredibly painful when one is trying to solve integrity problems elsewhere and examining things so an eventual fix would be nice. It seems as though some filtering/escaping is required to fix this. Thanks so much for listening!
  11. I wanted to take several older external USB hard drives, some of them containing non-Retrospect data folders, others containing Retrospect Backup Sets, and transfer all of them into a single backup set I named "Consolidation." I did this by using both Immediate backups to capture (backup) non-Retrospect data folders into the Consolidation set, and using Transfer Backup Set scripts to transfer any Retrospect backup sets on an older external USB drive to the newer Consolidation Backup Set. The process worked well and ended up spanning two new external 4TB drives (the 2nd drive is barely full yet so there's room for further growth/consolidation). I'd say it's about 5 or 6TB of data spanning two 4TB external USB drives. Then I wanted to create a duplicate or "backup" of that single Consolidation set... just in case the first one ever had issues, I would have at least two copies. To do this, I am using a Transfer Backup Set script to transfer from the original Consolidation backup set to a secondary newly created "Consolidation2" backup set (the number '2' appended to the name, a different backup set but ultimately a duplicate of Consolidation). During the Transfer from Consolidation to Consolidation2, I hit the dreaded "bad backup set header found" message... so far the Activity monitor shows about a 18000 count. My system: MSI GT72S 6QD (a web search yields its specs and support page)... at first I had been using a powered USB hub without issue for most of my backups and stuff. Not sure if that's related, but based on a Retrospect KB article (https://www.retrospect.com/en/support/kb/troubleshooting_bad_backup_set_header_and_other_similar_errors) I decided to plug directly into my laptop's USB 3.0 ports, but I still see those errors. I also see them on another laptop as well. Some questions... 1) I think my experience above indicates with relative certainty that the original Consolidation set has some corrupted files/backup data... correct? Okay, so to assess the damage and move forward, I'm trying to gain insight into the following... I'm currently letting the Transfer from Consolidation to Consolidation2 continue despite the 18000 errors because it seems to be continuing and transferring (copying) files. But I'm left with some important questions... 2) I see this error message "bad backup set header found", but Retrospect is showing me no feedback as to what files are affected. It's unclear what snapshots or backups were affected. This might help me understand the impact and how much effort I should take to resolve it... the corrupted files may not be that critical to me but I can't tell... Retrospect is only emitting the lower level error message. Is there a way to get Retrospect to give me a user-friendly set of information about what files were affected? ... and perhaps what snapshots, Sessions, etc. from the original Consolidation are not reliable or missing after the transfer to Consolidation2? Or, is there a way to diff Consolidation and Consoldation2's catalogs to see what made it successfully to Consolidation2? That information seems imperative to assess damage. 3) Given I see about 18000 "bad backup set header found" errors, but the Transfer operation is continuing, apparently continuing to copy good/accessible files, what will the new Consolidation2 Backup Set look like when complete? Will it contain only good data that could be retrieved? Will it include garbage data based on a best effort to copy from the partly corrupted source set? etc. While this question seems similar to #2 above, I'm asking something different here... I'm trying find out what data from the corrupted files in the source set, if any, make to the destination. I'm guessing none of the corrupted data makes it, that it is ignored, but I want to find out for certain. 4) What about backup sets in the partly corrupted source set... will the "good" not corrupted parts be copied, while ignoring the corrupted parts? Does this mean the snapshot is transferred to the new Consolidation2 without the corrupted files? Or are the corrupted file names still moved to the destination even if the destination data archive is missing or contains bogus data? I can't imagine Retrospect copying over garbage data... but I don't want to involve my personal assumptions here... want to get the facts. Basically, Retrospect is encountering errors during a transfer due to corruption in the source set but it's unclear how Retrospect is going to deal with those errors from a very specific concrete level... meaning what will the result of its work, the new destination Consolidation2 set, look like (assuming the destination drive is good and there is no new corruption)? Based on the answer to the above, my plan is to continue to let the transfer complete, and once completed, I want to go back over all the older external USB drives which were used to create the original Consoldation set, and re-back them up to Consolidation2. After all that, I'll erase Consolidation and recreate it using a Transfer from Consolidation2 (the new one and only "good" one) to a newer recreated Consolidation. My goal here is to salvage what was good about the first round of backups to create Consolidation in hopes that going back over all the older USB external drives will go quickly given files will already be in the Consolidation2 set. Keep in mind these Consolidation and Consolidation2 sets are essential archival in nature... I don't really need the structure as much as the file history... though I want to retain the snapshot data as best as possible. So far Retrospect seems to transfer that data just fine. Does the approach in the prior paragraph sound reasonable? My Transfer operation is vanilla except I checked "Transfer any needed intermediate database Snapshots." Thanks for any insight!