Jump to content


Photo

TO ALL USERS: New feature proposal: Avoid redundant backup of renamed files.

redundant md5 digest hash backup file rename duplicate duplicates avoid

  • Please log in to reply
21 replies to this topic

#21 David Hertzberg

David Hertzberg

    Occasional Forum Poster

  • Members
  • 410 posts
  • LocationNew York, NY

Posted 07 July 2017 - 09:36 AM

Ash7 may very well be correct as to the former slowness of CPUs and RAM explaining why Retrospect's developers originally considered full deduplication to be computationally too expensive; U.S. Patent 5,150,473 was filed by Dantz Development Corp. in 1990.  If you read the lead section of the Wikipedia article on Retrospect, especially the third and fifth paragraphs, you will see that the sales target for Retrospect shifted after Time Machine (and later the equivalent Windows facility) were developed to meet the backup needs of home installations.  Ash7 and I may be among the few home users of Retrospect left, which IMHO explains why Retrospect Inc. does not think it has "many users requiring this feature."

 

I mentioned the childlessness and breakup of my marriage to explain why I don't have a family archive with a lot of kids' pictures.  The scenic trip I took pictures of was in 1999, when digital cameras were still somewhat expensive and a bit exotic.  As a result of that trip I swore off taking my own pictures, because I found I was spending so much time planning the shots that I wasn't really looking at the sights.  Today I see so many people in my scenic part of Manhattan taking pictures (presumably for their Facebook pages), and wonder whether they are making the same mistake.  Anyway I now own a cheap digital camera, but I rarely use it; I'm not really a visual person.


  • eapedDus likes this

#22 David Hertzberg

David Hertzberg

    Occasional Forum Poster

  • Members
  • 410 posts
  • LocationNew York, NY

Posted 15 July 2017 - 07:27 AM

It has been bugging me that the Google facility I mentioned in the first paragraph of post #19 in this thread only shows patents that reference the patent number you enter (well, you'd expect that—it's Google implementing the facility it thinks the world needs!).  To actually access the original U.S. patent 5,150,473, you have to browse https://www.google.com/patents/US5150473 (omitting the commas, naturally).

 

So, having figured this out, I took another look at U.S. patent 5,150,473.  It doesn't discuss the Snapshot, only the Catalog File.  If I click the leftmost image under Images, it shows Figs. 1-5—unfortunately Figs. 6-20 are not shown anywhere.  To the extent that I understand Figs. 2-5, each figure seems to show the format of a possible type of node within the tree shown in Fig. 1.  Fig. 4 is a File Info node, and it contains a single name.

 

IMHO, as a result of the calculation I discussed in my third paragraph of post #16, Snapshot nodes on the average aren't that large.  Therefore I infer that Snapshot nodes link to Catalog File nodes.  This in turn would mean that a Snapshot node would have to be enhanced to include a file name, if Ash7's feature suggestion were implemented, so that the file name in a Snapshot node could be different from the Catalog File node representing the last backup of that file—without a change to the Catalog File node that is almost certainly a no-no because you'd have to also change the copy of the Catalog File stored on the possibly-tape Backup Set medium (if you didn't make that change you'd create the problem I mentioned in post #9) .  That enhancement would enable renaming of files without their being re-backed-up, but would likely require a significant change in Retrospect's handling of Snapshots.  That's why I don't think Retrospect Inc. is likely to implement Ash7's feature suggestion, since—as I said at the end of the first paragraph of post #21—the feature is most likely to benefit the probably-few home users of Retrospect.

 

Of course the question of whether Snapshot nodes contain file names is best answered by someone actually taking a look at one.  Unfortunately, since I am not yet doing any programming on my home Macs, I don't know what "file dump" app to use to view data whose structure is unknown.  If someone reading this post is also familiar with Mac programming, maybe he/she could offer a suggestion?







Also tagged with one or more of these keywords: redundant, md5, digest, hash, backup, file, rename, duplicate, duplicates, avoid

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users