Jump to content

Media versus thorough verification


amkassir

Recommended Posts

They do different things, and have a different purpose. One (media verification) compares the MD5 calculation of the media set contents against that calculation in the catalog (made when the backup was done). The other actually compares the source against what's in the media set, which often is not possible if files changed since the backup was made. Consider the case of constantly-changing logfiles. A comparison verification is worthless, and cannot ever be done because the source file will always be different.

 

Personally, I would prefer that MD5 verification be called exactly that, because that's what it is, but some users may not understand what the MD5 calculation is, or why it's useful, etc.

 

Russ

Link to comment
Share on other sites

No. Read my explanation again. The two types have entirely different purposes. Use the right tool for the job.

 

The only time I would consider comparison verification (which is what I believe the operation should be named, for clarity) is for backup of a quiescent volume.

 

Examples would be:

 

(a) a share that has been taken offline so that no users or applications can modify the volume;

 

(B) an archive volume;

 

© a volume that is being recommissioned from storage;

 

etc. Almost every other case has a volume whose contents are, or may be, modified in the time between backup and comparison.

 

It also depends on whether you trust MD5 technology and Retrospect's implementation of that technology. That's why you need to establish a backup policy and then test thoroughly against that backup policy's requirements.

 

Russ

Link to comment
Share on other sites

"Media verification" (MD5 verification) should also normally be faster than Thorough verification (Comparison verification), because the MD5 verification reads only the backup, while the Comparison verification reads both the backup and the source.

You draw incorrect conclusions.

 

MD5 verification is two operations in parallel (assuming reasonable buffering, reasonable thread factoring, and neglecting the negligible anomalies on start and finish) - read the source, and compute MD5 checksum. The gating item may be the speed of the I/O channel and the backup device, or it may be the MD5 calculation (depending on how well Retrospect's calculation is implemented, and also depending on the CPU horsepower of the machine on which the Retrospect engine is running).

 

Comparison verification is three operations in parallel (assuming reasonable buffering, reasonable thread factoring, and assuming that the source and backup devices are different and don't compete for the same I/O resources) - read the source, read the backup, do the comparison. The comparison (assuming even a poor implementation) is certainly not compute bound. The two reads may be completely overlapping, with complete parallism, if they are on different devices. If the backup device is the slower device, then the comparison verification may be as fast, or potentially even faster (if the MD5 verification is compute bound), than MD5 verification.

 

It's not a simple question to answer. You should test on your setup, with your particular choices of hardware and technology for the backup device and the source device. The choice between the two approaches should not be made based on assumed speed. Again, the two approaches serve different purposes.

 

Russ

Link to comment
Share on other sites

I disagree with Russ on one point: the utility of a full comparison verification can be useful even if volumes are changing. You may be perfectly fine with changed files, and because they are logged so that you can identify them the process can be useful. A file that does not pass verification is still backed up, it just did not pass verification. A project share may need to be backed up at night, but Marketing may be working on a presentation through the night so their one file may change. Yet, you still know at the end that the backup is intact, except for acceptable changes to the system (i.e. "yep- Marketing was working last night"). If you need (or rather, desire) full comparison backups then by all means do them- quite often the results are still useful, and they allow a more thorough backup check than MD5 alone.

Link to comment
Share on other sites

I disagree with Russ on one point: the utility of a full comparison verification can be useful even if volumes are changing. You may be perfectly fine with changed files, and because they are logged so that you can identify them the process can be useful.

You misinterpret what a compare mismatch means. It means exactly that the data version retrieved from the media set (backup set) doesn't match the data version retrieved from the source. No more, no less.

 

Usually that means that the file changed since backup, but not always. For example, RAID 5 issues can cause successive reads of data to be different (an explanation is far beyond the scope of this forum), as can uncorrected memory errors, I/O channel errors, bit errors in tape or disk media, etc. In short, it doesn't mean that the file on the source changed; it only means that data retrieved from the source doesn't match data retrieved from the backup.

 

A file that does not pass verification is still backed up, it just did not pass verification.

Not true. The data in the backup set (um, media set) just doesn't compare with the data read from the source. Either or both could be changed. You don't necessarily have the correct data in the backup set.

 

A project share may need to be backed up at night, but Marketing may be working on a presentation through the night so their one file may change. Yet, you still know at the end that the backup is intact, except for acceptable changes to the system (i.e. "yep- Marketing was working last night"). If you need (or rather, desire) full comparison backups then by all means do them- quite often the results are still useful, and they allow a more thorough backup check than MD5 alone.

Again, see above. Your assumptions are wrong, causing your conclusions to be wrong.

 

Russ

Link to comment
Share on other sites

  • 2 weeks later...
"Media verification" (MD5 verification) should also normally be faster than Thorough verification (Comparison verification)' date=' because the MD5 verification reads only the backup, while the Comparison verification reads both the backup and the source.[/quote']

You draw incorrect conclusions.

 

MD5 verification is two operations in parallel (assuming reasonable buffering, reasonable thread factoring, and neglecting the negligible anomalies on start and finish) - read the source, and compute MD5 checksum. The gating item may be the speed of the I/O channel and the backup device, or it may be the MD5 calculation (depending on how well Retrospect's calculation is implemented, and also depending on the CPU horsepower of the machine on which the Retrospect engine is running).

 

Comparison verification is three operations in parallel (assuming reasonable buffering, reasonable thread factoring, and assuming that the source and backup devices are different and don't compete for the same I/O resources) - read the source, read the backup, do the comparison. The comparison (assuming even a poor implementation) is certainly not compute bound. The two reads may be completely overlapping, with complete parallism, if they are on different devices. If the backup device is the slower device, then the comparison verification may be as fast, or potentially even faster (if the MD5 verification is compute bound), than MD5 verification.

 

It's not a simple question to answer. You should test on your setup, with your particular choices of hardware and technology for the backup device and the source device. The choice between the two approaches should not be made based on assumed speed. Again, the two approaches serve different purposes.

 

Russ

On my setup, media verification of a backup to DVD+RW on a 2.4GHz Powerbook Pro (dual core), OS X 10.6.4, Retrospect 8.2, the system runs consistently with about 35-40% idle CPU while verifying. So the MD5 calculation in the verify is not making the system CPU bound.

 

MD5 calculation by itself (using /sbin/md5 over a large HDD file) uses about 20% of one CPU.

Link to comment
Share on other sites

MD5 calculation by itself (using /sbin/md5 over a large HDD file) uses about 20% of one CPU.

Again' date=' an incorrect assumption. You assume that the code in /sbin/md5 is the same as used by Retrospect. The algorithm may (should) be the same, but the implementation may not be as fast in Retrospect.

 

Russ[/quote']

It's not an assumption, it's a measurement.

 

If the Retrospect MD5 code is faster, then that would give greater weight to my evidence that Media Verification isn't CPU bound.

 

Anyway, when I observe my configuration doing Media Verification, it's not CPU bound.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...