Jump to content

'File "xyz" appears incomplete' when backup media is not corrupt


Recommended Posts

Just a couple of things.

 

- Apologies for typing RAID 0 above when trying to lists our "known" configurations. Brain fart.

- I don't think that this is related to ACLs, but it's an easy test to try (assuming that a small defined Subvolume used as a Source will always result in file compare errors).

 

I'll give this a try.

 

- Although the RAID volume is named "Boot OSX" can we infer that it is not, in fact, booting anything?

 

This turns out to be an artifact of the listing in Profiler. Apparently it lists only one partition on a drive. The RAIDed drives each have three partitions. A small EFI paritition, a small "Boot OSX" partition, then the big data partition. The RAID volume is _not_ booting anything. Strangely, the actual boot drive has no "Boot OSX" partition.

 

- System Profiler lists Capacity for the RAID volume, but not Available space; how much data are we talking about?

 

The capacity is 768GB. There is 386GB of space used.

 

This is some serious hardware, and Retrospect is reporting some serious errors when you try and use it. I've seen enough times that Retrospect ended up being the digital canary in the virtual coal mine, so I'd advise taking this seriously.

 

At this point, I'd also advise opening a Support Incident with EMC; other than testing with a File Backup Set instead of the Removable Disk Backup Set you've been using, I'm out of ideas for what to try.

 

I appreciate your help. I guess this isn't just a case of me doing something stupid. It sounds like an obscure hardware or OS problem or a bug in Retrospect.

 

-Glenn

Link to comment
Share on other sites

I like Dave's summaries so I'm posting an updated

summary as an introduction for EMC tech support:

 

- Apple Xserve1,1 w/2 2 GHz Dual-Core Intel Xeon processors

- OS X Server 10.4.11

- Retrospect 6.1.138 RDU version: 6.1.13.101

- 3 internal hard drives, one used as boot volume, other two 750 GB SATA drives configured as RAID 1 mirror

- Apple software RAID with built-in SAS/SATA controller board

- Local backups to Removable Disk Backup Sets exhibit file errors during the Compare pass when the Source is the RAID 1 volume. Other Sources work as expected

- Backup media is removable hard drives in two two-bay firewire enclosures

- Errors occur when known-good source media is used in backup set (tried this twice so three backup drives all exhibit errrors)

- Source volume passes Apple DiskUtility Verification

- Source volume has roughly 800,000 files using roughly 380GB

- Roughly 200,000 files are flagged in "appears incomplete" errrors in Compare phase of every backup. It's not always the same files although some files seem to never produce errors.

- most files with errors have not changed in months

- Backing up to other File Backup Sets has not been tried

 

 

Glenn

Link to comment
Share on other sites

I have been continuing to follow up on the ACL idea and have found that files which get the "appears incomplete" error have an ACL problem and files that never get "appears incomplete" errors do not have the ACL problem. The "ACL problem" is this: The "ls -le" command produces this error:

 

Unable to translate qualifier on ACL

0: inherited allow list,add_file,search,delete,add_subdirectory,delete_child,readattr,writeattr,readextattr,writeextattr,readsecurity,writesecurity,chown,file_inherit,directory_inherit

 

So I'm an old bsd unix guy and have to admit I know very little about ACL attributes. I'm going to study up to see what this means. User accounts are managed with the Apple Workgroup Manager and Open Directory. I suspect Open Directory has some role in setting up users' ACL configs but I don't know for sure. I mention this because the problem occurs for all files in some users' folders and other users have no such errors under their home folder so it seems to be user specific.

 

-Glenn

 

PS I've turned off backing up ACLs for tonight's backups.

Link to comment
Share on other sites

I suspect Open Directory has some role in setting up users' ACL configs

Not really according to your problem statement.

 

If I understand your corrected configuration statement, you have Retrospect running on an Xserve with Mac OS Server 10.4.11, backing up local volumes.

 

WGM sets the ACLs, inheritance, etc., and permissions/ACLs for the volumes. OD allows the Xserve to propagate its UID/GID structure for all logins across the network, rather than the traditional Unix model of different UID/GID settings for each machine.

 

Perhaps the reason that it happens on some users and not others is that you made an inheritance change after some users had been added, such that new user homedirs inherit different ACLs than older ones. You might check the UIDs and/or GIDs of those users to see if they are in some common range. One painful alternative (that might upset users) would be to change toplevel ACLs for /Users and propagate them to everyone. You might hose everyone or fix everyone.

 

Good troubleshooting.

 

Russ

Link to comment
Share on other sites

Could it be an ACL referencing a deleted group? I suspect by turning off ACL backups I will have solved the problem.

I don't know. Understand that the "ACL disabling" hack is for a different bug in Apple's "Carbon" API library that Retrospect Mac presently uses because of its legacy codebase that can't easily be ported to the modern "Cocoa" API library (hence the reason that Retrospect runs on an Intel Mac as emulated PPC code under Rosetta). See:

Cocoa vs. Carbon APIs

 

Apparently, Apple's Carbon API library, when used on the Universal Binary version of MacOS (server and non-server), has at least one bug that causes crashing when a Carbon app makes certain ACL syscalls. Apple is not motivated to fix this bug, and might, in fact, be making a conscious decision not to fix the bug as incentive for developers (such as EMC) to rewrite their apps for the Cocoa APIs. So Retrospect has this workaround to not back up ACLs so as to avoid Apple's bug. As a comment, the Universal Binary version of MacOS (server and non-server, first appearing in 10.4.6 on Intel Macs) is a very different animal, with a different codebase, from the PPC only version of MacOS. In short, there are two very different versions of MacOS 10.4.6 through 10.4.11, depending on whether you have the PPC version or the Universal Binary version. Although no PPC Mac ever shipped with the Universal Binary version of 10.4.6 through 10.4.11, it is possible to install the UB version if you know what you are doing, and some have installed the MacOS Server version of 10.4.6 through 10.4.11 to get around some nasty AFP bugs.

 

I don't know the extent of the ACL crashing bug or what syscalls trigger it, but there is more than one trigger. The best we users can do now, until Retrospect X arrives, is to work around the bug.

 

Russ

Link to comment
Share on other sites

I have been continuing to follow up on the ACL idea and have found that files which get the "appears incomplete" error have an ACL problem and files that never get "appears incomplete" errors do not have the ACL problem. The "ACL problem" is this: The "ls -le" command produces this error:

 

Unable to translate qualifier on ACL
0:  inherited allow list,add_file,search,delete,add_subdirectory,delete_child,readattr,writeattr,readextattr,writeextattr,readsecurity,writesecurity,chown,file_inherit,di rectory_inherit

 

If this is accurate, that every file that Retrospect is complaining about will also report an error with the "ls" unix tool, and that every file that does not report an ls error works fine with Retrospect, then you don't have a Retrospect problem at all.

 

As we so often see here, Retrospect has alerted you to an existing problem with your computer, one that won't be truly solved by making changes to Retrospect's settings.

Link to comment
Share on other sites

  • 2 weeks later...

Dave,

 

Yes, what you say is true. Retrospect is having problems only with the files that have the ACL with an unknown group or user.

 

Note that if we were dealing with the usual POSIX permissions there would be nothing wrong with having files owned by un-named users or groups. This is normal and ok in POSIX compliant systems. I would, however, have to dig deep into the ACL standard to see if what I have is a standards-compliant file system.

 

I would hope that the developers of Retrospect would at least take a look at this issue. The fact that the files in question would be included some nights in the incremental but not others even though they hadn't been changed in months is suspicious. That erratic behavior could show up in other circumstances where the file system is totally clean.

 

Furthermore the "appears incomplete" error is not documented. I would hope that the developers of Retrospect would improve this reporting so any other users who encounter it may find the cause with less headaches than I had. Before I started this forum discussion the only suggested solution for this error was to make sure your backup media was good/clean. That turned out to point me in entirely the wrong direction.

 

I'll also mention one other interesting tidbit. If you review my system configuration you will see that there are two nearly identical Xserve servers. It turns out there are similar "broken" ACLs on the second server but Retrospect has never complained about them. The main difference is that the second server is a Retrospect client. Only the drive on the Xserve that's hosting the Retrospect server logs the "appears incomplete" errors.

 

I've removed all the bogus ACLs and I'm getting no more "appears incomplete" errors. The nightly incremental backups for the problem drive have dropped from 200,000 files per night to around 10,000 files, ie the files that actually changed that day.

 

All is well. My problem is resolved. Thanks for all the suggestions. Whoever brought up the idea to turn off the ACL backups was right-on!

 

-Glenn

Link to comment
Share on other sites

I would hope that the developers of Retrospect would at least take a look at this issue.

It's my understanding that they have (see my post above) and reported the bug(s) to Apple long ago. The bug only happens with the Rosetta emulation of Cocoa APIs, and Apple doesn't seem motivated to fix it.

 

I sympathize with EMC's programmers because I've got a show-stopper RADAR bug on our Xserve that has been pending with Apple for three years now (Apple Hardware RAID fails to fully flush write cache on graceful power down; only workaround is to disable write cache; causes RAID corruption). The firmware fix is known, and was fixed by LSI Logic in their version of the card a few years ago, after Apple split from the LSI Logic code tree for the firmware. Sigh.

 

Russ

Link to comment
Share on other sites

Russ,

 

What makes you think that my bug ("appear incomplete" errors along with incremental backups backing up files that didn't change) is the same as the bug that crashes Retrospect?

 

I see the following text in the knowledgebase article you referenced for the crashing bug:

 

Problems have been seen with ACL backups under 10.4.10

 

Instead of the word, "Crashes", they used the word, "Problems", as if there are problems _other_ than crashes. Is my issue one of the problems?

 

 

-Glenn

Link to comment
Share on other sites

Russ,

 

What makes you think that my bug ("appear incomplete" errors along with incremental backups backing up files that didn't change) is the same as the bug that crashes Retrospect?

 

I see the following text in the knowledgebase article you referenced for the crashing bug:

 

Problems have been seen with ACL backups under 10.4.10

 

Instead of the word, "Crashes", they used the word, "Problems", as if there are problems _other_ than crashes. Is my issue one of the problems?

Well, first, the KB article is wrong when it says this is an Intel issue. From my post above:

 

Understand that the "ACL disabling" hack is for a different bug in Apple's "Carbon" API library that Retrospect Mac presently uses because of its legacy codebase that can't easily be ported to the modern "Cocoa" API library (hence the reason that Retrospect runs on an Intel Mac as emulated PPC code under Rosetta).

 

Apparently, Apple's Carbon API library, when used on the Universal Binary version of MacOS (server and non-server), has at least one bug that causes crashing when a Carbon app makes certain ACL syscalls. Apple is not motivated to fix this bug, and might, in fact, be making a conscious decision not to fix the bug as incentive for developers (such as EMC) to rewrite their apps for the Cocoa APIs. So Retrospect has this workaround to not back up ACLs so as to avoid Apple's bug. As a comment, the Universal Binary version of MacOS (server and non-server, first appearing in 10.4.6 on Intel Macs) is a very different animal, with a different codebase, from the PPC only version of MacOS. In short, there are two very different versions of MacOS 10.4.6 through 10.4.11, depending on whether you have the PPC version or the Universal Binary version. Although no PPC Mac ever shipped with the Universal Binary version of 10.4.6 through 10.4.11, it is possible to install the UB version if you know what you are doing, and some have installed the MacOS Server version of 10.4.6 through 10.4.11 to get around some nasty AFP bugs.

 

I don't know the extent of the ACL crashing bug or what syscalls trigger it, but there is more than one trigger. The best we users can do now, until Retrospect X arrives, is to work around the bug.

The "issues" (problems, etc., whatever you want to call them) are with the Universal Binary version of 10.4.x and 10.5.x, and are artifacts of bugs in the Carbon API library. Each time that EMC has reported over the past year or so that Apple has "fixed" the bug(s) or that Retrospect has worked around the bug(s), it has manifested itself with a different effect with each subsequent MacOS update (because of the different memory locations of the re-linked images). Not that I know what the bug(s) is (or are), or where in the API library the bug(s) is (or are), but consider if it's a wild pointer that is munging some data structure. With each subsequent OS release, the effect of such random bits from space will be different.

 

All I do know is that each time this oddness has been seen, if you "disable ACL backup", then Retrospect doesn't execute the syscalls in the Carbon API library that trigger these issues.

 

You might ask why only Retrospect triggers these bugs. Well, there are very few programs that do ACL syscalls, and the ACL syscalls are not Unix standard syscalls - they are Apple enhancements to the basic POSIX permissions scheme. Primarily just the Finder and backup-type programs make such Apple-specific ACL syscalls; none of the standard Unix programs or standard Unix backup programs do. Other programs don't care - they just try to read and write, and when they fail on access, well, they just fail (but never do any syscalls to manipulate ACLs). And other Apple-specific backup programs were long ago ported to Cocoa so that they could be Universal Binary. Only Retrospect lags the pack.

 

Until Apple fixes the bug(s) in the Carbon API library (unlikely) or until Retrospect moves to a Universal Binary Cocoa API version (let us hope...) (so that the buggy Carbon API library is not used for these syscalls), the problem will manifest itself in one or another fashion.

 

Just my thoughts,

 

Russ

Edited by Guest
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...