whalemeat Posted July 23, 2007 Report Share Posted July 23, 2007 Got a weird one here. G5 Xserve running Server 10.4.9 & Retrospect 6.1.126 with driver 6.1.11.101. Have about 30 network clients backing up to file storage sets that were previously stored on an external usb drive. We have successfully moved 28 of them to a volume mounted from an Open-E iscsi appliance using Atto XtendSan iscsi initiator software. However, the remaining two cause Retrospect, and eventually the entire system, to freeze when it goes to back them up. One is a large backup - 186GB and counting - the other is much smaller, about 16GB. One is a ppc client, the other is Intel. Sometimes it freezes as soon as it goes to start writing to the storage set, other times it gets 90+ percent done and then freezes. If you force-quit Retrospect then the Finder will freeze and will not relaunch, and on attempted restart or shutdown the entire system freezes and must be cold started with the power button. As soon as I move the storage set back to the usb drive, it works fine. There is nothing in the retrospect log, or in the system logs to give me a clue whats going on here. I am able to move large files on and off the iscsi volume with the Finder without issue. Anyone have any ideas? There dont seem to be alot of people on the forums here who are backing up to iscsi volumes. Thanks. Quote Link to comment Share on other sites More sharing options...
CallMeDave Posted July 23, 2007 Report Share Posted July 23, 2007 First, note that Dantz changed "Storage Set" to "Backup Set" forever ago. So to help newer users understand, I'll stick to the current termonology. > One is a large backup - 186GB and counting - the other is much smaller, about 16GB. > One is a ppc client, the other is Intel - Does this mean that each of these two problem Backup Sets backup data from only a specific phycial client computer? If so, this would make some testing easier. First thing to try would be to breakout the Backup Sets themselves (which work on one physical/logical media, but fail on another) from the specific Sources they use. - Can you create a new File Backup Set and attempt to copy data from one (or the other) of the "problem" clients? If so, might you be able to Recycle the problem Backup Set and simply do another full backup? If the problem follows the client(s) can you install the Initiator software onto another machine? That way if things continue to be a problem you don't have to reboot a working server. Quote Link to comment Share on other sites More sharing options...
whalemeat Posted July 23, 2007 Author Report Share Posted July 23, 2007 Oops, yes I've been using retrospect since shortly after the earth cooled. Old terminology dies hard sometimes. You are correct, each of the 30 clients backs up to its own Backup Set, so that does make matching easier, and in the case of the two problem ones, the 186GB set is exclusively from a G4 and the 16GB one is exclusively from a MacBook Pro. I have tried creating new sets for each one but the freezes continue. After recovering from the freeze, its generally a good backup of what its has managed to get. Its not corrupted at all, though it may need a little catalog repair. I have reinstalled the client software on each of the problem systems, v 6.1.130 fresh off the Insignia site. It doesn't make any difference, the freezes continue. I also thought it may have something to do with the size of the iscsi volume - 1.78TB - but I created a smaller 500GB iscsi volume and tried placing the sets on it and it still froze. The usb drive is 699GB. Unfortunately I only have the one server available to run these from. For the meantime I have kept the usb drive connected and am still using it for these two, but I need that drive elsewhere. I also have a few more Storage, er Backup Sets I need move to the iscsi but I want to know what's going on before I move them in case any of them have trouble too. Thanks Quote Link to comment Share on other sites More sharing options...
CallMeDave Posted July 23, 2007 Report Share Posted July 23, 2007 ok, so you have 28 clients, and all of them work correctly exept for two. That's a big enough set to rule out anything wrong with your general setup (although I assume the Atto software uses kexts, and might be the actual underlying cause of the freezes). The Backup Sets from the old physical media are not at fault, since fresh ones also fail to work, so we can point to the client, and/or the data on those clients. - If you define a small folder as a Subvolume, and use that as a Source, can you get reliable executions? > Unfortunately I only have the one server available to run these from Why does it have to be a server? I've yet to see a Mac shop with 30 machines that didn't have some old iMacs hidden away in a broom closet somewhere. Or how about your machine? It would be great to be able to reproduce the issue on another Retrospect install, and further narrow down the clients as being the trigger. Of course, nothing on a client machine should cause the backup machine to freak out this way. But this low level software was probably never tested against the version of Retrospect that you (and the rest of us) are using. It would probably be worthwhile to open a support incident with EMC over this, once you have a few more data-points. Dave Quote Link to comment Share on other sites More sharing options...
rhwalker Posted July 23, 2007 Report Share Posted July 23, 2007 Have you tried defining a subvolume on these problematic clients to see if some data can be backed up from them? Is there anything different about the network infrastructure to these two clients? Russ Quote Link to comment Share on other sites More sharing options...
whalemeat Posted July 24, 2007 Author Report Share Posted July 24, 2007 Thanks for your help guys. They are actually backing up a subvolume I should have mentioned that. By default I create a subvolume of the Users folder and exclude the Movies and Music folders and all the cache files. Its the only way to keep the sizes manageable (Plus I really dont care if your rip of 'Gili' gets lost in a crash ) But the two that aren't working and all the rest that are were created from the same template script and all use the same selector. The 186GB one is the largest by a fair margin, but there are plenty of others in the 2-110GB range that work fine. I also ran drive repairs on both client's hard drives (nothing found) and even tracked down the specific file they were working on when it froze and removed them, but then it crashes on another random file. Something freaky in the network is also on my mind, but all 30 of these clients are on the same public and routable subnet. They're all on dhcp but their addresses are registered with the server so they never change. The retrospect server is on a different public and routable subnet in this building but we have fiber interconnects so throughput has never been an issue - I don't think anything is timing out. There is a firewall between them but again, the traffic between the server and all the clients passes through the same filters - why would it interfere with just two? And what possible effect could switching to a usb drive have on a firewall? What I need to figure out is what makes these two clients different from all the rest. But as far as I can see there is absolutely nothing in common between them that is also different from the others that are working. I've been beating my head against the wall on this one for over a month. When I look at the state of my semi-frozen server - I can launch the terminal after Retrospect locks up but before I try to restart it and the whole thing goes down - running top tells me there are 3 stuck threads. Does anyone know how I tell which they are? I think one is a umount process - like something has caused it to want to dump the iscsi disk but something else is not letting it. Thanks. Quote Link to comment Share on other sites More sharing options...
rhwalker Posted July 24, 2007 Report Share Posted July 24, 2007 Quote: What I need to figure out is what makes these two clients different from all the rest. The easiest way, from what you have described, would be to put one of them on the same switch port as one of the working clients. That would eliminate all of the level 2 and 3 network issues. Quote: I can launch the terminal after Retrospect locks up but before I try to restart it and the whole thing goes down - running top tells me there are 3 stuck threads. Does anyone know how I tell which they are? Code: ps axlww -O "flags" open your Terminal window up wide. The unrunnable processes will have a "U" in the status ("STAT") column. Look at the Flags ("F") column. I would bet that the threads are stuck on physical I/O. Here are the flag bits: Code: P_ADVLOCK 0x00001 Process may hold a POSIX advisory lock P_CONTROLT 0x00002 Has a controlling terminal P_INMEM 0x00004 Loaded into memory P_NOCLDSTOP 0x00008 No SIGCHLD when children stop P_PPWAIT 0x00010 Parent is waiting for child to exec/exit P_PROFIL 0x00020 Has started profiling P_SELECT 0x00040 Selecting; wakeup/waiting danger P_SINTR 0x00080 Sleep is interruptible P_SUGID 0x00100 Had set id privileges since last exec P_SYSTEM 0x00200 System proc: no sigs, stats or swapping P_TIMEOUT 0x00400 Timing out during sleep P_TRACED 0x00800 Debugged process being traced P_WAITED 0x01000 Debugging process has waited for child P_WEXIT 0x02000 Working on exiting P_EXEC 0x04000 Process called exec P_NOSWAP 0x08000 Another flag to prevent swap out P_PHYSIO 0x10000 Doing physical I/O P_OWEUPC 0x20000 Owe process an addupc() call at next ast P_SWAPPING 0x40000 Process is being swapped for details, man ps If they are retrospect processes, you might try something like: Code: ps axlww -O "flags" | fgrep retro to whittle the data down to something manageable. Russ Quote Link to comment Share on other sites More sharing options...
whalemeat Posted August 22, 2007 Author Report Share Posted August 22, 2007 Just to give this topic a bit of bump and an update - I have been working with the Atto support people on this (they are fantastic) and they sent me a beta of the next release of the iscsi intiator and it has fixed the problem with the smaller of the two systems but the large one continues to freeze. We are trying to get a tcp dump of the communication to the iscsi device but the crash seems to render the dump file corrupt. Then I moved another backup set to the iscsi storage and lo and behold it started freezing in the same way. Its a pain but I have spotted something - these 3 sets that are causing problems have the largest catalog files of all my backups. The one that started working it the smallest of the three at 99.5MB - the other two are well over 100MB. So my suspicion is focused on the catalogs at this point - probably writing to the catalogs since it seems to ba able to scan and match okay its only when it goes to start writing data that it locks up. Does anybody know if Retrospect does something different to large catalogs than to small ones? Thanks. Quote Link to comment Share on other sites More sharing options...
whalemeat Posted August 24, 2007 Author Report Share Posted August 24, 2007 More info: Thinking the problem may be in writing to the catalog rather than the data file, I tried putting the catalog on the local storage where is has been known to work, and using first an alias and then a soft link to point to the data file on the iscsi volume. They both failed outright, but not by freezing. This time I got error messages: 105 (unexpected end of data) and 211 (media locked) respectively. I didn't really expect it to work, but the interesting thing is, having now pointed it back to the catalog on the iscsi volume its not freezing now, it continues to give me error 105. The only other thing I changed was to set the "ignore permission on this volume" for the iscsi volume. The knowledgebase article for error 105 seems to be missing, but a few others that refer to 105 under specific circumstances (writing to a CD or to a certain model of Samsung drive) seem to indicate that driver updates were necessary to resolve those issues, so it sounds like pretty low-level stuff. Quote Link to comment Share on other sites More sharing options...
CallMeDave Posted August 25, 2007 Report Share Posted August 25, 2007 Having the individual parts of a separated File Backup Set (Foo and Foo.cat) in different locations is not supported. Be nice if hard/soft links fooled the program, but it was never designed for that and it doesn't work. In general, the Finder's "respect ownership" setting of a volume where a File Backup Set is stored doesn't matter, as long as the volume is writable. But perhaps iSCSI is different. Perhaps there are ownership issues with this technology. In your communication with ATTO, you should be sure to let them know that the program attempting to write to the volume is running with a different UID then the Finder user. Or, you could try logging into the Finder as root (you might have to set a root password in Netinfo Manager first) and seeing if it makes any difference where both Retrospect _and_ the Finder user are both UID=0. Dave Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.