TheBum Posted November 17, 2005 Author Report Share Posted November 17, 2005 Let me start the diagnosis by saying that the error 9 on the close() call indicates a bad file descriptor. So, either a file descriptor is being closed twice or memory is being "stepped on", causing a corrupt file descriptor. It would help to have a version of pitond that outputs more debug info, such as the value of the file descriptor in the failed close() call. Quote Link to comment Share on other sites More sharing options...
CallMeDave Posted November 18, 2005 Report Share Posted November 18, 2005 Quote: It would help to have a version of pitond that outputs more debug info We do. In a shell, type: $ /Applications/Retrospect\ Client.app/Contents/Resources/pitond --help to get: --help Display this message -setpw <password> Set the first access password. Once the password is set, it can only be changed from Retrospect -testpw Test if the first access password has been set -log n Set the logging level to n Logs are saved in /var/log/retroclient.log The log flag is 1 - 9, with 9 spewing log entries fast and furious. You can modify the /Library/StartupItems/RetroClient/RetroClient script with this flag (at whatever value you find valuable) to keep pitond spewing across restarts. Dave Quote Link to comment Share on other sites More sharing options...
TheBum Posted November 18, 2005 Author Report Share Posted November 18, 2005 Many thanks. I'll post back with any findings. I'm a Unix software developer by trade, so I enjoy a good code mystery but only if I have the tools to find the clues. [update] I ran pitond with a log level of 9 and discovered that the problem appears to be the closing of a file handle that was never opened (at least according to the debug info). The crash appears to happen in a function called NetCancelSockets when the machine starts going to sleep. Here's the last part of the log: 1132287973: NetCancelSockets: wait cancelling sockets 1132287973: NetCancelSockets: cancelling socket 9 1132287973: connAccept: Handle 9 closed 1132287973: connAccept: close(socket) failed with error 9 1132287973: Assertion failure at pitond/object.c-477 1132287973: LogFlush: program exit(-1) called, flushing log file to disk There are a number of "connAccept: Handle <x> closed" messages in the log, each of which has a corresponding "TransStart: Handle <x> start" message preceding it at some point. Handle 9 is an exception; there is no corresponding TransStart message. Hope this helps with the diagnosis. I'll keep checking to see if future crashes involve the same file handle. [Yet another update] The next time it happened was with a different file handle. The only consistencies are that it's always happening inside NetCancelSockets and that it always involves a file handle being closed that was apparently never opened. Quote Link to comment Share on other sites More sharing options...
rdzman Posted November 18, 2005 Report Share Posted November 18, 2005 I was able to reproduce the pitond crash (once) by putting my PowerBook to sleep while it was being scanned by the server. Here is the verbose output from the log, which is consistent with what Alan reports. I haven't checked yet, but Alan, is the crash 100% reproducible for you (by sleeping during scan)? [update: I was only able to reproduce the crash 2 times in about 7 or 8 attempts.] Here's the log ... the first line looks interesting to me as well. Now that I think about it, I'm being backed up over my wired Ethernet connection, but I sometimes have a wireless connection active as well. I wonder if that is related? Alan is your setup similar? I know in the past I've had problems under these circumstances where it couldn't even complete a scan without getting a network error. Turning off Airport fixed that. (Sorry, I'm just thinking out loud). 1132319735: netDelInterface: deleting interface 2 1132319735: iplud: interface for socket 4 deleted 1132319735: connListener: starting thread connAccept 1132319735: connListener: Handle 4 open 1132319735: SThreadSpawn: starting thread 25212416 1132319735: NetConnAdd: adding socket 4 1132319735: connAccept: Handle 4 closed 1132319735: NetConnDel: removing socket 4 1132319735: sThreadExit: exiting thread 25212416 1132319735: connListener: interface for socket 5 deleted 1132319735: sThreadExit: exiting thread 25169408 1132319735: sThreadExit: exiting thread 25168384 1132319735: connTCPConnection: conn = 3151360, code = 120, tid = 288, count = 24 1132319735: TransStart: Handle 4 open 1132319735: TransStart: Handle 5 open 1132319735: TransStart: starting 'GHst' on Builtin 1132319735: transSpawn: starting thread transSpawnTop 1132319735: SThreadSpawn: starting thread 25169408 1132319735: Builtin: startup for GHst (288) 1132319735: ServDone_bi (288): result 0 1132319735: ServClear: Handle 4 closed 1132319735: transSpawnTop: Handle 5 closed 1132319735: transSpawnTop: Handle 4 closed 1132319735: sThreadExit: exiting thread 25169408 1132319736: connTCPConnection: conn = 3151360, code = 120, tid = 289, count = 24 1132319736: TransStart: Handle 4 open 1132319736: TransStart: Handle 5 open 1132319736: TransStart: starting 'SNte' on Builtin 1132319736: transSpawn: starting thread transSpawnTop 1132319736: SThreadSpawn: starting thread 25168384 1132319736: Builtin: startup for SNte (289) 1132319736: ServDone_bi (289): result 0 1132319736: ServClear: Handle 4 closed 1132319736: transSpawnTop: Handle 5 closed 1132319736: transSpawnTop: Handle 4 closed 1132319736: sThreadExit: exiting thread 25168384 1132319736: connTCPConnection: conn = 3151360, code = 101, tid = 0, count = 0 1132319736: connTCPConnection: conn = 3151360, code = 120, tid = 290, count = 24 1132319736: TransStart: Handle 4 open 1132319736: TransStart: Handle 5 open 1132319736: TransStart: starting 'GDef' on Builtin 1132319736: transSpawn: starting thread transSpawnTop 1132319736: SThreadSpawn: starting thread 25169408 1132319736: Builtin: startup for GDef (290) 1132319736: ServDone_bi (290): result 0 1132319736: ServClear: Handle 4 closed 1132319736: transSpawnTop: Handle 5 closed 1132319736: transSpawnTop: Handle 4 closed 1132319736: sThreadExit: exiting thread 25169408 1132319736: connTCPConnection: conn = 3151360, code = 120, tid = 291, count = 24 1132319736: TransStart: Handle 4 open 1132319736: TransStart: Handle 5 open 1132319736: TransStart: starting 'GNst' on Builtin 1132319736: transSpawn: starting thread transSpawnTop 1132319736: SThreadSpawn: starting thread 25168384 1132319736: Builtin: startup for GNst (291) 1132319736: ServDone_bi (291): result 0 1132319736: ServClear: Handle 4 closed 1132319736: transSpawnTop: Handle 5 closed 1132319736: transSpawnTop: Handle 4 closed 1132319736: sThreadExit: exiting thread 25168384 1132319736: NetCancelSockets: wait cancelling sockets 1132319736: NetCancelSockets: cancelling socket 7 1132319736: connAccept: Handle 7 closed 1132319736: connAccept: close(socket) failed with error 9 1132319736: Assertion failure at pitond/object.c-477 1132319736: LogFlush: program exit(-1) called, flushing log file to disk Quote Link to comment Share on other sites More sharing options...
TheBum Posted November 18, 2005 Author Report Share Posted November 18, 2005 I did not check whether a scan was in progress. Were you checking that on the server side or the client side? Quote Link to comment Share on other sites More sharing options...
djr Posted November 18, 2005 Report Share Posted November 18, 2005 I'm having similar problems. I'm just starting to investigate, but here's what I can share: I run a network of 40 OS X machines, mostly 10.3 but a few are Tiger. I've found that a few of them have been having their Retrospect Clients "turned off". I think it's only the Tiger machines but I'm not sure yet. It seems to be happening almost once a day for some of my Tiger clients. I haven't seen anything in the logs yet though. Just starting to investigate. -dan Quote Link to comment Share on other sites More sharing options...
TheBum Posted November 18, 2005 Author Report Share Posted November 18, 2005 Quote: Now that I think about it, I'm being backed up over my wired Ethernet connection, but I sometimes have a wireless connection active as well. I wonder if that is related? Alan is your setup similar? I always do my backups wirelessly. Quote Link to comment Share on other sites More sharing options...
rdzman Posted November 22, 2005 Report Share Posted November 22, 2005 Quote: I did not check whether a scan was in progress. Were you checking that on the server side or the client side? I had a VNC connection to the server, so I was looking at that. But I was also watching the client log file in the Console on the client. There is plenty of activity there during a scan too. On the other hand, even before/after a scan, there is activity in the client log when the server is just polling the various clients. From my experience, I would be surprised if the bug was only triggered by sleeping during a scan. I don't sleep it during scans that often. Just so happens that the 2 times I was able to make it happen that's what it was doing. My guess is sleeping during any connection to the server has the potential to trigger it. Quote Link to comment Share on other sites More sharing options...
rdzman Posted November 22, 2005 Report Share Posted November 22, 2005 Quote: I think it's only the Tiger machines but I'm not sure yet. It seems to be happening almost once a day for some of my Tiger clients. I haven't seen anything in the logs yet though. I'm pretty sure I only began to see the problem after upgrading to Tiger too. Quote Link to comment Share on other sites More sharing options...
Qlite Posted December 12, 2005 Report Share Posted December 12, 2005 Might anyone know if Dantz has acknowledged this client turn off problem? Since upgrading our system to all Tiger machines, most of the clients are powerbooks connected via wifi, the client app turns itself off. Sleep/Scan conflicts seems likely. I'm also having a problem where the backup server gets caught in a loop with "net retry". Eventually, I have to stop the execution and than resume the server. Has anyone seen this occur? Have there been any solutions found for the ptond shut down besides the Apple script? Quote Link to comment Share on other sites More sharing options...
natew Posted December 12, 2005 Report Share Posted December 12, 2005 Qlite, Have you tried using the 6.1 client? That will take care of the net retry. Thanks Nate Quote Link to comment Share on other sites More sharing options...
Qlite Posted December 15, 2005 Report Share Posted December 15, 2005 Nate, I am currently running the 6.1.107 version which I believe to be the most current. I rans ome further tests and spoke to Dantz regarding this issue. What we've concluded is that the client turns itself off when the airport connection is turned off (or put to sleep). I call this a BUG as one does not keep a laptop running 24/7. Dantz is supposed to get back to me regarding whether it's a Bug or not. Regarding the NetRetry failure, that might have been an anomaly. Time will tell. Thank you, Ben Quote Link to comment Share on other sites More sharing options...
xdavid Posted December 21, 2005 Report Share Posted December 21, 2005 Dantz Knowledgebase... TITLE: Net Retry error after upgrading to Retrospect 6.1 Discussion Some users have reported NetRetry error messages when connecting to 6.1 or updating to the 6.1 version of the client software for Macintosh. The 6.1 version of the Retrospect Client installer released prior to October 12, 2005 may experience a problem deleting the old retropds.log files on the client system, resulting in NetRetry errors. This issue has been fixed with the latest version of the client software available at http://www.dantz.com/updates http://kb.dantz.com/display/2n/articleDirect/index.asp?aid=8119&r=0.2452661 Quote Link to comment Share on other sites More sharing options...
punga Posted January 30, 2006 Report Share Posted January 30, 2006 Any solutions to the client disabling itself issues? I've tried the applescript mentioned above and it does not work. Also, in some of the workstations I'm seeing this on, there is no airport card or network so it can't be related to wireless network. Problem machines range from powerbooks to G5's running 10.3.9 and 10.4.4. And regarding the Net Retry errors, this issue is still not resolved under the latest client (Oct 2005 release). I started a separate thread regarding it here: http://forums.dantz.com/ubbthreads/showflat.php?Cat=0&Number=66743&page=0&view=collapsed&sb=5&o=&fpart=1 In my opinion, this isn't a client issues since where I've seen the issue crop up, it's been a user leaving the network and Retrospect backup server not wanting to give up looking for it. Shouldn't Retrospect be able to give up after a set amount of time and move on if a client is not available after, say, 5 minutes? It used to under older versions. Quote Link to comment Share on other sites More sharing options...
natew Posted February 3, 2006 Report Share Posted February 3, 2006 Hi Have you every moved the client application from its default location? That can cause this problem. Option click on the Retrospect settings button to set a backup speed threshold. This will force Retrospect to give up on clients where speeds are too low. Nate Quote Link to comment Share on other sites More sharing options...
punga Posted February 3, 2006 Report Share Posted February 3, 2006 Quote: Have you every moved the client application from its default location? That can cause this problem. No, in all cases the client is left in its installed location /Applications. Any other suggestions? I set up the other suggestion for the speed threshhold in preferences. That makes sense and I will see how it works. Thanks for your help, Shawn Quote Link to comment Share on other sites More sharing options...
CallMeDave Posted February 4, 2006 Report Share Posted February 4, 2006 Quote: Also, in some of the workstations I'm seeing this on, there is no airport card or network so it can't be related to wireless network. Problem machines range from powerbooks to G5's running 10.3.9 and 10.4.4. By "this" do you mean the same logged crash that Alan reported in the first post of this thread: 1116545396: connAccept: close(socket) failed with error 9 1116545396: ServicePurge: service not found 1116545396: Assertion failure at pitond/object.c-477 1116545396: LogFlush: program exit(-1) called, flushing log file to disk - Does pitond start correctly after a system restart? Quote Link to comment Share on other sites More sharing options...
punga Posted February 7, 2006 Report Share Posted February 7, 2006 Quote: By "this" do you mean the same logged crash that Alan reported in the first post of this thread: I'm not onsite with any of my clients (I'm a consultant who supports a number of different studios and workgroups) where this happened to, so I'll need to investigate further. However, I just noticed this morning my client was turned off. Turned it back on, but that appears to wipe the retroclient.log, so the next time it happens, I will check before re-enabling the client. BTW, in some of the cases that I've seen this happen, Airport could not have been a factor because there is no card installed in the machine or no wireless network on site. In my case, my Powerbook has an Airport card, but I need to check the log the next time my client turns itself off. Thanks, Shawn Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.