Jump to content
Sign in to follow this  
wmconlon

Redhat 9 client fails during backup

Recommended Posts

I cannot get a backup to finish from Redhat to Mac OS 9.1. I have two volumes selected for backup: / and /boot. / always fails, but /boot finishes.

 

On the Mac, I get a 519 error report while attempting to backup /. Then Retrospect moves on to /boot and completes successfully. This suggests to me that the problem isn't really a communication error.

 

The log on the linux box shows:

 

# tail -f /var/log/retroclient.log

1079629887: iplud: bound to address 0.0.0.0

1079629887: ipludAddMembership: adding membership for 0.0.0.0

1079629893: IPNSRegister(0): registered: "white"/"7093841c7b286ba5"

1079629893: ConnStartListen: starting thread ConnStartListen for 192.168.181.240:0

1079629899: IPNSRegister(0): registered: "white"/"7093841c7b286ba5"

1079629937: Connection established by 192.168.181.252:50285

1079629937: ConnReadData: Connection with 192.168.181.252:50285 was reset

1079629937: Connection established by 192.168.181.252:50286

1079629939: ConnReadData: Connection with 192.168.181.252:50286 was reset

1079629941: Connection established by 192.168.181.252:50287

1079632487: ConnReadData: Connection with 192.168.181.252:50287 was reset

1079632487: ConnWriteData: send() failed with error 9

1079632487: ConnWriteData: send() failed with error 9

1079632677: Connection established by 192.168.181.252:50288

1079633481: ConnReadData: Connection with 192.168.181.252:50288 was reset

1079633490: Connection established by 192.168.181.252:50290

1079633490: ConnReadData: Connection with 192.168.181.252:50290 was reset

1079633790: Connection established by 192.168.181.252:50298

1079633790: ConnReadData: Connection with 192.168.181.252:50298 was reset

 

From this point on there is a continuing sequence of connections being established and reset. I then have to kill and start rcl to get another backup going.

 

There are several curious things:

1. Why does this bind to 0.0.0.0 when rcl explicitly states

$CLIENTDIR/retroclient -daemon -ip 192.168.181.240

2. Why does the backup server at 192.168.181.252 keep trying all the high ports? I though it used 497?

3. What does ConnWriteData: send() failed with error 9 mean?

 

Also related to debugging:

1. Why doesn't the client report real timestamps in the log?

2. Why is the log cleared when I restart the client. It seems to me that it should be appended to.

Share this post


Link to post
Share on other sites

Just a follow-up:

 

I'm finally getting occasional backups to complete, though I continue to get 519 communication failures. Essentially, the problem above would occur after several hundred megabytes were copied. By stopping and starting the rcl client on redhat, I could finally get everything backed up.

 

1. Of course, comparison wasn't occurring because failure would occur during the backup.

2. Adding a new backupset will cause this problem to rear its ugly head again.

3. I also get 519 (communication) errors when the backup server tries to connect, saying the client can't be reserved; this requires rcl restart.

 

I'm pretty sure the network is not the problem, as this Dell server has high quality Intel NICs, and its file sharing performance(SMB and Netatalk) has been tested. I've looked at the threads regarding dual NICs, but asure a configuration issue doesn't jump out at me.

Share this post


Link to post
Share on other sites

Hi

 

Sad to say I don't have good answers for you about the logging. One thing you might want to try is allocating more memory to Retrospect on the backup machine. A memory management issue could at least explain why the backup fails after a large amount of data has been transferred.

 

Nate

Share this post


Link to post
Share on other sites

Moved Retrospect 5.1 to another OS9 system with twice as much (256 MB vs 104 MB) RAM. Increased memory allocation to 200 MB.

 

It still cannot backup my linux client. I then clicked the Preview button and started getting Net Retry messages after scanning about 13768 folders and about 180000 files. Nonetheless, about an hour later there were two windows listing files to be marked (one window for /boot, with 10 MB of datea and one for /, with 8 GB of data).

 

I then selected backup and got a 541 error (client not installed or not running). The client shows:

 

# ./rcl status

Server "white":

Version 6.5.108

back up according to normal schedule

currently on

readonly is off

exclude is off

1 connections, 1 authenticated

 

So the discrepancy is that the client thinks the server is connected and authenticated, but the server has somehow dropped the connection and can't reconnect).

 

The log is uninformative without real time stamps:

 

# tail -f /var/log/retroclient.log

1080764610: Connection established by 192.168.181.220:49152

1080764619: ConnReadData: Connection with 192.168.181.220:49152 was reset

1080770139: Connection established by 192.168.181.220:49152

1080770139: ConnReadData: Connection with 192.168.181.220:49152 was reset

1080770139: ServicePurge: service not found

1080770140: Connection established by 192.168.181.220:49153

1080772847: ConnWriteData: send() failed with error 104

1080772847: ConnWriteData: send() failed with error 32

1080772847: ConnReadData: Connection with 192.168.181.220:49153 closed

1080773507: Connection established by 192.168.181.220:49154

 

Does anyone know what the errors 104 and 32 mean? And why is the connection shown in the status window as connected and authenticated, but shown in the log as reset and closed? There seems to be a failure to communicate (and to trap and handle errors) between client and server.

Share this post


Link to post
Share on other sites

I'm inclined to believe that the failure has to do with some contention between rcl and other processes. I started another immediate backup just before leaving the office yesterday afternoon -- and it completed. Then it completed again as part of the daily backup script at 10pm.

 

My goal is to migrate from an old AppleShare server onto this linux system, but we need reliable backups first. The existing AppleShare server (running web, file, print, SMB, mail, DNS, FTP) NEVER fails to backup properly. But the rcl client on linux seems troubled when there is any activity, even though top typically shows 97 to 99% idle CPU.

Share this post


Link to post
Share on other sites

This is a continuation of the same problems.

 

Once the backups became incremental (only a few hundred to a thousand files), linux backup has been reliable. For about two months, I haven't had any issues with the linux client.

 

This week, the backup server started reporting 505 errors, (client reserved). Yet

 

# /etc/init.d/rcl status

Server "white":

Version 6.5.108

reserved by xxxxxxxxxx for firewire backup

back up according to normal schedule

currently on

readonly is off

exclude is off

1 connections, 1 authenticated

 

xxxx above is the name of the backup server.

 

the retroclient.log continues to be uninformative (esp, w/o real time stamps):

 

1085426088: ConnStartListen: starting thread ConnStartListen for 127.0.0.1:0

1085426088: iplud: bound to address 0.0.0.0

1085426088: ipludAddMembership: adding membership for 0.0.0.0

1085426094: IPNSRegister(0): registered: "white"/"7093841c7b286ba5"

1085426094: ConnStartListen: starting thread ConnStartListen for 192.168.181.240:0

1085426100: IPNSRegister(0): registered: "white"/"7093841c7b286ba5"

1085498094: Connection established by 192.168.181.220:49198

1085539755: Connection established by 192.168.181.220:49156

1085539755: ConnReadData: Connection with 192.168.181.220:49156 was reset

1085539755: ServicePurge: service not found

1085550204: Connection established by 192.168.181.220:49161

1085550204: ConnReadData: Connection with 192.168.181.220:49161 was reset

1085550204: ServicePurge: service not found

1085636522: Connection established by 192.168.181.220:49165

1085636522: ConnReadData: Connection with 192.168.181.220:49165 was reset

1085636522: ServicePurge: service not found

1085722959: Connection established by 192.168.181.220:49169

1085722959: ConnReadData: Connection with 192.168.181.220:49169 was reset

1085722959: ServicePurge: service not found

 

192.168.181.220 is the address of the baciup server.

 

 

Interestingly, this trouble began AFTER rebooting the linux machine. Sure would be nice to have real logging to debug this. Anyone at Dantz listening?

Share this post


Link to post
Share on other sites

Hi

 

You can turn up client logging with the retroclient -log x, 9 being the highest.

 

I would disable virtual memory on the backup machine and allocate 50 or 60 MB to Retrospect.

 

Any chance you can try running this backup on an OSX machine?

 

Thanks

Nate

Share this post


Link to post
Share on other sites

thx,

 

I'll try the logging feature when this next crops up.

 

Regarding OSX, I can certainly run a backup with this version, but I've posted similar problems with OS X as the server, and the OSX version will not let me backup my AppleShareIP server, while the OS9 version does.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×