Jump to content

OS X server crashing during Duplicate


Recommended Posts

Hello,

 

Since a while I am having troubles backing up one particular server using Retrospect Backup 6.1.126. It starts backing up our main project server and stops after a while giving the following message in the Retrospect logs:

 

'Trouble reading files, error 519 (network communication failed).'

 

The server is not responding anymore, not directly and not through the network. This first only occurred on scheduled duplicates, so I did that manually for a while. But now it does it all the time and I cannot afford loosing our main server every time it backs up.

 

It backs up the other servers without problems, one has the same configuration as our main server.

 

Server:

Apple G5 DP 2.0 Ghz 1 GB RAM with OS X Server 10.4.11. Data is on a separate RAID system (Arena Sivy 7230)

 

Backup server:

Apple G4 DP 1 Ghz, with 768 MB RAM and OS X client 10.4.10. It uses the same RAID as our main server for storage. Retrospect Backup 6.1.126

 

Thanks!

Link to comment
Share on other sites

-519 is the same as pulling the network cable: The client suddenly goes "dead".

-519 isn't a "crash", it's just an error code.

 

It isn't clear what you do to make it work the next time. Do you have to reboot?

The server is not responding anymore

Does that mean the client isn't responding to any type of access or "just" Retrospect?

Is there anything in the logs on the client?

Have you checked the disks on the client?

 

Link to comment
Share on other sites

-519 is the same as pulling the network cable: The client suddenly goes "dead".

-519 isn't a "crash", it's just an error code.

 

The OS X server freezes completely, it is not responding when accessing directly with a mouse and keyboard. Spinning beachballs forever, force quit does not work anymore.

 

It isn't clear what you do to make it work the next time. Do you have to reboot?

 

I have to force restart the server. In the beginning the problem only occurred with automatic Duplicates (through a scheduled script), when I activated the script manually it would backup without any problems.

 

Is there anything in the logs on the client?

Have you checked the disks on the client?

 

The system- and crash logs do not show any relevant information, there is simply a gap in the system log data for the moment it looses it connection.

Link to comment
Share on other sites

Since a while I am having troubles backing up one particular server using Retrospect Backup 6.1.126. It starts backing up our main project server and stops after a while giving the following message in the Retrospect logs:

 

'Trouble reading files, error 519 (network communication failed).'

 

The timing of things is pretty important, but I'll go out on a limb here and assume that the main project server is crashing for whatever reason, and since it's crashed, Retrospect can no longer talk to it.

 

This first only occurred on scheduled duplicates, so I did that manually for a while. But now it does it all the time...

 

So it _sounds_ as if there is a correlation of some sort, but that does not prove the causation.

 

- How does the machine running Retrospect 6.1.126 communicate with the crashing server? Is the crashing server running Retrospect OS X Client software? If so, what version?

 

It starts backing up our main project server and stops after a while

 

- What's shown in Retrospect's Operations Log; does the backup scan successfully? Does it copy? Does it start to compare? Is the loss of communication happening at a consistent time (always before complete scan, always after complete scan, etc)?

 

there is simply a gap in the system log data for the moment it looses it connection

 

- So the exact time that the Retrospect Operations Log reports the -519 error is the exact last entry in any otherwise verbose log on the server?

 

Data is on a separate RAID system (Arena Sivy 7230)

 

- Do backups of local volumes on the G5 also cause a crash?

(bummer that entering "7230" in the search field on the Arena website results in no hits; pretty disappointing for a $7,000 device)

- Is the RAID device directly connected to the G5 through an SATA host bus adapter card of some sort?

 

 

Backup server ... uses the same RAID as our main server for storage

 

Those of us reading this have no way of understanding exactly what this means.

- Are you using a File Backup Set stored on the same Arena Sivy 7230 that is also the Source of your backups?

- How exactly are things configured?

 

Assuming that you are using client software on the machine that is crashing, it might be a good idea to bump up the logging on the client, and open a support incident with EMC

Link to comment
Share on other sites

The timing of things is pretty important, but I'll go out on a limb here and assume that the main project server is crashing for whatever reason, and since it's crashed, Retrospect can no longer talk to it.

 

- How does the machine running Retrospect 6.1.126 communicate with the crashing server? Is the crashing server running Retrospect OS X Client software? If so, what version?

 

The client vs on the server is 6.1.130, the Backup app is 6.1.126

 

- What's shown in Retrospect's Operations Log; does the backup scan successfully? Does it copy? Does it start to compare? Is the loss of communication happening at a consistent time (always before complete scan, always after complete scan, etc)?

 

- 23-10-2008 09:24:04: Copying _MASSIVESERVER on xserver…

File “robbie_full.mov†appears incomplete, path: “_MASSIVESERVER/Massive Music Algemeen/Massive Communicatie/Massive Websites/Massivemusic.com/Films TV/Massive Site/Mov's/TMF/robbie_full.movâ€.

Trouble reading files, error 519 (network communication failed).

23-10-2008 09:37:09: Execution incomplete.

Remaining: 72279 files, 347,3 GB

Completed: 10605 files, 9,3 GB

Performance: 1075,5 MB/minute

Duration: 00:13:05 (00:04:15 idle/loading/preparing)

 

It always starts the Duplicate script for this server, it stops after a while but never stops after a certain time (always different), it never stops at the same file so it is difficult to determine if a certain file maybe causes the network interruption.

 

- So the exact time that the Retrospect Operations Log reports the -519 error is the exact last entry in any otherwise verbose log on the server?

 

The last time I tried the Duplicate (last thursday) the logs showed the following:

 

XSERVER System log:

 

Oct 23 09:06:27 xserver bootpd[23156]: DHCP REQUEST [en0]: 1,0:17:f2:ed:64:8d

Oct 23 09:06:27 xserver bootpd[23156]: ACK sent Macbook-Joep 192.168.1.102 pktsize 363

Oct 23 09:30:28 xserver bootpd[23534]: interface en0: ip 192.168.1.10 mask 255.255.255.0

Oct 23 09:30:28 xserver bootpd[23534]: server name xserver.massivemusic.com

Oct 23 09:30:28 xserver bootpd[23534]: bsdpd: re-reading configuration

Oct 23 09:30:28 xserver bootpd[23534]: bsdpd: shadow file size will be set to 48 megabytes

Oct 23 09:30:28 xserver bootpd[23534]: bsdpd: age time 00:15:00

Oct 23 09:30:28 xserver bootpd[23534]: DHCP DISCOVER [en0]: 1,0:11:24:85:a3:ca

Oct 23 09:30:28 xserver bootpd[23534]: OFFER sent Powerbook-Michiel 192.168.1.117 pktsize 363

Oct 23 09:30:29 xserver bootpd[23534]: DHCP REQUEST [en0]: 1,0:11:24:85:a3:ca

Oct 23 09:30:29 xserver bootpd[23534]: ACK sent Powerbook-Michiel 192.168.1.117 pktsize 363

Oct 23 09:50:45 localhost kernel[0]: standard timeslicing quantum is 10000 us

Oct 23 09:50:45 localhost kernel[0]: vm_page_bootstrap: 253220 free pages

Oct 23 09:50:45 localhost kernel[0]: mig_table_max_displ = 70

Oct 23 09:50:45 localhost kernel[0]: 95 prelinked modules

Oct 23 09:50:45 localhost kernel[0]: Copyright © 1982, 1986, 1989, 1991, 1993

Oct 23 09:50:45 localhost kernel[0]: The Regents of the University of California. All rights reserved.

Oct 23 09:50:45 localhost kernel[0]: using 2621 buffer headers and 2621 cluster IO buffer headers

Oct 23 09:50:45 localhost kernel[0]: DART enabled

 

 

The Duplicate started at 9:24. And in between 9:30 and 9:50 is 20 minutes of nothing. The Retrospect logs show that the contact with the server is lost around 9:37.

 

- Do backups of local volumes on the G5 also cause a crash?

(bummer that entering "7230" in the search field on the Arena website results in no hits; pretty disappointing for a $7,000 device)

- Is the RAID device directly connected to the G5 through an SATA host bus adapter card of some sort?

 

Nothing else is giving problems besides this. We have an identical G5 DP 2.0 Ghz server, same OS which Duplicates fine.

 

 

Those of us reading this have no way of understanding exactly what this means.

- Are you using a File Backup Set stored on the same Arena Sivy 7230 that is also the Source of your backups?

- How exactly are things configured?

 

The G5 xserver and the Retrospect backup G4 are using the same physical Arena Sivy. So running projects (data on the server) and it's backup are stored inside the same device, but different volumes. In the end all is stored again to data tapes.

 

Assuming that you are using client software on the machine that is crashing, it might be a good idea to bump up the logging on the client, and open a support incident with EMC

 

Thanks a lot for your help! The problem is driving me crazy...

Link to comment
Share on other sites

The client vs on the server is 6.1.130, the Backup app is 6.1.126

A couple of things to check.

 

(1) Just as a comment, you are running a very old version of the Retrospect application (and an old version of the client). You might consider getting the free updates to the current versions:

Retrospect Mac updates

 

(2) You don't say what version of the Retrospect Driver Update ("RDU") you are running - see the Retrospect log, which logs the version numbers of Retrospect and RDU on each launch. See above link for update to current version.

 

(3) The EMC knowledgebase article is wrong about which versions of MacOS are affected by Apple's ACL crashing bug in the Carbon API (for older code base programs, like Retrospect's current version):

Retrospect 6.1 release notes

 

The bug affects any version of Tiger or Leopard, on PPC or Intel architecture, that has ACLs enabled. Try setting the Retrospect preference to not back up ACLs.

 

In the updated version of course.

 

Let us know if this helps.

 

Russ

Link to comment
Share on other sites

Before I go through your responses, perhaps you could take another stab at answering the questions posed. Questions such as "How exactly are things configured?" deserve a more thorough telling then what's provided here. Something as basic as what computer the RAID device is physically connected to remains unanswered.

 

To add confusion, this post uses the term "Duplicate" for the first time. So knowing exactly how you are using Retrospect (what exactly is the Source? What exactly is the Destination? Is this a Backup or a Duplicate? Etc?) is critical for even _starting_ to try and help.

 

Russ, I recognize that the ACL bug can cause Retrospect to crash; have there been reports of it causing OS X Server to crash?

 

 

Dave

Link to comment
Share on other sites

Yes, I believe so. I'd have to pour back over the forum reports. The problem statements have been, shall we say, less than precise.

 

I'm convinced that Apple has no interest or desire in fixing this Carbon API bug, and that it will only be fixed for those of us on Retrospect when Retrospect X (Cocoa) arrives.

 

Russ

Link to comment
Share on other sites

(1) Just as a comment, you are running a very old version of the Retrospect application (and an old version of the client). You might consider getting the free updates to the current versions:

 

I've just done the Retrospect updates on host and client side, tried the script again and it stopped after 14 minutes:

 

+ Duplicate using XSERVER at 28-10-2008 12:47

28-10-2008 12:47:39: Connected to xserver

 

- 28-10-2008 12:47:39: Copying _MASSIVESERVER on xserver…

File “ONMnieuw.mov†appears incomplete, path: “_MASSIVESERVER/Massive Music Algemeen/Massive Communicatie/Massive Websites/Massivemusic.com/Films TV/Massive Site/Mov's/ONMnieuw.movâ€.

Trouble reading files, error 519 (network communication failed).

28-10-2008 13:01:37: Execution incomplete.

Remaining: 73093 files, 356,9 GB

Completed: 10614 files, 6,0 GB

Performance: 648,9 MB/minute

Duration: 00:13:58 (00:04:33 idle/loading/preparing)

 

The xserver system logs look like this:

 

Oct 28 12:46:04 xserver bootpd[2861]: DHCP REQUEST [en0]: 1,0:23:6c:10:1d:d8

Oct 28 12:46:04 xserver bootpd[2861]: ACK sent Steve-s-iPhone 192.168.1.108 pktsize 300

Oct 28 12:49:34 xserver bootpd[2861]: DHCP REQUEST [en0]: 1,0:23:6c:10:1d:d8

Oct 28 12:49:34 xserver bootpd[2861]: ACK sent Steve-s-iPhone 192.168.1.108 pktsize 300

Oct 28 12:55:47 xserver bootpd[3152]: interface en0: ip 192.168.1.10 mask 255.255.255.0

Oct 28 12:55:47 xserver bootpd[3152]: server name xserver.massivemusic.com

Oct 28 12:55:47 xserver bootpd[3152]: bsdpd: re-reading configuration

Oct 28 12:55:47 xserver bootpd[3152]: bsdpd: shadow file size will be set to 48 megabytes

Oct 28 12:55:47 xserver bootpd[3152]: bsdpd: age time 00:15:00

Oct 28 12:55:47 xserver bootpd[3152]: DHCP REQUEST [en0]: 1,0:21:e9:7d:f1:bf

Oct 28 12:55:47 xserver bootpd[3152]: ACK sent 192.168.1.110 pktsize 300

Oct 28 13:10:45 localhost kernel[0]: standard timeslicing quantum is 10000 us

Oct 28 13:10:45 localhost kernel[0]: vm_page_bootstrap: 253220 free pages

Oct 28 13:10:45 localhost kernel[0]: mig_table_max_displ = 70

Oct 28 13:10:45 localhost kernel[0]: 95 prelinked modules

Oct 28 13:10:45 localhost kernel[0]: Copyright © 1982, 1986, 1989, 1991, 1993

Oct 28 13:10:45 localhost kernel[0]: The Regents of the University of California. All rights reserved.

Oct 28 13:10:45 localhost kernel[0]: using 2621 buffer headers and 2621 cluster IO buffer headers

Oct 28 13:10:45 localhost kernel[0]: DART enabled

Oct 28 13:10:45 localhost lookupd[56]: lookupd (version 369.5) starting - Tue Oct 28 13:10:45 2008

Oct 28 13:10:45 localhost kernel[0]: Enabling ECC Error Notifications

Oct 28 13:10:45 localhost kernel[0]: USB caused wake event (EHCI)

Oct 28 13:10:45 localhost kernel[0]: FireWire (OHCI) Apple ID 42 built-in now active, GUID 001124ff fe37a3a4; max speed s800.

 

While I'm copying/pasting this info, I see one similarity with the previous attempt: it does stop while backing up files from the same folder ('Massive Site'). I'll try to remove that complete folder away from the server and start the script again.

 

(2) You don't say what version of the Retrospect Driver Update ("RDU") you are running - see the Retrospect log, which logs the version numbers of Retrospect and RDU on each launch. See above link for update to current version.

 

Updated as well.

 

The bug affects any version of Tiger or Leopard, on PPC or Intel architecture, that has ACLs enabled. Try setting the Retrospect preference to not back up ACLs.

 

I had read that it only affects Intel machines (ours is G5), but if you say it also affects these, I'll switch the ACL pref and see if that works.

In the updated version of course.

Link to comment
Share on other sites

To add confusion, this post uses the term "Duplicate" for the first time.

 

Well, except for the actual title of the thread.

 

So here's what it sounds as if the OP is trying to do, but I'm guessing way too much:

 

- Computer hosting RAID w/SATA host adapter ("xserver") is running OS X Server and sharing RAID volumes via AFP (and/or other protocols)

 

- "xserver" is running OS X Client software

 

- Computer running Retrospect has a volume of that same RAID mounted via AFP (or some other protocol)

 

- Computer running Retrospect is connecting to xserver via the client software and attempting to Duplicate files from a volume on the RAID ("_MASSIVESERVER") that's local to the client, to a shared logical volume of the same RAID, pulling all files through the network and then pushing them back through the network again to wind up on the same physical device from which they started.

 

- During the process, xserver is crashing in some manner.

 

 

If this is actually what you are trying to do, Retrospect might not be the best tool to use. Why not just Duplicate the files directly, volume-to-volume, on xserver, using any of the multitude of good unix or cocoa duplication utilities?

 

Or, if your configuration is actually different, please describe it.

 

 

Dave

Link to comment
Share on other sites

Backup server:

Apple G4 DP 1 Ghz, with 768 MB RAM and OS X client 10.4.10. It uses the same RAID as our main server for storage. Retrospect Backup 6.1.126 (now updated)

I assume you mean "MacOS 10.4.10 non-server" rather than the non-existent "OS X client 10.4.10".

 

Just a comment - that's really not enough RAM to be backing up the amount / size of files that you are doing. It may not be the problem, but it may be contributing.

 

Russ

Link to comment
Share on other sites

Before I go through your responses, perhaps you could take another stab at answering the questions posed. Questions such as "How exactly are things configured?" deserve a more thorough telling then what's provided here. Something as basic as what computer the RAID device is physically connected to remains unanswered.

 

Sorry about that... Both the source (my xserver) and the destination (Retrospect machine G4 DP 1.0 Ghz) are connected via SCSI to the Arena Sivy 7230.

 

Disk Utility on the xserver tells me:

 

Disk Description: SA-7230 Media

Connection bus: SCSI

Connection Type: External

Connection ID: SCSI Target ID 0. Logical Unit 0

Disk Utility on the Retrospect machine tells me the same.

 

So there are 2 computers connected to the RAID via SCSI. All scripts on the Retrospect machine are 'Duplicates'. The clients are connected through a Gigabit network and all machines work normally in this network. The scripts are using a daily schedule and this used to work for the xserver as well. And from one day to another the scheduled Duplicate caused this problem, activating it manually solved this back then. Now this does not work anymore as well.

 

To add confusion, this post uses the term "Duplicate" for the first time. So knowing exactly how you are using Retrospect (what exactly is the Source? What exactly is the Destination? Is this a Backup or a Duplicate? Etc?) is critical for even _starting_ to try and help.

 

What I meant with 'Duplicate' is that when creating the script for this xserver, I've chosen for 'Duplicate' rather than 'Backup'.

 

What I'll try, as written in previous post, is removing those folders from the source where it was stuck the last two times.

 

Thnx!

Link to comment
Share on other sites

It would be useful information, before doing that (or if it works with those folders out of the source), to see what happens if you disable ACL backup.

 

Do you have any ACLs set up on the machine running MacOS Server 10.4.11 ? For the machine running MacOS non-server 10.4.10, it's possible to turn on ACLs by hand using the command line with effort, but you haven't mentioned that you did so.

 

Russ

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...