Jump to content

Recommended Posts

Nearly every night when Retrospect is running, i'm getting this Kernel panic saying: Uncorrectable parity error detected in DIM2/J13

 

My first tought was that it must be an faulty memory module, but I have tested with other memory modules, the motherboard and cpu is also replaced.

 

I'm running Retrospect 6.1.126 and the backup is a file backup made to an firewire disk.

 

i have uninstalled an reinstalled restrospect.

 

Does anybody have an idea what i should do as the next step?

Link to comment
Share on other sites

The server is an xserve dual G5 with an megaraid raidcontroler and 2gb (4*512) of ram. All the hardware replacement was done under warranty.

The os is OSX Server 10.4.5.

Retrospect 6.1.126

 

What apple had done on the machine is: Replaced the memory, replaced the motherboard and replaced CPU.

 

What more information do you need?

Link to comment
Share on other sites

Quote:

What more information do you need?

 


 

What more have you got? It's your dime...

 

- Why did Apple replace parts? Was it solely due to Retrospect issues?

 

- Have you experienced kernal panics from any other activity?

 

- Is your Source a RAID volume? What flavor of RAID?

 

- If you remove the DIMM in DIM2/J13 does the error change?

 

- What RDU is loading with Retrospect?

Link to comment
Share on other sites

Quote:

What more have you got? It's your dime...

 

- Why did Apple replace parts? Was it solely due to Retrospect issues?

 

- Have you experienced kernal panics from any other activity?

 

- Is your Source a RAID volume? What flavor of RAID?

 

- If you remove the DIMM in DIM2/J13 does the error change?

 

- What RDU is loading with Retrospect?

 


 

The reason to replace the parts was to solv the kernel panic issue (the kernel panic did indicate that it was a memory problem) but the replacement didn't solve the problem.

 

I have only experienced kernel panics when retrospect is doing a backup or rebuilding the catalog.

 

I have tested every combination with or without DIMMs in DIM2 and the other slots.

 

My source is a RAID 5 volume on three physical disks, and the controller is an megaraid controller.

 

RDU v 6.1.4.103 is loading with retrospect.

Link to comment
Share on other sites

Quote:

I have tested every combination with or without DIMMs in DIM2 and the other slots.

 

My source is a RAID 5 volume on three physical disks, and the controller is an megaraid controller.

 

 


Per, we've got a very similar configuration to yours (Apple Hardware RAID card ("megaraid") as two RADI 5 LUNS on 3 x 250 GB SATA ADMs) except we have a single processor Xserve G5 2.0 GHz with 2 GB RAM. We also had ECC errors on one of our DIMMs while running Retrospect (from day 1), but ours were correctable, not hard errors, and the problem was fixed when the bad DIMM was replaced under our Xserve maintenance agreement. It's pretty clear that you've got a hardware problem, but it's a bit unclear from your test above just what tests you have done with the bad DIMM. First, I'm sure you are aware that the Xserve G5 DIMMs go in pairs. If you've got a completely bad DIMM (i.e., one half of the pair bad), then that would give hard errors and probably would fail the POST (power on self test). The most instructive test to make is to swap the suspect DIMM with another of the DIMMs and to see if the error moves - that's what AppleCare had us do, and, because the problem moved with the DIMM, it indicated that the problem was with the DIMM rather than with the memory slot. If the problem doesn't move with the DIMM, then the problem is the slot on the motherboard itself. Could you try that test? Also, perhaps your eyes are better than mine, but it took a very bright light and close scrutiny for me to read the labels on the motherboard PCB to see which slots had which names. The DIMM pairs are on alternate sides of the middle of the memory slots, with the even slots on one side and the odd slots on the other side, and both members of a pair have to be the same size. Dave's suggestion to "remove the DIMM in DIMM2/J13" won't work because the DIMMs go in pairs, so you would either have to remove both DIMMs of the pair or, better yet, swap the DIMMs of the pair so that the bad DIMM stays in the same pair but changes slots. Also take care to ensure that the DIMMs are seated correctly. Please try this test (move, not remove, the suspect DIMM) and report. Although Retrospect, because it pumps lots of data, causes this error to surface, it's not an error that Retrospect causes; it's a hardware problem either with the motherboard or with the DIMM itself.

 

Completely unrelated to this, I hope you are aware that there's an apparent bug in the Apple Hardware RAID card in that it doesn't flush the write caches on graceful power down (doesn't happen on reboot, only on power down). I've had a RADAR report open since November on this - it's a very hard bug to track down, but I've got a reproducible test case that will cause the bug 100% of the time. I strongly suggest that you disable the write cache on all of your LUNs on the Apple Hardware RAID card until this is fixed, unless you want "mystery garbage blocks from space". Regards, Russ

Link to comment
Share on other sites

Wow! What an amazingly informative post from Russ!

 

> Dave's suggestion to "remove the DIMM in DIMM2/J13" won't work because the DIMMs go in pairs

 

What I was trying to get Per to report is if the DIMM2/J13 is empty (either by itself or along with it's matched pair, which I didn't realize) what is the text of the kernal panic error?

 

It's a mystery to me why users would hold back information when asking others for help.

 

Dave

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...