Jump to content

HeapValidate error on client backup


rcohen

Recommended Posts

Running Retrospect Multi-server 7.7.203 on Server 2003 R2 SP2 64-bit, when I backup an iSCSI drive with lots of files on a Server 2003 SP2 32-bit machine, I get these errors about 50% of the time:

 

TMemory::freeSpace: HeapValidate failed

TMemory::freeSpace: HeapValidate failed

TMemory::freeSpace: HeapValidate failed

TMemory::freeSpace: HeapValidate failed

Can't access source volume, error -519 ( network communication failed)

 

This problem didn't happen in Retrospect 7.6. After this, the rest of the clients back up without problems.

 

I'd appreciate any help, work around advice, or bug fixes.

 

- Rob

Link to comment
Share on other sites

I came into the forum this morning after getting the same error. I get:

 

TMemory:freeSpace: HeapValidate failed

 

This script had groomed 78.3 GB and backed up 8.4 GB.

 

The only change I've made prior to this error was using thorough verification. In this case, the volumes were local to the script. The backup media is elsewhere on a solid network.

 

Version 7.7.203 (64-bit)

Driver Update and Hot Fix, version 7.7.1.102 (64-bit)

Windows Server 2003 R2 Standard, 64-bit

Link to comment
Share on other sites

Hmm...I might be dealing with two separate problems.

 

It looks like I'm always getting the HeapValidate error, now. I'll see if that goes away if I restart the server and client.

 

- 12/28/2009 10:06:26 AM: Copying MailStore (E:) on mailserver

TMemory::freeSpace: HeapValidate failed

File "E:\System Volume Information\10{3808876b-c176-4e48-b7ae-04046e6cc752}": can't read, error -1020 ( sharing violation)

12/28/2009 10:28:30 AM: Snapshot stored, 99.8 MB

12/28/2009 10:28:41 AM: 1 execution errors

Remaining: 1 files, 300.0 MB

Completed: 2516 files, 1.8 GB, with 74% compression

Performance: 476.1 MB/minute

Duration: 00:22:15 (00:18:31 idle/loading/preparing)

 

The "can't access source volume" only seems to happen with simultanious backups.

 

I don't have any problems with my other servers, though.

Link to comment
Share on other sites

After changing my scripts, so that that server doesn't get backed up in parallel with other backup jobs.

 

That solved the problem with this server, but now I am getting a HeapValidate error on another server (which is going in parallel with other jobs).

 

- 1/3/2010 3:21:39 AM: Copying Local Disk (C:) on picasso

TMemory::freeSpace: HeapValidate failed

While scanning volume Local Disk,

Folder C:\RECYCLER\S-1-5-21-1454471165-527237240-682003330-1003\Dc59\Props\Bicycle #1\...,

Scanning incomplete, error -1123 ( volume structure corrupt)

 

 

It sounds like there is a memory corruption bug in 7.7.203, possibly due to a synchronization bug. I'll change my backup scripts to happen in serial, and see if that solves the problem. Of course, this will dramatically slow my backups down.

 

A HeapValidate error isn't harmless. It means that memory is being corrupted. That could cause the backups to be unreliable or corrupt. Kind of defeats the point of backups.

Link to comment
Share on other sites

So far, going down to 1 execution unit has eliminate the HeapValidate errors. Hopefully, this bug will be fixed, so I can do simultanious execution, again, like I did in 7.6.

 

I'm still getting this error, though: Can't access source volume, error -519 ( network communication failed)

 

It's on a large iSCSI drive.

 

 

 

Link to comment
Share on other sites

After some more testing, it looks like the iSCSI was a red herring. That server is in our DMZ, and it looks like our firewall was interfering with communication between it and our Retrospect server. For some reason, the problem happens with 7.7 and not 7.6, but since it works fine when it doesn't go through our firewall, I'm going to assign the firewall the blame. :)

 

This is separate from the HeapValidate error. That happens for me with simultanious executions, regardless of the firewall. I'm blaming that one on 7.7.

Link to comment
Share on other sites

On one of my first support conversations, the tech told me Exchange had to be local to the Retrospect installation. Therefore, I did so and mapped the backup drives via UNC paths in Retrospect. I believe this may be causing some timing issues within Retrospect 7.7(network-based backup media)

 

In response to those and other issues I'm working through with EMC support, I moved the Retrospect installation to execute on the server local to the backup media (external HDDs). I installed the Retrospect client software on the Exchange server and I no longer receive the HeapValidate info messages at the moment. I'll update two weeks or so from now.

Link to comment
Share on other sites

I am also using (UNC path) networked backup media. It is conceivable that this is introducing a race condition, since it's a consumer-level NAS device. I have run in to synchronizaton problems with them, before. I think I'll try moving the backup storage to a real server, over an isolated network.

 

Link to comment
Share on other sites

What are you using? I'm using Seagate FreeAgent Pro 750 GB drives, which have eSATA issues that, while not acknowledged by Seagate, are well documented in user forums across the net. While, I'm not convinced these issues are related to other issues I've been solving within Retrospect, if I ever win the lottery, I'll be replacing them ASAP. :)

Link to comment
Share on other sites

The Thecus N5200Pro is the one that has given me concurrency problems. Specifically, it wasn't handling file locking properly. When I contacted their support about it, they said, "We're a consumer device, so this isn't a priority for us." File locking isn't an issue for this, since I'm using seperate directories, but there's no reason to gamble.

 

I'll backup to a real server, and use the Thecus as secondary backup storage. That's nice to have, when something goes wrong, and I need to rebuild my backups from scratch.

Link to comment
Share on other sites

  • 2 weeks later...
The Thecus N5200Pro is the one that has given me concurrency problems. Specifically, it wasn't handling file locking properly. When I contacted their support about it, they said, "We're a consumer device, so this isn't a priority for us." File locking isn't an issue for this, since I'm using seperate directories, but there's no reason to gamble.

Just interested, what sort of problem do you see in real life?

 

Thecus themselves write: "Designed for hardware enthusiasts and SMBs, the N5200PRO packs some serious power under the hood." The part about SMB's doesn't 'sound' like consumer to me...

 

We do not use a Thecus but (amongst other things) a DroboPro. So far I haven't seen real concurrency problems with that device, but it indeed can get (relatively) slower when using a couple of execution units.

Link to comment
Share on other sites

Regarding the Thecus, file locking wasn't working reliably with Windows file shares. I had an application that relied on file locking, in order to handle multiple, simultanious users. As a result, files would occasionally get corrupted on the Thecus devices. The application works without problems on proper Windows and Linux (SMB/CIFS) server file shares.

 

I would not expect this to be an issue with Retrospect, since Retrospect does not permit simultanious execution threads to write to the same Backup Set (and therefore, file).

 

I was just suspecious about Thecus with concurrent access in general, because of this problem. In my experiments, Thecus didn't appear to be a factor at all.

 

One work around for the Thecus problem with file locking is to use it in iSCSI mode, and let Windows handle the file locking.

 

The experience convinced me not to trust consumer NAS as a real file server. I have found local storage, DAS, and OpenFiler to give much better performance and reliability. The Thecus seems to be doing fine as a network backup drive, using Retrospect (aside from 7.7 bugs, which are non-specific to Thecus).

Link to comment
Share on other sites

Thanks,

 

I think I expected you had used the Thecus in iSCSI mode. This actually is about the only mode you can use the DroboPro in (apart from using its Firewire or USB port and connecting it directly to the host).

 

I do agree most NAS are nice to use at home but can be troublesome in a business environment. Most NAS devices are clearly designed for consumer usage. But they are changing the storage realm, just like 3D game graphics has changed the professional 3D graphic scene. Eventually they will create a large impact even in professional storage.

 

 

On a side note, the DroboPro is a much simpler product compared to a Thecus, so we figured it should be more reliable software-wise. So far it has not let us down. But we only use it as a D2D storage for Retrospect, after which we use it for D2T.

 

I don't think I would use it as primary storage, although I must say it probably would work good enough for that.

Link to comment
Share on other sites

With 7.7.208, I am still getting lots of heap corruption errors. So far, I have been testing with 4 execution units. I'll drop to one to see if that's different.

 

I am getting lots of HeapValidate errors, across different clients, volumes, and backup sets. This is just one example.

 

I am seeing some other new errors. I don't know if these are helpful, or side effects of the memory corruption, but here they are, just in case.

 

- 1/29/2010 3:00:01 AM: Copying System (C:)

TMemory::freeSpace: HeapValidate failed

TMemory::freeSpace: HeapValidate failed

MapError: unknown Windows error -1,017

xpmlStoreMetadata: couldn't copy file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\System Volume Information\SRM\Settings\SrmGlobalSettings.xml" to the state folder "C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\Exec-3\State\MetaInfo\writer0000\comp0000\file0000", osErr -1017, error -1001

MapError: unknown Windows error -1,017

xpmlStoreMetadata: couldn't copy file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\System Volume Information\SRM\Settings\ReportSettings.xml" to the state folder "C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\Exec-3\State\MetaInfo\writer0000\comp0000\file0001", osErr -1017, error -1001

 

Link to comment
Share on other sites

After switching to a single execution unit, and recycling my backup sets, here's what I've noticed.

 

Since the heap corruption is global, running multiple executions shows errors everywhere, after it begins. The single execution unit shows that it happens consistently on the same drive.

 

The HeapValidate errors are starting on a local volume (on the backup server) with a huge number of files and folders. It has 1,664,270 files and 700,291 folders.

 

Hopefully, this will make it easier for Dantz to reproduce it. Meanwhile, I'll try splitting it into multiple backup volumes.

 

I am also seeing a couple other errors, which I haven't seen before this version. These happen repeatedly on other clents, before I start getting HeapValidate errors.

 

MapError: unknown Windows error -1,017

necoIncoming: empty stream packet, tid 20

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...