Jump to content
rcohen

HeapValidate error on client backup

Recommended Posts

Running Retrospect Multi-server 7.7.203 on Server 2003 R2 SP2 64-bit, when I backup an iSCSI drive with lots of files on a Server 2003 SP2 32-bit machine, I get these errors about 50% of the time:

 

TMemory::freeSpace: HeapValidate failed

TMemory::freeSpace: HeapValidate failed

TMemory::freeSpace: HeapValidate failed

TMemory::freeSpace: HeapValidate failed

Can't access source volume, error -519 ( network communication failed)

 

This problem didn't happen in Retrospect 7.6. After this, the rest of the clients back up without problems.

 

I'd appreciate any help, work around advice, or bug fixes.

 

- Rob

Share this post


Link to post
Share on other sites

I came into the forum this morning after getting the same error. I get:

 

TMemory:freeSpace: HeapValidate failed

 

This script had groomed 78.3 GB and backed up 8.4 GB.

 

The only change I've made prior to this error was using thorough verification. In this case, the volumes were local to the script. The backup media is elsewhere on a solid network.

 

Version 7.7.203 (64-bit)

Driver Update and Hot Fix, version 7.7.1.102 (64-bit)

Windows Server 2003 R2 Standard, 64-bit

Share this post


Link to post
Share on other sites

Also, I think I have only seen this problem when I'm running multiple simultanious backups, which I normally do for scripted backups. I'll try changing my scripts so that server gets backed up on a different schedule.

Share this post


Link to post
Share on other sites

Hmm...I might be dealing with two separate problems.

 

It looks like I'm always getting the HeapValidate error, now. I'll see if that goes away if I restart the server and client.

 

- 12/28/2009 10:06:26 AM: Copying MailStore (E:) on mailserver

TMemory::freeSpace: HeapValidate failed

File "E:\System Volume Information\10{3808876b-c176-4e48-b7ae-04046e6cc752}": can't read, error -1020 ( sharing violation)

12/28/2009 10:28:30 AM: Snapshot stored, 99.8 MB

12/28/2009 10:28:41 AM: 1 execution errors

Remaining: 1 files, 300.0 MB

Completed: 2516 files, 1.8 GB, with 74% compression

Performance: 476.1 MB/minute

Duration: 00:22:15 (00:18:31 idle/loading/preparing)

 

The "can't access source volume" only seems to happen with simultanious backups.

 

I don't have any problems with my other servers, though.

Share this post


Link to post
Share on other sites
It looks like I'm always getting the HeapValidate error, now. I'll see if that goes away if I restart the server and client.

 

It's a bug, but I don't think it's too significant. It doesn't seem to cause any other error or affect the backup process.

Share this post


Link to post
Share on other sites

While I have yet to call tech support on this (will do so early Jan.), I agree in that this seems informational in nature and therefore does not cause me great concern. If it does deteriorate, believe me, I will follow up. :)

Share this post


Link to post
Share on other sites

After changing my scripts, so that that server doesn't get backed up in parallel with other backup jobs.

 

That solved the problem with this server, but now I am getting a HeapValidate error on another server (which is going in parallel with other jobs).

 

- 1/3/2010 3:21:39 AM: Copying Local Disk (C:) on picasso

TMemory::freeSpace: HeapValidate failed

While scanning volume Local Disk,

Folder C:\RECYCLER\S-1-5-21-1454471165-527237240-682003330-1003\Dc59\Props\Bicycle #1\...,

Scanning incomplete, error -1123 ( volume structure corrupt)

 

 

It sounds like there is a memory corruption bug in 7.7.203, possibly due to a synchronization bug. I'll change my backup scripts to happen in serial, and see if that solves the problem. Of course, this will dramatically slow my backups down.

 

A HeapValidate error isn't harmless. It means that memory is being corrupted. That could cause the backups to be unreliable or corrupt. Kind of defeats the point of backups.

Share this post


Link to post
Share on other sites

BTW, I'm rebuilding my catalog files, to be safe, and I configured Retrospect to only use 1 execution unit. We'll see how that goes.

Share this post


Link to post
Share on other sites

So far, going down to 1 execution unit has eliminate the HeapValidate errors. Hopefully, this bug will be fixed, so I can do simultanious execution, again, like I did in 7.6.

 

I'm still getting this error, though: Can't access source volume, error -519 ( network communication failed)

 

It's on a large iSCSI drive.

 

 

 

Share this post


Link to post
Share on other sites

After some more testing, it looks like the iSCSI was a red herring. That server is in our DMZ, and it looks like our firewall was interfering with communication between it and our Retrospect server. For some reason, the problem happens with 7.7 and not 7.6, but since it works fine when it doesn't go through our firewall, I'm going to assign the firewall the blame. :)

 

This is separate from the HeapValidate error. That happens for me with simultanious executions, regardless of the firewall. I'm blaming that one on 7.7.

Share this post


Link to post
Share on other sites

Update:

After running for a couple weeks, dropping to one execution unit (to prevent simultanious executions) is still working reliably - no HeapValidate errors.

Share this post


Link to post
Share on other sites

On one of my first support conversations, the tech told me Exchange had to be local to the Retrospect installation. Therefore, I did so and mapped the backup drives via UNC paths in Retrospect. I believe this may be causing some timing issues within Retrospect 7.7(network-based backup media)

 

In response to those and other issues I'm working through with EMC support, I moved the Retrospect installation to execute on the server local to the backup media (external HDDs). I installed the Retrospect client software on the Exchange server and I no longer receive the HeapValidate info messages at the moment. I'll update two weeks or so from now.

Share this post


Link to post
Share on other sites

I am also using (UNC path) networked backup media. It is conceivable that this is introducing a race condition, since it's a consumer-level NAS device. I have run in to synchronizaton problems with them, before. I think I'll try moving the backup storage to a real server, over an isolated network.

 

Share this post


Link to post
Share on other sites

What are you using? I'm using Seagate FreeAgent Pro 750 GB drives, which have eSATA issues that, while not acknowledged by Seagate, are well documented in user forums across the net. While, I'm not convinced these issues are related to other issues I've been solving within Retrospect, if I ever win the lottery, I'll be replacing them ASAP. :)

Share this post


Link to post
Share on other sites

The Thecus N5200Pro is the one that has given me concurrency problems. Specifically, it wasn't handling file locking properly. When I contacted their support about it, they said, "We're a consumer device, so this isn't a priority for us." File locking isn't an issue for this, since I'm using seperate directories, but there's no reason to gamble.

 

I'll backup to a real server, and use the Thecus as secondary backup storage. That's nice to have, when something goes wrong, and I need to rebuild my backups from scratch.

Share this post


Link to post
Share on other sites

Bad news. I'm getting "TMemory::freeSpace: HeapValidate failed" errors again, even using a proper server for backup shares, over a segregated network.

 

It appears that the only way I can avoid the HeapValidate errors with 7.7 is to drop to 1 execution unit.

Share this post


Link to post
Share on other sites

After using a single execution unit for a few weeks, I'm getting HeapValidate errors, again. For whatever reason, it is happening less frequently with a single execution unit, but it is still happening.

 

This was never a problem with 7.6.

Share this post


Link to post
Share on other sites
The Thecus N5200Pro is the one that has given me concurrency problems. Specifically, it wasn't handling file locking properly. When I contacted their support about it, they said, "We're a consumer device, so this isn't a priority for us." File locking isn't an issue for this, since I'm using seperate directories, but there's no reason to gamble.

Just interested, what sort of problem do you see in real life?

 

Thecus themselves write: "Designed for hardware enthusiasts and SMBs, the N5200PRO packs some serious power under the hood." The part about SMB's doesn't 'sound' like consumer to me...

 

We do not use a Thecus but (amongst other things) a DroboPro. So far I haven't seen real concurrency problems with that device, but it indeed can get (relatively) slower when using a couple of execution units.

Share this post


Link to post
Share on other sites

Regarding the Thecus, file locking wasn't working reliably with Windows file shares. I had an application that relied on file locking, in order to handle multiple, simultanious users. As a result, files would occasionally get corrupted on the Thecus devices. The application works without problems on proper Windows and Linux (SMB/CIFS) server file shares.

 

I would not expect this to be an issue with Retrospect, since Retrospect does not permit simultanious execution threads to write to the same Backup Set (and therefore, file).

 

I was just suspecious about Thecus with concurrent access in general, because of this problem. In my experiments, Thecus didn't appear to be a factor at all.

 

One work around for the Thecus problem with file locking is to use it in iSCSI mode, and let Windows handle the file locking.

 

The experience convinced me not to trust consumer NAS as a real file server. I have found local storage, DAS, and OpenFiler to give much better performance and reliability. The Thecus seems to be doing fine as a network backup drive, using Retrospect (aside from 7.7 bugs, which are non-specific to Thecus).

Share this post


Link to post
Share on other sites

Thanks,

 

I think I expected you had used the Thecus in iSCSI mode. This actually is about the only mode you can use the DroboPro in (apart from using its Firewire or USB port and connecting it directly to the host).

 

I do agree most NAS are nice to use at home but can be troublesome in a business environment. Most NAS devices are clearly designed for consumer usage. But they are changing the storage realm, just like 3D game graphics has changed the professional 3D graphic scene. Eventually they will create a large impact even in professional storage.

 

 

On a side note, the DroboPro is a much simpler product compared to a Thecus, so we figured it should be more reliable software-wise. So far it has not let us down. But we only use it as a D2D storage for Retrospect, after which we use it for D2T.

 

I don't think I would use it as primary storage, although I must say it probably would work good enough for that.

Share this post


Link to post
Share on other sites

With 7.7.208, I am still getting lots of heap corruption errors. So far, I have been testing with 4 execution units. I'll drop to one to see if that's different.

 

I am getting lots of HeapValidate errors, across different clients, volumes, and backup sets. This is just one example.

 

I am seeing some other new errors. I don't know if these are helpful, or side effects of the memory corruption, but here they are, just in case.

 

- 1/29/2010 3:00:01 AM: Copying System (C:)

TMemory::freeSpace: HeapValidate failed

TMemory::freeSpace: HeapValidate failed

MapError: unknown Windows error -1,017

xpmlStoreMetadata: couldn't copy file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\System Volume Information\SRM\Settings\SrmGlobalSettings.xml" to the state folder "C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\Exec-3\State\MetaInfo\writer0000\comp0000\file0000", osErr -1017, error -1001

MapError: unknown Windows error -1,017

xpmlStoreMetadata: couldn't copy file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\System Volume Information\SRM\Settings\ReportSettings.xml" to the state folder "C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\Exec-3\State\MetaInfo\writer0000\comp0000\file0001", osErr -1017, error -1001

 

Share this post


Link to post
Share on other sites

After switching to a single execution unit, and recycling my backup sets, here's what I've noticed.

 

Since the heap corruption is global, running multiple executions shows errors everywhere, after it begins. The single execution unit shows that it happens consistently on the same drive.

 

The HeapValidate errors are starting on a local volume (on the backup server) with a huge number of files and folders. It has 1,664,270 files and 700,291 folders.

 

Hopefully, this will make it easier for Dantz to reproduce it. Meanwhile, I'll try splitting it into multiple backup volumes.

 

I am also seeing a couple other errors, which I haven't seen before this version. These happen repeatedly on other clents, before I start getting HeapValidate errors.

 

MapError: unknown Windows error -1,017

necoIncoming: empty stream packet, tid 20

 

Share this post


Link to post
Share on other sites

Splitting the large volume into two subvolumes has eliminated the HeapValidate errors, using a single execution unit.

 

I haven't tried multiple execution units, yet.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×