rcohen Posted December 28, 2009 Report Share Posted December 28, 2009 Running Retrospect Multi-server 7.7.203 on Server 2003 R2 SP2 64-bit, when I backup an iSCSI drive with lots of files on a Server 2003 SP2 32-bit machine, I get these errors about 50% of the time: TMemory::freeSpace: HeapValidate failed TMemory::freeSpace: HeapValidate failed TMemory::freeSpace: HeapValidate failed TMemory::freeSpace: HeapValidate failed Can't access source volume, error -519 ( network communication failed) This problem didn't happen in Retrospect 7.6. After this, the rest of the clients back up without problems. I'd appreciate any help, work around advice, or bug fixes. - Rob Quote Link to comment Share on other sites More sharing options...
bcssomadude Posted December 28, 2009 Report Share Posted December 28, 2009 I came into the forum this morning after getting the same error. I get: TMemory:freeSpace: HeapValidate failed This script had groomed 78.3 GB and backed up 8.4 GB. The only change I've made prior to this error was using thorough verification. In this case, the volumes were local to the script. The backup media is elsewhere on a solid network. Version 7.7.203 (64-bit) Driver Update and Hot Fix, version 7.7.1.102 (64-bit) Windows Server 2003 R2 Standard, 64-bit Quote Link to comment Share on other sites More sharing options...
rcohen Posted December 28, 2009 Author Report Share Posted December 28, 2009 Also, I think I have only seen this problem when I'm running multiple simultanious backups, which I normally do for scripted backups. I'll try changing my scripts so that server gets backed up on a different schedule. Quote Link to comment Share on other sites More sharing options...
rcohen Posted December 28, 2009 Author Report Share Posted December 28, 2009 Hmm...I might be dealing with two separate problems. It looks like I'm always getting the HeapValidate error, now. I'll see if that goes away if I restart the server and client. - 12/28/2009 10:06:26 AM: Copying MailStore (E:) on mailserver TMemory::freeSpace: HeapValidate failed File "E:\System Volume Information\10{3808876b-c176-4e48-b7ae-04046e6cc752}": can't read, error -1020 ( sharing violation) 12/28/2009 10:28:30 AM: Snapshot stored, 99.8 MB 12/28/2009 10:28:41 AM: 1 execution errors Remaining: 1 files, 300.0 MB Completed: 2516 files, 1.8 GB, with 74% compression Performance: 476.1 MB/minute Duration: 00:22:15 (00:18:31 idle/loading/preparing) The "can't access source volume" only seems to happen with simultanious backups. I don't have any problems with my other servers, though. Quote Link to comment Share on other sites More sharing options...
mauricev Posted December 28, 2009 Report Share Posted December 28, 2009 It looks like I'm always getting the HeapValidate error, now. I'll see if that goes away if I restart the server and client. It's a bug, but I don't think it's too significant. It doesn't seem to cause any other error or affect the backup process. Quote Link to comment Share on other sites More sharing options...
bcssomadude Posted December 30, 2009 Report Share Posted December 30, 2009 While I have yet to call tech support on this (will do so early Jan.), I agree in that this seems informational in nature and therefore does not cause me great concern. If it does deteriorate, believe me, I will follow up. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 4, 2010 Author Report Share Posted January 4, 2010 After changing my scripts, so that that server doesn't get backed up in parallel with other backup jobs. That solved the problem with this server, but now I am getting a HeapValidate error on another server (which is going in parallel with other jobs). - 1/3/2010 3:21:39 AM: Copying Local Disk (C:) on picasso TMemory::freeSpace: HeapValidate failed While scanning volume Local Disk, Folder C:\RECYCLER\S-1-5-21-1454471165-527237240-682003330-1003\Dc59\Props\Bicycle #1\..., Scanning incomplete, error -1123 ( volume structure corrupt) It sounds like there is a memory corruption bug in 7.7.203, possibly due to a synchronization bug. I'll change my backup scripts to happen in serial, and see if that solves the problem. Of course, this will dramatically slow my backups down. A HeapValidate error isn't harmless. It means that memory is being corrupted. That could cause the backups to be unreliable or corrupt. Kind of defeats the point of backups. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 4, 2010 Author Report Share Posted January 4, 2010 I am also getting HeapValidate errors in all my groom scripts. I'll try running them in serial, too. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 4, 2010 Author Report Share Posted January 4, 2010 BTW, I'm rebuilding my catalog files, to be safe, and I configured Retrospect to only use 1 execution unit. We'll see how that goes. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 7, 2010 Author Report Share Posted January 7, 2010 So far, going down to 1 execution unit has eliminate the HeapValidate errors. Hopefully, this bug will be fixed, so I can do simultanious execution, again, like I did in 7.6. I'm still getting this error, though: Can't access source volume, error -519 ( network communication failed) It's on a large iSCSI drive. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 8, 2010 Author Report Share Posted January 8, 2010 After some more testing, it looks like the iSCSI was a red herring. That server is in our DMZ, and it looks like our firewall was interfering with communication between it and our Retrospect server. For some reason, the problem happens with 7.7 and not 7.6, but since it works fine when it doesn't go through our firewall, I'm going to assign the firewall the blame. This is separate from the HeapValidate error. That happens for me with simultanious executions, regardless of the firewall. I'm blaming that one on 7.7. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 11, 2010 Author Report Share Posted January 11, 2010 Update: After running for a couple weeks, dropping to one execution unit (to prevent simultanious executions) is still working reliably - no HeapValidate errors. Quote Link to comment Share on other sites More sharing options...
bcssomadude Posted January 11, 2010 Report Share Posted January 11, 2010 On one of my first support conversations, the tech told me Exchange had to be local to the Retrospect installation. Therefore, I did so and mapped the backup drives via UNC paths in Retrospect. I believe this may be causing some timing issues within Retrospect 7.7(network-based backup media) In response to those and other issues I'm working through with EMC support, I moved the Retrospect installation to execute on the server local to the backup media (external HDDs). I installed the Retrospect client software on the Exchange server and I no longer receive the HeapValidate info messages at the moment. I'll update two weeks or so from now. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 11, 2010 Author Report Share Posted January 11, 2010 I am also using (UNC path) networked backup media. It is conceivable that this is introducing a race condition, since it's a consumer-level NAS device. I have run in to synchronizaton problems with them, before. I think I'll try moving the backup storage to a real server, over an isolated network. Quote Link to comment Share on other sites More sharing options...
bcssomadude Posted January 11, 2010 Report Share Posted January 11, 2010 What are you using? I'm using Seagate FreeAgent Pro 750 GB drives, which have eSATA issues that, while not acknowledged by Seagate, are well documented in user forums across the net. While, I'm not convinced these issues are related to other issues I've been solving within Retrospect, if I ever win the lottery, I'll be replacing them ASAP. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 11, 2010 Author Report Share Posted January 11, 2010 The Thecus N5200Pro is the one that has given me concurrency problems. Specifically, it wasn't handling file locking properly. When I contacted their support about it, they said, "We're a consumer device, so this isn't a priority for us." File locking isn't an issue for this, since I'm using seperate directories, but there's no reason to gamble. I'll backup to a real server, and use the Thecus as secondary backup storage. That's nice to have, when something goes wrong, and I need to rebuild my backups from scratch. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 13, 2010 Author Report Share Posted January 13, 2010 Bad news. I'm getting "TMemory::freeSpace: HeapValidate failed" errors again, even using a proper server for backup shares, over a segregated network. It appears that the only way I can avoid the HeapValidate errors with 7.7 is to drop to 1 execution unit. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 24, 2010 Author Report Share Posted January 24, 2010 After using a single execution unit for a few weeks, I'm getting HeapValidate errors, again. For whatever reason, it is happening less frequently with a single execution unit, but it is still happening. This was never a problem with 7.6. Quote Link to comment Share on other sites More sharing options...
Ramon88 Posted January 25, 2010 Report Share Posted January 25, 2010 The Thecus N5200Pro is the one that has given me concurrency problems. Specifically, it wasn't handling file locking properly. When I contacted their support about it, they said, "We're a consumer device, so this isn't a priority for us." File locking isn't an issue for this, since I'm using seperate directories, but there's no reason to gamble. Just interested, what sort of problem do you see in real life? Thecus themselves write: "Designed for hardware enthusiasts and SMBs, the N5200PRO packs some serious power under the hood." The part about SMB's doesn't 'sound' like consumer to me... We do not use a Thecus but (amongst other things) a DroboPro. So far I haven't seen real concurrency problems with that device, but it indeed can get (relatively) slower when using a couple of execution units. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 25, 2010 Author Report Share Posted January 25, 2010 Regarding the Thecus, file locking wasn't working reliably with Windows file shares. I had an application that relied on file locking, in order to handle multiple, simultanious users. As a result, files would occasionally get corrupted on the Thecus devices. The application works without problems on proper Windows and Linux (SMB/CIFS) server file shares. I would not expect this to be an issue with Retrospect, since Retrospect does not permit simultanious execution threads to write to the same Backup Set (and therefore, file). I was just suspecious about Thecus with concurrent access in general, because of this problem. In my experiments, Thecus didn't appear to be a factor at all. One work around for the Thecus problem with file locking is to use it in iSCSI mode, and let Windows handle the file locking. The experience convinced me not to trust consumer NAS as a real file server. I have found local storage, DAS, and OpenFiler to give much better performance and reliability. The Thecus seems to be doing fine as a network backup drive, using Retrospect (aside from 7.7 bugs, which are non-specific to Thecus). Quote Link to comment Share on other sites More sharing options...
Ramon88 Posted January 25, 2010 Report Share Posted January 25, 2010 Thanks, I think I expected you had used the Thecus in iSCSI mode. This actually is about the only mode you can use the DroboPro in (apart from using its Firewire or USB port and connecting it directly to the host). I do agree most NAS are nice to use at home but can be troublesome in a business environment. Most NAS devices are clearly designed for consumer usage. But they are changing the storage realm, just like 3D game graphics has changed the professional 3D graphic scene. Eventually they will create a large impact even in professional storage. On a side note, the DroboPro is a much simpler product compared to a Thecus, so we figured it should be more reliable software-wise. So far it has not let us down. But we only use it as a D2D storage for Retrospect, after which we use it for D2T. I don't think I would use it as primary storage, although I must say it probably would work good enough for that. Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 29, 2010 Author Report Share Posted January 29, 2010 With 7.7.208, I am still getting lots of heap corruption errors. So far, I have been testing with 4 execution units. I'll drop to one to see if that's different. I am getting lots of HeapValidate errors, across different clients, volumes, and backup sets. This is just one example. I am seeing some other new errors. I don't know if these are helpful, or side effects of the memory corruption, but here they are, just in case. - 1/29/2010 3:00:01 AM: Copying System (C:) TMemory::freeSpace: HeapValidate failed TMemory::freeSpace: HeapValidate failed MapError: unknown Windows error -1,017 xpmlStoreMetadata: couldn't copy file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\System Volume Information\SRM\Settings\SrmGlobalSettings.xml" to the state folder "C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\Exec-3\State\MetaInfo\writer0000\comp0000\file0000", osErr -1017, error -1001 MapError: unknown Windows error -1,017 xpmlStoreMetadata: couldn't copy file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\System Volume Information\SRM\Settings\ReportSettings.xml" to the state folder "C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\Exec-3\State\MetaInfo\writer0000\comp0000\file0001", osErr -1017, error -1001 Quote Link to comment Share on other sites More sharing options...
rcohen Posted January 29, 2010 Author Report Share Posted January 29, 2010 Another new error in my logs: necoIncoming: empty stream packet, tid 20 Quote Link to comment Share on other sites More sharing options...
rcohen Posted February 1, 2010 Author Report Share Posted February 1, 2010 After switching to a single execution unit, and recycling my backup sets, here's what I've noticed. Since the heap corruption is global, running multiple executions shows errors everywhere, after it begins. The single execution unit shows that it happens consistently on the same drive. The HeapValidate errors are starting on a local volume (on the backup server) with a huge number of files and folders. It has 1,664,270 files and 700,291 folders. Hopefully, this will make it easier for Dantz to reproduce it. Meanwhile, I'll try splitting it into multiple backup volumes. I am also seeing a couple other errors, which I haven't seen before this version. These happen repeatedly on other clents, before I start getting HeapValidate errors. MapError: unknown Windows error -1,017 necoIncoming: empty stream packet, tid 20 Quote Link to comment Share on other sites More sharing options...
rcohen Posted February 4, 2010 Author Report Share Posted February 4, 2010 Splitting the large volume into two subvolumes has eliminated the HeapValidate errors, using a single execution unit. I haven't tried multiple execution units, yet. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.