Jump to content

rcohen

Members
  • Content count

    29
  • Joined

  • Last visited

Community Reputation

0 Neutral

About rcohen

  • Rank
    Occasional forum poster
  1. Is this problem in the 32-bit version as well? I have been using the 64-bit version.
  2. I'm doing backups to disk, using a Thecus NAS. I have already tried backing up to other systems (actual Windows and Linux servers), with no change in behavior.
  3. 32 GB. Retrospect's peak memory usage is about 1 GB.
  4. Splitting the large volume into two subvolumes has eliminated the HeapValidate errors, using a single execution unit. I haven't tried multiple execution units, yet.
  5. After switching to a single execution unit, and recycling my backup sets, here's what I've noticed. Since the heap corruption is global, running multiple executions shows errors everywhere, after it begins. The single execution unit shows that it happens consistently on the same drive. The HeapValidate errors are starting on a local volume (on the backup server) with a huge number of files and folders. It has 1,664,270 files and 700,291 folders. Hopefully, this will make it easier for Dantz to reproduce it. Meanwhile, I'll try splitting it into multiple backup volumes. I am also seeing a couple other errors, which I haven't seen before this version. These happen repeatedly on other clents, before I start getting HeapValidate errors. MapError: unknown Windows error -1,017 necoIncoming: empty stream packet, tid 20
  6. Another new error in my logs: necoIncoming: empty stream packet, tid 20
  7. With 7.7.208, I am still getting lots of heap corruption errors. So far, I have been testing with 4 execution units. I'll drop to one to see if that's different. I am getting lots of HeapValidate errors, across different clients, volumes, and backup sets. This is just one example. I am seeing some other new errors. I don't know if these are helpful, or side effects of the memory corruption, but here they are, just in case. - 1/29/2010 3:00:01 AM: Copying System (C:) TMemory::freeSpace: HeapValidate failed TMemory::freeSpace: HeapValidate failed MapError: unknown Windows error -1,017 xpmlStoreMetadata: couldn't copy file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\System Volume Information\SRM\Settings\SrmGlobalSettings.xml" to the state folder "C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\Exec-3\State\MetaInfo\writer0000\comp0000\file0000", osErr -1017, error -1001 MapError: unknown Windows error -1,017 xpmlStoreMetadata: couldn't copy file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\System Volume Information\SRM\Settings\ReportSettings.xml" to the state folder "C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\Exec-3\State\MetaInfo\writer0000\comp0000\file0001", osErr -1017, error -1001
  8. Regarding the Thecus, file locking wasn't working reliably with Windows file shares. I had an application that relied on file locking, in order to handle multiple, simultanious users. As a result, files would occasionally get corrupted on the Thecus devices. The application works without problems on proper Windows and Linux (SMB/CIFS) server file shares. I would not expect this to be an issue with Retrospect, since Retrospect does not permit simultanious execution threads to write to the same Backup Set (and therefore, file). I was just suspecious about Thecus with concurrent access in general, because of this problem. In my experiments, Thecus didn't appear to be a factor at all. One work around for the Thecus problem with file locking is to use it in iSCSI mode, and let Windows handle the file locking. The experience convinced me not to trust consumer NAS as a real file server. I have found local storage, DAS, and OpenFiler to give much better performance and reliability. The Thecus seems to be doing fine as a network backup drive, using Retrospect (aside from 7.7 bugs, which are non-specific to Thecus).
  9. After using a single execution unit for a few weeks, I'm getting HeapValidate errors, again. For whatever reason, it is happening less frequently with a single execution unit, but it is still happening. This was never a problem with 7.6.
  10. Bad news. I'm getting "TMemory::freeSpace: HeapValidate failed" errors again, even using a proper server for backup shares, over a segregated network. It appears that the only way I can avoid the HeapValidate errors with 7.7 is to drop to 1 execution unit.
  11. The Thecus N5200Pro is the one that has given me concurrency problems. Specifically, it wasn't handling file locking properly. When I contacted their support about it, they said, "We're a consumer device, so this isn't a priority for us." File locking isn't an issue for this, since I'm using seperate directories, but there's no reason to gamble. I'll backup to a real server, and use the Thecus as secondary backup storage. That's nice to have, when something goes wrong, and I need to rebuild my backups from scratch.
  12. I am also using (UNC path) networked backup media. It is conceivable that this is introducing a race condition, since it's a consumer-level NAS device. I have run in to synchronizaton problems with them, before. I think I'll try moving the backup storage to a real server, over an isolated network.
  13. Update: After running for a couple weeks, dropping to one execution unit (to prevent simultanious executions) is still working reliably - no HeapValidate errors.
  14. rcohen

    Grooming misery

    I have found grooming to be very intolerant of read/write failures. More so than backups. In the past, I have had problems due to simultanious FTPs or copies of backup results to another drive, and iSCSI on a non-segregated network. Defragging and anti-virus could also potential interfere with read/write operations. Also, before the 64-bit version, running out of memory could corrupt a groom file. Once I segregated iSCSI and scheduled copy & FTP jobs to not happening during grooming, they have been very reliable. I run them once a week. When a groom does fail, you need to rebuild a catalog file. Also, starting with 7.7, I have had occasional errors when running simultanious jobs, so I had to drop to one execution unit. If things are still acting up, try recycling the backup set and starting from scratch. Of course, you lose your backed up data that way, unless you have another copy.
  15. After some more testing, it looks like the iSCSI was a red herring. That server is in our DMZ, and it looks like our firewall was interfering with communication between it and our Retrospect server. For some reason, the problem happens with 7.7 and not 7.6, but since it works fine when it doesn't go through our firewall, I'm going to assign the firewall the blame. This is separate from the HeapValidate error. That happens for me with simultanious executions, regardless of the firewall. I'm blaming that one on 7.7.
×