Jump to content

"Delayed Write Failed" -- RP6 Catalog corruption


awnews

Recommended Posts

Today I got hit with a Windows popup: "Application popup: Windows - Delayed Write Failed : Windows was unable to save all the data for the file \\Awpc8\r$\Backups\AWPC15\AWPC15_C_to_AWPC8_R.rbf.rfc. The data has been lost." This resulted in the Retrospect File backup being corrupted (specifically the catalog file portion). I had seen this previously when "Write back" caching was enabled on some drives, but I hadn't seen it in the last several months after disabling write caching.

 

Environment: Backup from the IDE C: drive on a PC running XP Pro SP1 and Retrospect Pro 6.0 RDU 3.7.105 to a File backup on a secondary IDE harddrive on another PC running W2K Pro SP3. The connection between the PCs is across Cat5 100Mbps networking cables connected by a Netgear 10/100M switch. The backup is a File backup to the second drive, run as Normal or Recycle depending on the schedule. Both PCs are up-to-date with all Service Packs, Crit updates, etc.

 

I don't believe that *anything* is wrong with the cabling, connectors, switch, NICs, etc. The systems work flawlessly with very large amounts of data moved across the network on a regular basis. The *only* time I have ever seen this kind of error is when running Retrospect (sustained high load?) when it has resulted in corruption of the Catalog of a file backup.

 

I've attached some related info at the bottom suggesting that the issue may be with the MS OS (W2K? XP) file system under heavy stress conditions. The article implies that the issue was fixed with W2K SP3 but perhaps it wasn't fully corrected. I do think Dantz should monitor this and consider how they could work around it with corrective action and recover (e.g. retries, auto-correction of corrupted catalogs, etc.).

 

I also had a question. If a Catalog is corrupted in this way, I can manually recover the catalog (sidebar: The Tools->Repair catalog working is awkward. "Update Existing Catalog" seems like what I should do but "Repair File Backup Set" is what I actually have to do). But if I do nothing and don't notice the error, will subsequent backups succeed (assuming the error doesn't reoccur) with fresh catalog(s) that may then be used, i.e. do I only lose the single catalog & session that had the error? And are previous sessions & catalogs in that File backup set useable? Or once the Catalog is corrupted, is the entire backup set unusable until the Catalog is rebuilt? If under this condition previous sessions aren't useable or if Retrospect can't/doesn't auto-recover the File backup/Catalog for subsequent backups, this should be addressed (to make the program less "fragile").

 

 

#######################################

MS XP Event Viewer log messages on XP PC (backup-from C-drive).

 

Event Type: Warning

Event Source: MRxSmb

Event Category: None

Event ID: 50

Date: 3/7/2003

Time: 5:30:19 AM

User: N/A

Computer: AWPC15

Description:

{Delayed Write Failed} Windows was unable to save all the data for the file \Device\LanmanRedirector. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Data:

0000: 04 00 04 00 02 00 56 00 ......V.

0008: 00 00 00 00 32 00 04 80 ....2..?

0010: 00 00 00 00 0c 02 00 c0 .......À

0018: 00 00 00 00 00 00 00 00 ........

0020: 00 00 00 00 00 00 00 00 ........

0028: 0c 02 00 c0 ...À

 

--------------------------------------------------------------------

 

Event Type: Information

Event Source: Application Popup

Event Category: None

Event ID: 26

Date: 3/7/2003

Time: 5:30:19 AM

User: N/A

Computer: AWPC15

Description:

Application popup: Windows - Delayed Write Failed : Windows was unable to save all the data for the file \\Awpc8\r$\Backups\AWPC15\AWPC15_C_to_AWPC8_R.rbf.rfc. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

 

 

============================================

MS W2K Event Viewer log messages on W2K PC (backup-to R-drive).

 

Event Type: Warning

Event Source: Disk

Event Category: None

Event ID: 32

Date: 3/7/2003

Time: 8:13:01 AM

User: N/A

Computer: AWPC8

Description:

The driver detected that the device \Device\Harddisk1\DR1 has its write cache enabled. Data corruption may occur.

Data:

0000: 0f 00 04 00 01 00 62 00 ......b.

0008: 00 00 00 00 20 00 04 80 .... ..?

0010: 01 00 00 00 10 00 00 c0 .......À

0018: 00 00 00 00 00 00 00 00 ........

0020: 00 00 00 00 00 00 00 00 ........

0028: 00 00 00 00 ....

 

Note that this message wasn't logged during the time of the 5:20AM backup (with Write error & catalog failure) but rather when I canceled the popup on the XP PC and just before I rebuilt the the catalog. I *think* that "Harddisk1" is the C: drive on the backup PC (not the R drive where the data is being saved). And a check of the separate physical C & R harddrives on the backup PC shows that Write Caching is *not* enabled. But I also noticed periodic messages in the Event Viewer log suggesting that the drive(s)' write caching is enabled after each reboot (by default?) and *then* write caching is disabled by the driver.

 

 

###########################################################

Retrospect Pro 6.0 RDU 3.7.105 log

(backup to secondary drive on same XP PC OK, then Write Cache failure during backup to network drive)

 

+ Normal backup using AWPC15_FullBackup_AllFiles at 3/7/2003 4:30 AM

To Backup Set AWPC15_FullBackup...

 

- 3/7/2003 4:30:18 AM: Copying DRIVE_C (C:)

File "C:\Documents and Settings\All Users\Application Data\Microsoft\Network\Downloader\qmgr0.dat": can't read, error -1020 (sharing violation)

File "C:\Documents and Settings\All Users\Application Data\Microsoft\Network\Downloader\qmgr1.dat": can't read, error -1020 (sharing violation)

File "C:\Documents and Settings\andrew\Local Settings\Temp\Perflib_Perfdata_648.dat": can't read security information, error -1020 (sharing violation)

File "C:\WINDOWS\Temp\ZLT05d27.TMP": can't read security information, error -1020 (sharing violation)

3/7/2003 5:10:23 AM: Snapshot stored, 107.9 MB

3/7/2003 5:10:36 AM: Comparing DRIVE_C (C:)

File "C:\Program Files\Utilities\WallMaster\WallMaster Wallpaper (andrew).bmp": different modify date/time (set: 3/7/2003 3:47:54 AM, vol: 3/7/2003 4:47:55 AM)

File "C:\WINDOWS\Prefetch\RTHLPSVC.EXE-20806002.pf": different modify date/time (set: 3/6/2003 4:34:40 AM, vol: 3/7/2003 4:34:54 AM)

File "C:\WINDOWS\Prefetch\VSSVC.EXE-1F033002.pf": different modify date/time (set: 3/7/2003 4:33:23 AM, vol: 3/7/2003 5:10:27 AM)

3/7/2003 5:11:28 AM: 7 execution errors

Completed: 223 files, 126.8 MB, with 60% compression

Performance: 128.9 MB/minute (113.5 copy, 149.1 compare)

Duration: 00:41:09 (00:39:10 idle/loading/preparing)

 

 

+ Normal backup using AWPC15_C_to_AWPC8_R at 3/7/2003 5:20 AM

To Backup Set AWPC15_C_to_AWPC8_R...

 

- 3/7/2003 5:20:10 AM: Copying DRIVE_C (C:)

File "C:\Documents and Settings\All Users\Application Data\Microsoft\Network\Downloader\qmgr0.dat": can't read, error -1020 (sharing violation)

File "C:\Documents and Settings\All Users\Application Data\Microsoft\Network\Downloader\qmgr1.dat": can't read, error -1020 (sharing violation)

MapError: unknown Windows error 999

TFile::Write: WriteFile failed, \\Awpc8\r\Backups\AWPC15\AWPC15_C_to_AWPC8_R.rbf.rfc, winerr 999, error -1001

MapError: unknown Windows error 999

TFile::Read: ReadFile failed, \\Awpc8\r\Backups\AWPC15\AWPC15_C_to_AWPC8_R.rbf.rfc, winerr 999, error -1001

MapError: unknown Windows error 999

TFile::Write: WriteFile failed, \\Awpc8\r\Backups\AWPC15\AWPC15_C_to_AWPC8_R.rbf.rfc, winerr 999, error -1001

Can't save Catalog File, error -1001 (unknown Windows OS error)

3/7/2003 5:30:23 AM: Execution incomplete

Remaining: 2 files, 10 KB

Completed: 9767 files, 857.6 MB, with 49% compression

Performance: 118.8 MB/minute

Duration: 00:10:13 (00:02:59 idle/loading/preparing)

 

 

========== manually "Repair File Backup Set" to fix catalog

+ Executing Recatalog at 3/7/2003 8:17 AM

To Backup Set AWPC15_C_to_AWPC8_R...

3/7/2003 8:42:15 AM: Execution completed successfully

Completed: 91159 files, 10.0 GB

Performance: 410.2 MB/minute

Duration: 00:24:52

 

 

//////////////////////////////////////////////////////////////////////////////////////////////////

No help from Microsoft with Event ID 50

 

Event Details

Event ID: 50

Source: MRxSmb

 

We're sorry

There is no additional information about this issue in the Error and Event Log Messages or Knowledge Base databases at this time. You can use the links in the Support area to determine whether any additional information might be available elsewhere.

 

------------------------------------------------------------------------------------------------------------

Possibly related info from the MS Knowledgebase

 

"Lost Delayed-Write Data" Error Message Under Extreme File System Stress

The information in this article applies to:

Microsoft Windows 2000 Server SP1

Microsoft Windows 2000 Server SP2

Microsoft Windows 2000 Advanced Server SP1

Microsoft Windows 2000 Advanced Server SP2

Microsoft Windows 2000 Professional SP1

Microsoft Windows 2000 Professional SP2

Microsoft Windows 2000 Datacenter Server SP2

 

This article was previously published under Q293842

SYMPTOMS

Under conditions in which the file system is heavily stressed, the following error message may be displayed:

 

Lost Delayed-Write Data

 

The system was attempting to transfer file data from buffers to Filename.

The write operation failed, and only some of the data may have been written to the file.

You may also receive the following entry in the program log:

Event Type: Warning

Event Source: MRxSmb

Event Category: None

Event ID: 50

Description: {Lost Delayed-Write Data} The system was attempting to transfer file data from buffers to \Device\LanmanRedirector. The write operation failed, and only some of the data may have been written to the file.

 

The event contains the same text as the error message, but also contains the error status in the data section. Double-click the event, and then click Words for the data type; the last word contains the status. If the status code is c000020c, this hotfix may resolve this issue.

CAUSE

Under heavy disk input/output (I/O) stress condition while an SMB network connection is being used through the loopback interface, a server (SRV) thread and redirector (RDR) thread may deadlock.

 

This behavior has been observed on a Windows 2000 Datacenter Server-based cluster running Microsoft SQL Server 2000 Enterprise Edition under extreme disk-stress conditions with a concurrent run of a complex query script and a database backup.

RESOLUTION

To resolve this problem, obtain the latest service pack for Windows 2000. For additional information, click the following article number to view the article in the Microsoft Knowledge Base:

260910 How to Obtain the Latest Windows 2000 Service Pack

 

The English version of this fix should have the following file attributes or later:

Date Time Version Size File name

----------------------------------------------------------

04-May-2001 11:48 5.0.2195.3573 1,685,440 Ntkrnlmp.exe

04-May-2001 11:49 5.0.2195.3573 1,685,248 Ntkrnlpa.exe

04-May-2001 11:49 5.0.2195.3573 1,705,856 Ntkrpamp.exe

04-May-2001 11:48 5.0.2195.3573 1,663,360 Ntoskrnl.exe

02-Apr-2001 11:46 5.0.2195.3444 237,072 Srv.sys

04-May-2001 14:25 5.0.2195.3407 73,488 Srvsvc.dll

 

 

STATUS

Microsoft has confirmed that this is a problem in the Microsoft products that are listed at the beginning of this article. This problem was first corrected in Windows 2000 Service Pack 3.

 

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...