Jump to content

Backup stuck in 'Closing'


pscs

Recommended Posts

  • Replies 93
  • Created
  • Last Reply

Not to seem stupid, but honestly, I couldn't tell you what the "retroeventhandler" is. There These are automated jobs, created using the wizards. I find no option for pre or post job activities, and I have created no scripts manually. If that doesn't answer the question, tell me what to look for, and I'll check. Thanks.

 

Link to comment
Share on other sites

Just my 2c worth; but I, too, have had this "closing" problem ever since I upgraded to 6.5 last year in June. I'm now on the latest (336) release - but it has made no difference. I also upgraded all my clients to the latest version.

 

This problem has persisted through a complete replacement of my backup machine - complete with a fresh install of Retrospect Multi Server 6.5, and a change of OS from Windows 2000 Server to Windows XP Pro SP1.

 

I back up to disk sets - and I have no Retrorun handler. I do back up a number of machines - with multiple subvolumes on some of them.

 

The most recent occurrence was my Friday night backup - which got as far as session 4 of 7; and remained stuck there until now. As you can see from the following log it has remained stuck since successfully completing a session, midnight on Friday:

 

- 2/20/2004 11:35:08 PM: Copying SS on PTM1

2/21/2004 12:40:09 AM: Snapshot stored, 37.8 MB

2/21/2004 12:40:29 AM: Comparing SS on PTM1

2/21/2004 12:57:18 AM: Execution completed successfully

Completed: 6867 files, 4.1 GB, with 14% compression

Performance: 191.1 MB/minute (154.0 copy, 252.0 compare)

Duration: 01:22:10 (00:38:21 idle/loading/preparing)

 

 

I have spoken with Retrospect support in the past - but their only suggestion was to recreate the catalogs - which did not solve the problem.

 

What's clear to me at this point is:

- It's not related to re-cataloging backup sets

- It's not releated to external scripts

- It’s not related to the System Restore feature on XP (it’s disabled on my machine).

- It occurs randomly during a backup.

- It's a major issue somewhere in Retrospect 6.5. There is no way to recover from it and complete a backup where it has occurred.

- When it happens Retrospect cannot be closed; and no changes made during that session are saved.

 

This problem renders Retrospect pretty much useless for me; and I am now researching replacement products. Sadly a backup system is only useful if, in fact, performs backups. I used to recommend Retrospect – but cannot do so in good conscience any longer.

 

If anyone has a practical suggestion on how to solve it I'm all ears - but I'm not hopeful. it seems v6.0 was the last good version; and one option, I guess, is to revert to that version. If anyone can recommend an alternative product that actually works I’d appreciate that too.

 

-Neville

 

 

 

Link to comment
Share on other sites

Hi

 

I've been trying to reproduce this error but haven't had any luck so far. As a test can you uninstall the 6.5 client on a couple of machines and install the 6.0 client instead? This will disable the open file backup function but it could help us isolate the problem to the 6.5 client software.

 

Are you running proactive backup or standard scripts? Are you using the proactive backup agent? Is the "quit outlook" option turned on in your scripts? Do you have Retrospect configured to run as the logged in user or a specified user?

 

What does the client control panel say is happening when the backup gets stuck in closing? Does turning it to off make any difference? (It may take a few minutes for the BU server to notice the change.)

 

Sorry for the random questions - I'm just trying to see if there is anything common about the reports in this thread.

 

Thanks

Nate

 

 

Link to comment
Share on other sites

Not to muddle the waters any further on this, but we had this problem big time a little while ago. It was after the upgrade to 6.5.

But what we figured out was that around the same time we started backing up a large number of OSX machines. It happened randomly (once or twice a week) and was a real pain.

Then we added some linux clients for the first time and it happened every night.

What we finally did was split the linux clients and OSX clients into their own scripts/backup sets and the problem went away.

The only thing we could figure was that the number of files on each server or workstation running a UNIX like OS was an order of magnitude over our OS9/Windows servers/workstations and that Retro couldn't handle the number of files. Our catalog files weren't too large, but rebuilding them never was able to get us through a full backup. It would work fine during incrementals, but always fail on recycles.

 

just my two cents,

 

Alex

Link to comment
Share on other sites

Hi

 

You are right, I think everyone is using disk sets although one user implies that he had been using a "file" backup set. Can you try a file set and see if it makes a difference?

 

Disk sets store all the data in small chunks so you would think that it would perform better than file.

 

Hey - one thought. Do you have the compress catalog setting enabled for your backup set? If so try turning it off and see if that helps.

 

Nate

Link to comment
Share on other sites

I was using a File backup set, and got the problem. Then, I changed to Disk backup, and still got the problem.

 

I also do tape backups, but I don't recall ever having the problem with those.

 

Only one tape backup can occur at once, but multiple disk/file backups can occur at the same time. Maybe it's to do with concurrency - isn't that a new feature in 6.5

 

Oh, how I wish Retrospect would take it seriously and start getting some real diagnosis information together..

 

 

Link to comment
Share on other sites

Hi Nate,

 

It's interesting that you are unable to reproduce this error; and yet so many different users are seeing it.

 

Let me try and answer your questions:

- I'm running standard scripts - not proactive.

- Quit Outlook is not used - my backup clients are not running Outlook.

- Retrospect is configured to run as a "specific user" - but it's the same user that's logged in.

- Doing things on the client makes no difference once the backup is stuck in "Closing". The control panel says it's busy

 

I will try the 6.0 client a little and see whether it makes any difference...

 

What I'm surprised by is the lack of response from Dantz in terms of any form of diagnostic or log that can tell us what's wrong. It seems there are a great many possible things that can make this software unstable (judging from the range of suggestions and experiences here). Perhaps it would make sense for Dantz to put into the software some kind of flag so that detailed logging could be turned on; and then we would not have to guess what's happening when it goes into the "Closing" state?

 

I'll hazard a guess that it's some sort of race condition - where one part of the software is waiting indefinitely for another part to do something; but timing idiosyncrasies that the developer did not think about are causing the events to occur out of order. It's bad practice to have an infinite wait in a real-time program; but that's what it looks like to me. It ought to time out and do something more elegant than force a task manager kill.

 

Backup software needs to reliably produce restorable backups.

 

I'll re-iterate that this makes the software pretty much useless to me - and I'm not alone. Dantz can choose to take it seriously or we - the customers - will take our business somewhere else.

 

-Neville

 

 

 

Link to comment
Share on other sites

Hi,

 

I understand your frustration. There is actually a _ton_ of logging we can turn on.

 

Retrospect does a number of things at the end of a backup:

Roll the temporary log file contents into the main log file

Combine temporary session and snapshot information into the catalog file

Update the configuration files.

export the backup report

Call external scripts

Disconnect from the client

 

That means there are a number of logs we need to look at. To turn on additional logging type CTRL + ALT + P + P at any screen in Retrospect. There are a lot of possible causes so we will need to try a number of different logging types. To get the logs we need please do the following:

 

In the standard preferences set your log file size to 100MB. Get into the hidden prefs and turn trees and volumes logging to 6. Reproduce the error. Then turn trees and volumes back to 3. Repeat this process with "network"and "Backup sets and catalogs". The idea is to keep each logging type seperate - otherwise the logs get tough to read. The logfile is stored in c:\documents and settings\all users\application data\retrospect\operations_log.utx. You may want to rename your current log so we start fresh.

 

We will need a system profile report for the machines involved when the error occurs as well as Retrospect and Retrospect client version numbers. If the logs are small enough to email then send them to me at pacrimsupport@dantz.com. If they are large let me know and I will get you login information for an FTP server.

 

I will collect the information you have posted into a report. Any other details you can provide would be helpful.

 

Thanks

Nate @ Dantz

 

 

 

 

Link to comment
Share on other sites

Hi Nate,

 

Firstly, thanks for your continuing assistance with this issue!

 

I saved off the old log file, set the log size to 100MB, and configured "Backup sets and Catalogs" to log at level 6 (leaving all other settings at 3).

 

I then ran the backup - but it completed without incident. Of course the problem is intermittent - so I guess I should not be too surprised.

 

I'll keep it in the current state until it fails again - and then get the log to you. I know I need to repeat with the "trees and volumes" and "network" logs too - but I'll get them to you one by one as they happen.

 

Regards,

-Neville

 

Link to comment
Share on other sites

Hi Nate,

 

I worked with tech support for a while and never got anywhere. I sent many assert_log.utx and operations_log.utx files with a higher level of logging.

 

You might look at case # 1-2157819, maybe you can find some usefull information.

 

I still have a copy of many of the logs if you need me to resend them.

 

Thank You

-Pete

Link to comment
Share on other sites

Hi

 

Thanks, I'll take a look.

 

Dantz has just released an update to 6.5 that you may want to try. It addresses an issue where invalid information is sent from a client machine. I'm not sure if that is the cause of this specific issue but it is worth a shot anyway.

 

Nate

Link to comment
Share on other sites

I have one client that the backup gets stuck on a lot since upgrading from 5.6 to 6.5. If I stop the retrospect client service on the client, the server finally sees that it isn't getting anywhere and reports an error about backing up the COM database. I've found that when the "Off" button in the Retrospect Client app doesn't do any good, stopping the service does.

 

 

 

Stopping the client service also lets the server close out the backup, and roll the logs into the main logs. The prior posts regarding losing logs are from killing Retrospect on the Server, when it's the client that is hanging. (Yes, I know the Server SHOULD time out on a client after a while, no idea why it doesn't, and I've mentioned this a lot in the past.)

 

 

 

Anyway, a few days ago, I separated this client into a separate script that is scheduled to start a little after the main script that backs up all the clients. I did this so I could try turning off the option to backup the System State on just this client, and so far it hasn't hung up since. This client is Windows 2000 Pro with latest service pack and Windows Updates, and is NOT configured to standby, sleep, etc. I have no idea what is causing the hang, but I'm more concerned with backing up all the files on this workstation than getting a good registry backup so this is an adequate workaround for me. You can use this idea and see if it helps.

 

 

Link to comment
Share on other sites

Hi!

 

I read up on this thread last week after having similar problems. Our setup is a Win2K Pro running Retrospect Server 6.5.336 (since upgraded to .343) is backing up a mix of OS 9 & X clients.

 

We started backing up to a file on the server in January. Everything was fine until the last week of February when we started getting the "stuck in 'closing…'" problem. This would happen on random clients on any of our three scripts that run every night (each script is backing up to a file),

 

After upgrading to .343 on Friday (3/5), all of our backup jobs have executed without any problems!

 

nuvs

Link to comment
Share on other sites

Hi Nate,

 

I have not yet upgraded my Retrospect to .343 - it's still on .336.

 

Last night I again had the backup stuck in Closing. This time it was with a freshly recycled disk backup set; and as it happens the backup got stuck on the first client it connected to. I tried the "stop the service" trick on the client, but nothing happened even after a few hours.

 

The backup log in c:\documents and settings\all users\application data\retrospect\operations_log.utx contained nothing, so I searched my drive; and found a file named C:\Documents and Settings\All Users\Application Data\Retrospect\RtrExec.dir\RtrC0C.tmp, which had a timestamp shortly after the timestamp on the catalog (.rbc) file. It looked like the file most recently modified by Retrospect. Inside is what looks like the part of the log file that would get written if the backup ever finished.

 

It's very short. The first session is a backup of the local C drive; and the second is the first client. Here's the second session:

 

Code:



- 3/10/2004 11:15:21 PM: Copying $[*!20612,,14,+3]BOOT (C:)$[3] on PTMADS$[4]

3/10/2004 11:15:21 PM: Connected to $[*!20780,,14,+3]PTMADS

arcDoMatch: hash loading: 0.000 of 0.0 secs (0.00%) for 0 ops (0.0 uSec/op)

arcDoMatch: hash saving: 0.735 of 1.3 secs (55.68%) for 2 ops (367,500.0 uSec/op)

arcDoMatch: session loading: 0.468 of 1.3 secs (34.92%) for 1 ops (468,000.0 uSec/op)

arcDoMatch: matching: 1.297 of 1.3 secs (96.07%) for 1 ops (1,297,000.0 uSec/op)

$[*20750] File "$[1]C:\Documents and Settings\All Users\Application Data\Microsoft\Network\Downloader\qmgr0.dat$[2]": can't read, error -1020 (sharing violation)

$[*20750] File "$[1]C:\Documents and Settings\All Users\Application Data\Microsoft\Network\Downloader\qmgr1.dat$[2]": can't read, error -1020 (sharing violation)

$[*20750] File "$[1]C:\TerminalServices\LServer\LServer\edb.log$[2]": can't read, error -1020 (sharing violation)

$[*20750] File "$[1]C:\TerminalServices\LServer\LServer\TLSLic.edb$[2]": can't read, error -1020 (sharing violation)

$[*20750] File "$[1]C:\TerminalServices\LServer\LServer\tmp.edb$[2]": can't read, error -1020 (sharing violation)

$[*20750] File "$[1]C:\WINNT\system32\Perflib_Perfdata_3e0.dat$[2]": can't read, error -1020 (sharing violation)

3/10/2004 11:55:03 PM: Snapshot stored, 81.8 MB

3/10/2004 11:55:09 PM: Comparing $[*!20612,,14,+3]BOOT (C:)$[3] on PTMADS$[4]

$[*20750] File "$[1]C:\WINNT\system32\inetsrv\urlscan\urlscan.031004.log$[2]": different modify date/time (set: 3/10/2004 11:41:16 PM, vol: 3/10/2004 11:50:12 PM)

3/11/2004 12:14:03 AM: 7 execution errors $[127234341211770000]

Completed: 11922 files, 4.8 GB, with 24% compression

Performance: 193.7 MB/minute (155.2 copy, 257.5 compare)

Duration: 00:58:42 (00:08:57 idle/loading/preparing)


 

Note that the setting "Backup sets and Catalogs" is set to log at level 6 (leaving all other settings at 3).

 

I don't know whether the above log snippet will help at all - but I will now try changing the logging to "trees and volumes" and see what happens.

 

In the meantime if the above log tells you anything please let me know.

 

Thanks,

-Neville

Link to comment
Share on other sites

Nevillev,

 

Thanks for posting the details. This log entry looks perfect i.e. I don't see anything wrong. I am setting up a backup now so I can compare your log entries with mine.

 

There is one other secret preference that we should enable - under the program menu item there is an option to update the log after each write. That should save you from having to hunt down the temp file anyway.

 

I'll let you know what I find.

 

Nate

Link to comment
Share on other sites

Hi Nate,

 

Well - here's the next installment.

 

I upgraded my Retrospect to the latest version (6.5.343). I also set the logging level to 6 for "Trees and volumes"; and back to 3 for the rest. Then I re-ran the backup that had failed on 3/10.

 

Once again I recycled the backup set before starting.

 

I had not seen your email about the log setting in the program menu before starting the backup - but nevertheless it turns out that the actual log contains all the entries that are in the tmp file this time.

 

It failed on a different client machine this time - in the 4th backup of the session.

The log is much longer now; so I will email it to you at the address you gave above. In the meantime, though, here are the very last entries. Again I can see nothing wrong; but hopefully it'll mean more to you:

 

Code:



FEncEnd: not started

FEncEnd: not started

FEncEnd: not started

FEncEnd: not started

vlocRDiskFileOpen: device number = 0x2

File "D:\Common\SS\data\s\safcaaaa": different modify date/time (set: 3/11/2004 8:33:30 PM, vol: 3/11/2004 10:29:14 PM)

File "D:\Common\SS\data\s\safcaaaa.a": can't read, error -1101 (file/directory not found)

File "D:\Common\SS\data\s\sfvbaaaa": different modify date/time (set: 3/11/2004 10:20:25 PM, vol: 3/11/2004 10:29:10 PM)

File "D:\Common\SS\data\s\sfvbaaaa.b": can't read, error -1101 (file/directory not found)

FEncEnd: not started

FEncEnd: not started

FEncEnd: not started

FEncEnd: not started

vlocRDiskFileOpen: device number = 0x2

FEncEnd: not started

FEncEnd: not started

FEncEnd: not started

FEncEnd: not started

vlocRDiskFileOpen: device number = 0x2

FEncStart: done

FEncEnd: not started

FEncEnd: not started

FEncEnd: not started

FEncEnd: not started

vlocRDiskFileOpen: device number = 0x2

FEncEnd: begin

FEncEnd: starting to wait

FEncEnd: still waiting

FEncEnd: done

3/11/2004 11:43:08 PM: 80 execution errors

Completed: 78521 files, 7.7 GB, with 14% compression

Performance: 111.4 MB/minute (92.5 copy, 140.0 compare)

Duration: 02:24:15 (00:02:50 idle/loading/preparing)

 


 

I will do it one more time, now - with the log setting on network.

 

Thanks again,

-Neville

Link to comment
Share on other sites

Hi Nate,

 

I confess I'm very very frustrated at this time.

 

I ran the backup again - after changing the logging to be "6" for network and resetting everything else.

 

It got stuck in closing again - on about the 3rd session I think. I had set it to "flush" the log - as suggested - and so I could see the entries in the Retrospect log view. There were thousands of rows at the end with some repeating sequence of network messages - all with zero bytes. So I terminated retrospect with TaskManager again. It generated an assertion - and created an assert_log file - which I will email to you.

 

it also erased the operations log. All the stuff about the network is not there anymore. I can barely believe it. I also looked in the RtExec.dir folder. It's empty now. I should have zipped up the file before killing Retrospect.

 

Anyhow I'll restart the backup now and hope that it creates some useful info this time. I can't believe how bad this problem is and how hard it is to get to the bottom of it.

 

Regards,

-Neville

Link to comment
Share on other sites

Before you terminate Retrospect on the SERVER, try stopping the client's Retrospect Client service (I do this with Computer Management, connecting to the remote computer). This always lets my server figure out that the client is hung, and it ends the backup and rolls the temp logs into the main logs.

 

I haven't had a hang on the problem client since upgrading to 6.5.343.

Link to comment
Share on other sites

Just to keep this list up to date:

 

I have not so far found that terminating the Retrospect Client service has any impact on the backup server when it's stuck - although I confess the longest I've waited is around 4 hours.

 

Also I have continued to have the problem with 6.5.343.

 

Nate is currently working with me on the various log files (now including a Network log) and hopefully we'll get to a resolution for this issue.

 

-Neville

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...