Jump to content

Frequently crashes again when building snapshots


waldorfm

Recommended Posts

Hello,

 

I had a previous post around 18-Aug, but it seems to be gone, so I'm making a new one. The problem is that the backup server is crashing almost every night again with a "terminated due to power failure" message. The action that leads to the problem is also not recorded in the operation log. I just happened to see the application frozen when doing the "building snapshot" operation - it did not move for 24 hours. I had the log open and it showed about 400,000 errors, mostly wrong time/date or size. I also noticed that files affected use foreign (arabic, cyrillic) filenames. The backup was processing our fileserver. Don't think OpenFile backup has kicked in because it could not locate enough idle time.

 

In the previous post the suggestion was to put in larger disks due to memory needs, etc. I have replaced the whole computer and installed the system from scratch. The system is Compaq DL360 running Win2K, latest patches, 512 MB RAM, and plenty of space. The problem is happening again. Any ideas how to fix it?

 

After using Retrospect for 11 years, I'm still trying to defend it's use, but I'm running out support.

 

Best regards,

Markus Waldorf

Link to comment
Share on other sites

Hello,

 

Thanks for your reply! It's Multiserver Version 7.0.326, Retrospect Update, version 7.0.6.108.

The backup device is IBM Ultrium 3584 LTO2, with 5 drives, partitioned. With Open-File backup and Disaster Recovery add-on.

 

I'm running the same configuration on another machine, which has been very reliable so far, but is dealing with less files and not backing up any user workstations.

 

The one computer that is having constantly trouble actually used to work reliable for a long time until a couple of month ago. The only thing that changed is that the fileserver has a lot more files now. The backup server has a couple of workstations and also back's up the coorporate fileserver. There could be some defragmentation programs and other things on the fileserver, but whatever it is, its causing the backup server to crash every couple of nights now. The problem seems to be when building the snapshot. Could it be some sort of a memory leak that may lead into other resource problems? As I said, I have replaced the whole machine with bigger drives, different cpu, different memory, etc. Also tried to start a new backupset, which did not help.

 

I'm out of ideas. We will soon utilize "snap-in" software on our EMC, which means I will be able to do local instead of network backups of the fileserver, but I afraid it won't solve the problem. Could the problem be with non-english or strange filenames, or the total number of files? I think there are currently about 4 mio. files in the backupset.

 

I have setup a test backup server that deals with just a handfull of clients, but they all have Arabic and Cyrillic filenames. I have the same crashing issues there. It's just my guess, but seems that the snapshot build has problems with certain extra long or non US-ASCII filenames.

 

Best regards,

RFE/RL

Markus Waldorf

Link to comment
Share on other sites

Hi

 

How many files are on the file server (number not size)?

 

You said you can reproduce this issue on a test server? Have you tried this to narrow down the problem file names?:

1)Define the folders on the root of a volume as subvolumes in Retrospect

2)Back up each subvolume seperately.

3)When the backup fails on a subvolume, define the folders contained within the subvolume and back up each one seperately.

4)Repeat until you can find the problem files.

 

File names should not be a problem but its possible there is a strange character somewhere that throws Retrospect off.

 

Thanks

Nate

Link to comment
Share on other sites

The snapshot build was hanging again over the weekend. This time when doing the Winbows Profile servers. I was able to stop the execution by pressing the stop button, and had to stop it right after again since it failed to run 1 previous schedule. Btw, it would be nice if Retrospect scripts could be configured to automatically remove executions past the current date.

 

According to the log it hang when doing a normal backup - not new or full. It completed 4862 files 3.1 GB, remaining 1014 files 556 MB. Performance 297.4 copy, 1.4 compare. This time there were no unicode filenames in the log, but it showed a very large list of "different modify date/time" errors.

 

Yes I can reproduce the error on a test server. Actually it's not really a test server, but this server used to backup some of those clients that were moved to the profile and fileserver a couple of month ago (wasn't my idea). And now I have a problem on the server backing up the profile and fileserver, maybe because some of the users files were moved to it. It does not hang at any particular user of file pattern though. I think something just triggers a sever resource issue like a memory leak when the server does the snapshot.

 

I don't think I can implement your suggestion to define folders for backup to isolate the problem since there are no real logical containers to choose beside usernames. Sorry I'm not going to define hundrets of subfolders. I will disable backup data verification first to see if that gives me more reliable backups for now.

 

Best regards!

Link to comment
Share on other sites

I noticed something in the windows Application Event log:

 

Beginning at 07:00:00 it wrote "Script execution terminated at specified Stop time" for 35 times, every second. That does not look normal to me. Is there a bug with snapshot build or terminating running executions properly?

Link to comment
Share on other sites

Ok. I've added 2 hours wrap-up to the proactive backup script (MoSu). I cannot do this with the script that does the backup/snapshot for above file/profile-server, as this backup is covered by a normal script. The schedule is:

 

Normal Script (Servers - Priority) every day starting 5 PM.

Proactive Backup (MoSu) every day 8 PM to 7 AM.

Proactive Backup (MoSu w/o filter) every day 3 AM PM to 5 AM.

 

The idea is to have "Servers - Priority" run every day starting at 5 PM. If Server backup manages to finish after 8 PM "MoSu" should kick in, and "MoSu w/o filter" if "MoSu" and "Servers Priority" are able to finish the job until 3 AM.

 

Looking at the above schedule once more I wonder about one thing:

 

The message "terminated at specified stop time 7 AM" is coming from proactive backup "MoSu". One the screen however is still "building snapshot" for "Servers - Priority", albeit frozen. How can MoSu stop execution at specified stop time if Servers-Priority is still running or building the snapshots - I mean MoSu should not have started yet in the first place.

 

Btw, there is only one tape drive and one execution unit. Backup ran fine last night - no crash or hanging.

Link to comment
Share on other sites

I compared what I noticed above with the other "test" server that shows the same problem. There is also the strange issue with the "terminated at specified stop time" repeated every 1 second for 37 seconds in the Windows application event log. This server however does not use normal scripts. Just 2 backup server scripts running from 5 PM to 7 AM, one from Mo - Fr and Fr - Su.

Link to comment
Share on other sites

Hi

 

Do you also have a global stop time set in the Retrospect preferences? Try removing that stop time.

 

When you stop a proactive script I believe it gives a stop message for every source in the script. Less than ideal but that is the way it works anyway.

 

Thanks

Nate

Link to comment
Share on other sites

Hello,

 

The default schedule in the preferences was and is still set to "always".

 

I compared what's written to the Windows Application Event if Retrospect doesn't crash. There is nothing written about "terminated at specified stop time at 7 AM", although it's in the Retrospect operation log.

 

There seems to be a clear pattern that when the above message appears in the event log for about 35 times, which does not match any number of sources in my scripts, than Retrospect crashed. If I don't see the "terminated at stop time" in the Windows Application event log it worked fine.

 

Maybe that could help to pin down the problem.

 

Best regads,

Markus

Link to comment
Share on other sites

Hi

 

Thanks for the detailed information.

 

Are both your test machine and the main backup server upgrades from Retrospect 6.5? Specifically, were preference files imported from 6.5 to 7.0? I'm wondering if the import doesn't have something to do with it. The way to test this theory would be to rename the Retrospect preferences folder and create a new test backup script.

 

I work with one customer over here with 75 proactive scripts all of which have stop times. They aren't seeing these same problems so I'm not sure if we can call this a bug yet.

 

I'd like to try to reproduce this. In your experience would you say this affects every proactive script with a stop time or are there other factors I need to recreate?

 

Thanks

Nate

Link to comment
Share on other sites

Hello,

 

Since I stopped verification for the backups it seems to be going better. No crashes during the last couple of days. I checked the Windows application eventlogs again to verify what I noticed yesterday. There is no note about "script terminiation" in the log. Only an entry from yesterday "Execution stopped between sources, wrap up time reached", which reflects the changes I've made 2 days ago. I suppose that without verification it has more of a chance to complete the backup and that could be why I'm not seeing "script termination" in the event log.

 

One thing I can certainly verify is that the server always crashed when there a row of 35 "script termination" events in a 1 second sequence. Normally, if the server continues to run, not hanging, there is only one.

 

I have tried many things to troubleshoot this issue already, and I have rebuild the settings from scratch at least in version 6.5, which was giving me trouble also. But as far as I remember Version 7 was an upgrade. The problem is that it is time consuming, and I have also complex selectors (filters). I'm not sure if you will succeed in recreating my situation because Windows is very complex and tied together. If it helps I have no problem to zip and upload you the settings or a backup of the whole drive if you have ftp upload.

 

Best regards,

Markus Waldorf

Link to comment
Share on other sites

Hi

 

Turning off verification is a bit of a problem in itself. I'm not sure which is worse.

 

If you move the stop time to 12 noon will the scripts have finished working? That or removing the stop time altogether may be a better solution that turning off the verify.

 

FWIW you can export and import selectors with 6.5 and 7.0. If you wanted to try clean configs you could at least transfer that part over.

 

I'll try to reproduce this here by scheduling stop times that should occur in the middle of snapshot creation. I may ask you for your configs later.

 

Thanks

Nate

Link to comment
Share on other sites

Hello,

 

Sorry for late reply but I have been very busy with other pressing issues. I know that disabling verification is bad, but it ran fine at least for the last 2 weeks. This weekend however it crashed again. The windows application event shows that script MoSu finished at 4:35 AM, and than I can see 37 events about script execution terminated at specified stop time - this always happens before it crashed. To me it seems clear that there is a problem with script termination or maybe working with the windows event manager. Unfortunately when it crashes it does not seem to store the last actions in the log file.

 

I may have found another interesting thing. I checked the config70.bat file:

 

Config70.bak 6,975 KB 10/15/2005 4:53 AM

Config70.dat 9,028 KB 10/16/2005 4:49 AM

 

This looks a bit strange to me. Why did it create a backup, and why at 4:53 AM? And why is the size different?

Maybe it's also related to the crashing noted above.

 

Best regads,

Markus

Link to comment
Share on other sites

Hi

 

The .bak file is created on a clean shutdown. Since the file is one day older it looks like yesterday's backups went OK.

 

A crash will indeed forget the last actions that Retrospect made because they cannot be properly written to the settings file.

 

I hate to say it but a clean configuration file is the next step in this case. Please try it when you can.

 

Thanks

Nate

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...