Recover full system solaris 8 x86 and legato networker 6.1
- From: D G Teed <donald.teed@xxxxxxxxx>
- Date: Thu, 16 Jul 2009 21:37:43 -0300
We have only 2 systems left to migrate off FreeBSD, and these require
an older backup client, and thus our older backup server running
Solaris 8 on x86 clone and legato 6.1.
There was a problem with the tape unit one night, and the next morning
we did a shutdown with reboot. When it came up, there were
terrible file system errors on all partitions of our disks. Attempts to repair
from a booted CDROM Solaris 8 environment were impossible in some
partitions and in others fsck reported losing a huge number of files.
Solaris was reinstalled, with patches for Networker 6.1 and st driver.
Indexes were recreated by the scanner command on the last full backup
tape of the backup server. Then recover was done, with all partitions
added (add /, add /var, add /usr, etc.). I elected to copy over
the files in place on the system. I was outside the server
room when an email arrived for root, indicating that
somehow, sendmail was now configured to route mail. I thought
this was a little surprising as the service had not been restarted during
the recover. The message stated that there had been over twenty
something errors on /dev/rmt/0ubn and the device was now disabled.
I was expecting to wait until the recover was complete, fix up the
vfstab for the current system partition assignments, and then reboot.
Instead, an operator with good knowledge of the backup software
told me the screen had gone blue. Somehow, the system
spontaneously rebooted while the recover was not complete.
The screen showed "Solaris Primary Boot" or something similar.
I'm looking for an opinion on the method of recovering
over a live system. I've done a full recover over a live system
before with Solaris 7 on a sparc server and not had
any problem like this.
Given the initial appearance of the disks getting messed up,
some troubles with the tape unit responding, the email about device
errors, and this second instance of the system going kaflooey,
I'm thinking we have a possible hardware intermittent fault -
maybe in one of the SCSI controllers.
Should I be recovering to a second system disk and then doing
installboot, etc., or can I exclude certain files in the recover
(e.g. /etc/mnttab) if overwriting this may have triggered the reboot?
I'd like to hear some opinions on this...
sunmanagers mailing list
- Prev by Date: Running X applications on SunRay SSRS 4.1 display via SSH tunnel on newer linux distributions
- Next by Date: Sun Managers Frequently Asked Questions (FAQ)
- Previous by thread: Running X applications on SunRay SSRS 4.1 display via SSH tunnel on newer linux distributions
- Next by thread: memory problem on POST test