BOOT, HALT, RESTART, 1, 2, 3

helbig_at_astro.multiNOSPAMvax.de
Date: 12/12/03


Date: Fri, 12 Dec 2003 10:44:50 +0000 (UTC)

Please reply to the group AND to me via email (for the reason, see
below).

I'm a bit confused about BOOT, HALT and RESTART (or the equivalent
numbers 1, 2 and 3---somewhere I have a note of what corresponds to
what, but if anyone has the information handy, please post it in the
reply). HALT is clear: whenever the machine gets to the console prompt,
it stays there until one tells it to boot. BOOT, in my experience,
causes it to boot after the power is switched on.

What does RESTART do?

I seem to recall three, quite distinct, explanations. Which is correct?
Is more than one correct? Does this depend on the hardware model?

1: RESTART means that, instead of rebooting after a power cycle, the
machine "picks up where it left off", the contents of RAM having been
preserved by power supplied by batteries (I think some old VAX machines
had this option---if so, then RESTART doesn't apply to all machines).

2: RESTART means that the machine will reboot not only after a power
cycle but after a crash.

3: RESTART means that the machine will reboot (only after a power cycle
or also after a crash I'm not sure) from the default boot device and, if
this doesn't work, try a network boot.

If the behaviour depends on the hardware model, I'm especially
interested in the following:

   VAXstation 4000 60

   VAXstation 4000 90

   VAXstation 3100 38

   VAXstation 3100 30

   VAX 4000 100A

   ALPHAstation 255/233

   DEC 3000 600

   DEC 300 300LX

   ALPHAserver 2000

   ALPHAserver 2100

I started out with a (new!) ALPHAstation 255/233 with 2 internal disks
(RZ 26 and RZ 28) and have gradually been building up a robust cluster.
I now have 3 nodes, each with its own (shadowed) system disk and all
important disks shadowed with members on two different nodes (one of
which contains SYSUAF.DAT etc with the corresponding logicals defined on
each node). A DSL "router" essentially directs all connections to the
TCPIP cluster alias.

This works fine.

The cluster at the moment contains the ALPHAstation, a VAXstation 4000
60 and a VAX 4000 100A, is 500 km away at the moment and I'm logged in
from the 3000 600 (using a nice 21" |d|i|g|i|t|a|l| colour monitor with
knobs on the front instead of an on-screen menu). The hardware setup on
the cluster is fine. I have now upgraded the VAX machines to 7.3 and
the ALPHA to 7.3-1, TCPIP 5.3 on all nodes, all patches current as of
last Sunday.

About 30 hours ago, the ALPHA crashed. AUTO_ACTION is BOOT. Hence my
questions above: would it have tried to boot again after the crash if
AUTO_ACTION was RESTART instead of BOOT?

Next on my list of things to do is to get software running on VAX which
now runs only on ALPHA, for example NEWSRDR (thus I don't have my normal
news access at the moment, hence the desire to receive copies of posts
via email) and LYNX (which I use in batch to update the DNS information
for the dynamic IP address). Thus I'll have to update the DNS by hand
for a while and monitor the IP address externally (possible by
periodically automatically opening a TCPIP connection to the outside
world).

I think I've only had one real machine crash before. (The machine ran
7.1 for a couple of years then 7.2-1 for four years.) My batch job
which updates the DNS information via LYNX used to crash once every day
or to with a "BUGCHECK, internal consistency failure" message (without
crashing the machine). Since upgrading to VMS 7.3-1 last weekend, I was
happy to see this go away, and everything else looked fine. I am a bit
concerned, though, of course, that a machine crash happened so soon
after upgrading to 7.3-1. (I have had no problems with 7.3 on the VAX
machines.)

Again, after the upgrade I applied all patches which were current last
Sunday except ACRTL. (I now know the reason the installation failed and
will increase the size of the page file---the machine only has 64 MB of
memory, though I hope to upgrade it to maybe 1GB in January---and try
again.) Could the crash have been the result of the upgrade? Is there
anything I could have forgotten? Perhaps some system parameter which is
not set correctly? (7.3 used to have problems with XFC, but I assume
these are long solved.)

As the result of a programming error, a few hours before the crash, an
application was started which, after an unknown time, filled up
DISK$SCRATCH (a physical disk; each user has DISK$SCRATCH:[username] and
the SYS$SCRATCH logical points to this directory). The disk (an RZ 23L)
now has about 100 of the 123 MB free and the programming error has been
corrected (or, rather, the application temporarily disabled). In
particular, the last DNS update occurred these several hours after this
application was started (I don't know when the disk actually filled up)
so things must have been more or less OK then.

I can understand that a full SYS$SCRATCH will do nasty things to some
applications, but could it cause a machine crash?

When the machine is back up, I'll have a look at the dump, if there is
one. The only thing I can see from the other nodes now is an error
count of 1 on the system disk and of course the system disk of the ALPHA
is in the state "MntVerifyTimeout" and the members are
"HostUnavailable". (Normally, all system disk (shadow sets) are mounted
on all nodes, as indeed are all disks.)

I have been noticing a few errors on PEA0: recently. This appears to
happen during shadow copies. Perhaps I should set things up so that
only one happens at a time; ALL traffic (LAT, TCPIP (no DECnet at the
moment), MSCP, SCS etc) is going over a 10 Mb/s UTP LAN.

Thanks in advance for any help!

This is the first "real-world" test of the robustness of my cluster. It
is great to have hardware which is almost 15 years old in some cases,
running the latest version of VMS and doing fancy stuff like shadowing,
and survive a node crash so gracefully. (The fact that it wasn't even
more graceful is due to the fact that I don't yet have all software
running on VAX.) As long as the ALPHA has no hardware error, and 7.3-1
isn't too crash-prone, I'll be happy again after I find out the reason
for the crash.



Relevant Pages

  • BOOT, HALT, RESTART, 1, 2, 3
    ... I'm a bit confused about BOOT, HALT and RESTART (or the equivalent ... What does RESTART do? ... cycle but after a crash. ... each with its own system disk and all ...
    (comp.sys.dec)
  • Re: BOOT, HALT, RESTART, 1, 2, 3
    ... > it stays there until one tells it to boot. ... > What does RESTART do? ... Does this depend on the hardware model? ... > cycle but after a crash. ...
    (comp.os.vms)
  • Re: BOOT, HALT, RESTART, 1, 2, 3
    ... > it stays there until one tells it to boot. ... > What does RESTART do? ... Does this depend on the hardware model? ... > cycle but after a crash. ...
    (comp.sys.dec)
  • Re: Computer Keeps Trying to Boot Over & Over - Harddrive Bad?
    ... cd with the boot files options selected. ... would randomly restart on its own and boot back up. ... Safe Mode with Networking ... I tried to use the System Recovery disk that came with the computer to ...
    (microsoft.public.windowsxp.help_and_support)
  • RE: HELP!!!! XP Service Pack 2 Crash
    ... It is unlikely that the loading of SP2 and the "crash" are related. ... the floppy disk. ... boot off of the XP ... partitions (drive letters) that you had before the crash. ...
    (microsoft.public.windowsxp.general)