Re: BOOT, HALT, RESTART, 1, 2, 3

mckinneyj_at_cpva.saic.com
Date: 12/12/03


Date: 12 Dec 03 05:58:19 PST

In article <brc672$2nap$1@news.xenopsyche.net>,
 helbig@astro.multiNOSPAMvax.de writes:
> Please reply to the group AND to me via email (for the reason, see
> below).
>
> I'm a bit confused about BOOT, HALT and RESTART (or the equivalent
> numbers 1, 2 and 3---somewhere I have a note of what corresponds to
> what, but if anyone has the information handy, please post it in the
> reply). HALT is clear: whenever the machine gets to the console prompt,
> it stays there until one tells it to boot. BOOT, in my experience,
> causes it to boot after the power is switched on.
>
> What does RESTART do?
>
> I seem to recall three, quite distinct, explanations. Which is correct?
> Is more than one correct? Does this depend on the hardware model?
>
> 1: RESTART means that, instead of rebooting after a power cycle, the
> machine "picks up where it left off", the contents of RAM having been
> preserved by power supplied by batteries (I think some old VAX machines
> had this option---if so, then RESTART doesn't apply to all machines).

No.

>
> 2: RESTART means that the machine will reboot not only after a power
> cycle but after a crash.
>
> 3: RESTART means that the machine will reboot (only after a power cycle
> or also after a crash I'm not sure) from the default boot device and, if
> this doesn't work, try a network boot.
>
> If the behaviour depends on the hardware model, I'm especially
> interested in the following:
>
> VAXstation 4000 60
>
> VAXstation 4000 90
>
> VAXstation 3100 38
>
> VAXstation 3100 30
>
> VAX 4000 100A
>
> ALPHAstation 255/233
>
> DEC 3000 600
>
> DEC 300 300LX
>
> ALPHAserver 2000
>
> ALPHAserver 2100
>
> I started out with a (new!) ALPHAstation 255/233 with 2 internal disks
> (RZ 26 and RZ 28) and have gradually been building up a robust cluster.
> I now have 3 nodes, each with its own (shadowed) system disk and all
> important disks shadowed with members on two different nodes (one of
> which contains SYSUAF.DAT etc with the corresponding logicals defined on
> each node). A DSL "router" essentially directs all connections to the
> TCPIP cluster alias.
>
> This works fine.
>
> The cluster at the moment contains the ALPHAstation, a VAXstation 4000
> 60 and a VAX 4000 100A, is 500 km away at the moment and I'm logged in
> from the 3000 600 (using a nice 21" |d|i|g|i|t|a|l| colour monitor with
> knobs on the front instead of an on-screen menu). The hardware setup on
> the cluster is fine. I have now upgraded the VAX machines to 7.3 and
> the ALPHA to 7.3-1, TCPIP 5.3 on all nodes, all patches current as of
> last Sunday.
>
> About 30 hours ago, the ALPHA crashed. AUTO_ACTION is BOOT. Hence my
> questions above: would it have tried to boot again after the crash if
> AUTO_ACTION was RESTART instead of BOOT?
>

Yes;

  AUTO_ACTION = HALT => never boot automatically
              = BOOT => boot on power-on
              = RESTART => boot after power-on or crash

If you want a crash dump written on your Alphas after a bug-machinecheck,
then SET AUTO_ACTION RESTART .

For the VAXes, are you referring to the value of SET HALT? or hardware
switches?

> Next on my list of things to do is to get software running on VAX which
> now runs only on ALPHA, for example NEWSRDR (thus I don't have my normal
> news access at the moment, hence the desire to receive copies of posts
> via email) and LYNX (which I use in batch to update the DNS information
> for the dynamic IP address). Thus I'll have to update the DNS by hand
> for a while and monitor the IP address externally (possible by
> periodically automatically opening a TCPIP connection to the outside
> world).
>
> I think I've only had one real machine crash before. (The machine ran
> 7.1 for a couple of years then 7.2-1 for four years.) My batch job
> which updates the DNS information via LYNX used to crash once every day
> or to with a "BUGCHECK, internal consistency failure" message (without
> crashing the machine). Since upgrading to VMS 7.3-1 last weekend, I was
> happy to see this go away, and everything else looked fine. I am a bit
> concerned, though, of course, that a machine crash happened so soon
> after upgrading to 7.3-1. (I have had no problems with 7.3 on the VAX
> machines.)
>
> Again, after the upgrade I applied all patches which were current last
> Sunday except ACRTL. (I now know the reason the installation failed and
> will increase the size of the page file---the machine only has 64 MB of
> memory, though I hope to upgrade it to maybe 1GB in January---and try
> again.) Could the crash have been the result of the upgrade? Is there
> anything I could have forgotten? Perhaps some system parameter which is
> not set correctly? (7.3 used to have problems with XFC, but I assume
> these are long solved.)
>
> As the result of a programming error, a few hours before the crash, an
> application was started which, after an unknown time, filled up
> DISK$SCRATCH (a physical disk; each user has DISK$SCRATCH:[username] and
> the SYS$SCRATCH logical points to this directory). The disk (an RZ 23L)
> now has about 100 of the 123 MB free and the programming error has been
> corrected (or, rather, the application temporarily disabled). In
> particular, the last DNS update occurred these several hours after this
> application was started (I don't know when the disk actually filled up)
> so things must have been more or less OK then.
>
> I can understand that a full SYS$SCRATCH will do nasty things to some
> applications, but could it cause a machine crash?
>
> When the machine is back up, I'll have a look at the dump, if there is
> one. The only thing I can see from the other nodes now is an error
> count of 1 on the system disk and of course the system disk of the ALPHA
> is in the state "MntVerifyTimeout" and the members are
> "HostUnavailable". (Normally, all system disk (shadow sets) are mounted
> on all nodes, as indeed are all disks.)
>
> I have been noticing a few errors on PEA0: recently. This appears to
> happen during shadow copies. Perhaps I should set things up so that
> only one happens at a time; ALL traffic (LAT, TCPIP (no DECnet at the
> moment), MSCP, SCS etc) is going over a 10 Mb/s UTP LAN.
>
> Thanks in advance for any help!
>
> This is the first "real-world" test of the robustness of my cluster. It
> is great to have hardware which is almost 15 years old in some cases,
> running the latest version of VMS and doing fancy stuff like shadowing,
> and survive a node crash so gracefully. (The fact that it wasn't even
> more graceful is due to the fact that I don't yet have all software
> running on VAX.) As long as the ALPHA has no hardware error, and 7.3-1
> isn't too crash-prone, I'll be happy again after I find out the reason
> for the crash.
>



Relevant Pages

  • BOOT, HALT, RESTART, 1, 2, 3
    ... I'm a bit confused about BOOT, HALT and RESTART (or the equivalent ... What does RESTART do? ... cycle but after a crash. ... each with its own system disk and all ...
    (comp.os.vms)
  • BOOT, HALT, RESTART, 1, 2, 3
    ... I'm a bit confused about BOOT, HALT and RESTART (or the equivalent ... What does RESTART do? ... cycle but after a crash. ... each with its own system disk and all ...
    (comp.sys.dec)
  • Re: Problems with BCE network adapter (Dell PE2950)
    ... how much data is in your NFS mounted directory? ... First boot with debug driver crash very shortly after nfs mounts ... Second boot with the debug driver system does not crash after cat * ... identical PE1950 the system crashed (Again producing a corrupt core ...
    (freebsd-net)
  • Re: BOOT, HALT, RESTART, 1, 2, 3
    ... > it stays there until one tells it to boot. ... > What does RESTART do? ... Does this depend on the hardware model? ... > cycle but after a crash. ...
    (comp.sys.dec)
  • crash on CONFIG_CFAG12864B=y in 2.6.20-rc3-mm1
    ... causes a crash at boot in 2.6.20-rc3-mm1. ... to detect if the hardware doesn't exists. ... BUG: unable to handle kernel NULL pointer dereference at virtual address 0000004 printing eip: ...
    (Linux-Kernel)