Re: BOOT, HALT, RESTART, 1, 2, 3
mckinneyj_at_cpva.saic.com
Date: 12/12/03
- Next message: Bob Koehler: "Re: Full or Partial file spec ?"
- Previous message: Bob Koehler: "Re: Scott McNealy;s Dilemma"
- In reply to: helbig_at_astro.multiNOSPAMvax.de: "BOOT, HALT, RESTART, 1, 2, 3"
- Next in thread: Ken Fairfield: "Re: BOOT, HALT, RESTART, 1, 2, 3"
- Reply: Ken Fairfield: "Re: BOOT, HALT, RESTART, 1, 2, 3"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 12 Dec 03 05:58:19 PST
In article <brc672$2nap$1@news.xenopsyche.net>,
helbig@astro.multiNOSPAMvax.de writes:
> Please reply to the group AND to me via email (for the reason, see
> below).
>
> I'm a bit confused about BOOT, HALT and RESTART (or the equivalent
> numbers 1, 2 and 3---somewhere I have a note of what corresponds to
> what, but if anyone has the information handy, please post it in the
> reply). HALT is clear: whenever the machine gets to the console prompt,
> it stays there until one tells it to boot. BOOT, in my experience,
> causes it to boot after the power is switched on.
>
> What does RESTART do?
>
> I seem to recall three, quite distinct, explanations. Which is correct?
> Is more than one correct? Does this depend on the hardware model?
>
> 1: RESTART means that, instead of rebooting after a power cycle, the
> machine "picks up where it left off", the contents of RAM having been
> preserved by power supplied by batteries (I think some old VAX machines
> had this option---if so, then RESTART doesn't apply to all machines).
No.
>
> 2: RESTART means that the machine will reboot not only after a power
> cycle but after a crash.
>
> 3: RESTART means that the machine will reboot (only after a power cycle
> or also after a crash I'm not sure) from the default boot device and, if
> this doesn't work, try a network boot.
>
> If the behaviour depends on the hardware model, I'm especially
> interested in the following:
>
> VAXstation 4000 60
>
> VAXstation 4000 90
>
> VAXstation 3100 38
>
> VAXstation 3100 30
>
> VAX 4000 100A
>
> ALPHAstation 255/233
>
> DEC 3000 600
>
> DEC 300 300LX
>
> ALPHAserver 2000
>
> ALPHAserver 2100
>
> I started out with a (new!) ALPHAstation 255/233 with 2 internal disks
> (RZ 26 and RZ 28) and have gradually been building up a robust cluster.
> I now have 3 nodes, each with its own (shadowed) system disk and all
> important disks shadowed with members on two different nodes (one of
> which contains SYSUAF.DAT etc with the corresponding logicals defined on
> each node). A DSL "router" essentially directs all connections to the
> TCPIP cluster alias.
>
> This works fine.
>
> The cluster at the moment contains the ALPHAstation, a VAXstation 4000
> 60 and a VAX 4000 100A, is 500 km away at the moment and I'm logged in
> from the 3000 600 (using a nice 21" |d|i|g|i|t|a|l| colour monitor with
> knobs on the front instead of an on-screen menu). The hardware setup on
> the cluster is fine. I have now upgraded the VAX machines to 7.3 and
> the ALPHA to 7.3-1, TCPIP 5.3 on all nodes, all patches current as of
> last Sunday.
>
> About 30 hours ago, the ALPHA crashed. AUTO_ACTION is BOOT. Hence my
> questions above: would it have tried to boot again after the crash if
> AUTO_ACTION was RESTART instead of BOOT?
>
Yes;
AUTO_ACTION = HALT => never boot automatically
= BOOT => boot on power-on
= RESTART => boot after power-on or crash
If you want a crash dump written on your Alphas after a bug-machinecheck,
then SET AUTO_ACTION RESTART .
For the VAXes, are you referring to the value of SET HALT? or hardware
switches?
> Next on my list of things to do is to get software running on VAX which
> now runs only on ALPHA, for example NEWSRDR (thus I don't have my normal
> news access at the moment, hence the desire to receive copies of posts
> via email) and LYNX (which I use in batch to update the DNS information
> for the dynamic IP address). Thus I'll have to update the DNS by hand
> for a while and monitor the IP address externally (possible by
> periodically automatically opening a TCPIP connection to the outside
> world).
>
> I think I've only had one real machine crash before. (The machine ran
> 7.1 for a couple of years then 7.2-1 for four years.) My batch job
> which updates the DNS information via LYNX used to crash once every day
> or to with a "BUGCHECK, internal consistency failure" message (without
> crashing the machine). Since upgrading to VMS 7.3-1 last weekend, I was
> happy to see this go away, and everything else looked fine. I am a bit
> concerned, though, of course, that a machine crash happened so soon
> after upgrading to 7.3-1. (I have had no problems with 7.3 on the VAX
> machines.)
>
> Again, after the upgrade I applied all patches which were current last
> Sunday except ACRTL. (I now know the reason the installation failed and
> will increase the size of the page file---the machine only has 64 MB of
> memory, though I hope to upgrade it to maybe 1GB in January---and try
> again.) Could the crash have been the result of the upgrade? Is there
> anything I could have forgotten? Perhaps some system parameter which is
> not set correctly? (7.3 used to have problems with XFC, but I assume
> these are long solved.)
>
> As the result of a programming error, a few hours before the crash, an
> application was started which, after an unknown time, filled up
> DISK$SCRATCH (a physical disk; each user has DISK$SCRATCH:[username] and
> the SYS$SCRATCH logical points to this directory). The disk (an RZ 23L)
> now has about 100 of the 123 MB free and the programming error has been
> corrected (or, rather, the application temporarily disabled). In
> particular, the last DNS update occurred these several hours after this
> application was started (I don't know when the disk actually filled up)
> so things must have been more or less OK then.
>
> I can understand that a full SYS$SCRATCH will do nasty things to some
> applications, but could it cause a machine crash?
>
> When the machine is back up, I'll have a look at the dump, if there is
> one. The only thing I can see from the other nodes now is an error
> count of 1 on the system disk and of course the system disk of the ALPHA
> is in the state "MntVerifyTimeout" and the members are
> "HostUnavailable". (Normally, all system disk (shadow sets) are mounted
> on all nodes, as indeed are all disks.)
>
> I have been noticing a few errors on PEA0: recently. This appears to
> happen during shadow copies. Perhaps I should set things up so that
> only one happens at a time; ALL traffic (LAT, TCPIP (no DECnet at the
> moment), MSCP, SCS etc) is going over a 10 Mb/s UTP LAN.
>
> Thanks in advance for any help!
>
> This is the first "real-world" test of the robustness of my cluster. It
> is great to have hardware which is almost 15 years old in some cases,
> running the latest version of VMS and doing fancy stuff like shadowing,
> and survive a node crash so gracefully. (The fact that it wasn't even
> more graceful is due to the fact that I don't yet have all software
> running on VAX.) As long as the ALPHA has no hardware error, and 7.3-1
> isn't too crash-prone, I'll be happy again after I find out the reason
> for the crash.
>
- Next message: Bob Koehler: "Re: Full or Partial file spec ?"
- Previous message: Bob Koehler: "Re: Scott McNealy;s Dilemma"
- In reply to: helbig_at_astro.multiNOSPAMvax.de: "BOOT, HALT, RESTART, 1, 2, 3"
- Next in thread: Ken Fairfield: "Re: BOOT, HALT, RESTART, 1, 2, 3"
- Reply: Ken Fairfield: "Re: BOOT, HALT, RESTART, 1, 2, 3"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|