RE: Cluster hang -- Getting Crash Dump
From: Stuart, Ed (Ed.Stuart_at_austinenergy.com)
Date: 03/18/04
- Next message: JF Mezei: "Re: Message-ID ??? How to replicate?"
- Previous message: JF Mezei: "Re: Postal mail sorting application"
- Maybe in reply to: Dave Baxter: "Cluster hang -- Getting Crash Dump"
- Next in thread: Keith Parris: "Re: Cluster hang -- Getting Crash Dump"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 18 Mar 2004 13:46:45 -0600
Believe it or not the Alphas have a console command to initiate a crash and
write a dump. At the console command prompt enter: crash
Ed
**Please apply a generous amount of all the usual disclaimers here.**
> -----Original Message-----
> From: dave.baxter@bannerhealth.com
> [mailto:dave.baxter@bannerhealth.com]
> Sent: Thursday, March 18, 2004 11:30 AM
> To: Info-VAX@Mvb.Saic.Com
> Subject: Cluster hang -- Getting Crash Dump
>
>
> In the past month, I have had two occasions when my (2-node ES40,
> OVMS731)cluster has hung. All of the symptoms point to a probable
> Quorum Hang, (quite possible since I don't have a Quorum Disk),
> however there are some indications that this might not be the case.
> (Note: VOTES = 1 each, EXPECTED_VOTES = 1)
>
> 1. Neither node crashed. (so no quorum loss there).
> 2. The cluster uses two fully independent GB ethernet interconnects
> (switches), which are private and do not connect to the network.
> 3. My other cluster, (which uses the same interconnect (i.e. same
> pair of switches) was unaffected by the hang.
> 4. On examination, (i.e. after driving in from home to take care of
> the problem), all link lights are green on the GB Switches. This
> would seem to rule out the interconnects as the source of any quorum
> loss. And even if they did somehow, simultaneously lose their
> connection and cause a loss of quorum, would quorum not be
> restored when the interconnect links reestablished???
>
> I would really appreciate any comments/suggestions here.
> (Please dont start berating me about the lack of a Quorum
> Disk unless you think it would have avoided this problem, and
> can explain why).
>
> On a second, equally important issue. In order to break the
> hang I had to HALT the nodes (one at a time with Cntrl/P).
> (Comment:: Again, with the votes set as above, this should
> have released the hang on the other node, it didn't!!).
>
> Because the nodes were HALTed, they didn't automatically
> generate a Crash Dump, so I dont have any thing to diagnose
> with (Error Log contains no indications of problems, neither
> does Operator.log).
>
> The crash dump file SYSDUMP.DMP is set up off the system
> disk, on an internal drive, (and is correctly set up in
> sysgen and at the console level).
>
> HOW CAN I FORCE A DUMP AFTER I AM AT THE "P00>>" CONSOLE PROMPT ???
>
> I am sure that I remember being told that there is a
> command that can be entered at the "P00>>" prompt that
> forces a dump of the
> Registers. I would really appreciate it if anyone can give me this
> information. This is probably more important to me than the cause
> of the hang since, at the moment, I have nowhere to go except
> to endlessly analyse the symptoms in my head, (this leads to
> insanity, ultimately).
>
> The only other information I can think of which might be
> useful is to mention that I am running the Cerner Millennium
> Clinical Application, with Oracle 8.1.7.4.
>
> Thanks
>
> Dave.
>
- Next message: JF Mezei: "Re: Message-ID ??? How to replicate?"
- Previous message: JF Mezei: "Re: Postal mail sorting application"
- Maybe in reply to: Dave Baxter: "Cluster hang -- Getting Crash Dump"
- Next in thread: Keith Parris: "Re: Cluster hang -- Getting Crash Dump"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|