[SUMMARY] Sunfire v880 reboot
From: Bill R. Williams (brwms_at_etsu.edu)
Date: 02/18/05
- Previous message: Michael Segale: "motd for ftp"
- In reply to: Bill R. Williams: "Sunfire v880 reboot"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 18 Feb 2005 15:40:19 -0500 To: Sun Managers <sunmanagers@sunmanagers.org>
For the benefit of others who might run into the same problem, Here
are the responses to my post regarding: Sunfire v880 reboot
(My original post from Mon, Feb 14, 2005 is included at the end.)
I should have mentioned in my original post that this system has been
running the same levels of software (Solaris 5.9), firmware (OBP), and
hardware configuration since August 2004, and such a thing has never
happened before.
Many thanks to those who offered opinions, theories, possibilities,
and suggestions! My remarks concerning my particular situation are
intertwined in their responses.
---------------------------------------------
Bill R. Williams <brw@etsu.edu>
------------------------ ETSU Library Systems
From: Peter A. van Gemert
>I have no clue on what went on in your system but could it be an
>faulty UPS?
Possibly, but I don't think the UPS is the culprit.
From: Eric Noriega
>Have you looked for a crash dump under /var/crash ?
There was no crash dump in there.
(That is the area defined in my 'dumpadm'.)
The following from joe_fletcher gets my vote for most probable cause:
From: "joe_fletcher"
>Usual thing in these situations is a watchdog reset. Tends
>to be nothing in the logs as it's about as hard a reset as
>you can get short of using a hammer. The only place you will
>see anything is on the console so, assuming you have it
>configured, take a look in the RSC buffer logs for whatever
>records remain.
>Cause is generally hardware related. I'd also run psrinfo.
>You might find the thing is now running on an odd number of CPUs. I've
>seen this happen a few times.
My CPUs are all online & functioning.
Also, prtdiag -v indicates everything within tolerances and "OK".
From: "Michael Horton"
>How is your power run?
>
>3 v880 power supplies into 1 ups?
>(no redundancy)
>3 v880 power supplies into 1 power circuit?
>(no redundancy)
>
>if your ups has a glitch (and they do), you have a power event.
I am not going to rule this out.
From: "Eric Paul"
>We had a similar issue a few months ago with one of our servers...
>They replaced two CPU modules, and several banks of RAM before the
>problem went away. Something to be aware of, there is an FCO for
>certain memory modules which were installed on a number of 880s
>(though Sun is not talking about it much...) I only found out from
>my FE. You might want to put in a call to tech support and see if
>they can give you the lot numbers and check the RAM out.
>
>The other thing you might want to do it set up syslog to point to a
>central logging server. I've found a lot of times when Sun boxes go
>down hard, they don't flush the last logs to disk. But the central
>server does get the logs and that's given me more information to go
>on.
From: Daniel Vega
>obp down rev maybe?
On Mon, Feb 14, 2005 at 06:00:17PM -0500, Bill R. Williams wrote:
> SunOS localhost 5.9 Generic_117171-07 sun4u sparc SUNW,Sun-Fire-880
> This afternoon, this machine just rebooted, and I cannot find the why!
>
> Following the reboot, all status lights on the v880 are normal, and
> all disk drives are functioning.
> There is no crash dump, and the only thing I can find in the logs
> which indicate a glitch is in the /var/adm/messages file: the last
> entry before the "new" boot-up entries is a "line" of ~308 NULL bytes.
>
> I've run prtdiag and all temperatures, fans, etc. look Ok.
> Things look correct from 'metastat'.
>
> This unit has 3 power supplies which are plugged to UPS, so it wasn't
> a glitch in power service coming to the machine, and if it's a power
> supply the thing is supposed to be able to continue with two of them
> functioning. And there's no indication of any problems (prtdiag) with
> either of the three.
>
> Anybody seen this sorta thing happen?
> (Maybe there's some gremlin in the v880 and/or Solaris 9 that I've
> missed.)
>
> This sorta thing makes me nervous.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
- Previous message: Michael Segale: "motd for ftp"
- In reply to: Bill R. Williams: "Sunfire v880 reboot"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|