[SUMMARY] Sunfire v880 reboot

From: Bill R. Williams (brwms_at_etsu.edu)
Date: 02/18/05

  • Next message: Crist Clark: "SunScreen 3.2 Queries"
    Date: Fri, 18 Feb 2005 15:40:19 -0500
    To: Sun Managers <sunmanagers@sunmanagers.org>
    
    

    For the benefit of others who might run into the same problem, Here
    are the responses to my post regarding: Sunfire v880 reboot
    (My original post from Mon, Feb 14, 2005 is included at the end.)

    I should have mentioned in my original post that this system has been
    running the same levels of software (Solaris 5.9), firmware (OBP), and
    hardware configuration since August 2004, and such a thing has never
    happened before.

    Many thanks to those who offered opinions, theories, possibilities,
    and suggestions! My remarks concerning my particular situation are
    intertwined in their responses.
     ---------------------------------------------
     Bill R. Williams <brw@etsu.edu>
     ------------------------ ETSU Library Systems

    From: Peter A. van Gemert
    >I have no clue on what went on in your system but could it be an
    >faulty UPS?
    Possibly, but I don't think the UPS is the culprit.

    From: Eric Noriega
    >Have you looked for a crash dump under /var/crash ?
    There was no crash dump in there.
    (That is the area defined in my 'dumpadm'.)

    The following from joe_fletcher gets my vote for most probable cause:
    From: "joe_fletcher"
    >Usual thing in these situations is a watchdog reset. Tends
    >to be nothing in the logs as it's about as hard a reset as
    >you can get short of using a hammer. The only place you will
    >see anything is on the console so, assuming you have it
    >configured, take a look in the RSC buffer logs for whatever
    >records remain.

    >Cause is generally hardware related. I'd also run psrinfo.
    >You might find the thing is now running on an odd number of CPUs. I've
    >seen this happen a few times.
    My CPUs are all online & functioning.
    Also, prtdiag -v indicates everything within tolerances and "OK".

    From: "Michael Horton"
    >How is your power run?
    >
    >3 v880 power supplies into 1 ups?
    >(no redundancy)
    >3 v880 power supplies into 1 power circuit?
    >(no redundancy)
    >
    >if your ups has a glitch (and they do), you have a power event.
    I am not going to rule this out.

    From: "Eric Paul"
    >We had a similar issue a few months ago with one of our servers...
    >They replaced two CPU modules, and several banks of RAM before the
    >problem went away. Something to be aware of, there is an FCO for
    >certain memory modules which were installed on a number of 880s
    >(though Sun is not talking about it much...) I only found out from
    >my FE. You might want to put in a call to tech support and see if
    >they can give you the lot numbers and check the RAM out.
    >
    >The other thing you might want to do it set up syslog to point to a
    >central logging server. I've found a lot of times when Sun boxes go
    >down hard, they don't flush the last logs to disk. But the central
    >server does get the logs and that's given me more information to go
    >on.

    From: Daniel Vega
    >obp down rev maybe?

    On Mon, Feb 14, 2005 at 06:00:17PM -0500, Bill R. Williams wrote:
    > SunOS localhost 5.9 Generic_117171-07 sun4u sparc SUNW,Sun-Fire-880
    > This afternoon, this machine just rebooted, and I cannot find the why!
    >
    > Following the reboot, all status lights on the v880 are normal, and
    > all disk drives are functioning.
    > There is no crash dump, and the only thing I can find in the logs
    > which indicate a glitch is in the /var/adm/messages file: the last
    > entry before the "new" boot-up entries is a "line" of ~308 NULL bytes.
    >
    > I've run prtdiag and all temperatures, fans, etc. look Ok.
    > Things look correct from 'metastat'.
    >
    > This unit has 3 power supplies which are plugged to UPS, so it wasn't
    > a glitch in power service coming to the machine, and if it's a power
    > supply the thing is supposed to be able to continue with two of them
    > functioning. And there's no indication of any problems (prtdiag) with
    > either of the three.
    >
    > Anybody seen this sorta thing happen?
    > (Maybe there's some gremlin in the v880 and/or Solaris 9 that I've
    > missed.)
    >
    > This sorta thing makes me nervous.
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Crist Clark: "SunScreen 3.2 Queries"

    Relevant Pages

    • Re: How to start last on a SunOS 4 box?
      ... why the machine would crash and reboot late at night with no logs or any ... record of why it just stopped working and would reboot. ... a hand reached over and cycled the power switch. ... building's electrical ground wasn't placed deep enough and twice ...
      (comp.unix.solaris)
    • x freezes after last dapper kernel upgrade
      ... I've had 2 freezes in the last day since upgrading to linux ... I have to do a hard (power button) reboot. ... The logs ...
      (Ubuntu)
    • Re: WDS + WPA + RADIUS problem
      ... dive into the router from a telnet or SSH session ... command line or menu driven reboot. ... range of power supply voltages. ... but haven't spoken to client this morning to ascertain). ...
      (alt.internet.wireless)
    • Re: Samsung ML-2010
      ... it on after the computer has been turned on, it draws so much ... I took it off the power strip. ... plugs, the ... No problem with the computer - it didn't reboot. ...
      (comp.periphs.printers)
    • Re: Samsung ML-2010
      ... turn it on after the computer has been turned on, it draws so ... much current that it causes the computer to reboot. ... I took it off the power strip. ... contacts on the plugs, or maybe even inside the wall going up to ...
      (comp.periphs.printers)