SUMMARY: DS20 won't boot

From: Bill Sadvary (sadvary_at_dickinson.edu)
Date: 06/21/04

  • Next message: jeffrey.s.horvath_at_census.gov: "caa User Defined Attributes on 5.1B"
    Date: Mon, 21 Jun 2004 07:22:02 -0400 (EDT)
    To: Tru64-UNIX-Managers <tru64-unix-managers@ornl.gov>
    
    

    Most replies suggested reseated the CPU board and memory, which I did with
    no luck, and to also pull the PCI cards one at a time and try to boot.

    After pulling the SAN card, it did boot to SRM. I swapped in a card from
    an unused system, reconfig'd the SAN access and now all is fine!

    A thanks goes out to:

    Thomas.Blinn
    Kjell Andresen
    Peter Reynolds
    Tom Traina

    My original post is below along with the first three replies.

    -Bill Sadvary

    ---------- Original message ----------

    I had our DS20E shutdown for a couple hours and now it won't boot. The
    four LEDs on the front panel all flash on, then #2 and #4 shut of for half
    a second then all are on and stay on. This all happens within a
    second or so.

    I get nothing on the graphics monitor (I'm almost certain the console is
    set to graphics) and nothing on COM1's terminal connection.

    The Systems Features Board says all is OK, the two pwr supplies, the
    system fans, the CPU fan and the temperature. There is no spare pwr supply
    installed.

    I connected a terminal up to the CPU diagnostics port and I get...

    DP264...V00005201.01.000000567ace.0018.02.000007d1.03.0373bef8c1.05.04..
    0620000000.14#0000000000000204#.15.00900000.17.

    I press enter and never get the SRM prompt.

    It has only one CPU, boots off an internal drive, has storage on a SAN
    (the SAN switch port does not give a green status like it should).

    Any hardware gurus out there with any ideas?

    Thanks,
    -Bill

    ----------Replies-----------

    Dr Thomas.Blinn
    ---------------
    I have the DS20E maintenance guide but it's in the office and I'm
    at home.

    I'd start by reseating the PCI modules and things like the CPU
    and memories. If you are able to talk to the management board
    then the serial hardware is working to some extent, but that's
    no guarantee that COM1 is really working. But if the power-on
    self-test (the software that runs before the SRM console) is
    managing to load the SRM console, and the hardware's connected
    and working, then even if the console is set to graphics, you
    should be able to get the SRM problem on COM1 by entering a
    few "enter" key strokes (unless auto_action is boot in which
    case it's going to try to boot, and if the problem is the SAN
    card which it sounds like it might be then you might have to
    break it out of that mode).

    It does sound to me like something is "hanging" the system, so
    try removing PCI options until you can get the SRM console to
    talk to you through the serial port (COM1). If you can't get
    that to work, you've got a problem in the motherboard, the CPU,
    or the memory. If you can get the "bare" system board to talk
    to you, then you can start adding the options back in to see if
    the hang comes back; that will help isolate the problem.

    Good luck..

    Tom

    Kjell Andresen
    --------------
    http://h18002.www1.hp.com/alphaserver/download/ds20e_reference_d.pdf
    Page 8-3:

    All 4: Starting console
    #2+#4 off: Setting memory low limit
    All 4 (again): Probing I/O

    NOTE: The first two LED patterns (LEDs 1-4 on, followed by LEDs 1-3 on
          and LED 4 off) are identical to the last two patterns, but
          represent different startup phases. Observe the LED pattern on
          power-up to ensure that the first two patterns execute
          successfully. If power-up does not succeed, and a LED pattern is
          lit that is the same as one of the first two patterns, the
          problem lies with one of the last two phases of the power-up
          sequence.

    Any beep codes? --> p.8-2

    Table 81 Error Beep Codes
    Beeps Message/Meaning Action to Repair
    1-2-3 Indicates fail-safe booter startup. The
          firmware in flash ROM is unavailable and
          fail-safe booter has begun running.
          Update the firmware.
          See Section 8.11.
    4 No valid header in ROM. Loading entire
          ROM. The header in the ROM is not valid.
          Replace the ROM.
    6 Memory error detected. A checksum error
          occurred after the ROM image was copied
          into memory. Either memory is
          misconfigured or a memory DIMM needs to
          be reseated.
          Check memory
          configuration.
          Reseat or replace DIMM.
    This was how far I got - hope you can get some information to come a
    bit further..

    Didn't find information about the CPU diags port in det manual.

    I'd remove the i/o card and tried without the san in first place.

    Regards,
    Kjell Andresen Systems administrator, University of Oslo, Norway
                    Center for Information Technology Services and
                    Department of Geosciences

    Peter Reynolds
    --------------
    According to the DS20E service guide, the problem you may have can be
    caused by a defective DIMM or B-cache, or a faulty PCI card. In addition
    to this a faulty CPU or main logic board can pass its POST, but fail to
    start the SRM.

    Is the LED on the floppy drive illuminated? If it is the firmware is
    corrupted.

    Is there a 'beep' code? 1-2-3 indicates that the firmware in the flash
    ROM is unavailable, and the firmware needs to be updated/replaced. 4
    means the ROM has failed and needs replacing, 6 means that there was a
    ROM checksum error, and the fault lies with memory. Either memory is
    misconfigured or a faulty DIMM needs to be reseated or replaced.

    To get over the problem of the firmware being unavailable, you need a
    diskette with the file PC264SRM.ROM renamed to DP264SRM.ROM on it. If the
    floppy activity led is on, then the failsafe loader is already active,
    and expecting this diskette. Insert the disk, then reset the system.

    Hope this helps - I'm sorry I can't be more specific.


  • Next message: jeffrey.s.horvath_at_census.gov: "caa User Defined Attributes on 5.1B"