Machine check error analysis

From: McCracken, Denise (Denise.McCracken_at_misyshealthcare.com)
Date: 10/28/04

  • Next message: Rajaya, Kiran: "Cannot bring the system back on."
    Date: Thu, 28 Oct 2004 09:34:37 -0700
    To: "Tru64 managers' list (E-mail)" <TRU64-UNIX-MANAGERS@ornl.gov>
    
    

            Can anyone tell me where to get information about the information in
    a machine check error? I have a machine that is logging a bunch of these,
    and while they say that they are correctable, we are also having a problem
    installing the OS, and I need to narrow it down. I'm not having any luck
    getting through to HP support, so I would appreciate any other help I could.

    thanks

    -d

    ******************************** ENTRY 36 ********************************
     
     
    Logging OS 2. Digital UNIX
    System Architecture 2. Alpha
    Event sequence number 1639.
    Timestamp of occurrence 02-OCT-2004 07:35:02
    Host name osflab
     
    System type register x00000016 Alpha 4000/1200 Series
    Number of CPUs (mpnum) x00000001
    CPU logging event (mperr) x00000000
     
    Event validity 1. O/S claims event is valid
    Event severity 5. Low Priority
    Entry type 100. CPU Machine Check Errors
     
    CPU Minor class 4. System Correctable Error (620)
     
    Software Flags x0000000000000000
    Active CPUs x00000001
    Hardware Rev x00000000
    System Serial Number NI83404258
    Module Serial Number
    Module Type x0000
    System Revision x00000000
     
    Machine Check Reason x0204 IOD Detected Soft Error
     
    Ext Interface Status Reg x0000000000000000
                                         Register Contents Not Valid For This
    Error
    Ext Interface Address Reg x0000000000000000
                                         Register Contents Not Valid For This
    Error
    Fill Syndrome Reg x0000000000000000
                                         Register Contents Not Valid For This
    Error
    Interrupt Summary Reg x0000000000000000
                                         Register Contents Not Valid For This
    Error
    WHOAMI x00000000 Register Contents Not Valid For This
    Error
     
    --IOD REGISTERS FOLLOW--
    This Bus Bridge Phy Addr x000000FBE0000000
                                         IOD# 1
    Dev Type & Rev Register x06002432 CAP Chip Revision: x00000002
                                         Host to PCI Revision: x00000003
                                         I/O Backplane Revision: x00000004
                                         Internal CAP Chip Arbiter: Enabled
                                         Device Class: Host Bus to PCI Bridge
    MC Error Info Register 0 x03FE5230
                                         MC Bus Trans Addr<31:4>: 3FE5230
    MC Error Info Register 1 x800E9600 MC bus trans addr <39:32> x00000000
                                         MC Command is WriteBack Mem
                                         CPU0 Master at Time of Error
                                         Device ID: x00000002
                                         MC error info valid
    CAP Error Register x89000000 Error Detected but Not Logged
                                         Correctable ECC err det by MDPA
                                         MC error info latched
    MDPA Status Register x00000000 MDPA Status Register Data Not Valid
    MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not Valid
    MDPB Status Register x00000000 MDPB Status Register Data Not Valid
    MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not Valid
     
    PALcode Revision Palcode Rev: 1.23-2

    "Customer service may be the only way that a
    company can distinguish itself from its
    competition these days." -H. Frank Gibbard

    Denise McCracken, Systems Software Specialist
    Misys Healthcare Systems, Tucson, AZ

    Certified Tru64 v5 Systems Administrator
    Comptia Network+ Certified Professional


  • Next message: Rajaya, Kiran: "Cannot bring the system back on."