crash

paf1_at_email.cz
Date: 07/29/04

  • Next message: Dermot Paikkos: "(SUMMARY) unified buffer cache"
    Date: Thu, 29 Jul 2004 13:05:16 +0200 (CEST)
    To: tru64-unix-managers@ornl.gov
    
    

    Hi,

    can anybody explain what's exactly wrong, please ?

    It looks like QBB0 backplain issue, but who knows .....

    WSEA - analyzing
    ******************************************************************************

    ---------- Problem Found: Duplicate tag parity error detected in the DTAG of SoftQbb0 (HardQbb0) at Mon 26 Jul 2004 14:09:19 GMT+01:00 ----------

    Problem Report Times:
        Event Time: Thu 15 Jul 2004 19:40:25 GMT+03:00
        Report Time: Mon 26 Jul 2004 14:09:19 GMT+01:00
        Expiration Time: Thu 15 Jul 2004 19:40:25 GMT+03:00

    Managed Entity:
    System Name : ux23
    System Type : AlphaServer GS80 6/731
    System Serial : AY12302979
    OS Type : Tru64 UNIX/Compaq Tru64 UNIX V5.1B (Rev. 2650)

    Service Obligation Data:

       Service Obligation: Valid
       Service Obligation Number: AY85151089
       System Serial Number: AY85151089
       Service Provider Company Name: Hewlett-Packard Company

    Brief Description:
    Duplicate tag parity error detected in the DTAG of SoftQbb0 (HardQbb0)

    Callout ID:
    Theory Code : 0x03C003000007AF05
    HQBB.Ent.Flt : 0.24.3

    Severity:
    1

    Reporting Node:
    zacerp

    Full Description:
    The Dtag control register (DTAG_CONTROL) indicated a parity or double bit ECC
    error in the duplicate tag store. Before the Dtag writes information in its tag
    rams it generates parity (dtag1) or ECC (dtag2). Therefore the fact that we did
    get this error on a read means the problem is caused by the tag rams or by the
    Dtag control logic. Dtag rams and the control logic are part of the QBB
    backplane. The error was detected in Duplicate Tag 0 block 3. This error causes
    a system fault condition.

    FRU List:

    Probability : High
    Fru Manufacturer : -
    Fru Model : -
    Fru PartNumber : 54-30354-03.B01
    Fru SerialNumber : AY10547388
    Fru FirmwareRev : -
    Fru SiteLocation : -
    Fru CabinetId : 600mm 4P System Cabinet
    Fru Position : 4P System Cabinet, Full Depth, box, located 4 from the bottom
    Fru Chassis : System Drawer
    Fru Assembly : -
    Fru Subassembly : -
    Fru Slot : -

    Evidence:
    Time of Event : 15 Jul 2004 15:31:28 GMT+03:00 (Thu)
    Unique ID : 9572.0 (cdl)
    Analysis Revision : GS320_UCE_RULE V6.5 (06may2004)

    SEA Version:
    System Event Analyzer for Tru64 UNIX V4.3.3 (Build
    40)

    ****************************************************************************************

    WSEA - full output event

    Event: 1586
    Description: Console Data Log Event at Thu 15 Jul 2004 19:40:25 GMT+03:00 from ux23
    File: ./binary.errlog
    ================================================================================

    COMMON EVENT HEADER (CEH) V2.0
    Event_Leader xFFFF FFFE
    Header_Length 260
    Event_Length 680
    Header_Rev_Major 2
    Header_Rev_Minor 0
    OS_Type 1 -- Tru64 UNIX
    Hardware_Arch 4 -- Alpha
    CEH_Vendor_ID 3,564 -- Hewlett-Packard Company
    Hdwr_Sys_Type 35 -- GS40/80/160/320 Series
    Logging_CPU 0 -- CPU Logging this Event
    CPUs_In_Active_Set 1
    Major_Class 113
    Minor_Class 0
    Entry_Type 113 -- Console Data Log Event
    DSR_Msg_Num 1,967 -- AlphaServer GS80
    Chip_Type 11 -- EV67 - 21264A
    CEH_Device 255
    CEH_Device_ID_0 x0000 03FF
    CEH_Device_ID_1 x0000 0007
    CEH_Device_ID_2 x0000 0007
    Unique_ID_Count 0
    Unique_ID_Prefix 9,572
    Num_Strings 5

    TLV Section of CEH
    TLV_DSR_String AlphaServer GS80 6/731
    TLV_OS_Version Compaq Tru64 UNIX V5.1B (Rev. 2650)
    TLV_Sys_Serial_Num AY12302979
    TLV_Time_as_Local Thu 15 Jul 2004 19:40:25 GMT+03:00
    TLV_Computer_Name ux23
    Entry_Type 113

    Console_Data_log

    START OF SUBPACKETS IN THIS EVENT

    Halt Frame Header Subpacket - V1.0
    Time_Stamp x0000 3407 0F0D 1F1C Time Stamp
       Seconds[7:0] 28 Seconds
       Minutes[15:8] 31 Minutes
       Hours[23:16] 13 Hours Unix = GMT Ovms = Local
       Day[31:24] 15 Day
       Month[39:32] 7 July
       Year[47:40] 52 2004

    System Machine Check Error Frame Subpacket - Version 1
    whami 0 CPU Reporting Error
    frame_size x0000 00E8
    frame_flags x0000 0000
    processor_offset x0000 0018
    system_offset x0000 00A0
    mchk_code x0000 0200
       ev6_mchk_code[31:0] x200 660 - System Fault
    frame_revision x0000 0001 GS80-160-320 BitToText Revision=2106.2002.01
    i_stat x0000 0000 0000 0000 IBox Status Register
    dc_stat x0000 0000 0000 0000 Dcache Status Register
    c_addr x0000 0000 0000 0000 Cbox read register field
       error_address[42:6] x0 Error Address of last reported ECC or Parity error
    c_syndrome_1 x0000 0000 0000 0000 CBox Syndrome 1
       upper_qw_syndrome[7:0]x0 Syndrome for Upper Quadword
    c_syndrome_0 x0000 0000 0000 0000 Cbox Syndrome 0
       lower_qw_syndrome[7:0]x0 Syndrome for Lower Quadword
    c_stat x0000 0000 0000 0000 CBox Read C_STAT
    c_sts x0000 0000 0000 0000 CBox Read Register C_STS
       block_status[3:0] x0 Shared
    mm_stat x0000 0000 0000 0000 Memory Management Status Register
       opcode[9:4] x0 Opcode of the Instruction that Caused the Error
    exc_addr x0000 0000 0000 0000 Exception Address Register
       pc[63:2] x0 Exception Address
    ier_cm x0000 0000 0000 0000 Interrupt Enable and Current Processor Mode Register
       cm[4:3] x0 Kernel
       asten[13] x0 AST Interrupt Enable
       sien[28:14] x0 Software Interrupt Enables
       pcen[30:29] x0 Performance Counter Interrupt Enables
       eien[38:33] x0 External Interrupt Enable
    isum x0000 0000 0000 0000 Interrupt Summary Register
       astk[3] x0
       aste[4] x0
       asts[9] x0
       astu[10] x0
       si[28:14] x0
       pc[30:29] x0
       cr[31] x0
       sl[32] x0
       ei[38:33] x0
    pal_base x0000 0000 0000 0000 Pal Base Register
       pal_base[43:15] x0 Base Physical Address for PALcode
    i_ctl x0000 0000 0000 0000 Ibox Control Register
       ic_en[2:1] x0
       spe[5:3] x0
       sde[7:6] x0
       sbe[9:8] x0
       bp_mode[11:10] x0
       hwe[12] x0
       sl_xmit[13] x0
       sl_rcv[14] x0
       va_48[15] x0
       va_form_32[16] x0
       single_issue_h[17] x0
       pct0_en[18] x0
       pct1_en[19] x0
       call_pal_r23[20] x0
       mchk_en[21] x0
       tb_mb_en[22] x0
       bist_fail[23] x0
       chip_id[29:24] x0 ChipId = EV6 PASS 1
       vptp[47:30] x0
       sext[63:48] x0
    process_context x0000 0000 0000 0000 Process Context Register
       ppce[1] x0 Process Performance Counting Enable
       fpe[2] x0 Floating Point Enable
       aster[8:5] x0 AST Enable
       astrr[12:9] x0 AST Request
       asn[46:39] x0 Address Space Number
    uncorr_cpu_error_sum x0000 0000 0000 0001 Uncorrectable Error or Fault Summary
       QBB0[0] x1 QBB0 uncorrectable Error or Fault
    QBB0_csrs_to_be_logged x0000 0000 0100 0000 Registers logged for QBB0:
       dtag0[24] x1 DTAG0
    QBB1_csrs_to_be_logged x0000 0000 0000 0000
    QBB2_csrs_to_be_logged x0000 0000 0000 0000
    QBB3_csrs_to_be_logged x0000 0000 0000 0000
    QBB4_csrs_to_be_logged x0000 0000 0000 0000
    QBB5_csrs_to_be_logged x0000 0000 0000 0000
    QBB6_csrs_to_be_logged x0000 0000 0000 0000
    QBB7_csrs_to_be_logged x0000 0000 0000 0000

    System Error Frame Header Subpacket - V1.0

    DTag Error Frame Subpacket - Version 2
    base_physical_address x0000 0FFF FFE0 0000 Base physical addess
       entity[22:18] x18 Duplicate Tag 0 (DTAG0)
       qbb_id[41:36] x3F QBB0
    DTAG_CONTROL x0000 0000 0000 0011 DTAG Control Register
       ena_fault[0] x1 Enable DTAG Fault
       pe_sum[5:2] x4 Tag RAM Parity Error or ECC DBE Summary
    DTAG_ERR_SUM x0000 0000 0000 0040 DTAG Error Summary Register
       bist_err_sum[3:0] x0 BIST ok
       nxm_err[6] x1 Non-existent memory error (ignore)
    DTAG_ERR_CID x0000 0000 0000 0003 DTAG Error commander ID Register
       cid[5:0] x3 Commander ID
    DTAG_ERR_CMD x0000 0000 0000 001B DTAG Error Command Register
       cmd[6:0] x1B Command
    DTAG_ERR_ADDR_0 x0000 0000 0000 0004 DTAG Error Address 0 Register
    DTAG_ERR_ADDR_1 x0000 0000 0000 00AF DTAG Error Address 1 Register
    DTAG_ERR_ADDR_2 x0000 0000 0000 008E DTAG Error Address 2 Register
    DTAG_ERR_ADDR_3 x0000 0000 0000 00D8 DTAG Error Address 3 Register
    DTAG_ECC_CONTROL x0000 0000 0000 008E DTagII ECC Control Register
       SBE_Err_Sum[3:0] xE DTag SRAM sub-block 1, 2 and 3 detected a Single Bit Error (SBE)
       Ena_SBE_Interrupt[5]x0 Disable DTag ECC SBE interrupts
       Ena_ECC[6] x0 Disable DTag ECC on DTAG RAMs
       Force_SBE[7] x1 Force an DTag ECC SBE
    DTAG_ECC_SYNDROME x0000 0000 0000 008E DTagII ECC Syndrome Register
       ECC_Syndrome[5:0] xE ECC syndrome value

     thanks for any info
    Jiri

    ________________________________________________________________________________
    NOVINKA --- kofeinový nápoj COFFEINUM. Odstraňuje únavu, zlepšuje koncentraci. V rámci akce za 299 Kč, poštovné a balné zdarma! Chcete vědět víc?
    http://www.mixer.cz/redirect.phtml?sig=survival


  • Next message: Dermot Paikkos: "(SUMMARY) unified buffer cache"