System crash: invalid memory read access from kernel

From: Florent Boucher (Florent.Boucher_at_cnrs-imn.fr)
Date: 11/26/03

  • Next message: lawries_at_btinternet.com: "tar bug in 5.1B PK3???"
    Date: Wed, 26 Nov 2003 16:15:27 +0100
    To: tru64-unix-managers@ornl.gov
    
    

    Dear Managers,
    we have just upgraded the memory on a alpha computer DS20 (2x500MHz)
    running 4.0F. And now every two or three days we have a crash with the
    following error from the uerf command:

    ********************************* ENTRY 6.
    *********************************

    ----- EVENT INFORMATION -----

    EVENT CLASS ERROR EVENT
    OS EVENT TYPE 302. PANIC
    SEQUENCE NUMBER 316.
    OPERATING SYSTEM DEC OSF/1
    OCCURRED/LOGGED ON Wed Nov 26 13:08:37 2003
    OCCURRED ON SYSTEM cubitus
    SYSTEM ID x00080022
    SYSTYPE x00000000
    PROCESSOR COUNT 2.
    PROCESSOR WHO LOGGED x00000000
    MESSAGE panic (cpu 0): kernel memory fault

    I was able to read from the crash data the following information:

    trap: invalid memory read access from kernel mode

        faulting virtual address: 0x00000000000001e9
        pc of faulting instruction: 0xfffffc000025a1d8
        ra contents at time of fault: 0xfffffc0000878750
        sp contents at time of fault: 0xfffffffe8f67f2d8

    panic (cpu 0): kernel memory fault

    Does any body can help us to deeper understand what is happening on the
    system. Is it a problem of memory or a software problem?
    Thanks a lot for your help.
    With my best regards
    Florent Boucher

    PS: Please find below more details on the crash

    #
    # Crash Data Collection (Version 1.4)
    #
    _crash_data_collection_time: Wed Nov 26 15:02:06 MET 2003
    _current_directory: /
    _crash_kernel: /var/adm/crash/vmunix.15
    _crash_core: /var/adm/crash/vmzcore.15
    _crash_arch: alpha
    _crash_os: Digital UNIX
    TruCluster Software
    _host_version: Digital UNIX V4.0F (Rev. 1227); Tue May 21 15:11:23 MET
    DST 2002
    TruCluster Software V1.6-12 (Rev. 225); 04/05/99 15:11
    _crash_version: Digital UNIX V4.0F (Rev. 1227); Tue May 21 15:11:23 MET
    DST 2002
    TruCluster Software V1.6-12 (Rev. 225); 04/05/99 15:11

    _crashtime: struct {
        tv_sec = 1069848517
        tv_usec = 388787
    }
    _boottime: struct {
        tv_sec = 1069661446
        tv_usec = 779696
    }
    _config: struct {
        sysname = "OSF1"
        nodename = "cubitus"
        release = "V4.0"
        version = "1227"
        machine = "alpha"
    }
    _cpu: 57
    _system_string: 0xffffffffff800b30 = "AlphaServer DS20 500 MHz"
    _ncpus: 2
    _avail_cpus: 2
    _partial_dump: 1
    _physmem(MBytes): 2303
    _panic_string: 0xfffffc000080bc20 = "kernel memory fault"
    _paniccpu: 0
    _panic_thread: 0xfffffc007461ca80
    _preserved_message_buffer_begin:
    struct {
        hdr = struct {
            msg_magic = 0x880524
            msg_bufx = 0x10c8
            msg_bufr = 0xed1
            msg_size = 0x3fe0
        }
        msg_bufc = "Alpha boot: available memory from 0x356e000 to 0x8fffa000
    Digital UNIX V4.0F (Rev. 1227); Tue May 21 15:11:23 MET DST 2002
    physical memory = 2304.00 megabytes.
    available memory = 2249.82 megabytes.
    using 8836 buffers containing 69.03 megabytes of memory
    Master cpu at slot 0.
    Firmware revision: 6.5-13
    PALcode: Digital UNIX version 1.92-74
    AlphaServer DS20 500 MHz
    pci1 at nexus
    isp0 at pci1 slot 7
    isp0: QLOGIC ISP1040B/V2
    isp0: Firmware revision 5.57 (loaded by console)
    isp0: Fast RAM timing enabled.
    scsi0 at isp0 slot 0
    rz5 at scsi0 target 5 lun 0 (LID=0) (DEC RRD47 (C) DEC 1206)
    tu0: DECchip 21140: Revision: 2.2
    tu0: auto negotiation capable device
    tu0 at pci1 slot 9
    tu0: DEC TULIP (10/100) Ethernet Interface, hardware address:
    00-00-F8-07-C3-BF
    tu0: auto negotiation off: selecting 100BaseTX (UTP) port: full duplex
    gpc0 at isa0
    PCI device at bus 0, slot 8, function 0 could not be configured:
    Vendor ID 0x14e4, Device ID 0x16a7, Base class 0x2, Sub class 0x0
    Sub-VID 0xe11 Sub-DID 0x601b
        has no matching entry in the PCI option table
    pci0 at nexus
    isa0 at pci0
    gpc1 not probed
    gpc1 not probed
    ace0 at isa0
    ace1 at isa0
    lp0 at isa0
    fdi0 at isa0
    fd0 at fdi0 unit 0
    ata0 at pci0 slot 105 (slot 5, function 1)
    ata0: CYPRESS 82C693
    scsi1 at ata0 slot 0
    ata1 at pci0 slot 205 (slot 5, function 2)
    ata1: CYPRESS 82C693
    scsi2 at ata1 slot 0
    usb0 at pci0 slot 305 (slot 5, function 3)
    isp1 at pci0 slot 7
    isp1: QLOGIC ISP1040B/V2
    isp1: Firmware revision 5.57 (loaded by console)
    isp1: Fast RAM timing enabled.
    scsi3 at isp1 slot 0
    rz24 at scsi3 target 0 lun 0 (LID=1) (DEC RZ1CB-CA (C) DEC LYJ0)
    (Wide16)
    rz26 at scsi3 target 2 lun 0 (LID=2) (DEC RZ1CB-CS (C) DEC 0844)
    (Wide16)
    rz27 at scsi3 target 3 lun 0 (LID=3) (DEC RZ2DA-LA (C) DEC N1H1)
    (Wide16)
    rz28 at scsi3 target 4 lun 0 (LID=4) (SEAGATE ST318406LC 010A)
    (Wide16)
    tz29 at scsi3 target 5 lun 0 (LID=5) (DEC TLZ10 (C) DEC 04a8)
    mchan0: Module revision = 34
    mchan0: jumpered as HUB configuration
    mchan0 at pci0 slot 9
    Created FRU table binary error log packet
    lvm0: configured.
    lvm1: configured.
    kernel console: ace0
    dli: configured
    ATM Subsystem configured with 2 restart threads
    ATM IFMP: configured
    clubase: configured
    dlmsl: configured
    drd: configured.
    cnxagent: configured
    dlm: configured.
    ATMUNI: configured
    ATMSIG: 3.x (module=uni3x) configured
    ILMI: 3.x (module=ilmi) configured
    ATM IP: configured
    ATM LANE: configured.
    i2c: Server Management Hardware Present
    ADVFS: using 21039 buffers containing 164.36 megabytes of memory
    vm_swap_init: warning /sbin/swapdefault swap device not found
    vm_swap_init: swap is set to lazy (over commitment) mode
    Starting secondary cpu 1
    rm_sw_init: begin MC initialization.
    rm_boot_am_i_alone: entered
    checking for existing memory channel nodes
    rm_slave_init
    slave unit boot phase 0: checking cables
    slave unit boot phase 1: request data ...
    slave unit boot phase 2: get lock data from all nodes
    slave unit boot phase 3: update request ...
    memory channel software inited - node 0 on mc0
    memory channel - adding node 2
    ccomsub: configured
    mcnet: configured
    MEMORY CHANNEL API - initializing
    Environmental Monitoring Subsystem Configured.
    chk_bf_quota: user quota underflow for user 402 on fileset /
    memory channel - removing node 2
    rm_remove_node: removal took 0x0 ticks
    MEMORY CHANNEL API - node 2 has left the cluster
    MEMORY CHANNEL API - cleaning up after node 2
    ccomsub: Successfully reconfigured for member 2 down
    ccomsub: state change detected by this node via callback
    memory channel request from node 2
    memory channel update request from node 2
    memory channel - adding node 2
    MEMORY CHANNEL API - node 2 has joined the cluster
    chk_bf_quota: group quota underflow for group 7 on fileset /
    chk_bf_quota: group quota underflow for group 7 on fileset /

    trap: invalid memory read access from kernel mode

        faulting virtual address: 0x00000000000001e9
        pc of faulting instruction: 0xfffffc000025a1d8
        ra contents at time of fault: 0xfffffc0000878750
        sp contents at time of fault: 0xfffffffe8f67f2d8

    panic (cpu 0): kernel memory fault
    syncing disks... device string for dump = SCSI 0 7 0 0 0 0 0.
    DUMP.prom: dev SCSI 0 7 0 0 0 0 0, block 300000
    device string for dump = SCSI 0 7 0 0 0 0 0.
    DUMP.prom: dev SCSI 0 7 0 0 0 0 0, block 300000
    "
    }

    -- 
     --------------------------------------------------------------------------
    | Florent BOUCHER                    |                                     |
    | Institut des Matériaux Jean Rouxel | Mailto:Florent.Boucher@cnrs-imn.fr  |
    | 2, rue de la Houssinière           | Phone: (33) 2 40 37 39 24           |
    | BP 32229                           | Fax:   (33) 2 40 37 39 95           |
    | 44322 NANTES CEDEX 3 (FRANCE)      | http://www.cnrs-imn.fr              |
     --------------------------------------------------------------------------
    

  • Next message: lawries_at_btinternet.com: "tar bug in 5.1B PK3???"

    Relevant Pages

    • [PATCH] 2.6.21-git15 - Kconfig Cleanup
      ... config EXCPT_IRQ_SYSC_L1 ... bool "Locate frequently called do_irq dispatcher function in L1 Memory" ... This driver is for the ...
      (Linux-Kernel)
    • Re: [PATCH] 2.6.21-git15 - Kconfig Cleanup
      ... config EXCPT_IRQ_SYSC_L1 ... bool "Locate frequently called do_irq dispatcher function in L1 Memory" ... This driver is for the ...
      (Linux-Kernel)
    • CE6 hangs in GrabFSPages()
      ... It's got to be a memory configuration issue, ... KeyIndex 0 = -1 ... Receive Config message for service DBGMSG ... attribute memory space. ...
      (microsoft.public.windowsce.platbuilder)
    • Re: Possible memory leak?
      ... Telstra's cable Internet. ... In the Top command, I found the system started with 93MB free memory, ... Here is the hardware config from its dmesg: ... miibus0: <MII bus> on sis0 ...
      (comp.unix.bsd.freebsd.misc)
    • Re: Time to fix my PC
      ... intermittently bad) is the power supply 'system'. ... Intermittent memory often passes all tests at room temperature. ... Responsible computer manufacturers provide comprehensive hardware ... such as the crash code or data from system logs in the OS ...
      (comp.os.linux.hardware)