SUMMARY: fatal error on 4500

From: Adam Mazza (adam_at_68e.com)
Date: 01/05/04

  • Next message: Thomas Cannon: "Console between two E450s"
    Date: Mon, 5 Jan 2004 14:45:09 -0500 (EST)
    
    

    Thanks for all the responses. Most people seemed to think it definately
    was an ecache error on the CPU (400MHz, 8Meg cache), and that the process
    running on it at the time could have been anything, it just happened to be
    the one mentioned. Since I have support and can't afford the machine to go
    down outside of a maintenance window I am going to swap out the CPU.

    Regards,

    Adam Mazza

    On Sun, 4 Jan 2004, Adam Mazza wrote:

    > Hi,
    >
    > I had an E4500 reboot itself recently, and on a first glance at the
    > logfile I assumed it was a CPU issue, either the cache on the CPU or the
    > CPU itself. Then I noticed that a the OS is reporting a java process
    > caused it to crash while in User mode. My understanding is that I user
    > process should never be able to do that, so I am wondering if the process
    > just tickled a a HW issue, or if it's something in Solaris or the JRE. I
    > am running Solaris 8 02/02 with the recommended patch cluster from ~6
    > months ago. I am running JRE 1.4.1_01. Here is a snippet of the logfile:
    >
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 667935 kern.info] NOTICE:
    > [AFT2] errID 0x002018e3.40e4dc3a DBI event on C
    > PU5
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 931584 kern.info] [AFT2]
    > errID 0x002018e3.40e4dc3a PA=0x00000000.85e5a0c0
    > Jan 3 15:55:21 testbox E$tag 0x00000000.09c010bc E$State: Modified
    > E$parity 0x04
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x00): 0x00000001.00000000
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x08): 0x0011d2bc.00000000
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x10): 0xfa40a0a0.fa441550
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x18): 0x00000000.00000000
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
    > E$Data (0x20): 0xcd51cb00.01100660 *Bad* PSYND=0
    > x4000
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x28): 0xb500012b.b80004ad
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x30): 0xffc0e359.00000000
    > Jan 3 15:55:21 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x38): 0x00000000.00150001
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 672661 kern.warning]
    > WARNING: [AFT1] EDP event on CPU5 Data access at TL=
    > 0, errID 0x002018e3.8237619a
    > Jan 3 15:55:22 testbox AFSR 0x00000000.00404000<EDP> AFAR
    > 0x00000000.85e5a0e0
    > Jan 3 15:55:22 testbox AFSR.PSYND 0x4000(Score 95) AFSR.ETS 0x00
    > Fault_PC 0xfa40a2a4
    > Jan 3 15:55:22 testbox UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000
    > UDBL.ESYND 0x00
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 731945 kern.info] [AFT2]
    > errID 0x002018e3.8237619a PA=0x00000000.85e5a0e0
    > Jan 3 15:55:22 testbox E$tag 0x00000000.09c010bc E$State: Modified
    > E$parity 0x04
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x00): 0x00000001.00000000
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x08): 0x0011d2bd.00000000
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x10): 0xfa40a0a0.fa441550
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x18): 0x00000000.00000000
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
    > E$Data (0x20): 0xcd51cb00.01100660 *Bad* PSYND=0
    > x4000
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x28): 0xb500012b.b80004ad
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x30): 0xffc0e359.00000000
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
    > E$Data (0x38): 0x00000000.00150001
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 734837 kern.info] [AFT2]
    > errID 0x002018e3.8237619a AFAR was derived from E$Tag
    > Jan 3 15:55:22 testbox unix: [ID 321153 kern.notice] NOTICE: Scheduling
    > clearing of error on page 0x00000000.85e5a000
    > Jan 3 15:55:22 testbox SUNW,UltraSPARC-II: [ID 130088 kern.info] [AFT3]
    > errID 0x002018e3.8237619a Above Error is in User Mode
    > Jan 3 15:55:22 testbox and is fatal: will reboot
    > Jan 3 15:55:22 testbox unix: [ID 855177 kern.warning] WARNING: [AFT1]
    > initiating reboot due to above error in pid 25059 (java)
    >
    >
    > Regards,
    >
    > Adam Mazza
    > PGP Key:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x382775D1
    > Key fingerprint = 5A82 FA7F 459C E805 6C00 3211 48AC 6069 3827 75D1
    > _______________________________________________
    > sunmanagers mailing list
    > sunmanagers@sunmanagers.org
    > http://www.sunmanagers.org/mailman/listinfo/sunmanagers
    >
    >

    Adam Mazza
    PGP Key:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x382775D1
    Key fingerprint = 5A82 FA7F 459C E805 6C00 3211 48AC 6069 3827 75D1
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Thomas Cannon: "Console between two E450s"

    Relevant Pages

    • IPv6 oops on ifup in latest BK
      ... Using ACPI for SMP configuration information ... Initializing CPU#0 ... CPU: Trace cache: 12K uops, ...
      (Linux-Kernel)
    • SCSI CDROM issue in kernels >= 2.6.14-rc3
      ... CPU: Trace cache: 12K uops, ... MEM window: disabled. ... SCSI device sda: 17928698 512-byte hdwr sectors ...
      (Linux-Kernel)
    • mptscsih: ioc1: attempting task abort! (sc=d6e8a980)
      ... CPU 2: Machine Check Exception: 0000000000000004 ... OEM ID: INTEL Product ID: Bridge CRB APIC at: 0xFEE00000 ... CPU: Trace cache: 12K uops, ... SCSI device sda: 287132440 512-byte hdwr sectors ...
      (Linux-Kernel)
    • Re: 2.6.16-rc6-mm2
      ... CPU: Trace cache: 12K uops, ... Calibrating delay using timer specific routine.. ... # ACPI Support ...
      (Linux-Kernel)
    • 2.6.16-rc5 huge memory detection regression
      ... I just tested 2.6.16-rc5 kernel on MSI 9136 dual Xeon server motherboard with 16 GB of memory and the kernel detects only 8 GB of RAM instead. ... CPU: Trace cache: 12K uops, ... SCSI device sda: 390721968 512-byte hdwr sectors ...
      (Linux-Kernel)