Crashing E220R(s)



I recently upgraded one of our workgroup servers from an E250 to an
E220R. The 220R had substantially more memory and CPU, so it was a
nice upgrade, even ignoring the fact that at the same time it move from
a barely-maintained Sol7 installation to Sol10.

However, after a couple of months of service the 220 started crashing
two or three times a day. Not much useful diagnostic output was found,
but some indications were that either a memory module or CPU module was
bad.

I swapped the disks over to an identical E220R we had about. I haven't
had a chance to run diagnostics on the old 220, although a simple run
through the OpenBoot tests didn't produce anything.

Now, after a few weeks in service, the *NEW* 220 has crashed today. Better
than the old, I actually got some log messages this time:

Jul 5 07:39:17 E220R-2 SUNW,UltraSPARC-II: [ID 677095 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data access at TL=0, errID 0x0006613d.f77cf492
Jul 5 07:39:17 E220R-2 AFSR 0x00000000.80200000<PRIV,UE> AFAR 0x00000000.bf41e4f8
Jul 5 07:39:17 E220R-2 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x1041228
Jul 5 07:39:17 E220R-2 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203<UE> UDBL.ESYND 0x03
Jul 5 07:39:17 E220R-2 UDBL Syndrome 0x3 Memory Module U1001 U1002 U1003 U1004
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 761793 kern.warning] WARNING: [AFT1] errID 0x0006613d.f77cf492 Syndrome 0x3 indicates that this may not be a memory module problem
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 490361 kern.info] [AFT2] errID 0x0006613d.f77cf492 PA=0x00000000.bf41e4f8
Jul 5 07:39:18 E220R-2 E$tag 0x00000000.0fc017e8 E$State: Modified E$parity 0x07
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000000.00000000
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.00000000
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.00000000
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x38): 0x00000040.00000000 *Bad* PSYND=0x00ff
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 301547 kern.warning] WARNING: [AFT1] Additional errors detected during error processing on CPU2 at TL=0, errID 0x0006613d.f77cf492
Jul 5 07:39:18 E220R-2 AFSR 0x00000000.008000ff<WP> AFAR 0x00000000.bf41e4f0
Jul 5 07:39:18 E220R-2 AFSR.PSYND 0x00ff(Score 05) AFSR.ETS 0x00 Fault_PC 0x1041228
Jul 5 07:39:18 E220R-2 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0003 UDBL.ESYND 0x03
Jul 5 07:39:18 E220R-2 SUNW,UltraSPARC-II: [ID 940362 kern.warning] WARNING: [AFT1] AFAR was derived from UE report, CP event on CPU0 (caused Data access error on CPU2), errID 0x0006613d.f77cf492

Am I right in thinking that this is a manifestation of the Ultra-II ecache
problems from years ago? These servers - spare equipment we obtained at
no cost, and thus no support contract - are the only ones of their type
we have. We essentially skipped over that entire generation, going from
the E250 straight to T2000s.

And... as I type this... it's just crashed again.

--
Brandon Hume - hume -> BOFH.Ca, http://WWW.BOFH.Ca/
.



Relevant Pages

  • Re: Hardware Upgrade Avice anyone?
    ... You have a large amount of memory and sufficient power in your current setup. ... WHen you upgrade, then look to an AMD or Pentium CPU which will be around ... Then you'll get CPU Speed, memory transfer rate data transfer tate and all ...
    (microsoft.public.windowsxp.hardware)
  • 2GB RAM not being used
    ... Even if I run a resource hog filter in ... I know I need to upgrade my MB and CPU at some point soon ... is there some way to make Windows 2k take advantage of more memory? ...
    (microsoft.public.win2000.setup)
  • Re: Odd performance problems after upgrade from 4.11 to 6.0-Stable
    ... After the upgrade, the system performs poorly. ... job produced minimal CPU utilization and little progress. ... My network link only ran at 3.4 Mbps (yes, that's bits, not ... runs out of memory and starts to use the swap file. ...
    (freebsd-stable)
  • Re: Dim 8250 Memory
    ... Dimension 8250 and the very similar Precision 350, if the CPU has a 533MHz FSB, ... then only the PC800-40 memory will work. ... We want to upgrade this to 512M. ...
    (alt.sys.pc-clone.dell)
  • Re: pcib allocation failure
    ... pcib1: attempting to grow prefetch window for ... attempting to grow memory window for ... cpu0: on acpi0 ... <ACPI PCI bus> on pcib0 ...
    (freebsd-current)