V240 ECC errors
From: Chris Cameron (Chris.Cameron_at_NetThruPut.com)
Date: 10/27/04
- Previous message: Nathan Bardsley: "permission denied changing passwd (no NIS)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: sunmanagers@sunmanagers.org Date: Wed, 27 Oct 2004 11:28:35 -0600
Have a V240 that isn't happy. We have a hardware support contract
through a 3rd party (bad idea), and they're insisting that the error
messages aren't pointing to any given component in the server. Because
of this they're dragging their feet on doing anything.
I'm certain the error is in memory, and I'm 99% sure that the error
message points to (at least) the bank that the error is coming from.
Could someone give me their interpertation of this information?
And just to pilfer more information from this post; have many people
here experienced RAM going bad on its own? This server has been working
fine for 8 months now and its older V240 brother hasn't had any
problems.
Thanks,
Chris
SUNWvts fails on memory with the error:
10/26/04 17:42:43 prod2 SunVTS5.1ps2: VTSID 6002 pmemtest.ERROR mem: "2
persistent errors on MB/P1/B0/D0: B0/D0."
10/26/04 17:42:43 prod2 SunVTS5.1ps2: VTSID 7012 vtsk.INFO : *Failed
test*
mem(pmemtest) passes: 56 errors: 1
10/26/04 17:43:23 prod2 SunVTS5.1ps2: VTSID 7005 vtsk.INFO : *Stop all
tests*
System Passes: 30, Cumulative Errors: 1, Elapsed Test Time: 000:47:53
cpu-unit0(iutest) passes: 3493 errors: 0
cpu-unit0(iutest).1 passes: 3486 errors: 0
cpu-unit0(fputest) passes: 134 errors: 0
cpu-unit0(fputest).1 passes: 134 errors: 0
cpu-unit1(iutest) passes: 3463 errors: 0
cpu-unit1(iutest).1 passes: 3468 errors: 0
cpu-unit1(fputest) passes: 135 errors: 0
cpu-unit1(fputest).1 passes: 135 errors: 0
kmem(vmemtest) passes: 31 errors: 0
kmem(vmemtest).1 passes: 30 errors: 0
mem(pmemtest) passes: 56 errors: 1
mem(pmemtest).1 passes: 57 errors: 0
A (much) shortened /var/adm/messages:
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 147594 kern.info]
NOTICE: [AFT0] Corrected memory (CE) Event detected by CPU1 at TL=0,
errID 0x0000d080.af559597
Oct 27 08:23:06 prod2 AFSR 0x00100002<PRIV,CE>.00000051 AFAR
0x00000012.36eba6a0
Oct 27 08:23:06 prod2 Fault_PC 0x1009a608 Esynd 0x0051 MB/P1/B0/D0:
B0/D0
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 419725 kern.info] [AFT0]
errID 0x0000d080.af559597 Corrected Memory Error on MB/P1/B0/D0: B0/D0
is Persistent
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 557903 kern.info] [AFT0]
errID 0x0000d080.af559597 Data Bit 44 was in error and corrected
Oct 27 08:23:06 prod2 unix: [ID 596940 kern.warning] WARNING: [AFT0] 118
soft errors in less than 24:00 (hh:mm) detected from Memory Module
MB/P1/B0/D0: B0/D0
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 548377 kern.info] [AFT2]
errID 0x0000d080.af559597 PA=0x00000012.36eba680
Oct 27 08:23:06 prod2 E$tag 0x00000000.16048dba E$state Exclusive
E$indx 1.000ba680
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2]
E$Data (0x00) 0x00000000.00000000 0x00000000.00000000 ECC 0x000
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2]
E$Data (0x10) 0x00907466.030108c0 0x00000000.00000000 ECC 0x04e
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2]
E$Data (0x20) 0x00000300.02f1c0a8 0x00000310.04b3b820 ECC 0x008
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2]
E$Data (0x30) 0x00000310.04aba700 0x00000310.04aba640 ECC 0x1b0
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 929717 kern.info] [AFT2]
D$ data not available
Oct 27 08:23:06 prod2 SUNW,UltraSPARC-IIIi: [ID 335345 kern.info] [AFT2]
I$ data not available
Oct 27 08:23:12 prod2 SUNW,UltraSPARC-IIIi: [ID 460234 kern.info]
NOTICE: [AFT0] Corrected memory (CE) Event detected by CPU1 at TL=0,
errID 0x0000d082.149b28a7
Oct 27 08:23:12 prod2 AFSR 0x00100002<PRIV,CE>.00000007 AFAR
0x00000012.36ebb540
Oct 27 08:23:12 prod2 Fault_PC <unknown> Esynd 0x0007 MB/P1/B0/D0:
B0/D0
Oct 27 08:23:12 prod2 SUNW,UltraSPARC-IIIi: [ID 983987 kern.info] [AFT0]
errID 0x0000d082.149b28a7 Corrected Memory Error on MB/P1/B0/D0: B0/D0
is Intermittent
Oct 27 08:23:12 prod2 SUNW,UltraSPARC-IIIi: [ID 779590 kern.info] [AFT0]
errID 0x0000d082.149b28a7 Data Bit 47 was in error and corrected
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 776888 kern.info]
NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0,
errID 0x0000d752.e47752a2
Oct 27 10:28:06 prod2 AFSR 0x00000000.10000051<FRC> AFAR
0x00000012.36ebb540 INVALID
Oct 27 10:28:06 prod2 Fault_PC 0x100456e8 Esynd 0x0051 J_AID 0
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 178586 kern.info] [AFT0]
errID 0x0000d752.e47752a2 Data Bit 44 was in error and corrected
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 776888 kern.info]
NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0,
errID 0x0000d752.e47752a2
Oct 27 10:28:06 prod2 AFSR 0x00000000.10000051<FRC> AFAR
0x00000012.36ebb540 INVALID
Oct 27 10:28:06 prod2 Fault_PC 0x100456e8 Esynd 0x0051 J_AID 0
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 178586 kern.info] [AFT0]
errID 0x0000d752.e47752a2 Data Bit 44 was in error and corrected
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 524728 kern.info]
NOTICE: [AFT0] Corrected remote memory/cache (RCE) Event detected by
CPU0 at TL=0, errID 0x0000d752.e4776390
Oct 27 10:28:06 prod2 AFSR 0x00100000<PRIV>.81000000<RCE> AFAR
0x00000012.36eba6a0
Oct 27 10:28:06 prod2 Fault_PC 0x1009a608 J_REQ 1
Oct 27 10:28:06 prod2 MB/P1/B0: B0/D0 B0/D1 (applicable only if
corresponding FRC Event also logged)
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 470073 kern.info] [AFT2]
errID 0x0000d752.e4776390 PA=0x00000012.36eba680
Oct 27 10:28:06 prod2 E$tag 0x00000000.16048dba E$state Exclusive
E$indx 0.000ba680
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2]
E$Data (0x00) 0x00000000.00000000 0x00000000.00000000 ECC 0x000
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2]
E$Data (0x10) 0x00907466.030108c0 0x00000000.00000000 ECC 0x04e
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2]
E$Data (0x20) 0x00000300.02f1c0a8 0x00000310.04b3b820 ECC 0x008
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2]
E$Data (0x30) 0x00000310.04aba700 0x00000310.04aba640 ECC 0x1b0
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 929717 kern.info] [AFT2]
D$ data not available
Oct 27 10:28:06 prod2 SUNW,UltraSPARC-IIIi: [ID 335345 kern.info] [AFT2]
I$ data not available
Oct 27 10:28:12 prod2 SUNW,UltraSPARC-IIIi: [ID 416787 kern.info]
NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0,
errID 0x0000d754.4a13c207
Oct 27 10:28:12 prod2 AFSR 0x00000000.10000007<FRC> AFAR
0x00000012.36ebb540 INVALID
Oct 27 10:28:12 prod2 Fault_PC <unknown> Esynd 0x0007 J_AID 0
Oct 27 10:28:12 prod2 SUNW,UltraSPARC-IIIi: [ID 696726 kern.info] [AFT0]
errID 0x0000d754.4a13c207 Data Bit 47 was in error and corrected
Oct 27 10:28:22 prod2 SUNW,UltraSPARC-IIIi: [ID 717637 kern.info]
NOTICE: [AFT0] Corrected remote memory/cache (RCE) Event detected by
CPU0 at TL=0, errID 0x0000d756.b7d82b5c
Oct 27 10:28:22 prod2 AFSR 0x00100000<PRIV>.81000000<RCE> AFAR
0x00000012.36ebb540
Oct 27 10:28:22 prod2 Fault_PC <unknown> J_REQ 1
Oct 27 10:28:22 prod2 MB/P1/B0: B0/D0 B0/D1 (applicable only if
corresponding FRC Event also logged)
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
- Previous message: Nathan Bardsley: "permission denied changing passwd (no NIS)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|