v880 memory problem

From: Thomas Carter (TCarter_at_memc.com)
Date: 05/28/03

  • Next message: Kumar Guhan: "correctly assessing memory usage"
    To: sunmanagers@sunmanagers.org
    Date: Wed, 28 May 2003 10:48:23 -0500
    
    

    I have a v880 half way around the world (in Japan) that is exhibiting
    memory problems:

    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 404677 kern.info] NOTICE:
    [AFT0] Corrected system bus (CE) Event detected by CPU3 at TL=0, errID
    0x00185cc9.e6b46ca0
    May 25 08:26:31 coeha02 AFSR 0x00000002<CE>.00000037 AFAR
    0x00000040.ca397780
    May 25 08:26:31 coeha02 Fault_PC 0x1009d0db4 Esynd 0x0037 Slot B:
    J3200
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 897010 kern.info] [AFT0]
    errID 0x00185cc9.e6b46ca0 Corrected Memory Error on Slot B: J3200 is
    Persistent
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 693738 kern.info] [AFT0]
    errID 0x00185cc9.e6b46ca0 Data Bit 66 was in error and corrected
    May 25 08:26:31 coeha02 unix: [ID 596940 kern.warning] WARNING: [AFT0] 14
    soft errors in less than 24:00 (hh:mm) detected from Memory Module Slot B:
    J3200
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 587212 kern.info] [AFT2]
    errID 0x00185cc9.e6b46ca0 PA=0x00000040.ca397780
    May 25 08:26:31 coeha02 E$tag 0x00000081.94480000 E$state_6 Exclusive
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 895151 kern.info] [AFT2]
    E$Data (0x00) 0x2d302e30.32323334 0x014e2c01.0a094d4a ECC 0x03b
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 895151 kern.info] [AFT2]
    E$Data (0x10) 0x4c505244.3330300b 0x33523244.44414130 ECC 0x13a
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 895151 kern.info] [AFT2]
    E$Data (0x20) 0x34303205.53465044 0x50043536.303003c2 ECC 0x00e
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 895151 kern.info] [AFT2]
    E$Data (0x30) 0x04080933.52324444 0x41413034.02c10301 ECC 0x13d
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 422670 kern.info] [AFT2]
    D$Tag 0x040ca397 D$state Valid D$utag 0xad D$snp 0x040ca396
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 582021 kern.info] [AFT2]
    PAtag 0x040.ca397780 PAsnp 0x040.ca397780 VAutag 0x2b7780
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 842398 kern.info] [AFT2]
    D$Data (0x00) 0x2d302e30.32323334 0x014e2c01.0a094d4a
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 842398 kern.info] [AFT2]
    D$Data (0x10) 0x4c505244.3330300b 0x33523244.44414130
    May 25 08:26:31 coeha02 SUNW,UltraSPARC-III: [ID 335345 kern.info] [AFT2]
    I$ data not available

    And the memory configuration (from prtdiag) is this:
               Logical Logical Logical
          MC Bank Bank Bank DIMM Interleave Interleaved
     Brd ID num size Status Size Factor with
    ---- --- ---- ------ ----------- ------ ---------- -----------
      A 0 0 512MB no_status 256MB 8-way 0
      A 0 1 512MB no_status 256MB 8-way 0
      A 0 2 512MB no_status 256MB 8-way 0
      A 0 3 512MB no_status 256MB 8-way 0
      B 1 0 512MB no_status 256MB 8-way 1
      B 1 1 512MB no_status 256MB 8-way 1
      B 1 2 512MB no_status 256MB 8-way 1
      B 1 3 512MB no_status 256MB 8-way 1
      A 2 0 512MB no_status 256MB 8-way 0
      A 2 1 512MB no_status 256MB 8-way 0
      A 2 2 512MB no_status 256MB 8-way 0
      A 2 3 512MB no_status 256MB 8-way 0
      B 3 0 512MB no_status 256MB 8-way 1
      B 3 1 512MB no_status 256MB 8-way 1
      B 3 2 512MB no_status 256MB 8-way 1
      B 3 3 512MB no_status 256MB 8-way 1

    Is there a way to disable this memory in Solaris until we have time to
    shut down the machine and swap the memory module?

    Thanks,
    Thomas Carter
    MEMC Southwest
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Kumar Guhan: "correctly assessing memory usage"

    Relevant Pages

    • V240 ECC errors
      ... I'm certain the error is in memory, and I'm 99% sure that the error ... errID 0x0000d080.af559597 Corrected Memory Error on MB/P1/B0/D0: ... 0x00000012.36ebb540 INVALID ... corresponding FRC Event also logged) ...
      (SunManagers)
    • intermittent memory error ?
      ... Now on Oct 2 I had a second errro but in a different memory module. ... Then on June and now on October I got 2 memory parity errors. ... errID 0x0000ed4d.9a2bb91b Corrected Memory Error on U0701 is Persistent ...
      (SunManagers)
    • How to find out the slot number for the bad memory in E420?
      ... I have a bad memory on my E420R and want to replace a memory module. ... Corrected Memory Error detected by CPU1, errID ...
      (comp.sys.sun.admin)
    • SUN ULTRA SPARC - II GIVING FREQUENT MEMORY PROBLEMS
      ... frequently giving me the following log message relating to the some ... Corrected Memory Error detected by CPU0, errID ... Error on U0303 is PersistentFeb 10 05:15:15 casper SUNW,UltraSPARC-II: ...
      (comp.sys.sun.admin)
    • Re: XP Home crashes when increasing RAM from 256 to 512
      ... if the tested it in a hardware memory test (but those ... > 2) We put memory module B in memory slot A, ... > memory module A in memory slot B. The PC booted. ... > memory module A in memory slot B. The PC did not boot. ...
      (microsoft.public.windowsxp.perform_maintain)