SUMMARY: Memory group (A0) failed on V880 CPU/mem board
- From: Stoyan Genov <stoyan.genov@xxxxxxxxxxxxxx>
- Date: Tue, 28 Feb 2006 09:56:05 +0100
Good day,
Although not definitive, it seems the CPU/memory board might be
causing the trouble. I had/have no chance to replace the board now.
I have replaced the failed DIMMs, leaving group B1 empty, and no errors
so far.
Thanks to Joe Fletcher and Sandwich Maker:
Joe Fletcher wrote:
Sounds like what used to be a fairly common fault on a load of the
UltraIII stuff. The fault is most likely a flaky system board. A call
in to SUN and they will replace it. I thought they'd mostly sorted the
manufacturing problems, especially with the 1.2GHz versions and upwards
but I guess there will always be a few duff ones around.
Sandwitch Maker wrote:
i haven't experienced it personally, but iirc there was an earlier
generation of boards [500MHz?] that would show memory errors if the
cpu itself wasn't secured properly on the board. notoriously on these
boards the cpu heatsink screws were often either too loose or worse,
too tight, leading to phantom dimm errors. iirc a clue was that all
memory would suddenly show bad.
Best Regards,
Stoyan Genov
Stoyan Genov wrote:
Good day,_______________________________________________
A fully-equipped V880 (8 x CPU @ 1.2GHz, 4 boards, 64GB RAM),
spontaneously and irregularly restarted a couple of times.
Logs from two days ago showed soft memory error on Slot D, J8101.
After the restarts, it showed errors in this bank no more, but reported
all banks in the required group A0 (J3000, J3001, J2900, J2901) with
hard errors. The machine is configured to restart on hardware errors
(error-reset-recovery=boot in eeprom), so I believe restarts are normal
given the errors and the configuration.
I have asr-disable'd cpu5 and cpu7, thus cutting off access to this
board and its memory.
I have the chance to swap the reported as faulty DIMMs in the next hour.
What itches me:
Am I too paranoic to think that simultaneous fault of all DIMMs in one
group is not actually problems with the DIMMs?
Is it possible that the board is faulty?
Is it possible that another failed DIMM (J8101) is actually causing the
trouble?
Any comments and advice are welcome. I will summarize.
sunmanagers mailing list
sunmanagers@xxxxxxxxxxxxxxx
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
- References:
- Memory group (A0) failed on V880 CPU/mem board
- From: Stoyan Genov
- Memory group (A0) failed on V880 CPU/mem board
- Prev by Date: Solaris 8 02/04 on a huge (48+ cores) E25K domain.
- Next by Date: sync panic.
- Previous by thread: Memory group (A0) failed on V880 CPU/mem board
- Next by thread: Cannot configure X on my Thinkpad with Solaris 10
- Index(es):
Relevant Pages
|
|