Re: [PATCH] Machine Check Architecture on amd64



* Suleiman Souhlal <ssouhlal@xxxxxxxxxxx> wrote:
Hi,

I have a simple patch for amd64 that uses the Machine Check
Architecture/Exceptions on most recent x86 CPUs to detect memory errors:

http://people.freebsd.org/~ssouhlal/testing/mce-20070621.diff

It will report uncorrected and corrected errors (the latter, only if sysctl
machdep.mce.log_corrected=1).
You can ask the kernel to panic if it gets an uncorrected error by setting
machdep.mce.panic_on_uc=1.
All this can be disabled by setting the machdep.mce.enable tunable to 0. I'm
still not sure if I want this enabled by default, as I don't have any Intel
machines to test this on, but I have tested it on Opteron (both corrected
and uncorrected errors).

I would appreciate it if someone would try this, especially if you have
Intel machines with bad RAM.

Comments are welcome.

| /*
| * Uncorrected MCEs will generate a #MC, while corrected
| * don't, so we have to periodically poll for them.
| */

What about adding an option to only print uncorrected MCE's? That's the
most interesting data and we can get that without using a kthread,
right?

Nice work! :-)

--
Ed Schouten <ed@xxxxxx>
WWW: http://g-rave.nl/

Attachment: pgpeymsH3eFY0.pgp
Description: PGP signature



Relevant Pages

  • Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in
    ... the ones where using MMCONFIG during BAR probing causes a hard lockup on ... some Intel machines, and the ones where we get bad config data on some ... AMD machines due to the configuration retry status being mishandled. ...
    (Linux-Kernel)
  • Re: Problem Solved
    ... this is on an AMD machine. ... > sure how this affects Intel machines, or even if certain Intel machines ...
    (microsoft.public.windowsxp.hardware)
  • Re: [PATCH] Machine Check Architecture on amd64
    ... Intel machines with bad RAM. ... reporting the corrected errors and will stop the kthread ). ...
    (freebsd-current)
  • [PATCH] Machine Check Architecture on amd64
    ... I have a simple patch for amd64 that uses the Machine Check Architecture/Exceptions on most recent x86 CPUs to detect memory errors: ... It will report uncorrected and corrected errors ... I'm still not sure if I want this enabled by default, as I don't have any Intel machines to test this on, but I have tested it on Opteron. ...
    (freebsd-current)