2ND SUMMARY: UPDATE/Alpha particles and cosmic rays - Bcache Tag Parity Error

David.Knight_at_clubcorp.com
Date: 09/14/04

  • Next message: curt: "SUMMARY: "su" in init script"
    Date: Tue, 14 Sep 2004 09:02:20 -0500
    To: tru64-unix-managers@ornl.gov
    
    

    Managers,
    Here is what I have found, Thanks to all that replied, I really appreciate
    it!!!
    It seems that there are several IT shops out there that have been given
    this same story from HP and even Sun Microsystems.
    However, HP has not only claimed Cosmic reasons. I have reports from some
    that HP pointed the finger at
    RFI/RFC soundwaves.
    As recommended by one response I searched threw some BLOGS/Archives on
    solar activity and found (Listed Below under Reports)
    that there were indeed reports of solar activity on the or near the dates
    of our BCACHE errors.
    I'm still not sure if I am a believer however there are a several IT
    personnel out there that do.
    Below is some interesting reference documentation that I was referred to
    along with listing of the cosmic sites that record solar events.

    Thanks again to all that helped with this cosmic issue.

    -David

    Here is a note from a respected HP Engineer that brings a good argument:

    It's basic physics. Both alpha particles and cosmic rays are what is
    called "ionizing radiation" -- stuff that when it interacts with other
    matter can induce ionization or in other words random electrical charge
    perturbations. When this happens in the context of, say, your radio
    or TV, it's heard or seen as "static". When it happens in your CPU
    or memory, it's apt to change a bit somewhere from a one to zero or
    vice versa. But usually it induces a single bit error. Depending
    on the part involved, such an error may be undetected, detected and
    corrected, or detected but not correctable. Parts that have parity
    checking (some data paths in the CPU but not all, most cache memory)
    but not EDC/ECC (error detection and correction) are susceptible to
    this "static" just as much as parts that either don't detect errors
    (often inside CPU chip registers, for example) or detect and correct
    errors; all it takes is one sufficiently energetic interaction in
    the wrong place and you've got an error. With "parity only" parts
    like the Bcache, such an error (if reported to the OS) is usually
    treated as fatal. Of course, if no OS is running, you can get the
    error and it will have no effect. And while sometimes the error
    may be detected is actually irrelevant, as a general rule (since
    the system software has to assume that the contents of memory and
    persistent storage matter and that data integrity is paramount)
    the only safe thing to do in the face of such an error is to halt
    the system (i.e., "panic").

    Parity errors can, of course, have other causes as well, including
    defective parts (either defective in design, or defective as a side
    effect of aging, usually due to heat stress). In most computer
    system applications, there is enough shielding against electrical
    and electronic "emissions" that alpha particles and cosmic rays are
    an unlikely cause of parity errors in the kinds of components that
    are provided with only error detection. The more likely cause in
    most cases where a part that had been reliable starts failing is
    heat induced failure.

    _____________________________________________________________________________

    ------A Book has been published on the topic by a Cypress Semiconductor
    Corp:

    http://www.eeproductcenter.com/showArticle.jhtml?articleID=46200051

    -----Research Document published by the Nuclear Physics Laboratory
    University of CO:

    http://www.taek.gov.tr/taek/tudnaem/yayinlar/yayinlar_pdf/fundamental/Fundamental-42.PDF

    ----A scientific explanation of cosmic rays:

    http://zebu.uoregon.edu/~js/glossary/cosmic_rays.html

    Cosmic Reports from two different Sites:

    --------- 1st:
    http://data.gns.cri.nz/hazardwatch/2003_02_01_solararch.html

    30.7.04
      A moderate geomagnetic storm occurred on 23-24 July, and on 25-26 July
    another,
      more severe, storm produced auroras in North America. A third
    geomagnetic storm
      reached extreme levels on 27-28 July and spectacular auroras were seen
    from Dunedin.
      All of these storms were caused by solar coronal mass-ejections
    associated with powerful flares.

    19.12.03
      Low level geomagnetic storms caused by the wind stream from a solar
    coronal hole
      continued until 15 December. Activity is presently at low levels, but
    gusts from
      another coronal hole may strike the Earthâ€?s magnetosphere on 21 or 22
    December,
      causing more geomagnetic disturbances.
     
     
      21.2.03
        The Earth has been inside the high-speed wind stream from a solar
    coronal hole
        for the six days, but there have been only minor disturbances to
    Earth's magnetic field.
      2:12 PM
     
     
     
      14.2.03
        Solar conditions have been quiet since the aurora of 1-3 February.
    However,
        disturbances to Earth's magnetic field may increase as the solar wind
    stream
        from a hole in the sun's corona impacts the Earth on Saturday or
    Sunday.
    2:36 PM

    --------- 2nd:

    http://www.bbso.njit.edu/cgi-bin/ActivityReport

    BBSO Solar Activity Report 28-JUL-2004 17:29:42 UT
    Sunny with light winds and fair seeing.
    Solar activity has been at a slightly lower level with only C-class events
    from NOAA 0652. Region continues to decay and is expected to produce
    C-class and M-class events.

    NOAA 0652, N07W72. Decaying beta-gamma region. Region continues to decay
    both in sunspot area and magnetic complexity. Region has only produced
    C-class events since yesterday. Except C-class and M-class events to
    continue.

    NOAA 0653, S14W75. Decaying region.

    NOAA 0654, N07E15. Simple beta region. Little change.

    Positions are for July 28,2004 at 17:00 UT.

    RF
     
    ~~~~
     
     
     Partly cloudy with high clouds.
     Solar activity has been low with multiple C-class events from NOAA 0525.
    The largest flare was a C8.6 at 0931 UT today. Solar activity should
    remain about the same with C-class events from NOAA 0525.
     
     NOAA 0520, S11W32. Slowly decaying beta region.
     
     NOAA 0521, S11W30. Decaying beta region.
     
     NOAA 0523, S15E48. Single stable sunspot.
     
     NOAA 0524, S08E44. Small beta region.
     
     NOAA 0525, N09E46. Beta-gamma region. Region remains mostly unchanged
    from yesterday. Region produced a C8.6@0931 UT today. Expect C-class
    events with a very slight chance for a low level M-class event.
     
     NOAA 0526, N12W64. Small beta region.
     
     NOAA 0527, S15W51. Small beta region.
     
     NOAA 052-, N08E75. Middle size single sunspot.
     
     Positions are for December 18,2003 at 17:00 UT.
     
     RF
     
    ~~~~~
     
     
     High thick clouds today.
     Solar activity has been very low there is only one spotted region on the
    disk.
     
     NOAA 0285, S12W36. Decaying plage.
     
     NOAA 0288, N11E50. Simple beta region. Region remains mostly unchanged
    from yesterday. C-class event possible.
     
     Positions are for 16:00 UT.
     
     Solar activity is expected to remain very low. Region NOAA 0288 may
    produce a low level C-class event.
     
     RF


  • Next message: curt: "SUMMARY: "su" in init script"
  • Quantcast