write errors

From: Charles Ballowe (hangman_at_steelballs.org)
Date: 09/26/03

  • Next message: Mohamed K. Ahmed: "New HP ITRC forums"
    Date: Fri, 26 Sep 2003 09:41:07 -0500
    To: tru64-unix-managers@ornl.gov
    
    

    The DBAs called me telling me they were having write errors on some files
    the other day and digging through with evmget, I found errors like:
    25-Sep-2003 01:18:26 sys.unix.hw.error_counter_changed.disk._hwid.86 200 oraproddb A change has occurred in an error counter for device (HWID=86 lid=16)

    I placed a support call with HP sent the binary.errlog and they were able to
    determine which disk in that RAID set was having problems and it was replaced.
    The DBAs are still having their problems but I am no longer seeing anything
    logged to the errlog. I don't believe it to be a hardware problem at this time,
    so I'm wondering if it's possible that there is some error state stored in the
    kernel that won't be cleared until the system reboots or some other action
    is taken? Is there something else I should be looking for?

    Any thoughts on how to get by this would be appreciated.
    System is Tru64v5.1A PK5 and is clustered - both members are GS-80s.

    -Charlie


  • Next message: Mohamed K. Ahmed: "New HP ITRC forums"