RA7000 I/O errors

From: Bryan Daniel (Bryan.Daniel_at_ucfv.bc.ca)
Date: 05/22/03

  • Next message: susan james: "CDE problem"
    Date: Thu, 22 May 2003 14:49:41 -0700
    To: tru64-unix-managers@ornl.gov
    
    

    Managers:

    I have an RA7000 raid array with HSZ70 controller attached to a DS20.
    Some time ago one of the controllers failed and has not yet been
    replaced, therefore we have been running on a single controller. It has
    been working alright, but last week and again today we experienced I/O
    errors on one of the RAID sets which continually sent this error to the
    messages log:

    May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
    block(0x31a440, 0x31a440) on device 19,162
    May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
    block(0x3065c0, 0x3065c0) on device 19,162
    May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
    block(0x595440, 0x595440) on device 19,162

    The only way I found to clear the error was to reboot the server.

    What I am wondering is, could this be a problem caused by an overrun of
    the single controller? It only seems to occur when we have high I/O for
    an extended period such as a database export. Any advice would be
    helpful.

    The attached message is sent to the root account.

    Thank you,
    Bryan Daniel
    Systems Administrator
    University College of the Fraser Valley
    Abbotsford, BC Canada

    Subject: EVM ALERT [700]: SCSI event

    ======================= Binary Error Log event =======================
    EVM event name: sys.unix.binlog.hw.scsi

        Binary error log events are posted through the binlogd daemon, and
        stored in the binary error log file, /var/adm/binary.errlog. This
        event is used to report all SCSI device errors, including disk,
        tape, HSZ raid events, and adapter errors.

    ======================================================================

    Formatted Message:
        SCSI event

    Event Data Items:
        Event Name : sys.unix.binlog.hw.scsi
        Priority : 700
        PID : 326
        PPID : 1
        Event Id : 2054
        Timestamp : 21-May-2003 13:43:33
        Host IP address : 198.162.97.2
        Host Name : whistler
        User Name : root
        Format : SCSI event
        Reference : cat:evmexp.cat:300

    Variable Items:
        subid_class (INT32) = 199
        subid_num (INT32) = 2
        subid_unit_num (INT32) = 2047
        subid_type (INT32) = 55
        binlog_event (OPAQUE) = [OPAQUE VALUE: 360 bytes]

    ============================ Translation =============================
    Sequence number of error: 2099250215 Time of error entry: 21-May-2003
    13:43:33 Host name: whistler

    SCSI CAM ERROR PACKET
    Controller type: DISK
    SCSI device class: UNKNOWN
    Bus Number: 2
    Target number: 7
    Lun Number: 7

    Name of routine that logged the event: isp_reinit
    Event information: Beginning Adapter/Chip reinitialization (0x1)
    ======================================================================


  • Next message: susan james: "CDE problem"