SUMMARY SCSI Disk Errors - sense key: not ready



Thanks to Jason Grove, Chris Ruhnke and Brad Morrison for their input on
this topic.

Possible reasons for the disk problems included high humidity, power supply
problems, dying motor, poor cabling and a problem with the SCSI interface to
the server. Unfortunately, I didn't receive any information on the meaning
of the vendor's error code. If anyone has access to this, I'd appreciate
the information.

I suspect the drives' motors were slowly dying; I replaced both disks one
week ago and have not experienced any SCSI errors since then which would
seem to rule out problems with power, cabling and the server's interface.
Also, the climate in our server room is controlled, with humidity typically
below 20%.

I've included my initial question along with some of the responses I
received below.

-Damian

Those drives (IBM) if I remember correctly had some problems with
humidity. Make sure the environment they are in does not have high
humidity.. Sun was replacing them a while ago... run an iostat -En and
see how many hard and media errors there are. if it is over 10 I think,
then you need to replace the drive.

jason


I have an E450 which has exhibited similar problems on Fuji and Sun SCSI
harddrives.

"device not ready" means exactly that -- the device has spun down for some
reason.
In my case I have been able to "unplug" the disk from the backplane, wait
one minute and plug it back in and the
drive will spin back up.
You will then have to re-enable it with SVM and it should sync up with its
mirror -- "# metareplace -e <metadevice> > <slice>".
If the drive is truly bad, it won't spin back up.
It could also be an early indication of a failing drive; but you won't
know for sure until it dies completely.
Or your power supply may be marginal and under "stress" of heavy activity
the power level may fall below the level
needed by this drive.


--CHRis

Chris H. Ruhnke
Technical Services Professional
IBM Global Services
Dallas, TX


My opinion is that it's a bad cable or a physical problem with the
interface on the machine. It seems like a very,
very remote possibility that both drives have the same problem. Yes,
they're old, but what are the odds of two
having the same problem, i.e., transport failures at high bandwidth usage.
Don't let the block identifier fool you:
"Drive not ready" means that the operation was interrupted because the
"drive ready" signal went to zero during the > operation. Although this can
be caused by a bad drive, it doesn't seem likely that both drives would
come/go
on/offline.

Hmmm. It is possible (not too likely, IMHO) that you have a power problem.
It's unlikely b/c a drive that fails in
this way would have to spin up again after having lost power, i.e., you'd
be seeing many more "drive not ready"
messages during the spin-up.

OTOH, given that you have replacement drives handy, you could prove this
by swapping them out and causing the high
traffic. In fact, since they're mirrored with SVM, you could perform one
drive replacement to the mirror and
determine whether the same errors happen with the replacement drive. I'm
guessing that it will. :-)

Be sure to summarize this one. SCSI errors (and their associated
resolutions) can always use more exposure. ;-)

Brad Morrison | The Capital Group Companies
Location: SNO | x43199 | (210) 474-3199 | Cell: (281) 704-5375
E-mail: Brad_Morrison@xxxxxxxxxxxx
[ Mailing: 3500 Wiseman Blvd San Antonio, TX 78251-4320 USA ]


-----Original Message-----
From: Wiest, Damian [mailto:dmwiest@xxxxxxxxxxx]
Sent: Tuesday, January 31, 2006 8:25 AM
To: 'sunmanagers@xxxxxxxxxxxxxxx'
Subject: SCSI Disk Errors - sense key: not ready


Greetings all,

I have a couple of IBM SCSI drives that are requiring maintenance on a
weekly basis. I have six 18GB drives installed in the first half of a D1000
array which is attached to a dual-channel Symbios SCSI card in an old E-250.
Four of the disks are from IBM (product number DDYST1835SUN18G, revision
S94A) and the other two are from Fujitsu (product number MAJ3182M SUN18G,
revision 0804). I have configured the disks as three, two-way mirrors under
SVM; one of the mirrors with IBM drives is logging errors. Here's a sample
entry from /var/adm/messages:

Jan 28 06:30:01 lcidev01 unix: WARNING: /pci@1f,4000/scsi@5,1/sd@2,0 (sd47):
Jan 28 06:30:01 lcidev01 Error for Command: write(10)
Error Level: Fatal
Jan 28 06:30:01 lcidev01 unix: Requested Block: 6137304
Error Block: 6137304
Jan 28 06:30:01 lcidev01 unix: Vendor: IBM
Serial Number: 00361EE587
Jan 28 06:30:01 lcidev01 unix: Sense Key: Not Ready
Jan 28 06:30:01 lcidev01 unix: ASC: 0x4 (<vendor unique code 0x4>), ASCQ:
0x1, FRU: 0x0 Jan 28 06:30:01 lcidev01 unix: WARNING: md: d112: write error
on /dev/dsk/c1t2d0s7 Jan 28 06:30:11 lcidev01 unix: WARNING: md: d112:
/dev/dsk/c1t2d0s7 needs maintenance

The disks typically begin exhibiting this behavior during periods of high
activity. I do have a couple of replacements lying around, but I'd like
some advice as to whether this problem is related to the drives, or if it's
indicative of a bigger problem before simply swapping them out.

TIA!

-Damian
_______________________________________________
sunmanagers mailing list
sunmanagers@xxxxxxxxxxxxxxx
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@xxxxxxxxxxxxxxx
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



Relevant Pages

  • Re: libata in 2.4.24?
    ... As a result SCSI disks are reliable for database ... if the OS driver has not issued a FLUSH CACHE ... I hope you mean the drives don't report completion until the data is on ...
    (Linux-Kernel)
  • Re: MicroVAX 3500 questions
    ... DSSI nor SCSI, ... It's an inconvenience but the system will boot without it. ... etc are disks ... The RA7x drives are about the size of an RD54. ...
    (comp.sys.dec)
  • Re: 3B2 Disks
    ... 2 MFM drives on a custom controller. ... But a friend had a 3B2 which had a SCSI interface, ... tablets around for erasing disks. ... But -- the 3B2 was the porting base for unix for some time. ...
    (comp.sys.3b1)
  • Re: Configuration Sanity Check Please
    ... That brings the power to 140W. ... If the Vcore converter is 90% efficient, then the input power to the ... the machine first starts, the hard drives draw 2.5A each, from the ... But it really depends on how many disks are involved. ...
    (alt.comp.periphs.mainboard.asus)
  • Re: hard drive problem
    ... SCSI hard disks are intended to be up 24/7, with about 80% of that time ... being under heavy load. ... get a second failure as hard drives are not reparable anymore. ...
    (comp.os.linux.hardware)