Re: A1000: Determining bad disk

From: Mr. Johan Andersson (johan_at_solace.mh.se)
Date: 09/29/03


Date: Mon, 29 Sep 2003 08:51:13 +0200


On Sat, 27 Sep 2003, Vikas Agnihotri wrote:

> On Thu, 25 Sep 2003 18:33:09 GMT, Darren Dunham <ddunham@redwood.taos.com>
> wrote:
>
> >>> I am seeing some SCSI transport failures in /var/adm/messages on one of
> >>> my LUNs. The A1000 has all RAID5 luns.
> >>>
> >>> I suspect the disk is going bad.
> >
> > Why? If you do, you should run rm6 and run a healthcheck.
>
> I dont like the rm6 GUI, the CLI equivalent is 'healthck', right? I did a
> 'healthck -a' and got 'Optimal'. I didnt expect anything else.

Well, if you get Optimal, then the A1000 itself thinks its OK.

> I dont know how thorough 'healthck' is anyway. Say the disk was going bad,
> and I knew about it proactively, I could, on-demand, mark the drive failed
> using 'drivutil' and take the reconstruction hit when I want to instead of
> waiting for it to happen anytime!

If a disk was failing, the A1000 would probably give you a few events
anyway, its quite good at kicking bad drives. Thats why you use Raid5, so
that it CAN kick a drive without you loosing data. If you have a hotspare
activated, thats even better.

> How about 'parityck', is that a more exhaustive disk check?

Yes and no, it checks the parity of the raid5, which as it happens it does
by reading all the diskdata, which would in a way, test the disk, but its
the data on them thats really checked.

> Anyway, in this particular case, as it turned out, my SCSI errors were due
> to the "disconnected tagged commands", for which Sun support suggested that
> I consider reducing 'set sd:sd_max_throttle' (in /etc/system) to something
> like 10 (default is 256) or so.

Yup, its in the best practive for A1000-A3500

> Is this common practice to throttle down the 'sd' driver with the RAID
> A1000? Is this because the disks are too fast for the sd driver? [Or is it
> the other way around?]

I'll leave that for a SCSI expert, but basically it doesnt have to do with
slow or fast, but rather on how many scsi commands you can "queue" to the
controller. Dont remember all the facts, but I'm sure someone else does
:-)

/Johan A



Relevant Pages

  • Re: A1000: Determining bad disk
    ... its quite good at kicking bad drives. ... that it CAN kick a drive without you loosing data. ... activated, thats even better. ... I'll leave that for a SCSI expert, but basically it doesnt have to do with ...
    (comp.unix.solaris)
  • Re: Re: Re: Re: HD problems
    ... >>> Urk, thats clearly the problem. ... > I use SpeedFan and Motherboard Monitor ... that came a bit late with WD drives. ... Ubuntu install fails on various stages, ...
    (comp.sys.ibm.pc.hardware.storage)
  • Re: 5.4-RC2 freezing - ATA related?
    ... common SATA controller there is. ... > timeout problems thats possible in 5.x has been fixed there. ... Since I can't even make the drives run in PIO mode for some reliability, ...
    (freebsd-stable)
  • Re: Using Acronis Imaging Software with XP advice required.`
    ... Thats pretty much how things are set up Rod. ... partition and would have exactly the same sector to sector ... separate drives in my computer which I do not or an external drive which I ... > Then every so often I backup the information that has changed by doing an ...
    (microsoft.public.windowsxp.general)