Re: A1000: Determining bad disk

From: Darren Dunham (ddunham_at_redwood.taos.com)
Date: 09/29/03


Date: Mon, 29 Sep 2003 16:22:39 GMT

In comp.unix.solaris Vikas Agnihotri <fornewsgroups@vikas.mailshell.com> wrote:
>>>> I suspect the disk is going bad.
>>
>> Why? If you do, you should run rm6 and run a healthcheck.

> I dont like the rm6 GUI, the CLI equivalent is 'healthck', right? I did a
> 'healthck -a' and got 'Optimal'. I didnt expect anything else.

Then why do you think a disk is going bad?

> I dont know how thorough 'healthck' is anyway. Say the disk was going bad,
> and I knew about it proactively, I could, on-demand, mark the drive failed
> using 'drivutil' and take the reconstruction hit when I want to instead of
> waiting for it to happen anytime!

I suppose, but what makes you think a disk is going bad?

> How about 'parityck', is that a more exhaustive disk check?

> Anyway, in this particular case, as it turned out, my SCSI errors were due
> to the "disconnected tagged commands", for which Sun support suggested that
> I consider reducing 'set sd:sd_max_throttle' (in /etc/system) to something
> like 10 (default is 256) or so.

Yes. That's what I was (badly) trying to say. Since the OS can't "see"
any of the disks anyway, any scsi errors in /var/adm/messages will be
unrelated to disk errors. They would have to do with the RAID
controller, the cable, the host adapter, and any scsi settings.

> Is this common practice to throttle down the 'sd' driver with the RAID
> A1000? Is this because the disks are too fast for the sd driver? [Or is it
> the other way around?]

Which adapter is this? I see a Sun Alert here...

http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fsalert/22803

-- 
Darren Dunham                                           ddunham@taos.com
Unix System Administrator                    Taos - The SysAdmin Company
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >


Relevant Pages

  • Re: T2000 - raidctl - how to?
    ... AVAILABLE DISK SELECTIONS: ... Creating RAID volume will destroy all data on spare space of member disks, ... version: 'QLE2460 Host Adapter Driver: ... fcode-version: '1.50a9' ...
    (SunManagers)
  • Re: Solaris 10 with RAID-1 for PE 2850
    ... I put back disk 1 and restarted ... > 0 Physical Drivefound on the host adapter. ... > 0 Physical Drivehandled by BIOS ... > Configuration of NVRAM and drives mismatch ...
    (comp.unix.solaris)
  • Re: A1000: Determining bad disk
    ... you should run rm6 and run a healthcheck. ... but what makes you think a disk is going bad? ... any scsi errors in /var/adm/messages will be ... controller, the cable, the host adapter, and any scsi settings. ...
    (comp.unix.solaris)
  • Re: md5sum input/output errors with large files
    ... flaky hardware or disk ... (I used to get scsi errors with older kernels, ... When I see this kind of problem I assume that the disk has run out ...
    (Debian-User)
  • Re: A1000: Determining bad disk
    ... >> I suspect the disk is going bad. ... you should run rm6 and run a healthcheck. ... >> path in the messages file, but how can I identify the physical disk from ... > and WWN#. ...
    (comp.unix.solaris)