Re: LVM / DiskSuite question

From: Georges Tomazi (gt_at_diapason.com)
Date: 01/31/05

  • Next message: kiran.nagaraja_at_gmail.com: "System administration survey"
    Date: Mon, 31 Jan 2005 00:13:09 +0100
    
    

    Peter -

    On Sun, 30 Jan 2005 17:54:11 +0100, Peter C. Tribble wrote
    (in article <ctj3fj$f0a$1@helium.hgmp.mrc.ac.uk>):

    [...]

    > So why did disksuite fail it? It must have had some reason for doing so that
    > is likely to be logged somewhere. Disksuite is pretty aggressive at failing
    > devices (it dosn't allow many errors before chucking it out completely) but
    > I've always seen at least one error that explains it.

    I found that error message in the logs:

    Jan 29 22:45:40 gbr2-p40 scsi: [ID 107833 kern.warning] WARNING:
    /pci@1f,0/pci@1,1/scsi@2/sd@2,0 (sd2):
    Jan 29 22:45:40 gbr2-p40 SCSI transport failed: reason 'reset':
    retrying command
    Jan 29 22:45:48 gbr2-p40 md_stripe: [ID 641072 kern.warning] WARNING: md:
    d33: read error on /dev/dsk/c0t2d0s3
    Jan 29 22:45:48 gbr2-p40 md_mirror: [ID 842313 kern.info] NOTICE: md: d33:
    B_FAILFAST I/O retry
    Jan 29 22:46:00 gbr2-p40 md_stripe: [ID 641072 kern.warning] WARNING: md:
    d33: read error on /dev/dsk/c0t2d0s3
    Jan 29 22:46:00 gbr2-p40 md_mirror: [ID 104909 kern.warning] WARNING: md:
    d33: /dev/dsk/c0t2d0s3 needs maintenance
    Jan 29 22:46:00 gbr2-p40 md_stripe: [ID 241980 kern.notice] NOTICE: md: d33:
    hotspared device /dev/dsk/c0t2d0s3 with /dev/dsk/c0t4d0s7

    > I usuall (in the cases where the disk is responsive at all) do a
    > format/analyze/read/repair and try replacing it just to see if it was a
    > single bad block. Sometimes works.

    I checked the grown defects list and it's still empty. The disk is a 73 Gb
    Maxtor Atlas 10K IV bought in May 2004.

    defect> prim
    Extracting primary defect list...Extraction complete.
    Defect List has a total of 684 defects.

    defect> g
    Extracting grown defects list...Extraction complete.
    Defect List has a total of 0 defects.

    [...]

    > Seems complicated. How about just
    >
    > metareplace -e d3 c0t2d0s3

    I tried and it worked. Thanks a lot ! Much simpler and easier than what I was
    going to do.

    [...]

    > That's a concatenation. Don't think you want that...

    Definitely not ;-)

    [...]

    > If the metareplace succeeds. And if the metareplace fails, the hot spare
    > should still be in place.

    You're right. When the metareplace started to resync the failed slice, the
    hot spare switched back to "available".

    So what do you think now ? Do you believe that a LVM software failure is
    something possible or the drive is definitely dying ? Is it worth breaking
    the mirror to reformat the disk and recreate the mirror ?

    Thx again,

    Georges

    -- 
    Georges Tomazi - gt@diapason.com
    

  • Next message: kiran.nagaraja_at_gmail.com: "System administration survey"

    Relevant Pages

    • Re: LVM / DiskSuite question
      ... d33: read error on /dev/dsk/c0t2d0s3 ... Extracting primary defect list...Extraction complete. ... something possible or the drive is definitely dying? ... the mirror to reformat the disk and recreate the mirror? ...
      (comp.unix.solaris)
    • Re: Fabulous Adventures In Coding (Eric Lippert)
      ... did nobody point out that this defect all mirrors have can be easily ... All you have to do is turn out the light, and the mirror will no ... submit that the mirror does not reverse anything. ... If we were accustomed to bending over backward to look ...
      (microsoft.public.scripting.jscript)