Re: gmirror disk fail questions...
- From: Gary Newcombe <gary@xxxxxxxxxxxxxxxxxxxxx>
- Date: Sat, 19 Apr 2008 11:26:31 +1000
On Fri, 18 Apr 2008 10:40:04 -0700, Christopher Cowart
<ccowart@xxxxxxxxxxxxxxxxxxxx> wrote:
Gary Newcombe wrote:
[...]
# gmirror status
[mesh:/var/log]# gmirror status
Name Status Components
mirror/gm0 DEGRADED ad4
looking in /dev/ however, we have
crw-r----- 1 root operator 0, 83 17 Apr 13:58 ad4
crw-r----- 1 root operator 0, 91 17 Apr 13:58 ad4s1
crw-r----- 1 root operator 0, 84 17 Apr 13:58 ad6
crw-r----- 1 root operator 0, 92 17 Apr 13:58 ad6a
crw-r----- 1 root operator 0, 99 17 Apr 13:58 ad6as1
crw-r----- 1 root operator 0, 93 17 Apr 13:58 ad6b
crw-r----- 1 root operator 0, 94 17 Apr 13:58 ad6c
crw-r----- 1 root operator 0, 100 17 Apr 13:58 ad6cs1
crw-r----- 1 root operator 0, 95 17 Apr 13:58 ad6d
crw-r----- 1 root operator 0, 96 17 Apr 13:58 ad6e
crw-r----- 1 root operator 0, 97 17 Apr 13:58 ad6f
crw-r----- 1 root operator 0, 98 17 Apr 13:58 ad6s1
crw-r----- 1 root operator 0, 101 17 Apr 13:58 ad6s1a
crw-r----- 1 root operator 0, 102 17 Apr 13:58 ad6s1b
crw-r----- 1 root operator 0, 103 17 Apr 13:58 ad6s1c
crw-r----- 1 root operator 0, 104 17 Apr 13:58 ad6s1d
crw-r----- 1 root operator 0, 105 17 Apr 13:58 ad6s1e
crw-r----- 1 root operator 0, 106 17 Apr 13:58 ad6s1f
I am guessing that a failing disk is responsible for the data
corruption, but I have no errors in /var/log/messages or console.log.
On every boot, the mirror is marked clean ad there's no warnings about
a disk failing anywhere? Where should I be looking for or what should I
be doing to get any warnings?
Also, how-come if ad4 is the working disk, ad4's slices seem to be
labelled as ad6. What's going on here? To me, ad6 appears to have
correct labelling for the mirror from ad6s1a-f
I believe the kernel hides individual labels for a gmirror volume. The
labels on ad4 should be visible in /dev/mirror/. Because gmirror really
just mirrors the data block by block (with a little bit of meta data at
the very end of the drive), once the drive is no longer a member of an
array, the kernel treats it as an individual drive and allows visibility
of all the labels.
OK, so not to worry about the slices.
How can I test for sure whether the disk is damaged or dying, or
whether this is just a temporary glitch in the mirror? This is the
first time I've had a gmirror raid give me problems.
The first time a drive gets kicked out, I typically try to re-insert it.
We have monitoring, so we receive notifications if it fails again. After
that, I get the vendor to replace it.
Assuming ad6 has been deactivated/disconnected, I was thinking of
trying:
gmirror activate gm0 ad6
gmirror rebuild gm0 ad6
Is this safe?
You have to kick ad6 out and re-insert it:
# gmirror forget
# gmirror insert gm0 /dev/ad6
After doing that, I would watch closely for a while in case your drive
is actually failing. I've written a small nagios check for gmirror; let
me know if you'd like me to send it (it could easily be adapted to a
cron job). You can also get `gmirror status' output in your dailies by
adding daily_status_gmirror_enable="YES" to /etc/periodic.conf.
I've since added the gmirror entry to periodic.conf, but your script
sounds ideal. I would like that, thanks. I would much rather get some
warning about this happening as it does appear to have caused some data
corruption.
But, given it's timing out on boot, I would personally bag the drive and
replace it. You'll still need to run the same 2 commands above.
[mesh:/dev/mirror]# gmirror forget
Missing device(s).
[mesh:/dev/mirror]# gmirror status
Name Status Components
mirror/gm0 DEGRADED ad4
[mesh:/dev/mirror]# gmirror insert gm0 /dev/ad6
Not all disks connected.
Looks like it is new disk time then after all.
Thanks for your advice.
Gary
_______________________________________________
--
Chris Cowart
Network Technical Lead
Network & Infrastructure Services, RSSP-IT
UC Berkeley
freebsd-questions@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe@xxxxxxxxxxx"
- References:
- gmirror disk fail questions...
- From: Gary Newcombe
- Re: gmirror disk fail questions...
- From: Christopher Cowart
- gmirror disk fail questions...
- Prev by Date: Re: Where to have my .so files install?
- Next by Date: Re: [SSHd] Limiting access from authorized IP's
- Previous by thread: Re: gmirror disk fail questions...
- Next by thread: RTL8111C driver for FBSD7
- Index(es):
Relevant Pages
|