ZFS raidz recovery



Hi all, I'm trying to simulate a disk fail and replacement in
a raidz array and failing myself. What'm I doing wrong? Here's
a transcript with interspersed commentary:

root@file:~# zpool status
pool: raid
state: ONLINE
scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:20:06 2010
config:

NAME STATE READ WRITE CKSUM
raid ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad12 ONLINE 0 0 0
ad13 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0

errors: No known data errors
root@file:~# zpool offline raid ad12

reboot
dd if=/dev/zero of=/dev/ad12 ..

root@file:~# zpool replace raid ad12
cannot replace ad12 with ad12: ad12 is busy
root@file:~# zpool replace -f raid ad12
cannot replace ad12 with ad12: ad12 is busy

The handbook suggests 'replace' but I guess this is only
if the disk is physically replaced and gets a new identifier?
Trying with 'online':

root@file:~# zpool online raid ad12
root@file:~# zpool status
pool: raid
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Sat Nov 27 13:29:14 2010
config:

NAME STATE READ WRITE CKSUM
raid ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad12 ONLINE 0 0 0 15.5K resilvered
ad13 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0

errors: No known data errors

Output remains as such, is this normal?

root@file:~# zpool scrub raid
root@file:~# zpool status
pool: raid
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:30:37 2010
config:

NAME STATE READ WRITE CKSUM
raid ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad12 ONLINE 0 0 2.11K 87.7M repaired
ad13 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0

errors: No known data errors
root@file:~# zpool scrub raid
root@file:~# zpool status
pool: raid
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:30:55 2010
config:

NAME STATE READ WRITE CKSUM
raid ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad12 ONLINE 0 0 2.11K
ad13 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0

errors: No known data errors

These are checksum errors? So the disk hasn't been integrated
properly?

root@file:~# zpool clear raid ad12
root@file:~# zpool status
pool: raid
state: ONLINE
scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:39:09 2010
config:

NAME STATE READ WRITE CKSUM
raid ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad12 ONLINE 0 0 0
ad13 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0

errors: No known data errors
root@file:~# zpool status -x
all pools are healthy

To make sure this's the case I fail a different disk:

root@file:~# zpool offline raid ad6
root@file:~# zpool status
pool: raid
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:40:52 2010
config:

NAME STATE READ WRITE CKSUM
raid DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
ad12 ONLINE 0 0 0
ad13 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 OFFLINE 0 0 0

errors: No known data errors

on reboot the status changes:

root@file:~# zpool status
pool: raid
state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
see: http://www.sun.com/msg/ZFS-8000-72
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
raid FAULTED 0 0 1 corrupted data
raidz1 DEGRADED 0 0 6
ad12 OFFLINE 0 0 0
ad13 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 1


The same happens if I recreate the array and try again.
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: ZFS raidz recovery
    ... pool: raid ... state: ONLINE ...
    (freebsd-stable)
  • Re: Bad news re: new (20080817) ZFS patches and send/recv (broken again)
    ... after today's batch of ZFS patches merged - all that remains is that I uncommented ps -axl from /usr/sbin/crashinfo, ... (Note that the "no such pool" in the FIRST script is normal; it simply tries to clean up something that isn't there, ... zpool export crashtestmaster ... echo Creating files and syncing ...
    (freebsd-current)
  • Re: zpool vdev vs. glabel
    ... your disk off before glabeling and adding it to your pool. ... I have created a raidz2 with disk I labeled with glabel before. ... zpool offline tank label/tank6 ...
    (freebsd-stable)
  • Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic
    ... a live CD supporting zpool v28) ... Doing a zpool import actually made it show that the pool had ... So I run zpool import -f pool in openindiana, and luckily, all my ... option of rebooting. ...
    (freebsd-stable)
  • SUMMARY: WWN changed, ZFS confused
    ... Oracle got back to me not long after I'd posted this.. ... root@ host> zpool export iscsidevos ... What the zpool import command does is exam all disks labels for a pool name, ...
    (SunManagers)