Re: Hot-changing a failed HDD with ahci.ko



On Wed, Dec 14, 2011 at 09:29:52AM +0100, Patrick M. Hausen wrote:
Hi, all,

while most cheap servers with SATA disks are not really hot-plug
capable, changing a failed disk (either gmirror or zfs) was possible
without a reboot by executing e.g. if ad4 failed:

atacontrol detach ata2
<change disks>
atacontrol attach ata2

What is the proper equivalent for ahci, ada0 and camcontrol?

None is needed: yank the disk, reinsert, wait a few seconds, done.
Validation, with full output, hardware, etc:

http://koitsu.wordpress.com/2010/07/22/freebsd-and-zfs-hot-swapping-sata-disks-with-ahci/

I've made videos to demonstrate this as well, but need to edit them and
upload them.

Stop unit commands seem not to work with SATA disks, so I
tried:

<forcefully unplug "broken" disk>
-> system logs about lost device, so far so good
<insert new disk>
camcontrol reset 1
camcontrol devlist
-> disk still not there
camcontrol rescan 1
-> command hangs
<login to a second session, system still responsive>
shutdown -r now
-> system panics, eventually reboots

Before you yanked the disk, were any non-ZFS filesystems mounted?

This sounds similar to what happens if you were to yank a classic SATA
disk from a non-AHCI system, or under ata(4), without detaching first.
Or, on some systems, when SATA disks are yanked without use of a
hot-swap backplane.

I can provide details about the panic if someone is interested,
but maybe there is a proper procedure already, which I simply missed.

System is RELENG_8_2 amd64.
ahci0: <Intel Cougar Point AHCI SATA controller> port 0xf090-0xf097,0xf080-0xf083,0xf070-0xf077,0xf060-0xf063,0xf020-0xf03f mem 0xfb921000-0xfb9217ff irq 19 at device 31.2 on pci0
ada0 at ahcich0 bus 0 scbus1 target 0 lun 0
ada0: <ST31000340NS SN05> ATA-8 SATA 1.x device
ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada1 at ahcich1 bus 0 scbus2 target 0 lun 0
ada1: <ST31000340NS SN05> ATA-8 SATA 1.x device
ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)

You might try booting RELENG_9 (which has ahci.ko as the default, so no
need to mess about) on a LiveCD or equivalent and attempt the same
thing. I'm left wondering if there's some stuff in RELENG_8 (not a typo
compared to the above RELENG_9 reference) that you do not have in
RELENG_8_2.

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: FreeBSD as VMWare guest / disk resizing
    ... add new disks without a reboot but, as I described below, have not ... disk without a reboot. ... I'm looking for a way to get FreeBSD 8 / 9 to detect that an already ... Here is the output of camcontrol showing how the disks ...
    (freebsd-questions)
  • Re: problem with SATA disk, difference between standard kernel and Debian kernel
    ... It still has the option to load more modules from a floppy disk, ... There are also no floppy disk images of the installer ... What is unusual about SATA disks and controllers? ...
    (Debian-User)
  • Re: [opensuse] 11.3 grub and disk ordering/numbering
    ... 11.3 grub and disk ordering/numbering ... I am using SATA disks only. ... A little trial an error with GRUB and I got the order figured out and the system ... works fine now with all drives attached. ...
    (SuSE)
  • Re: zpool errors
    ... using ZFS on consumer SATA disks at home, ... all other cases the disk was clearly broken, ...
    (comp.unix.solaris)
  • Raid 5 hard disk choice
    ... I am thinking of using SW raid 5 on my Asus P4P800 Deluxe. ... Will 2 SATA disks saturate the SATA controller? ... What is the performance gain between a single disk and raid 5? ...
    (alt.os.linux.suse)