Vital Patches for ataraid with Intel Matrix RAID (ICH7)



Here's some vital patches for the ataraid driver when using Intel Matrix
RAID (often found built into mainboards these days).

These are problems that will bite at the worst time: When a disk goes
out in your RAID.

A combined patch is attached which applies to FreeBSD 6 and 7, and the
various specific problem reports and issues are outlined below.

Cheers,

Stef Walter



Fix an early boot panic if you reboot with all drives present when your
RAID is marked DEGRADED. This can happen if a drive has an unreadable
block and the drive gets detached from the RAID. Rebooting at this point
will panic. Yoichi created a patch for this over a year ago.

http://www.freebsd.org/cgi/query-pr.cgi?pr=102211


Don't duplicate the RAID amoeba style if you boot with a drive present
that was detached from a RAID. This can happen if you manage to get past
the above panic problem. You'll end up with two devices like ar0 and
ar1. This can be a major mess if ar1 was already contained active file
systems.

http://www.freebsd.org/cgi/query-pr.cgi?pr=121899


If you reboot after adding a spare, or during the rebuilding process,
the RAID will become magically READY by itself. Not cool.

http://www.freebsd.org/cgi/query-pr.cgi?pr=102210


--- sys/dev/ata/ata-raid.c.orig 2008-03-19 11:20:15.000000000 +0000
+++ sys/dev/ata/ata-raid.c 2008-03-19 21:53:37.000000000 +0000
@@ -848,10 +848,17 @@
rdp->status &= ~AR_S_READY;
}

+ /*
+ * Note that when the array breaks so comes up broken we
+ * force a write of the array config to the remaining
+ * drives so that the generation will be incremented past
+ * those of the missing or failed drives (in all cases).
+ */
if (rdp->status != status) {
if (!(rdp->status & AR_S_READY)) {
printf("ar%d: FAILURE - %s array broken\n",
rdp->lun, ata_raid_type(rdp));
+ writeback = 1;
}
else if (rdp->status & AR_S_DEGRADED) {
if (rdp->type & (AR_T_RAID1 | AR_T_RAID01))
@@ -860,6 +867,7 @@
printf("ar%d: WARNING - parity", rdp->lun);
printf(" protection lost. %s array in DEGRADED mode\n",
ata_raid_type(rdp));
+ writeback = 1;
}
}
mtx_unlock(&rdp->lock);
@@ -2157,22 +2165,23 @@

/* clear out any old info */
for (disk = 0; disk < raid->total_disks; disk++) {
+ u_int32_t disk_idx = map->disk_idx[disk] & 0xffff;
raid->disks[disk].dev = NULL;
- bcopy(meta->disk[map->disk_idx[disk]].serial,
+ bcopy(meta->disk[disk_idx].serial,
raid->disks[disk].serial,
sizeof(raid->disks[disk].serial));
raid->disks[disk].sectors =
- meta->disk[map->disk_idx[disk]].sectors;
+ meta->disk[disk_idx].sectors;
raid->disks[disk].flags = 0;
- if (meta->disk[map->disk_idx[disk]].flags & INTEL_F_ONLINE)
+ if (meta->disk[disk_idx].flags & INTEL_F_ONLINE)
raid->disks[disk].flags |= AR_DF_ONLINE;
- if (meta->disk[map->disk_idx[disk]].flags & INTEL_F_ASSIGNED)
+ if (meta->disk[disk_idx].flags & INTEL_F_ASSIGNED)
raid->disks[disk].flags |= AR_DF_ASSIGNED;
- if (meta->disk[map->disk_idx[disk]].flags & INTEL_F_SPARE) {
- raid->disks[disk].flags &= ~(AR_DF_ONLINE | AR_DF_ASSIGNED);
- raid->disks[disk].flags |= AR_DF_SPARE;
+ if (meta->disk[disk_idx].flags & INTEL_F_SPARE) {
+ raid->disks[disk].flags &= ~AR_DF_ONLINE;
+ raid->disks[disk].flags |= (AR_DF_SPARE | AR_DF_ASSIGNED);
}
- if (meta->disk[map->disk_idx[disk]].flags & INTEL_F_DOWN)
+ if (meta->disk[disk_idx].flags & INTEL_F_DOWN)
raid->disks[disk].flags &= ~AR_DF_ONLINE;
}
}
@@ -2183,7 +2192,7 @@
if (!strncmp(raid->disks[disk].serial, atadev->param.serial,
sizeof(raid->disks[disk].serial))) {
raid->disks[disk].dev = parent;
- raid->disks[disk].flags |= (AR_DF_PRESENT | AR_DF_ONLINE);
+ raid->disks[disk].flags |= AR_DF_PRESENT;
ars->raid[raid->volume] = raid;
ars->disk_number[raid->volume] = disk;
retval = 1;
@@ -2233,11 +2242,16 @@
}

rdp->generation++;
- microtime(&timestamp);
+
+ /* Generate a new config_id if none exists */
+ if (!rdp->magic_0) {
+ microtime(&timestamp);
+ rdp->magic_0 = timestamp.tv_sec ^ timestamp.tv_usec;
+ }

bcopy(INTEL_MAGIC, meta->intel_id, sizeof(meta->intel_id));
bcopy(INTEL_VERSION_1100, meta->version, sizeof(meta->version));
- meta->config_id = timestamp.tv_sec;
+ meta->config_id = rdp->magic_0;
meta->generation = rdp->generation;
meta->total_disks = rdp->total_disks;
meta->total_volumes = 1; /* XXX SOS */
_______________________________________________
freebsd-hackers@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@xxxxxxxxxxx"

Relevant Pages

  • Re: Problems with software RAID on SATA
    ... Connected to this are two 320GB drives ... >>which I want to turn into a RAID1 array. ... >>I'm almost certain it's a problem with initting the RAID arrays at boot. ...
    (Debian-User)
  • Re: RAID newbie...can I have several partitions on a RAID 1 array?
    ... You haven't expounded upon why you think you need raid. ... better backup device rather than buy 2 cheap RAID HBAs. ... RAID array then I would have to replace the mobo with the same one or at ... Lets say, for example, you buy 2 identical model drives, from ...
    (comp.sys.ibm.pc.hardware.storage)
  • Re: [PATCH 000 of 5] md: Introduction
    ... "why linux raid isn't Raid really, why it can be worse than plain disk") ... After this, the array ... error is in the filesystem, due to the complex layout of raid5. ... hundreds or 1000s of drives, you've quite high probability that some of them will fail sometimes, or will develop a bad sector etc). ...
    (Linux-Kernel)
  • Corrupt Win2k Software RAID 5 PLEASE HELP **
    ... I have an IDE raid 5 array implemented with windows 2000. ... The OS is on a stand alone disk. ... One of the hard drives ...
    (microsoft.public.win2000.setup)
  • Re[2]: Whay is broken ATARAID that ignored?
    ... > That's not what I call a working raid support! ... For all I know that's actually a bug in the controller firmware (there ... But rather this than FreeBSD going on to write to a RAID1 array ... drives, modify just one and then expect it to still work. ...
    (freebsd-current)