Re: aac0: COMMAND 0xffffffffxxxxxxxx TIMEOUT AFTER xx SECONDS



Chris Hedley wrote:
On Fri, 9 Jun 2006, Doug White wrote:

On Fri, 9 Jun 2006, Chris Hedley wrote:

I've been receiving this message quite a lot lately if I put my Adaptec 2410SA aac controller under really heavy load. A quick look at the archives suggests that it used to be a problem a couple of years ago, but was apparently fixed. Personally I've had no bother with it until a few months ago when I upgraded my version of -CURRENT, at which point it started misbehaving.


I assume you've checked cabling and termination? Frequently, driver updates can improve performance which means less tolerance for marginal configurations.


The 2410SA uses SATA discs (I was trying to get SCSI performance on the cheap, ever the optimist!) so I'm assuming that the cables are okay. At least there's no user-breakable termination settings for me to worry about...

I'm also wondering if I might not be better off actually replacing the card with something better, or at least something better suited to FreeBSD: with the discs' and controller's write-caching turned off, the 2410SA is s-l-o-w, about 6MB/s for contiguous writes to an array (either RAID-5 or RAID-10) (benchmarked using the admittedly somewhat crude "dd various block sizes to/from a /dev entry" technique), although reads are acceptable at ~50-60MB/s, if not especially earth-shattering. Any suggestions (for something inexpensive! If money were no object I'd've gone for a SCSI-only system), or might I just as well stick with the 2410SA?


6MB/s sounds like you aren't getting any help from the card's write cache; its having to do stripe reads to recalculate parity instead of doing full stripe writes. Many cards disable write-back cache if the battery module isn't present -- make sure you have one and its working. /dev accesses also use physio so you don't get any benefit from write combining in the filesystem layer.


I've deliberately turned off write-caching because the 2410SA doesn't support battery-backed memory. I'm not sure if it's really necessary to disable it, but having experienced the odd disc crash in the past I've become a little paranoid about my data...


What the battery gives you is consistency of the parity data in the case of power loss. You can have a situation where a block is being modified, and thus the parity also needs to be modified. If the block
gets written but not the parity, or the parity gets written but not the
block, the stripe will be inconsistent. You won't see this until you
have a drive failure and are trying to do a rebuild from teh parity. By
that point, it's too late, you'll have silent data corruption due to the
inconsistency. For RAID-0, the battery is pointless, and for RAID-1, the battery is nearly pointless; the mirror members will either agree or
not, and if they disagree the worst that will happen is that you'll get
old data. This is no different than if the OS crashes without flushing
out all buffers. Old data is much easier to recover from than corrupt
data, which is what you get if the parity is inconsistent.

Also, in general, hardware RAID beats PCI RAID, hands down.


In my case, software raid beats it too! I have my "fast discs" attached to an old 3960 controller and mirror them with gmirror, and the write performance is an order of magnitude better than the 2410SA, which tells me that something somewhere must be wrong. I know I shouldn't really expect SCSI performance from SATA discs, but this seems a bit much to me (I also have write-caching turned off on my SCSI discs, but I have enabled tagged queueing). I'm still slightly uncomfortable with the idea of software RAID, but it hasn't lost anything yet, in spite of a few "unplanned outages".

Software RAID will almost always be faster for trivial tasks than PCI
RAID. What PCI RAID gives you is task offloading from the CPU, and
protection while the OS is not running. If your CPU is sitting idle
most of the time, then software RAID often is a win.

That said, the design of a PCI RAID controller plays a huge role in how
it performs. Let's just say that the 2410 design is, um, "low end". There are other cards out there from several vendors, especially the newer generation ones that use PCI-Express and PCI-X, that perform a whole heck of a lot better. I have several cards that beat software
RAID by a wide margin, but they are also expensive.

Scott
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Hard Drive configuration ? for Video Capture
    ... Maybe, but IMHO, RAID is more trouble than it's worth unless you ... unless the video card is specifically used by the video ... that it is just in the manufacturing of the discs. ... software is actually reading the disk back and verifying all ...
    (rec.video.desktop)
  • Re: AIX V5.3 & FASTT500 PERFORMANCE TUNING
    ... calculate the parity data every time a write is done, there is a decrease on performance when compared with reads, which doesn’t require the parity calculation. ... On a RAID_10, there is no parity calculation on either read or write, but there’s almost always a small slowdown in the write performance, due to the disk internals. ... commonly used implementation of RAID, Level 4 provides block-level striping with a parity disk. ... the information contained in this communication ...
    (AIX-L)
  • Re: backup archive format saved to disk
    ... special controller, and which would need to special rescue CD ... RAID level 1 is simply multiple copies. ... One way of doing RAID 5 with three discs is to write data ... To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx with a subject of "unsubscribe". ...
    (Debian-User)
  • Re: Best Raid Level for Streaming?
    ... RAID 3: Striping and Parity ... In RAID level 3, data is striped across a set of disks. ... is generated and stored on a dedicated disk. ... In RAID level 5, both parity and data are striped across a set of disks. ...
    (microsoft.public.windowsmedia.server)
  • Re: bad blocks on raid5 cause filesystem failure
    ... > fail at the same time, before rebuild can finish parity on the first, ... At some point, the RAID ... > controller should, like you say, stop all host IO and report the drive ... You are 100% right though about the RAID array not working properly. ...
    (comp.os.linux.hardware)