Recommendations for servers running SATA drives



I'm forking the thread on fsck/soft-updates in hopes of getting some practical advice based on the discussion here of background fsck, softupdates and write-caching on SATA drives.

On Fri, 26 Sep 2008, Jeremy Chadwick wrote:

Let's be realistic. We're talking about ATA and SATA hard disks, hooked
up to on-board controllers -- these are the majority of users. Those
with ATA/SATA RAID controllers (not on-board RAID either; most/all of
those do not let you disable drive write caching) *might* have a RAID
BIOS menu item for disabling said feature.

While I would love to deploy every server with SAS, that's not practical in many cases, especially for light-duty servers that are not being pushed very hard. I am taking my chances with multiple affordable drives and gmirror where I cannot throw in a 3Ware card. I imagine that many non-desktop FreeBSD users are doing the same considering you can fetch a decent 1U box with plenty of storage for not much more than $1K. I assume many here are in agreement on this point -- just making it clear that the bargain crowd is not some weird edge case in the userbase...

Regardless of all of this, end-users should, in no way shape or form,
be expected to go to great lengths to disable their disk's write cache.
They will not, I can assure you. Thus, we must assume: write caching
on a disk will be enabled, period. If a filesystem is engineered with
that fact ignored, then the filesystem is either 1) worthless, or 2)
serves a very niche purpose and should not be the default filesystem.

Arguments about defaults aside, this is my first questions. If I've got a server with multiple SATA drives mirrored with gmirror, is turning on write-caching a good idea? What kind of performance impact should I expect? What is the relationship between caching, soft-updates, and either NCQ or TCQ?

Here's an example of a Seagate, trimmed for brevity:

Protocol Serial ATA v1.0
device model ST3160811AS

Feature Support Enable Value Vendor
write cache yes yes
read ahead yes yes
Native Command Queuing (NCQ) yes - 31/0x1F
Tagged Command Queuing (TCQ) no no 31/0x1F

TCQ is clearly not supported, NCQ seems to be supported, but I don't know how to tell if it's actually enabled or not. Write-caching is currently on.

The tradeoff is apparently performance vs. more reliable recovery should the machine lose power, smoke itself, etc., but all I've seen is anecdotal evidence of how bad performance gets.

FWIW, this machine in particular had it's mainboard go up in smoke last week. One drive was too far gone for gmirror to rebuild it without doing a "forget" and "insert". The remaining drive was too screwy for background fsck, but a manual check in single-user left me with no real suprises or problems.

The system is already up and the filesystems mounted. If the error in
question is of such severity that it would impact a user's ability to
reliably use the filesystem, how do you expect constant screaming on
the console will help? A user won't know what it means; there is
already evidence of this happening (re: mysterious ATA DMA errors which
still cannot be figured out[6]).

IMHO, a dirty filesystem should not be mounted until it's been fully
analysed/scanned by fsck. So again, people are putting faith into
UFS2+SU despite actual evidence proving that it doesn't handle all
scenarios.

I'll ask, but it seems like the consensus here is that background fsck, while the default, is best left disabled. The cases where it might make sense are:

-desktop systems
-servers that have incredibly huge filesystems (and even there being able to selectively background fsck filesystems might be helpful)

The first example is obvious, people want a fast-booting desktop. The second is trading long fsck times in single-user for some uncertainty.

The problem here is that when it was created, it was sort of an
"experiment". Now, when someone installs FreeBSD, UFS2 is the default
filesystem used, and SU are enabled on every filesystem except the root
fs. Thus, we have now put ourselves into a situation where said
feature ***must*** be reliable in all cases.

You're also forgetting a huge focus of SU -- snapshots[1]. However, there
are more than enough facts on the table at this point concluding that
snapshots are causing more problems[7] than previously expected. And
there's further evidence filesystem snapshots shouldn't even be used in
this way[8].

...

Filesystems have to be reliable; data integrity is focus #1, and cannot
be sacrificed. Users and administrators *expect* a filesystem to be
reliable. No one is going to keep using a filesystem if it has
disadvantages which can result in data loss or "waste of administrative
time" (which I believe is what's occurring here).

The softupdates question seems tied quite closely to the write-caching question. If write-caching "breaks" SU, that makes things tricky. So another big question:

If write-caching is enabled, should SU be disabled?

And again, what kind of performance and/or reliability sacrifices are being made?

I'd love to hear some input from both admins dealing with this stuff in production and from any developers who are making decisions about the future direction of all of this.

Thanks,

Charles


[1]: http://www.usenix.org/publications/library/proceedings/bsdcon02/mckusick/mckusick_html/index.html
[6]: http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting
[7]: http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
[8]: http://lists.freebsd.org/pipermail/freebsd-stable/2007-January/032070.html

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"

_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: [patch] ext2/3: document conditions when reliable operation is possible
    ... the data on the filesystem has not been horribly mangled. ... further writes to the disk can trash unrelated existing data because it's ... disk can trash unrelated existing data _anyway_, because the flash block ... Today we have cheap plentiful USB keys that act like hard drives, ...
    (Linux-Kernel)
  • Re: linux hard drive failed, clicking on bootup
    ... E) hard drives do and will fail; this can mostly only be predicted ... one's hardware, if the hardware is broken/defective (though it can ... * try a full read test of the filesystem, partition, volume, or full ... fsck/e2fsck - but with the -n option, so no writing is done to the ...
    (comp.os.linux.setup)
  • Re: Looking for a Text on ZFS
    ... Even a 64bit filesystem still has gigantic reserves of space and ... well the drives can still live on their own if they are ever seperated. ... Since I use HDs on my computers, I have had about 20 to 25 ...
    (freebsd-questions)
  • [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
    ... encrypt filesystem for archives. ... Tape (using tar, and a media used by "large data processing shops", ... whiz specialized crap that NASA seems to love) or SCSI hard drives ... How would you get the source off if the filesystem is not ...
    (Debian-User)
  • Re: running mksnap_ffs
    ... |>> I got the following Filesystem: ... |>> The system is used as SMB/NFS server for my other systems here. ... | If snapshots were designed to support background fsck, ... goto out; ...
    (freebsd-stable)