Re: Status of support for 4KB disk sectors
- From: Jeremy Chadwick <freebsd@xxxxxxxxxxxxxxxx>
- Date: Mon, 18 Jul 2011 21:05:44 -0700
On Mon, Jul 18, 2011 at 11:38:00PM -0400, Glen Barber wrote:
On 7/18/11 7:41 PM, Jeremy Chadwick wrote:
On Mon, Jul 18, 2011 at 03:50:15PM -0700, Kevin Oberman wrote:
I just want to check on the status of 4K sector support in FreeBSD. I read
a long thread on the topic from a while back and it looks like I might hit some
issues if I'm not REALLY careful. Since I will be keeping the existing Windows
installation, I need to be sure that I can set up the disk correctly without
screwing up Windows 7.
I was planning on just DDing the W7 slice over, but I am not sure how well this
would play with GPT. Or should I not try to use GPT at all? I'd like
to as this laptop
spreads Windows 7 over two slices and adds a third for the recovery
only one for FreeBSD and I'd like to put my files in a separate slice.
GPT would offer
that fifth slice.
I have read the handbook and don't see any reference to 4K sectors and only a
one-liner about gpart(8) and GPT. Oncew I get this all figured out,
I'll see about writing
an update about this as GPT looks like the way to go in e future.
When you say "4KB sector support", what do you mean by this? All
drives on the market as of this writing, that I've seen, all claim a
physical/logical sector size of 512 bytes -- yes, even SSDs, and EARS
drives which we know use 4KB sectors. They do this to guarantee full
compatibility with existing software.
Since you're talking about gpart and "4KB sector support", did you mean
to ask "what's the state of FreeBSD and aligned partition support to
ensure decent performance with 4KB-sector drives?"
If so: there have been some commits in recent days to RELENG_8 to help
try to address the shortcomings of the existing utilities and GEOM
infrastructure. Read the most recent commit text carefully:
But the currently "known method" is to use gnop(8). Here's an example:
Notice: I'm reading this as "how badly do 'green drives' suck?"
It's important to note that not all WD Caviar Green drives use 4KB
sectors. WD, as of this writing, uses the 4-letter "EARS" string in the
drive model that denotes use of 4KB sectors.
The Green series do have other problems that people have experienced,
such as bugs/quirks in the firmware causing the drive to repetitively
park its heads in the landing zone (witnessed as either really bad drive
performance, or the drive falling off the bus + reattaching). You can
detect this situation by looking at SMART attribute 193
(Load_Cycle_Count). A very high number (in the tens or hundreds of
thousands for a drive that has only been in use for a week or so) is an
indicator of the problem.
WD apparently has given people firmware updates to fix the issue.
However the drive firmware version number does not change after updating
the microcode, but it does fix the problem. (For what it's worth,
Samsung pulled this same manoeuvre when it came to firmware updates for
a catastrophic bug on their SpinPoint F4 drives.) What I'm saying is
there's no way to detect whether or not your drive is running the fixed
firmware, other than looking at said SMART attribute.
I do have references for this issue, but it will take me some time to
dig up the URLs and so on.
FWIW, I've recently done the gnop(8) trick to two "green" drives in one
of my machines because I was seeing horrifying performance problems with
what I consider to be basic stuff, like 'portsnap extract', but more
severely with copying large data (file-backed bacula files to be exact)
into said datasets. I have yet to retry my read/write tests with drives
I have not converted with gnop(8).
I imagine this would have a tremendous effect on performance. With
SSDs, the estimated performance impact is between 30-50% depending on
what the workload is. Meaning with SSDs, drives with aligned partitions
perform 30-50% better. When you read about how NAND cell and NAND flash
pages work (look it up on Wikipedia, look for FTL (flash transition
layer)) it makes sense. With mechanical HDDs, I'm not sure what the
performance hit is, but I imagine it's large.
Furthermore, talking about SSDs again: I want to make folks aware of the
fact that Intel SSDs use an 8KB NAND flash page (not 4KB!). NAND pages
are erased 256 pages at a time (8*256=2MByte). When it comes to
alignment, flash page size is what's of concern. So for Intel SSDs (X25
series, 320 series, and 510 series), 8KByte-aligned is the way to go.
I have not conclusively tested all possible combinations of
configurations, nor reverted the changes to the drives to retest, but if
it is of any interest, here's what I'm seeing.
I have comparisons between WD "green" and "black" drives.
Unfortunately, the machines are not completely similar - one is a
Core2Quad, the other Core2Duo; one has 6GB RAM, the other 8GB RAM; also,
'orion' is running a month-old 8-STABLE; 'kaos' is running a 2-week-old
-CURRENT. Both machines are using ZFSv28:
orion % sysctl -n hw.ncpu; sysctl -n hw.physmem
kaos % sysctl -n hw.ncpu; sysctl -n hw.physmem
The drives in 'orion' are 1TB WD green drives in a ZFS mirror; the
drives in 'kaos' are 1TB WD black drives in a raidz1 (3 drives).
First the read test:
kaos % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null'
12.94 real 0.60 user 11.95 sys
orion % sh -c 'time find /usr/src -type f -name \*.\[1-9\] >/dev/null'
118.02 real 0.46 user 8.74 sys
I guess no real surprise here. 'kaos' has more spindles to read from,
on top of faster seek times.
Next the write test:
The 'compressed' and 'dedup' datasets referenced below are 'lzjb' and
'sha256,verify', respectively. I'd wait for the 'compressed+dedup'
tests to finish, but I have to wake up tomorrow morning.
orion# sh -c 'time portsnap extract -p /zstore/perftest >/dev/null'
306.71 real 44.37 user 110.28 sys
orion# sh -c 'time portsnap extract -p /zstore/perftest_compress >/dev/null'
166.62 real 43.87 user 109.49 sys
orion# sh -c 'time portsnap extract -p /zstore/perftest_dedup >/dev/null'
3576.43 real 44.98 user 109.12 sys
kaos# sh -c 'time portsnap extract -p /perftest >/dev/null'
311.31 real 51.23 user 193.37 sys
kaos# sh -c 'time portsnap extract -p /perftest_compress >/dev/null'
269.85 real 49.55 user 191.56 sys
kaos# sh -c 'time portsnap extract -p /perftest_dedup >/dev/null'
4655.73 real 51.86 user 196.22 sys
Like I said, I have not yet had the time to retest this on drives
without the gnop(8) fix (another similar zpool with 2 drives), so maybe
the data I'm providing isn't relevant, but since the gnop(8) fix for 4K
sector drives was mentioned, I thought it might be relevant to a point.
The problem with what you're testing here is that it's not really
"testing the drive" -- it's testing multiple drives with ZFS in the
middle. Using dd would address that. For testing "non-aligned" offsets
(for the EARS drive), use the seek= parameter. I would also recommend
in picking an awkwardly-sized bs= value, such as 61340.
Now, that's for ZFS, but I'm under the impression the exact same is
needed for FFS/UFS.
<rant> Do I bother doing this with my SSDs? No. Am I suffering in
performance? Probably. Why do I not care? Because the level of
annoyance is extremely high -- remember, all of this has to be done from
within the installer environment (referring to "Emergency Shell"), which
on FreeBSD lacks an incredible amount of usability, and is even worse to
deal with when doing a remote install via PXE/serial. Fixit is the only
decent environment. Given that floppies are more or less gone, I don't
understand why the Fixit environment doesn't replace the "Emergency
Not that it necessarily helps in a PXE environment, but a memstick of
9-CURRENT has helped me recover minor "oops" situations a few times over
the past few months. What are these "floppies" you speak of, again? :)
Sure, USB flash drives work great. But it's a little hard to install a
USB flash drive when you're 3000 miles away. :-) mm's mfsBSD is also
useful for recovery situations:
My point, though, was this: Fixit was separate from Emergency Shell
because of space concerns on floppy disks (Fixit wouldn't fit). Since
floppies really aren't used much any more, this concern should be
revised. IMHO Fixit should be removed and Emergency Shell should
provide the same environment/utilities/etc. as Fixit.
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
freebsd-stable@xxxxxxxxxxx mailing list
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"
- Prev by Date: Re: Status of support for 4KB disk sectors
- Next by Date: Re: disable 64-bit dma for one PCI slot only?
- Previous by thread: Re: Status of support for 4KB disk sectors
- Next by thread: Re: Status of support for 4KB disk sectors