Re: ATA 4K sector issues



We experimented a bit with aligning fdisk (dos slices) by changing
the sector offset to 2 but I came to the conclusion that it was better
to do the alignment in disklabel / gpt / whatever higher-level
partitioner floats your boat and not mess with anything the BIOS
uses to boot the machine

My recommendation is to use a 1MB physical base alignment. That's what
I adjusted DragonFly's disklabel64 to do. It's definitely best to
have the partitioner deal with it instead of having to mess around
manually because the partitioner can calculate the actual physical
alignment by querying the kernel's disk subsystem regardless of the
topology.

There are several reasons for using a large alignment:

* A variety of media already uses much larger physical block sizes.
MLC flash uses 128K and SLC uses 64K blocks. See the note below
on why this matters even though SSDs do write combining.

* A larger alignment is more likely to work well as a default in
RAID configurations and doesn't hurt non-RAID.

* The kernel cluster I/O subsystem wants to collect stuff into 64K-256K
clusters for reading and writing (writing being the most important).
A larger alignment plus some minor tweeks in the cluster code will
cause the cluster writes to also be well aligned.

* Even though UFS does not take advantage of cluster alignment
(because BMAP tends to align only to the UFS block size which
is a fairly small <= 32K usually), filesystems such as ZFS (with
128K blocks I believe) and HAMMER (with 64K blocks and 8MB super
blocks) will. And fixing up UFS isn't difficult. One might need
to mess with the cylinder group alignment and make some minor tweeks
to the bmap allocator but that's about it.

* A large alignment hurts nothing. Who cares about ~512K-1MB of wasted
space at the beginning of the drive? I don't.

This is particularly important for SSDs. Even though SSDs do write
combining a properly aligned write will theoretically greatly improve
write endurance by reducing internal fragmentation, reducing write
amplification effects, and also reducing the amount of internal
rewriting the drive does to defragment and wear-level. It is hard
to test this but I am seeing wear rates condusive with a 100TB write
endurance on 40G Intel drives vendor-speced for a 35TB write endurance.

So even though you might not see a major difference in performance
you could very well see a big difference in write endurance. It isn't
possible to benchmark this with a standard benchmark which keeps the
SSD 100% active so I've been using real work loads and it just takes
forever to tick-down the SSDs wear-meter. The SSD also needs idle
time to implement internal defragmentation and wear leveling efficiently
(This seems more apparent in the OCZs than in the Intels).

There are a lot of moving parts in the kernel related to alignment.
The cluster code and the filesystem block allocation code are the two
biggest issues and adjustments have to made to take proper advantage
of it, particularly for SSDs.

So the answer is: Aligning things certainly isn't going to hurt
anything so you might as well kick it hard (use a large alignment)
so you don't have to revisist the problem again a year from now.

--

For hard drives with larger physical sector sizes it shouldn't matter
for asynchronous writes. It really shouldn't. And nearly all of UFS's
writes are asynchronous. That said:

I read Thiago's posting. I will note something specifics about a ports
tarball. Ports has 261,000+ files in it, mostly small. UFS and the
cluster code CANNOT COMBINE those writes (because the buffer-cache for
file data is per-vnode), so UFS will wind up doing a very large number
of fragment-sized writes.

These fragment-sized writes (4K in Thiago's aligned test that ran in
1:25, and 2K in Thiago's aligned test that ran in 10:24) should STILL
be write-combined in the drive. That is, UFS STILL has good write
linearity even with the small writes.

So I suspect the issue here is that the drive is not properly
write-combining the writes, possibly coupled with additional issues
in UFS's bmap and inode allocator that might not be presenting the
drive with enough write-combinable data that fits in the drive's cache,
forcing the drive to do a lot of read-before-write.

In terms of write-combinable data and UFS it could be a cylinder-group
alignment issue. Bitmap blocks are a particular problem because they
use an odd-sized block size (typically 6K if I remember right), though
I'm not sure how the filesystem fragment size effects it.

You would have to instrument the write activity to determine how
good the linearity is verses the size of the drive's ram cache.
There are definitely several possible explanations for the horrible
performance when using 2K fragments.

ZFS (and also HAMMER) would not have this particular problem. ZFS
clearly has other issues in those tests but I don't know enough about
its internals to guess, other than maybe it is a ZIL tuning issue.

-Matt

_______________________________________________
freebsd-hackers@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: anyone know where I can find a service manual for Disk II
    ... You can use a known-good-alignment disk to check azimuth alignment ... The radial alignment test track is an analog recording made by a special ... When the reading head is properly centered over the track, ... "standard" to which all drives will be aligned. ...
    (comp.sys.apple2)
  • Re: Seeking APTEST
    ... since the alignment is fixed by the angular alignment of the stepper ... motor and the cam, ... cam or the stepper mounts to loosen, track alignment can be an issue. ... I concur, being I have received, in the past, 2 such drives. ...
    (comp.sys.apple2)
  • Re: 1571 troubles
    ... formatting only the 2nd side of a disk as a result of alignment issues. ... but these disk drives are famous for fun and exciting problems. ... the formatting problem happen at the same point everytime, ...
    (comp.sys.cbm)
  • Re: 1571 troubles
    ... formatting only the 2nd side of a disk as a result of alignment issues. ... but these disk drives are famous for fun and exciting problems. ... the formatting problem happen at the same point everytime, ...
    (comp.sys.cbm)
  • Re: Help Re. satellite TV
    ... For a start, each cluster is only two or three degrees part, so ... For a fixed dish, Tony's figure is ... Accuracy is not important when it comes to satellite alignment is it, ... being accurate is just being an annoying pedant. ...
    (uk.tech.digital-tv)