Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs



On Jan 18, 2007, at 2:28 PM, Maxim Sobolev wrote:
Unfortunately, there are simply different tradeoffs between mechanisms for copying depending on whether you want to use or avoid using/thrashing the L1/L2 caches, whether the data is cache- aligned, and so forth; the CPU can't infer what you want to occur-- you have to tell it. I find it interesting that some of the architectures (PA-RISC,

Well, of course there are some special cases, but in general there should be some baseline suitable for most of uses. That's why we (and most other operating systems) only provide single version for the mem*(3) APIs.

Well, a truly generic version in is lib/libc/string/bcopy.c; it's architecture-neutral (ie, it's pure C code) and it handles all kinds of things like overlapping source and destination addresses, non- aligned access, and so forth. The downside is that it's slower than using movl/movsl, much less some of the fancier variants that Bruce and Matt have been discussing (in considerable, interesting detail) earlier:

http://now.cs.berkeley.edu/Td/bcopy.html

If you're only moving, say, 5 bytes, the overhead of fancy loop unrolling and prefetching and so forth isn't going to help compared with a simple movb/movl combination, so it really depends.

--
-Chuck

_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"