Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- From: Bruce Evans <bde@xxxxxxxxxxx>
- Date: Sun, 21 Jan 2007 15:03:48 +1100 (EST)
On Sat, 20 Jan 2007, David Malone wrote:
On Thu, Jan 18, 2007 at 11:16:19AM +1100, Bruce Evans wrote:- the FPU routines are faster on Athlons (XP and 64 at least), but these
didn't exist until 2001. The introduction of these CPUs may have
been the trigger for turning off the FPU routines in -current in 2001.
Until then problems were limited to Pentium-1's since the dynamic
configuration prevented the routines being used on all other machines.
I think a very quirky K6-2 machine that I had let us reproduce the
problem fairly dependably and may have been part of the reason it
was finally turned off.
I just looked again at your old (2001) mail about this. The userland
benchmark was flawed. It tried 3 methods sequentially without warming
up caches, so all methods did unintended testing of I-cache misses
(including branch target cache cache) and the first method (userland
bzero) warmed up the D-cache for the other 2. The kernel runtime
configuration also fails to either warm or cool the caches initially.
It assumes P1 cache sizes and depends on a 1MB buffer being much larger
than caches. Maybe this was not enough for K6-2. It is certainly not
enough for Athlon64, but I think it would mostly cause false negatives
so I don't understand why it gave a false positive for the K6-2.
After fixing the userland benchmark, userland bzero did much better
and your benchmark agreed with mine that FPU methods for bzero are
just pessimizations on A64-AXP. However, the behaviour for bcopy
is quite different on A64-AXP -- even the old FPU methods are small
optimizations in some cases (on A64, about 25% in the fully-L2 cached
case; little difference for other large copies).
Bruce
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"
- References:
- Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
- From: Kip Macy
- Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
- From: Kip Macy
- Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
- From: Ivan Voras
- Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
- From: Bruce Evans
- Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
- From: Bruce Evans
- Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- From: Ivan Voras
- Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- From: Attilio Rao
- Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- From: Bruce Evans
- Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- From: David Malone
- Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
- Prev by Date: Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- Next by Date: src/gnu
- Previous by thread: Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- Next by thread: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
- Index(es):