Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligne d to 128 bytes in i386 CPUs)
- From: Nick Evans <nevans@xxxxxxxxxxxxx>
- Date: Wed, 17 Jan 2007 15:29:50 -0500
On Wed, 17 Jan 2007 14:41:44 -0500
Ivan Voras <ivoras@xxxxxx> wrote:
Bruce Evans wrote:
And MMX/XMM registers ar not needed to get movnt on machines withSSE2,
since movnti is part of SSE2. This reduces the advantages of usingMMX/XMM
registers on P4's and A64's in 32-bit mode to the non-nt parts of thent
above (fully cached case), which I think are less important than the
parts.
Hmm, I'm looking at i386/i386/support.s and there are several versions
of bcopy and bmove functions, including some that optimize by using FPU
registers (large_i586_bcopy_loop), and a version that uses movnti
(sse2_pagezero), but I can't find the bit of magic which glues them to
bzero() call.
Also, as as I can tell by the comments, the FPU version works by
manually saving context... why is this possible (i.e. won't something
preempt it?)
Potentially stupid question but, is it not possible to benchmark these
variations at build or boot time and use the most appropriate method? Or
at least the one most appropriate 90% of the time?
Nick
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"
- References:
- Prev by Date: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- Next by Date: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
- Previous by thread: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- Next by thread: Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
- Index(es):
Relevant Pages
|