Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)



Bruce Evans wrote:

And MMX/XMM registers ar not needed to get movnt on machines with SSE2,
since movnti is part of SSE2. This reduces the advantages of using MMX/XMM
registers on P4's and A64's in 32-bit mode to the non-nt parts of the
above (fully cached case), which I think are less important than the nt
parts.

Hmm, I'm looking at i386/i386/support.s and there are several versions
of bcopy and bmove functions, including some that optimize by using FPU
registers (large_i586_bcopy_loop), and a version that uses movnti
(sse2_pagezero), but I can't find the bit of magic which glues them to
bzero() call.

Also, as as I can tell by the comments, the FPU version works by
manually saving context... why is this possible (i.e. won't something
preempt it?)

Attachment: signature.asc
Description: OpenPGP digital signature



Relevant Pages

  • Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
    ... since movnti is part of SSE2. ... registers on P4's and A64's in 32-bit mode to the non-nt parts of the ... including some that optimize by using FPU ...
    (freebsd-current)
  • Re: uncertain: x87 or SSE2
    ... optimizations will take place on the SSE hardware, not the FPU. ... it is not a true register machine (the registers are relative); ... Every stack machine has *some* limitations. ...
    (alt.lang.asm)
  • Re: Whats gonna happen to "extended"?
    ... When implementing Win64, Microsoft decided not to save the FPU & MMX registers across context switches for 64bit programs, which effectively means x87 FPU, MMX and 3DNow! ... There are new x64 instructions planned in the next two years, maybe they'll include the ability to work on 128 bit floats (there already are 128bit float registers in x64... ...
    (borland.public.delphi.non-technical)
  • Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
    ... AFAIK the only major stopper is context saving of the ... >> than SSE, but overlay FPU registers?) ... > When I implemented fpu copy into the kernel I had a lot of thinking ...
    (freebsd-arch)
  • Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
    ... AFAIK the only major stopper is context saving of the ... >> than SSE, but overlay FPU registers?) ... > When I implemented fpu copy into the kernel I had a lot of thinking ...
    (freebsd-current)