Re: Assembly string functions in i386 libc



On Fri, 13 Jul 2007, Bruce Evans wrote:

On Thu, 12 Jul 2007, Sean C. Farley wrote:

On Thu, 12 Jul 2007, Bruce Evans wrote:

Now I've looked at it. I think it is not testing strlen() at all,
except for the libc case, because __pure prevents more than 1 call
to strlen(). (The existence of __pure is also a bug. __pure was
the FreeBSD spelling of the __const__ attribute in gcc-1. It was
removed when special support for gcc-1 was dropped, and should not
have been recycled.) __pure is a syntax error in the old version of
FreeBSD that I tested on. I first tried __pure2, which is the
FreeBSD spelling of the __const__ attribute in gcc-2. I think it is
weaker than the __pure__ attribute in gcc-3.

From what I could find, strlen() should not have the __const__
(__pure2) attribute since it is being passed a pointer, but __pure__
(__pure) should work. Are you saying that __pure used to mean
__const__ in gcc-1 but now it means __pure__ for gcc-2.96 and above?
The redefinition of __pure is what you are saying is a bug. Yes?

Yes to most of this. __pure2 is actually weaker than __pure[>2.96].
__pure2 has the very large effect of removing all calls to strlen()
from the loop. This affected everything except libc strlen() since
everything else was named xstrlen() and declared as __pure*, while
libc strlen() was declared in <string.h> without __pure*.

Actually, the reason I had __pure in main.c was because it exists in
string.h.

OTOH, __pure[>2.96] has no effect on this benchmark, at least with
gcc-3.3.3. I don't understand why it has no effect. It has no effect
even when I change the arg to a literal. The context is very simple,
with no aliasing problems in sight, at least with the literal arg
(with the arg possibly being argv[2], maybe gcc has to worry about the
arg being modified by a signal handler). If __pure[>2.96] doesn't
work in this simple context, then it isn't clear when it can work.

Using or not using __pure with gcc-3.4.6 has no effect for me even with
the literal argument regardless of optimization (-O0, -O1, or -O2).

BTW, starting somewhere near gcc-3.4 for -O2 and gcc-4.2 for -O,
simple loops like this don't always work in benchmarks, because the
compiler removes the whole loop if it can see that it doesn't do
anything. The compiler can see this if it can see inside any function
calls in the loop (this currently requires the functions to be in the
same source file or #included there), or if the functions are declared
as sufficiently __pure. When I used __pure2 with gcc-3.3.3 -O, gcc
removed the function calls but not the loop. gcc-4.2 would also
remove the loop.

Interesting. I need to remember this.

Just to note, __pure2 is not valid with strlen() since it examines data
passed via a pointer, according to the GCC docs.

...[A64 in 32-bit mode similar to AXP]

BTW, does AXP refer to Athlon XP or Alpha AXP? When I first saw you
write AXP, I thought it was an Alpha. :)

...[asm version more than twice as slow on P3-P4]

The Athlon XP did much better with the assembly version than either
Intel CPU for me. For all three CPU's using various string lengths
from 1 to 256, the C versions always beat the assembly version
although it came somewhat close for the 9 to 32 byte lengths to
basestrlen.

Intel CPUs are remarkably different from AXP :-). I'm surprised at
the sign of the difference here -- I would have expected them to be
better for the string instructions.

That is what has been confusing me. Possibly, Intel has not touched the
basics of these string instructions for a longer time than AMD.

Sean
--
scf@xxxxxxxxxxx
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"



Relevant Pages