Re: Assembly string functions in i386 libc
- From: "Sean C. Farley" <scf@xxxxxxxxxxx>
- Date: Thu, 12 Jul 2007 16:02:47 -0500 (CDT)
On Thu, 12 Jul 2007, Bruce Evans wrote:
On Thu, 12 Jul 2007, Bruce Evans wrote:
On Wed, 11 Jul 2007, Sean C. Farley wrote:
While looking at increasing the speed of strlen(), I noticed that on
i386 platforms (PIII, P4 and Athlon XP) the performance is abysmal
in libc compared to the version I was writing. After more testing,
I found it was only the assembly version that is really slow. The C
version is fairly quick. Is there a need to continue to use the
assembly versions of string functions on i386? Does it mainly help
slower systems such as those with i386 or i486 CPU's?
I think you are mistaken about the asm version being slow. In my
tests ...
Partly.
I have the results from my P4 (Id = 0xf24 Stepping = 4) system and
the test program here[1]. strlen.tar.bz2 is the archive of it for
anyone's testing. In the strlen/results subdirectory, there are the
results for strings of increasing lengths.
Sorry, I didn't look at this. I just wrote a quick re-test and ran
it
Now I've looked at it. I think it is not testing strlen() at all,
except for the libc case, because __pure prevents more than 1 call to
strlen(). (The existence of __pure is also a bug. __pure was the
FreeBSD spelling of the __const__ attribute in gcc-1. It was removed
when special support for gcc-1 was dropped, and should not have been
recycled.) __pure is a syntax error in the old version of FreeBSD
that I tested on. I first tried __pure2, which is the FreeBSD
spelling of the __const__ attribute in gcc-2. I think it is weaker
than the __pure__ attribute in gcc-3.
attribute since it is being passed a pointer, but __pure__ (__pure)From what I could find, strlen() should not have the __const__ (__pure2)
should work. Are you saying that __pure used to mean __const__ in gcc-1
but now it means __pure__ for gcc-2.96 and above? The redefinition of
__pure is what you are saying is a bug. Yes?
After removing __pure* and adding -static -g to CFLAGS, with
gcc-3.3.3:
On a old Celeron (400MHz) (all P2's probably behave like this):
%%%
libcstrlen: time spent executing strlen(string) = 64: 7.786868
basestrlen: time spent executing strlen(string) = 64: 3.816736
strlen: time spent executing strlen(string) = 64: 3.364313
strlen2: time spent executing strlen(string) = 64: 2.662973
%%%
rep scasb is apparently very slow on P2's.
On an A64 in i386 mode:
%%%
libcstrlen: time spent executing strlen(string) = 64: 0.709657
basestrlen: time spent executing strlen(string) = 64: 0.691397
strlen: time spent executing strlen(string) = 64: 0.527339
strlen2: time spent executing strlen(string) = 64: 0.441090
%%%
Now rep scasb is only slightly slower than the simple C loop (since
all small loops take 2 cycles on AXP and A64...). strlen and strlen2
are marginally faster since their loops do more.
basestrlen is fastest for lengths <= 5 on the Celeron.
basestrlen is fastest for lengths <= 9 on the A64.
I removed __pure from main.c and added -static -g.
Athlon XP 2100 (1.72 GHz):
libcstrlen: time spent executing strlen(string) = 64: 0.994755
asmstrlen: time spent executing strlen(string) = 64: 0.989012
basestrlen: time spent executing strlen(string) = 64: 0.879722
strlen: time spent executing strlen(string) = 64: 0.626727
strlen2: time spent executing strlen(string) = 64: 0.587162
P4 1.6 GHz:
libcstrlen: time spent executing strlen(string) = 64: 2.412558
asmstrlen: time spent executing strlen(string) = 64: 2.413904
basestrlen: time spent executing strlen(string) = 64: 1.049927
strlen: time spent executing strlen(string) = 64: 0.543575
strlen2: time spent executing strlen(string) = 64: 0.547015
PIII 450MHz:
libcstrlen: time spent executing strlen(string) = 64: 6.976066
asmstrlen: time spent executing strlen(string) = 64: 6.974106
basestrlen: time spent executing strlen(string) = 64: 3.464854
strlen: time spent executing strlen(string) = 64: 2.541872
strlen2: time spent executing strlen(string) = 64: 2.339469
The Athlon XP did much better with the assembly version than either
Intel CPU for me. For all three CPU's using various string lengths from
1 to 256, the C versions always beat the assembly version although it
came somewhat close for the 9 to 32 byte lengths to basestrlen.
Even if this does not show that the assembly version should be replaced,
I find this performance testing interesting. I learned something new.
Sean
--
scf@xxxxxxxxxxx
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: Assembly string functions in i386 libc
- From: Bruce Evans
- Re: Assembly string functions in i386 libc
- References:
- Assembly string functions in i386 libc
- From: Sean C. Farley
- Re: Assembly string functions in i386 libc
- From: Bruce Evans
- Re: Assembly string functions in i386 libc
- From: Bruce Evans
- Assembly string functions in i386 libc
- Prev by Date: Re: Porting OpenBSD's sysctl hw.sensors framework to FreeBSD
- Next by Date: Re: Assembly string functions in i386 libc
- Previous by thread: Re: Assembly string functions in i386 libc
- Next by thread: Re: Assembly string functions in i386 libc
- Index(es):