Re: Assembly string functions in i386 libc



On 2007-Jul-11 15:24:01 -0500, "Sean C. Farley" <scf@xxxxxxxxxxx> wrote:
libc compared to the version I was writing. After more testing, I found
it was only the assembly version that is really slow. The C version is
fairly quick. Is there a need to continue to use the assembly versions
of string functions on i386? Does it mainly help slower systems such as
those with i386 or i486 CPU's?

The performance of string instructions has varied wildly across
various x86 implementations. Definitely, for short strings, the
overhead in initialising the various registers outweighs any actual
difference in loop performance. For any recent CPU, the location of
the string in the memory hierarchy far outweighs implementation
issues. bde@ has done various testing in the last and posted results.

Some comments:
- comparing the strlen() in a shared libc with a statically linked one
is unfair - especially on the i386.
- Your results don't include non-aligned inputs
- Your results don't include non-power-of-2 lengths

I would appreciate it if anyone could see if strlen and strlen2 perform
any better on an amd64. Although the current C version of strlen() in
7-CURRENT is faster than mine for smaller values, they perform better
for larger strings.

I've tested on:
FreeBSD 6.2-STABLE #28: Fri Jun 22 11:44:13 EST 2007
root@xxxxxxxxxxxxxxxxxxxxxxx:/usr/obj/usr/src/sys/turion
CPU: AMD Turion(tm) 64 Mobile ML-40 (2194.52-MHz K8-class CPU)
Origin = "AuthenticAMD" Id = 0x20f42 Stepping = 2
Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
Features2=0x1<SSE3>
AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
AMD Features2=0x1<LAHF>

There is no asm strlen so libcstrlen and basestrlen should be identical
(and disassembling [x]strlen() shows that the code _is_ identical) but
there are significant differences for short strings and measurable
differences for all lengths except 32 bytes. This indicates that your
program is not able to accurately compare strlen() performance.

I've tried statically linking all the test programs and this removes
the libcstrlen/basestrlen differences. The very poor results for 4
and 8 byte strings are unexpected but (as expected), your unrolled
strlen() implementations behave better for longer strings.

The attached results all reflect your code with '-static' added to
every gcc/link step.

--
Peter Jeremy
x libcstrlen.01
+ basestrlen.01
* strlen.01
% strlen2.01
+--------------------------------------------------------------------------+
| % |
| * * % |
| * * % |
| ** *x %% |
|* ** * * ** + % %%%+# x +|
||____M_A______| ||______M______AA_________|_A|__| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 0.056836 0.076297 0.0571435 0.0600979 0.0065785842
+ 10 0.056941 0.079997 0.0571135 0.0605826 0.0075392895
No difference proven at 95.0% confidence
* 10 0.045764 0.057498 0.047954 0.0489822 0.0031965751
Difference at 95.0% confidence
-0.0111157 +/- 0.00485944
-18.496% +/- 8.08587%
(Student's t, pooled s = 0.00517184)
% 10 0.0642 0.067644 0.0662535 0.0662219 0.00087897572
Difference at 95.0% confidence
0.006124 +/- 0.00440962
10.19% +/- 7.33739%
(Student's t, pooled s = 0.0046931)
x libcstrlen.02
+ basestrlen.02
* strlen.02
% strlen2.02
+--------------------------------------------------------------------------+
| % |
| % |
| * * % |
| ** *x %% |
| * *** * +*xx + * x %% % % * %|
||_________M____|_A_|MA|A______|___| |___M__A_____| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 0.06611 0.076365 0.0663895 0.0673789 0.0031655271
+ 10 0.065865 0.068425 0.0662385 0.0664137 0.00072301053
No difference proven at 95.0% confidence
* 10 0.059657 0.08414 0.06171 0.0648375 0.0073763495
No difference proven at 95.0% confidence
% 10 0.079855 0.089286 0.0801355 0.0812853 0.0029096056
Difference at 95.0% confidence
0.0139064 +/- 0.00285662
20.6391% +/- 4.23963%
(Student's t, pooled s = 0.00304026)
x libcstrlen.04
+ basestrlen.04
* strlen.04
% strlen2.04
+--------------------------------------------------------------------------+
| * * % |
| * * % |
| * * * % % |
| x ****x +***** + % %%% + % %|
||____|MM|____|A_|___________| |______M___A__________| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 0.082714 0.086885 0.08454 0.0848665 0.0011530442
+ 10 0.084376 0.113232 0.0852015 0.0900253 0.0098820415
No difference proven at 95.0% confidence
* 10 0.089334 0.091925 0.089935 0.090297 0.00094932268
Difference at 95.0% confidence
0.0054305 +/- 0.000992314
6.39887% +/- 1.16926%
(Student's t, pooled s = 0.00105611)
% 10 0.105559 0.131599 0.1080435 0.1111049 0.0078954972
Difference at 95.0% confidence
0.0262384 +/- 0.00530137
30.9173% +/- 6.24671%
(Student's t, pooled s = 0.00564218)
x libcstrlen.08
+ basestrlen.08
* strlen.08
% strlen2.08
+--------------------------------------------------------------------------+
| ** % * |
| ** %%% % ** * |
| +** +x +x %%%% % *** * *|
||__MM_A____| |_MA__| |__M_A_____| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 0.121139 0.146324 0.1226245 0.125436 0.0079127939
+ 10 0.12069 0.145306 0.1219265 0.1250433 0.0076468179
No difference proven at 95.0% confidence
* 10 0.194276 0.218771 0.1965875 0.1998421 0.0075342985
Difference at 95.0% confidence
0.0744061 +/- 0.00725919
59.318% +/- 5.78717%
(Student's t, pooled s = 0.00772586)
% 10 0.162464 0.173597 0.164107 0.1656017 0.0041219019
Difference at 95.0% confidence
0.0401657 +/- 0.00592774
32.0209% +/- 4.72571%
(Student's t, pooled s = 0.00630882)
x libcstrlen.16
+ basestrlen.16
* strlen.16
% strlen2.16
+--------------------------------------------------------------------------+
| % |
| % |
| % * * |
| %% ** x*+ + |
| %% %% % *** * * * * x*++ x*+ x *|
||_M__A____| |__M_A_____| ||M_M_A____|| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 0.267459 0.292087 0.2683745 0.2740531 0.0087043756
+ 10 0.267893 0.292362 0.2707545 0.2747774 0.0078299701
No difference proven at 95.0% confidence
* 10 0.212733 0.236073 0.2156615 0.2196616 0.007802558
Difference at 95.0% confidence
-0.0543915 +/- 0.00776649
-19.8471% +/- 2.83394%
(Student's t, pooled s = 0.00826577)
% 10 0.185465 0.208264 0.186767 0.1902279 0.0071633648
Difference at 95.0% confidence
-0.0838252 +/- 0.0074897
-30.5872% +/- 2.73294%
(Student's t, pooled s = 0.0079712)
x libcstrlen.32
+ basestrlen.32
* strlen.32
% strlen2.32
+--------------------------------------------------------------------------+
| % * |
| % * * x + + |
| % % % * * xx ++ + |
|%%% % % * ** * * +xx+** * xx|
||M_A__| |__A__| |__|MA_|_| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 0.414594 0.45129 0.4222915 0.4275801 0.014180339
+ 10 0.412549 0.438513 0.426586 0.4282437 0.0078657702
No difference proven at 95.0% confidence
* 10 0.249627 0.276534 0.260869 0.2597264 0.0093742772
Difference at 95.0% confidence
-0.167854 +/- 0.0112939
-39.2567% +/- 2.64135%
(Student's t, pooled s = 0.01202)
% 10 0.212447 0.236769 0.2154385 0.2202512 0.0093600922
Difference at 95.0% confidence
-0.207329 +/- 0.0112887
-48.4889% +/- 2.64014%
(Student's t, pooled s = 0.0120144)
x libcstrlen.64
+ basestrlen.64
* strlen.64
% strlen2.64
+--------------------------------------------------------------------------+
|% ** |
|% % ** |
|%%% ** +**+ x|
|%%% % ** * ******|
||A| |A| ||A_| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 0.709884 0.744587 0.7251365 0.7276627 0.011059351
+ 10 0.711416 0.742595 0.723867 0.7256684 0.010705189
No difference proven at 95.0% confidence
* 10 0.324096 0.345875 0.330732 0.33092 0.0067527522
Difference at 95.0% confidence
-0.396743 +/- 0.0086092
-54.5229% +/- 1.18313%
(Student's t, pooled s = 0.00916267)
% 10 0.267484 0.290712 0.273968 0.2746264 0.0073885214
Difference at 95.0% confidence
-0.453036 +/- 0.00883668
-62.2591% +/- 1.21439%
(Student's t, pooled s = 0.00940477)

Attachment: pgpKUx1v1opTM.pgp
Description: PGP signature



Relevant Pages

  • Re: Benchmarks: STLs string vs. C string
    ... without any extra strlen execution inside the sprintf. ... implementations, although all implementations I have tried have been very fast. ... In many programs you need to copy/format strings without actually doing a printf. ... I have noticed that most programmers dont care or know about the ...
    (comp.arch.embedded)
  • Re: std::string and refcounting
    ... > subject of our respective string implementations. ... > using the home-grown stringswere mainly refcounting and ... but I thought that these days almost all STL ... > implementations used refcounted strings and that the STL was available ...
    (comp.lang.cpp)
  • Re: Assembly string functions in i386 libc
    ... Definitely, for short strings, the ... indicates that your program is not able to accurately compare strlen() ... the libcstrlen/basestrlen differences. ... strlenimplementations behave better for longer strings. ...
    (freebsd-arch)
  • Re: "The Elements of Programming Style"
    ... Strings do not have any auto expansion. ... The parameter is the structure describing the string. ... implementations. ... That genesis is included in my implementation, ...
    (comp.lang.c)