Suggested improvement to radixsort()

From: Markus B Krüger (markusk_at_pvv.org)
Date: 10/31/03

  • Next message: David Malone: "Re: O_NOACCESS?"
    Date: Fri, 31 Oct 2003 15:13:53 +0100 (CET)
    To: freebsd-hackers@freebsd.org
    
    

    (I hope this is the correct forum for this post; if not, please let me
    know where it should go.)

    I've found a possible improvement to the implementation of radixsort()
    located in /usr/src/lib/libc/stdlib/radixsort.c, which will make the
    implementation more efficient when sorting data sets containing many
    strings that share a common prefix.

    The improvement consists in checking for bins where all strings have
    the same character at the current position of the radix sort, and
    moving on to the next character instead of needlessly shuffling the
    strings in the bin around. When I used the modified radixsort() to
    sort a web log file, which typically contains many strings with common
    prefixes (IP addresses), I registered a speedup of approximately 15%.
    On a constructed dataset where all strings had a long common prefix,
    the speedup was 50%.

    My modifications are listed below in patch format. Corrections,
    comments and questions are welcome. Is this patch something that I
    should send with send-pr, or should it be submitted elsewhere?

    --- begin radixsort.c.patch ---
    *** radixsort.c Tue Mar 11 12:39:57 1997
    --- radixsort.c.patch Fri Oct 31 12:55:10 2003
    ***************
    *** 175,180 ****
    --- 175,191 ----
                      }

                      /*
    + * Special case: if all strings have the same
    + * character at position i, move on to the next
    + * character.
    + */
    + if (nc == 1 && count[bmin] == n) {
    + push(a, n, i+1);
    + nc = count[bmin] = 0;
    + continue;
    + }
    +
    + /*
                       * Set top[]; push incompletely sorted bins onto stack.
                       * top[] = pointers to last out-of-place element in bins.
                       * count[] = counts of elements in bins.
    --- end radixsort.c.patch ---

    -- 
     ,-------------------  Markus Bjartveit Krüger  ---------------------.
    '                                                                     `
    ` E-mail: markusk@pvv.org           WWW: http://www.pvv.org/~markusk/ '
     )-------------------------------------------------------------------(
    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
    

  • Next message: David Malone: "Re: O_NOACCESS?"

    Relevant Pages

    • Re: Determining the common prefix for several strings
      ... puts items.first ... We could compute the length of the shortest string and use that finer upper limit, but that's not necessary because even in the case that the shortest string is the common prefix, we would get some nil in the map through columns in the next iteration, and thus at least two distinct values. ... On the other hand there's the corner case when all strings are equal, that's not handled in the above code because they way to do it depends on the details. ...
      (comp.lang.ruby)
    • Smarter ways to define MACRO to strings?
      ... I have a number of strings in an array, ... I want to add the common prefix to all the string, ...
      (comp.lang.c)