Re: BSD license compatible hash algorithm?



"Aryeh M. Friedman" <aryeh.friedman@xxxxxxxxx> writes:
Dag-Erling Smørgrav <des@xxxxxx> writes:
"Aryeh M. Friedman" <aryeh.friedman@xxxxxxxxx> writes:
All hashs have issues with pooling.... see
http://www.burtleburtle.net/bob/hash/index.html... btw it is a
old wives tale that the number of buckets should be prime (mostly
based on the very weak implementation Knuth offered)
Not an "old wives' tale", but rather an easy way to implement a
hash algorithm that is good enough for most simple uses: metric
modulo table size, where metric is a number derived from the item
in such a manner as to give a good spread.
Sorry for taking a while to reply.... but the above only applies if
your using a very primitive hash like Knuth's multiplication one....

You are overlooking a very common case: the use of a hash table to
implement an in-memory dictionary (aka associative array) where the key
is an integer with poor variability in the high-order bits. K % N where
K is the key and N is the size of the table requires very little code,
is reasonably efficient, and, provided that N is prime, gives a
reasonably good spread (excpet in pathological cases where the values of
K are clustered around multiples of N).

every modern hash I know of should have 2^k buckets actually for some
k<2^32 [in almost all cases <2^16 except for algorithms like the one I
mentioned I am working on which sets k=n where n=the bit count of the
key].

I certainly hope not. 2^(2^32) is several of billion orders of
magnitude more than the number of elementary particles in the known
universe (currently estimated at 10^80). Even 2^(2^16) is too big by
about sixty thousand orders of magnitude.

DES
--
Dag-Erling Smørgrav - des@xxxxxx
_______________________________________________
freebsd-hackers@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Hash function, Birthday paradox and probabilities.
    ... background but I think this is mainly a probability theory problem. ... return an hash of length N bits ... that takes as input a bit string of length L ... and how to minimize the "spread" of m around its average. ...
    (sci.math)
  • Re: MD5 and SHA-1 Digest: What is the probability of repeating hash-values?
    ... that we have no reason to believe that MD5 doesn't spread the hash ... to say whether it's spread "pretty well" or even "moderately well", ... each n-bit integer to a different hash value rather than spreading ...
    (sci.crypt)
  • Re: Hashtable efficiency (was: Re: Over-riding equals method dilemma)
    ... >> good a hash that is will depend on the implementation of the JVM. ... >> will not have as much spread as the programmer might naively expect. ... codes, thus fuller buckets because the codes aren't different. ... FirstSQL/J Object/Relational DBMS ...
    (comp.lang.java.programmer)
  • Re: Mephisto Master specs
    ... With PC programs using a couple of orders of magnitude more memory for hash tables, I wonder if 118K is really adding much, other than in the rare simple endgame position where it can go deep and see the right move. ...
    (rec.games.chess.computer)
  • Hashtable efficiency (was: Re: Over-riding equals method dilemma)
    ... > good a hash that is will depend on the implementation of the JVM. ... > I have seen real problems caused by this issue; ... but I can't imagine why the spread of the hash codes ...
    (comp.lang.java.programmer)