Consider the set H of all linear (or affine) transformations between two ve
ctor spaces over a finite field F. We study how good SE is as a class of ha
sh functions, namely we consider hashing a set S of size n into a range hav
ing the same cardinality n by a randomly chosen function from H and look at
the expected size of the largest hash bucket. H is a universal class of ha
sh functions for any finite field. but with respect to our measure differen
t fields behave differently.
If the finite field F has n elements, then there is a bad set S subset of F
-2 of size n with expected maximal bucket size Ohm(n1/3). If n is a perfect
square, then there is even a bad set with largest bucket size always at le
ast root n. (This is worst possible, since with respect to a universal clas
s of hash functions every set of size n has expected largest bucket size be
low root n + 1/2.)
If, however, we consider the field of two elements, then we yet much better
bounds. The best previously known upper bound on the expected size of the
largest bucket for this class was O(2(root log n)). We reduce this upper bo
und to O(log n log log n). Note that this is not far from the guarantee for
a random function. There. the average largest bucket would be theta(log ni
log log n).
In the course of our proof we develop a tool which may be of independent in
terest. Suppose we have a subset S of a vector space D over Z(2), and consi
der a random linear mapping of D to a smaller vector space R. If the cardin
ality of S is larger than c(epsilon)\R\log\R\, then with probability 1 - ep
silon, the image of S will cover all elements in the range.