Motivated by a problem of filtering near-duplicate Web documents, Broder, C
harikar, Frieze and Mitzenmacher defined the following notion of epsilon-ap
proximate min-wise independent permutation families. A multiset F of permut
ations of {0, 1,..., n - 1} is such a family if for all K subset of or equa
l to {0, 1,..., n - 1} and any x is an element of K, a permutation n chosen
uniformly at random from F satisfies
[GRAPHICS]
We show connections of such families with low discrepancy sets for geometri
c rectangles, and give explicit constructions of such families F of size n(
O(root log n)) for epsilon = 1/n(Theta(1)), improving upon the previously b
est-known bound of Indyk. We also present polynomial-size constructions whe
n the min-wise condition is required only for \K\ less than or equal to 2(O
(log2/3n)), with epsilon greater than or equal to 2(-O(log2/3 n)). (C) 2000
Published by Elsevier Science B.V. All rights reserved.