Mining for associations between items in large transactional databases is a
central problem in the field of knowledge discovery. When the database is
partitioned among several share-nothing machines, the problem can be addres
sed using distributed data mining algorithms. One such algorithm, called CD
, was proposed by Agrawal and Shafer in [1] and was later enhanced by the F
DM algorithm of Cheung, Han et al. [5].
The main problem with these algorithms is that they do not scale well with
the number of partitions. They are thus impractical for use in modern distr
ibuted environments such as peer-to-peer systems, in which hundreds or thou
sands of computers may interact. In this paper we present a set of new algo
rithms that solve the Distributed Association Rule Mining problem using far
less communication. In addition to being very efficient, the new algorithm
s are also extremely robust. Unlike existing algorithms, they continue to b
e efficient even when the data is skewed or the partition sizes are imbalan
ced. We present both experimental and theoretical results concerning the be
havior of these algorithms and explain how they can be implemented in diffe
rent settings.