ITA
ENG

Communication-efficient distributed mining of association rules

Authors

Schuster, A Wolff, R

Citation

A. Schuster et R. Wolff, Communication-efficient distributed mining of association rules, SIG RECORD, 30(2), 2001, pp. 473-484

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIGMOD RECORD

ISSN journal

01635808 → ACNP

Volume

Issue

Year of publication

2001

Pages

473 - 484

Database

ISI

SICI code

0163-5808(200106)30:2<473:CDMOAR>2.0.ZU;2-E

Abstract

Mining for associations between items in large transactional databases is a central problem in the field of knowledge discovery. When the database is partitioned among several share-nothing machines, the problem can be addres sed using distributed data mining algorithms. One such algorithm, called CD , was proposed by Agrawal and Shafer in [1] and was later enhanced by the F DM algorithm of Cheung, Han et al. [5]. The main problem with these algorithms is that they do not scale well with the number of partitions. They are thus impractical for use in modern distr ibuted environments such as peer-to-peer systems, in which hundreds or thou sands of computers may interact. In this paper we present a set of new algo rithms that solve the Distributed Association Rule Mining problem using far less communication. In addition to being very efficient, the new algorithm s are also extremely robust. Unlike existing algorithms, they continue to b e efficient even when the data is skewed or the partition sizes are imbalan ced. We present both experimental and theoretical results concerning the be havior of these algorithms and explain how they can be implemented in diffe rent settings.