Large scale data mining based on data partitioning

Authors
Citation
Sc. Zhang et Xd. Wu, Large scale data mining based on data partitioning, APPL ARTIF, 15(2), 2001, pp. 129-139
Citations number
7
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
APPLIED ARTIFICIAL INTELLIGENCE
ISSN journal
08839514 → ACNP
Volume
15
Issue
2
Year of publication
2001
Pages
129 - 139
Database
ISI
SICI code
0883-9514(200102)15:2<129:LSDMBO>2.0.ZU;2-E
Abstract
Dealing with very large databases is one of the defining challenges in data mining research and development. Some databases are simply too large (e.g. , with terabytes of data) to be processed at one time. For efficiency and s pace reasons, partitioning them into subsets for processing is necessary. H owever, since the number of item sets in each partitioned data subset can b e a combinatorial amount and each of them may be a large item set in the or iginal database, data mining results from these subsets can be very large i n size. Therefore, the key to data partitioning is how to aggregate the res ults from these subsets. It is not realistic to keep all results from each subset, because the rules from one subset need to be verified for usefulnes s in other subsets. This article presents a model of aggregating associatio n rules from different data subsets by weighting. In particular, the aggreg ation efficiency is enhanced by rule selection.