ITA
ENG

Is sampling useful in data mining? A case in the maintenance of discoveredassociation rules

Authors

Lee, SD Cheung, DW Kao, B

Citation

Sd. Lee et al., Is sampling useful in data mining? A case in the maintenance of discoveredassociation rules, DATA M K D, 2(3), 1998, pp. 233-262

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

DATA MINING AND KNOWLEDGE DISCOVERY

ISSN journal

13845810 → ACNP

Volume

Issue

Year of publication

1998

Pages

233 - 262

Database

ISI

SICI code

1384-5810(199809)2:3<233:ISUIDM>2.0.ZU;2-X

Abstract

By nature, sampling is an appealing technique for data mining, because appr oximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the pr oblem of maintaining discovered association rules. Some studies have been d one on the problem of maintaining the discovered association rules when upd ates are made to the database. All proposed methods must examine not only t he changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. in this paper , we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whe ther we should update the mined association rules or not. If the estimated difference is small, then the rules in the original database is still a goo d approximation to those in the updated database. Hence, we do not have to spend the resources to update the rules. We can accumulate more updates bef ore actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. Experimental results show that our algorithm is very efficient and highly accurate.