SET-ORIENTED DATA MINING IN RELATIONAL DATABASES

Authors
Citation
M. Houtsma et A. Swami, SET-ORIENTED DATA MINING IN RELATIONAL DATABASES, Data & knowledge engineering, 17(3), 1995, pp. 245-262
Citations number
24
Categorie Soggetti
Computer Science Artificial Intelligence","Computer Science Information Systems
ISSN journal
0169023X
Volume
17
Issue
3
Year of publication
1995
Pages
245 - 262
Database
ISI
SICI code
0169-023X(1995)17:3<245:SDMIRD>2.0.ZU;2-P
Abstract
Data mining is an important real-life application for businesses. It i s critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-orient ed approach to mining data is needed. In such an approach, the data mi ning operations are expressed in terms of relational or set-oriented o perations. Query optimization technology can then be used for efficien t processing. In this paper, we describe set-oriented algorithms for m ining association rules. Such algorithms imply performing multiple joi ns and thus may appear to be inherently less efficient than special-pu rpose algorithms. We develop new algorithms that can be expressed as S QL queries, and discuss optimization of these algorithms. After analyt ical evaluation, an algorithm named SETM emerges as the algorithm of c hoice. Algorithm SETM uses only simple database primitives, viz., sort ing and merge-scan join. Algorithm SETM is simple, fast, and stable ov er the range of parameter values. It is easily parallelized and we sug gest several additional optimizations. The set-oriented nature of Algo rithm SETM makes it possible to develop extensions easily and its perf ormance makes it feasible to build interactive data mining tools for l arge databases.