ITA
ENG

SET-ORIENTED DATA MINING IN RELATIONAL DATABASES

Authors

HOUTSMA M SWAMI A

Citation

M. Houtsma et A. Swami, SET-ORIENTED DATA MINING IN RELATIONAL DATABASES, Data & knowledge engineering, 17(3), 1995, pp. 245-262

Citations number

Categorie Soggetti

Computer Science Artificial Intelligence","Computer Science Information Systems

Journal title

Data & knowledge engineering → ACNP

ISSN journal

0169023X

Volume

Issue

Year of publication

1995

Pages

245 - 262

Database

ISI

SICI code

0169-023X(1995)17:3<245:SDMIRD>2.0.ZU;2-P

Abstract

Data mining is an important real-life application for businesses. It i s critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-orient ed approach to mining data is needed. In such an approach, the data mi ning operations are expressed in terms of relational or set-oriented o perations. Query optimization technology can then be used for efficien t processing. In this paper, we describe set-oriented algorithms for m ining association rules. Such algorithms imply performing multiple joi ns and thus may appear to be inherently less efficient than special-pu rpose algorithms. We develop new algorithms that can be expressed as S QL queries, and discuss optimization of these algorithms. After analyt ical evaluation, an algorithm named SETM emerges as the algorithm of c hoice. Algorithm SETM uses only simple database primitives, viz., sort ing and merge-scan join. Algorithm SETM is simple, fast, and stable ov er the range of parameter values. It is easily parallelized and we sug gest several additional optimizations. The set-oriented nature of Algo rithm SETM makes it possible to develop extensions easily and its perf ormance makes it feasible to build interactive data mining tools for l arge databases.