SQLEM: Fast clustering in SQL using the EM algorithm

Citation
C. Ordonez et P. Cereghini, SQLEM: Fast clustering in SQL using the EM algorithm, SIG RECORD, 29(2), 2000, pp. 559-570
Citations number
17
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
29
Issue
2
Year of publication
2000
Pages
559 - 570
Database
ISI
SICI code
0163-5808(200006)29:2<559:SFCISU>2.0.ZU;2-Q
Abstract
Clustering is one of the most important tasks performed in Data Mining appl ications. This paper presents an efficient SQL implementation of the EM alg orithm to perform clustering in very large databases. Our version can effec tively handle high dimensional data, a high number of clusters and more imp ortantly, a very large number of data records. We present three strategies to implement EM in SQL: horizontal, vertical and a hybrid one. We expect th is work to be useful for data mining programmers and users who want to clus ter large data sets inside a relational DBMS.