ITA
ENG

SQLEM: Fast clustering in SQL using the EM algorithm

Authors

Ordonez, C Cereghini, P

Citation

C. Ordonez et P. Cereghini, SQLEM: Fast clustering in SQL using the EM algorithm, SIG RECORD, 29(2), 2000, pp. 559-570

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIGMOD RECORD

ISSN journal

01635808 → ACNP

Volume

Issue

Year of publication

2000

Pages

559 - 570

Database

ISI

SICI code

0163-5808(200006)29:2<559:SFCISU>2.0.ZU;2-Q

Abstract

Clustering is one of the most important tasks performed in Data Mining appl ications. This paper presents an efficient SQL implementation of the EM alg orithm to perform clustering in very large databases. Our version can effec tively handle high dimensional data, a high number of clusters and more imp ortantly, a very large number of data records. We present three strategies to implement EM in SQL: horizontal, vertical and a hybrid one. We expect th is work to be useful for data mining programmers and users who want to clus ter large data sets inside a relational DBMS.