Clustering is one of the most important tasks performed in Data Mining appl
ications. This paper presents an efficient SQL implementation of the EM alg
orithm to perform clustering in very large databases. Our version can effec
tively handle high dimensional data, a high number of clusters and more imp
ortantly, a very large number of data records. We present three strategies
to implement EM in SQL: horizontal, vertical and a hybrid one. We expect th
is work to be useful for data mining programmers and users who want to clus
ter large data sets inside a relational DBMS.