Finding generalized projected clusters in high dimensional spaces

Citation
Cc. Aggarwal et Ps. Yu, Finding generalized projected clusters in high dimensional spaces, SIG RECORD, 29(2), 2000, pp. 70-81
Citations number
18
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
29
Issue
2
Year of publication
2000
Pages
70 - 81
Database
ISI
SICI code
0163-5808(200006)29:2<70:FGPCIH>2.0.ZU;2-N
Abstract
High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results in dicate that in high dimensional data, even the concept of proximity or clus tering may not be meaningful. We discuss very general techniques for projec ted clustering which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the cluste rs themselves. This definition is substantially more general and realistic than currently available techniques which limit the method to only projecti ons from the original set of attributes. The generalized projected clusteri ng technique may also be viewed as a way of trying to redefine clustering f or high dimensional applications by searching for hidden subspaces with clu sters which are created by inter-attribute correlations. We provide a new c oncept of using extended cluster feature vectors in order to make the algor ithm scalable for very large databases. The running time and space requirem ents of the algorithm are adjustable, and are likely to tradeoff with bette r accuracy.