ITA
ENG

Finding generalized projected clusters in high dimensional spaces

Authors

Aggarwal, CC Yu, PS

Citation

Cc. Aggarwal et Ps. Yu, Finding generalized projected clusters in high dimensional spaces, SIG RECORD, 29(2), 2000, pp. 70-81

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIGMOD RECORD

ISSN journal

01635808 → ACNP

Volume

Issue

Year of publication

2000

Pages

70 - 81

Database

ISI

SICI code

0163-5808(200006)29:2<70:FGPCIH>2.0.ZU;2-N

Abstract

High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results in dicate that in high dimensional data, even the concept of proximity or clus tering may not be meaningful. We discuss very general techniques for projec ted clustering which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the cluste rs themselves. This definition is substantially more general and realistic than currently available techniques which limit the method to only projecti ons from the original set of attributes. The generalized projected clusteri ng technique may also be viewed as a way of trying to redefine clustering f or high dimensional applications by searching for hidden subspaces with clu sters which are created by inter-attribute correlations. We provide a new c oncept of using extended cluster feature vectors in order to make the algor ithm scalable for very large databases. The running time and space requirem ents of the algorithm are adjustable, and are likely to tradeoff with bette r accuracy.