High dimensional data has always been a challenge for clustering algorithms
because of the inherent sparsity of the points. Recent research results in
dicate that in high dimensional data, even the concept of proximity or clus
tering may not be meaningful. We discuss very general techniques for projec
ted clustering which are able to construct clusters in arbitrarily aligned
subspaces of lower dimensionality. The subspaces are specific to the cluste
rs themselves. This definition is substantially more general and realistic
than currently available techniques which limit the method to only projecti
ons from the original set of attributes. The generalized projected clusteri
ng technique may also be viewed as a way of trying to redefine clustering f
or high dimensional applications by searching for hidden subspaces with clu
sters which are created by inter-attribute correlations. We provide a new c
oncept of using extended cluster feature vectors in order to make the algor
ithm scalable for very large databases. The running time and space requirem
ents of the algorithm are adjustable, and are likely to tradeoff with bette
r accuracy.