We present a novel approach to finding the number of clusters in data based
on the minimization of a regularized cost function. Minimization of the pr
oposed cost function results in the minimization of the sum-of-squared dist
ances of the data points from the respective nearest cluster center as well
as the sum-of-squared distances of the individual cluster centers from nei
ghborhood cluster centers. Smaller values of the neighborhood encourage the
formation of more distinct cluster centers, while larger values of the nei
ghborhood encourage the formation of fewer distinct cluster centers. We ide
ntify the neighborhood as a scale parameter and obtain the number of cluste
r centers at varying values of the scale parameter. The number of cluster c
enters in the data is then obtained based on persistence over the largest r
ange of the scale parameter. Four simulations are presented to illustrate t
he efficacy of the proposed algorithm. (C) 1999 Elsevier Science B.V. All r
ights reserved.