We consider the problem of determining the structure of clustered data, wit
hout prior knowledge of the number of clusters or any other information abo
ut their composition. Data are represented by a mixture model in which each
component corresponds to a different cluster. Models with varying geometri
c properties are obtained through Gaussian components with different parame
trizations and cross-cluster constraints, Noise and outliers can be modelle
d by adding a Poisson process component. Partitions are determined by the e
xpectation-maximization (EM) algorithm for maximum likelihood, with initial
values from agglomerative hierarchical clustering Models are compared usin
g an approximation to the Bayes factor based on the Bayesian information cr
iterion (BIC); unlike significance tests, this allows comparison of more th
an two models at the same time, and removes the restriction that the models
compared be nested, The problems of determining the number of clusters and
the clustering method are solved simultaneously by choosing the best model
. Moreover, the EM result provides a measure of uncertainty about the assoc
iated classification of each data point. Examples are given, showing that t
his approach can give performance that is much better than standard procedu
res, which often fail to identify groups that are either overlapping or of
varying sizes and shapes.