K. Roeder, A GRAPHICAL TECHNIQUE FOR DETERMINING THE NUMBER OF COMPONENTS IN A MIXTURE OF NORMALS, Journal of the American Statistical Association, 89(426), 1994, pp. 487-495
When a population is assumed to be composed of a finite number of subp
opulations, a natural model to choose is the finite mixture model. It
will often be the case, however, that the number of component distribu
tions is unknown and must be estimated. This problem can be difficult;
for instance, the density of two mixed normals is not bimodal unless
the means are separated by at least 2 standard deviations. Hence modal
ity of the data per se can be an insensitive approach to component est
imation. We demonstrate that a mixture of two normals divided by a nor
mal density having the same mean and variance as the mixed density is
always bimodal. This analytic result and other related results form th
e basis for a diagnostic and a test for the number of components in a
mixture of normals. The density is estimated using a kernel density es
timator. Under the null hypothesis, the proposed diagnostic can be app
roximated by a stationary Gaussian process. Under the alternative hypo
thesis, components in the mixture will express themselves as major mod
es in the diagnostic plot. A test for mixing is based on the amount of
smoothing necessary to suppress these large deviations from a Gaussia
n process.