Stochastic complexity is treated as a tool of classification, i.e., of
inferring the number of classes, the class descriptions, and the clas
s memberships for a given data set of binary vectors. The stochastic c
omplexity is evaluated with respect to the family of statistical model
s defined by finite mixtures of multivariate Bernoulli distributions o
btained by the principle of maximum entropy. It is shown that stochast
ic complexity is asymptotically related to the classification maximum
likelihood estimate. The Formulae for stochastic complexity have an in
terpretation as minimum code lengths for certain universal source code
s for storing the binary data Vectors and their assignments into the c
lasses in a classification. There is also a decomposition of the class
ification uncertainty in a sum of an intraclass uncertainty, an interc
lass uncertainty, and a special parsimony term. It is shown that minim
izing the stochastic complexity amounts to maximizing the information
content of the classification. An algorithm of alternating minimizatio
n of stochastic complexity is given. We discuss the relation of the me
thod to the AUTOCLASS system of Bayesian classification. The applicati
on of classification by stochastic complexity to an extensive data bas
e of strains of Enterobacteriaceae is described. (C) 1997 Academic Pre
ss.