There is normally a simple choice made in the form of the covariance matrix
to be used with continuous density HMM's, Either a diagonal covariance mat
rix is used, with the underlying assumption that elements of the feature ve
ctor are independent, or a full or block-diagonal matrix is used, where all
or some of the correlations are explicitly modeled, Unfortunately when usi
ng full or block-diagonal covariance matrices there tends to be a dramatic
increase in the number of parameters per Gaussian component, limiting the n
umber of components which may be robustly estimated. This paper introduces
a new form of covariance matrix which allows a few "full" covariance matric
es to be shared over many distributions, whilst each distribution maintains
its own "diagonal" covariance matrix, In contrast to other schemes which h
ave hypothesized a similar form, this technique fits within the standard ma
ximum-likelihood criterion used for training HMM's. The ne iv form of covar
iance matrix is evaluated on a large-vocabulary speech-recognition task, In
initial experiments the performance of the standard system was achieved us
ing approximately half the number of parameters. Moreover, a 10% reduction
in word error rate compared to a standard system can be achieved with less
than a 1% increase in the number of parameters and little increase in recog
nition time.