Mj. Flaherty et Db. Roe, ORTHOGONAL TRANSFORMATIONS OF STACKED FEATURE VECTORS APPLIED TO HMM SPEECH RECOGNITION, IEE proceedings. Part I. Communications, speech and vision, 140(2), 1993, pp. 121-126
The paper reports improvements in speech recognition accuracy by using
more sophisticated time analysis as part of the feature selection pro
cess. The recognition methodology utilises hidden Markov modelling wit
h continuous density functions. The authors propose using, as speech f
eatures, linear transformations of the vector consisting of successive
time samples of the cepstrum. Taylor series, the Legendre polynomial
transform and the discrete cosine transform share several properties w
ith principal components analysis. These transforms are expected to im
prove speech recognition accuracy by incorporating higher-order time d
erivatives (such as the second time derivative) of spectral informatio
n while at the same time producing an essentially diagonal covariance.
In an experimental evaluation of these ideas, accuracy in speaker-ind
ependent recognition of the 'E'-set of the alphabet improved from 55%,
with no time varying information, to 68%, with first-order time varyi
ng information, and 74%, by including second-order time varying inform
ation.