K. Aikawa et al., CEPSTRAL REPRESENTATION OF SPEECH MOTIVATED BY TIME-FREQUENCY MASKING- AN APPLICATION TO SPEECH RECOGNITION, The Journal of the Acoustical Society of America, 100(1), 1996, pp. 603-614
A new spectral representation incorporating time-frequency forward mas
king is proposed. This masked spectral representation is efficiently r
epresented by a quefrency domain parameter called dynamic-cepstrum (Dy
C). Automatic speech recognition experiments have demonstrated that Dy
C powerfully improves performance in phoneme classification and phrase
recognition. This new spectral representation simulates a perceived s
pectrum. It enhances formant transition, which provides relevant cues
for phoneme perception, while suppressing temporally stationary spectr
al properties, such as the effect of microphone frequency characterist
ics or the speaker-dependent time-invariant spectral feature. These fe
atures are advantageous for speaker-independent speech recognition. Dy
C can efficiently represent both the instantaneous and transitional as
pects of a running spectrum with a vector of the same size as a conven
tional cepstrum. DyC is calculated from a cepstrum time sequence using
a matrix Lifter. Each column vector of the matrix lifter performs spe
ctral smoothing. Smoothing characteristics are a function of the time
interval between a masker and a signal. DyC outperformed a conventiona
l cepstrum parameter obtained through linear predictive coding (LPC) a
nalysis for both phoneme classification and phrase recognition by usin
g hidden Markov models (HMMs). Compared with speaker-dependent recogni
tion, an even greater improvement over the cepstrum parameter was foun
d in speaker-independent speech recognition. Furthermore, DyC with onl
y 16 coefficients exhibited higher speech recognition performance than
a combination of the cepstrum and a delta-cepstrum with 32 coefficien
ts for the classification experiment of phonemes contaminated by noise
s. (C) 1996 Acoustical Society of America.