CEPSTRAL REPRESENTATION OF SPEECH MOTIVATED BY TIME-FREQUENCY MASKING- AN APPLICATION TO SPEECH RECOGNITION

Citation
K. Aikawa et al., CEPSTRAL REPRESENTATION OF SPEECH MOTIVATED BY TIME-FREQUENCY MASKING- AN APPLICATION TO SPEECH RECOGNITION, The Journal of the Acoustical Society of America, 100(1), 1996, pp. 603-614
Citations number
12
Categorie Soggetti
Acoustics
ISSN journal
00014966
Volume
100
Issue
1
Year of publication
1996
Pages
603 - 614
Database
ISI
SICI code
0001-4966(1996)100:1<603:CROSMB>2.0.ZU;2-W
Abstract
A new spectral representation incorporating time-frequency forward mas king is proposed. This masked spectral representation is efficiently r epresented by a quefrency domain parameter called dynamic-cepstrum (Dy C). Automatic speech recognition experiments have demonstrated that Dy C powerfully improves performance in phoneme classification and phrase recognition. This new spectral representation simulates a perceived s pectrum. It enhances formant transition, which provides relevant cues for phoneme perception, while suppressing temporally stationary spectr al properties, such as the effect of microphone frequency characterist ics or the speaker-dependent time-invariant spectral feature. These fe atures are advantageous for speaker-independent speech recognition. Dy C can efficiently represent both the instantaneous and transitional as pects of a running spectrum with a vector of the same size as a conven tional cepstrum. DyC is calculated from a cepstrum time sequence using a matrix Lifter. Each column vector of the matrix lifter performs spe ctral smoothing. Smoothing characteristics are a function of the time interval between a masker and a signal. DyC outperformed a conventiona l cepstrum parameter obtained through linear predictive coding (LPC) a nalysis for both phoneme classification and phrase recognition by usin g hidden Markov models (HMMs). Compared with speaker-dependent recogni tion, an even greater improvement over the cepstrum parameter was foun d in speaker-independent speech recognition. Furthermore, DyC with onl y 16 coefficients exhibited higher speech recognition performance than a combination of the cepstrum and a delta-cepstrum with 32 coefficien ts for the classification experiment of phonemes contaminated by noise s. (C) 1996 Acoustical Society of America.