ITA
ENG

CEPSTRAL REPRESENTATION OF SPEECH MOTIVATED BY TIME-FREQUENCY MASKING- AN APPLICATION TO SPEECH RECOGNITION

Authors

AIKAWA K SINGER H KAWAHARA H TOHKURA Y

Citation

K. Aikawa et al., CEPSTRAL REPRESENTATION OF SPEECH MOTIVATED BY TIME-FREQUENCY MASKING- AN APPLICATION TO SPEECH RECOGNITION, The Journal of the Acoustical Society of America, 100(1), 1996, pp. 603-614

Citations number

Categorie Soggetti

Acoustics

Journal title

The Journal of the Acoustical Society of America → ACNP

ISSN journal

00014966

Volume

100

Issue

Year of publication

1996

Pages

603 - 614

Database

ISI

SICI code

0001-4966(1996)100:1<603:CROSMB>2.0.ZU;2-W

Abstract

A new spectral representation incorporating time-frequency forward mas king is proposed. This masked spectral representation is efficiently r epresented by a quefrency domain parameter called dynamic-cepstrum (Dy C). Automatic speech recognition experiments have demonstrated that Dy C powerfully improves performance in phoneme classification and phrase recognition. This new spectral representation simulates a perceived s pectrum. It enhances formant transition, which provides relevant cues for phoneme perception, while suppressing temporally stationary spectr al properties, such as the effect of microphone frequency characterist ics or the speaker-dependent time-invariant spectral feature. These fe atures are advantageous for speaker-independent speech recognition. Dy C can efficiently represent both the instantaneous and transitional as pects of a running spectrum with a vector of the same size as a conven tional cepstrum. DyC is calculated from a cepstrum time sequence using a matrix Lifter. Each column vector of the matrix lifter performs spe ctral smoothing. Smoothing characteristics are a function of the time interval between a masker and a signal. DyC outperformed a conventiona l cepstrum parameter obtained through linear predictive coding (LPC) a nalysis for both phoneme classification and phrase recognition by usin g hidden Markov models (HMMs). Compared with speaker-dependent recogni tion, an even greater improvement over the cepstrum parameter was foun d in speaker-independent speech recognition. Furthermore, DyC with onl y 16 coefficients exhibited higher speech recognition performance than a combination of the cepstrum and a delta-cepstrum with 32 coefficien ts for the classification experiment of phonemes contaminated by noise s. (C) 1996 Acoustical Society of America.