Distortion-class modeling for robust speech recognition under GSM RPE-LTP coding

Citation
Jm. Huerta et Rm. Stern, Distortion-class modeling for robust speech recognition under GSM RPE-LTP coding, SPEECH COMM, 34(1-2), 2001, pp. 213-225
Citations number
32
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
34
Issue
1-2
Year of publication
2001
Pages
213 - 225
Database
ISI
SICI code
0167-6393(200104)34:1-2<213:DMFRSR>2.0.ZU;2-W
Abstract
We present a method to reduce the degradation in recognition accuracy intro duced by full-rate GSM RPE-LTP coding by combining sets of acoustic models trained under different distortion conditions. During recognition, the a po steriori probabilities of an utterance are calculated as a weighted sum of the posteriors corresponding to the individual models. The phonemes used by the system's word pronunciations are grouped into classes according to amo unt of distortion they undergo in coding. The acoustic model used in the de coding process is a weighted combination of models derived from clean speec h and models derived from speech that had been degraded by GSM coding (the source models), with the relative combination of the two sources depending on the extent to which each class of phonemes is degraded by the coding pro cess. To determine the distortion class membership, and hence the weights, we measure the spectral distortion introduced to the quantized long-term re sidual by the RPE-LTP codec. We discuss how this distortion varies accordin g to phonetic class. The method described reduces the degradation in recogn ition accuracy introduced by GSM coding of sentences in the TIMIT database by more than 70% relative to the baseline accuracy obtained in matched trai ning and testing conditions with respect to a system using the source acous tic models, and up to 60% relative to the best baseline systems regardless of the number of Gaussians. (C) 2001 Elsevier Science B.V. All rights reser ved.