We present a method to reduce the degradation in recognition accuracy intro
duced by full-rate GSM RPE-LTP coding by combining sets of acoustic models
trained under different distortion conditions. During recognition, the a po
steriori probabilities of an utterance are calculated as a weighted sum of
the posteriors corresponding to the individual models. The phonemes used by
the system's word pronunciations are grouped into classes according to amo
unt of distortion they undergo in coding. The acoustic model used in the de
coding process is a weighted combination of models derived from clean speec
h and models derived from speech that had been degraded by GSM coding (the
source models), with the relative combination of the two sources depending
on the extent to which each class of phonemes is degraded by the coding pro
cess. To determine the distortion class membership, and hence the weights,
we measure the spectral distortion introduced to the quantized long-term re
sidual by the RPE-LTP codec. We discuss how this distortion varies accordin
g to phonetic class. The method described reduces the degradation in recogn
ition accuracy introduced by GSM coding of sentences in the TIMIT database
by more than 70% relative to the baseline accuracy obtained in matched trai
ning and testing conditions with respect to a system using the source acous
tic models, and up to 60% relative to the best baseline systems regardless
of the number of Gaussians. (C) 2001 Elsevier Science B.V. All rights reser
ved.