J. Ferreiros et Jm. Pardo, Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations, SPEECH COMM, 29(1), 1999, pp. 65-76
This paper presents a comprehensive study of continuous speech recognition
in Spanish. It shows the use and optimisation of several well-known techniq
ues together with the application for the first time to Spanish of language
specific knowledge to these systems, i.e. the careful selection of the pho
ne inventory, the phone-classes used, and the selection of alternative pron
unciation rules. We have developed a semicontinuous phone-class dependent c
ontextual modelling. Using four phone-classes, we have obtained recognition
error rate reductions roughly equivalent to the percentage increase of the
number of parameters, compared to baseline semicontinuous contextual model
ling. We also show that the use of pausing in the training system and multi
ple pronunciations in the vocabulary help to improve recognition rates sign
ificantly. The actual pausing of the training sentences and the application
of assimilation effects improve the transcription into context-dependent u
nits. Multiple pronunciation possibilities are generated using general rule
s that are easily applied to any Spanish vocabulary. With all these ideas w
e have reduced the recognition errors of the baseline system by more than 3
0% in a task parallel to DARPA-RM translated into Spanish with a vocabulary
of 979 words. Our database contains four speakers with 600 training senten
ces and 100 testing sentences each. All experiments have been carried out w
ith a perplexity of 979, and even slightly higher in the case of multiple p
ronunciations, to be able to study the acoustic modelling power of the syst
ems with no grammar constraints. (C) 1999 Elsevier Science B.V. All rights
reserved.