ITA
ENG

Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations

Authors

Ferreiros, J Pardo, JM

Citation

J. Ferreiros et Jm. Pardo, Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations, SPEECH COMM, 29(1), 1999, pp. 65-76

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SPEECH COMMUNICATION

ISSN journal

01676393 → ACNP

Volume

Issue

Year of publication

1999

Pages

65 - 76

Database

ISI

SICI code

0167-6393(199909)29:1<65:ICSRIS>2.0.ZU;2-Y

Abstract

This paper presents a comprehensive study of continuous speech recognition in Spanish. It shows the use and optimisation of several well-known techniq ues together with the application for the first time to Spanish of language specific knowledge to these systems, i.e. the careful selection of the pho ne inventory, the phone-classes used, and the selection of alternative pron unciation rules. We have developed a semicontinuous phone-class dependent c ontextual modelling. Using four phone-classes, we have obtained recognition error rate reductions roughly equivalent to the percentage increase of the number of parameters, compared to baseline semicontinuous contextual model ling. We also show that the use of pausing in the training system and multi ple pronunciations in the vocabulary help to improve recognition rates sign ificantly. The actual pausing of the training sentences and the application of assimilation effects improve the transcription into context-dependent u nits. Multiple pronunciation possibilities are generated using general rule s that are easily applied to any Spanish vocabulary. With all these ideas w e have reduced the recognition errors of the baseline system by more than 3 0% in a task parallel to DARPA-RM translated into Spanish with a vocabulary of 979 words. Our database contains four speakers with 600 training senten ces and 100 testing sentences each. All experiments have been carried out w ith a perplexity of 979, and even slightly higher in the case of multiple p ronunciations, to be able to study the acoustic modelling power of the syst ems with no grammar constraints. (C) 1999 Elsevier Science B.V. All rights reserved.