An efficient search space representation for large vocabulary continuous speech recognition

Citation
K. Demuynck et al., An efficient search space representation for large vocabulary continuous speech recognition, SPEECH COMM, 30(1), 2000, pp. 37-53
Citations number
24
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
30
Issue
1
Year of publication
2000
Pages
37 - 53
Database
ISI
SICI code
0167-6393(200001)30:1<37:AESSRF>2.0.ZU;2-Z
Abstract
In pursuance of better performance, current speech recognition systems tend to use more and more complicated models for both the acoustic and the lang uage component. Cross-word context dependent (CD) phone models and long-spa n statistical language models (LMs) are now widely used. In this paper, we present a memory-efficient search topology that enables the use of such det ailed acoustic and language models in a one pass time-synchronous recogniti on system. Characteristic of our approach is (1) the decoupling of the two basic knowledge sources, namely pronunciation information and LM informatio n, and (2) the representation of pronunciation information - the lexicon in terms of CD units - by means of a compact static network. The LM informati on is incorporated into the search at run-time by means of a slightly modif ied token-passing algorithm. The decoupling of the LM and lexicon allows gr eat flexibility in the choice of LMs, while the static lexicon representati on avoids the cost of dynamic tree expansion and facilitates the integratio n of additional pronunciation information such as assimilation rules, Moreo ver, the network representation results in a compact structure when words h ave various pronunciations, and due to its construction, it offers partial LM forwarding at no extra cost. (C) 2000 Elsevier Science B.V. All rights r eserved.