In pursuance of better performance, current speech recognition systems tend
to use more and more complicated models for both the acoustic and the lang
uage component. Cross-word context dependent (CD) phone models and long-spa
n statistical language models (LMs) are now widely used. In this paper, we
present a memory-efficient search topology that enables the use of such det
ailed acoustic and language models in a one pass time-synchronous recogniti
on system. Characteristic of our approach is (1) the decoupling of the two
basic knowledge sources, namely pronunciation information and LM informatio
n, and (2) the representation of pronunciation information - the lexicon in
terms of CD units - by means of a compact static network. The LM informati
on is incorporated into the search at run-time by means of a slightly modif
ied token-passing algorithm. The decoupling of the LM and lexicon allows gr
eat flexibility in the choice of LMs, while the static lexicon representati
on avoids the cost of dynamic tree expansion and facilitates the integratio
n of additional pronunciation information such as assimilation rules, Moreo
ver, the network representation results in a compact structure when words h
ave various pronunciations, and due to its construction, it offers partial
LM forwarding at no extra cost. (C) 2000 Elsevier Science B.V. All rights r
eserved.