An approximate determinization algorithm for weighted finite-state automata

Citation
Al. Buchsbaum et al., An approximate determinization algorithm for weighted finite-state automata, ALGORITHMIC, 30(4), 2001, pp. 503-526
Citations number
23
Categorie Soggetti
Engineering Mathematics
Journal title
ALGORITHMICA
ISSN journal
01784617 → ACNP
Volume
30
Issue
4
Year of publication
2001
Pages
503 - 526
Database
ISI
SICI code
0178-4617(200108)30:4<503:AADAFW>2.0.ZU;2-I
Abstract
Nondeterministic weighted finite-stare automata are a key abstraction in au tomatic speech recognition systems. The efficiency of automatic speech reco gnition depends directly on the sizes of these automata and the degree of n ondeterminism present, so recent research has studied ways to determinize a nd minimize them, using analogues of classical automata determinization and minimization. Although, as we describe here, determinization can in the wo rst case cause poly-exponential blowup in the number of stares of a weighte d finite-state automaton, in practice it is remarkably successful. In exten sive experiments in automatic speech recognition systems, deterministic wei ghted finite-state automata tend to be smaller than the corresponding nonde terministic inputs. Our observations show that these size reductions depend critically on the interplay between weights and topology in nondeterminist ic weighted finite-state automata. We exploit these observations to design a new approximate determinization algorithm, which produces a deterministic weighted finite-state automaton that preserves the strings of a weighted l anguage but not necessarily their weights. We apply our algorithm to two di fferent types of weighted finite-state automata that occur in automatic spe ech recognition systems and in each case provide extensive experimental res ults showing that, compared with current techniques, we achieve significant size reductions without affecting performance. In particular, for a standa rd test bed, we can reduce automatic speech recognition memory requirements by 25-35% with negligible effects on recognition time and accuracy.