We study the problem of constructing the deterministic equivalent of a nond
eterministic weighted finite-state automaton (WFA). Determinization of WFAs
has important applications in automatic speech recognition (ASR). We provi
de the first polynomial-time algorithm to test for the twins property, whic
h determines if a WA admits a deterministic equivalent. We also give upper
bounds on the size of the deterministic equivalent; the bound is tight in t
he case of acyclic WFAs. Previously, Mohri presented a superpolynomial-time
algorithm to test for the twins property, and he also gave an algorithm to
determinize WFAs. He showed that the latter runs in time linear in the siz
e of the output when a deterministic equivalent exists; otherwise, it does
not terminate. Our bounds imply an upper bound on the running time of this
algorithm.
Given that WFAs can expand exponentially in size when determinized, we expl
ore why those that occur in ASR tend to shrink when determinized. According
to ASR folklore, this phenomenon is attributable solely to the fact that A
SR WFAs have simple topology, in particular, that they are acyclic and laye
red. We introduce a very simple class of WFAs with this structure, but we s
how that the expansion under determinization depends on the transition weig
hts: some weightings cause them to shrink, while others, including random w
eightings, cause them to expand exponentially. We provide experimental evid
ence that ASR WFAs exhibit this weight dependence. That they shrink when de
terminized, therefore, is a result of favorable weightings in addition to s
pecial topology. These analyses and observations have been used to design a
new, approximate WFA determinization algorithm, reported in a separate pap
er along with experimental results showing that it achieves significant WA
size reduction with negligible impact on ASR performance.