A. Corazza et al., OPTIMAL PROBABILISTIC EVALUATION FUNCTIONS FOR SEARCH CONTROLLED BY STOCHASTIC CONTEXT-FREE GRAMMARS, IEEE transactions on pattern analysis and machine intelligence, 16(10), 1994, pp. 1018-1027
Recently, the possibility of using Stochastic Context-Free Grammars (S
CFG's) in Language Modeling (LM) has been considered. When these gramm
ars are used, search can be directed by evaluation functions based on
the probabilities that a SCFG generates a sentence, given only some wo
rds in it. Expressions for computing the evaluation function have been
proposed by Jelinek and Lafferty for the recognition of word sequence
s in the case in which only the prefix of a sequence is known. Corazza
et al. have proposed methods for probability computation in the more
general case in which partial word sequences interleaved by gaps are k
nown. This computation is too complex in practice unless the lengths o
f the gaps are known. This paper proposes a method for computing the p
robability of the best parse tree that can generate a sentence only pa
rt of which (consisting of islands and gaps) is known. This probabilit
y is the minimum possible, and thus the most informative, upper-bound
that can be used in the evaluation function. The computation of the pr
oposed upper-bound has cubic time complexity even if the lengths of th
e gaps are unknown. This makes possible the practical use of SCFG for
driving interpretations of sentences in natural language processing.