Experiments in spoken document retrieval using phoneme n-grams

Citation
C. Ng et al., Experiments in spoken document retrieval using phoneme n-grams, SPEECH COMM, 32(1-2), 2000, pp. 61-77
Citations number
28
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
32
Issue
1-2
Year of publication
2000
Pages
61 - 77
Database
ISI
SICI code
0167-6393(200009)32:1-2<61:EISDRU>2.0.ZU;2-Y
Abstract
In spoken document retrieval (SDR), speech recognition is applied to a coll ection to obtain either words or subword units, such as phonemes, that can be matched against queries. We have explored retrieval based on phoneme n-g rams. The use of phonemes addresses the out-of-vocabulary (OOV) problem, wh ile use of n-grams allows approximate matching on inaccurate phoneme transc riptions. Our experiments explored the utility of word boundary information , stopword elimination, query expansion, varying the length of phoneme sequ ences to be matched and various combinations of n-grams of different length s. Given word-based recognition (WBR), we can match queries to speech using a phoneme representation of the words, permitting us to test whether it wa s the recognition or the matching process that was most crucial to retrieva l performance. Our experiments show that there is some deterioration in eff ectiveness, but the particular form of matching is less vital if the sequen ce of phonemes was correct. When phone sequences are recognised directly, w ith higher error rates than for words, it was more important to select a go od matching approach. Varying gram length trades precision against recall; combination of n-grams of different lengths, in particular 3-grams and 4-gr ams, can improve retrieval. Overall, phoneme-based retrieval is not as effe ctive as word-based retrieval, but is sufficient for situations in which wo rd-based retrieval is either impractical or undesirable. (C) 2000 Elsevier Science B.V. All rights reserved.