Spoken document representations for probabilistic retrieval

Citation
P. Jourlin et al., Spoken document representations for probabilistic retrieval, SPEECH COMM, 32(1-2), 2000, pp. 21-36
Citations number
27
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
32
Issue
1-2
Year of publication
2000
Pages
21 - 36
Database
ISI
SICI code
0167-6393(200009)32:1-2<21:SDRFPR>2.0.ZU;2-G
Abstract
This paper presents some developments in query expansion and document repre sentation of our spoken document retrieval system and shows how various ret rieval techniques affect performance for different sets of transcriptions d erived from a common speech source. Modifications of the document represent ation are used, which combine several techniques for query expansion, knowl edge-based on one hand and statistics-based on the other. Taken together, t hese techniques can improve Average Precision by over 19% relative to a sys tem similar to that which we presented at TREC-7. These new experiments hav e also confirmed that the degradation of Average Precision due to a word er ror rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval syste m can also be observed for seven different sets of transcriptions from diff erent recognition engines with a WER ranging from 24.8% to 61.5%. We hope t o repeat these experiments when larger document collections become availabl e, in order to evaluate the scalability of these techniques. (C) 2000 Elsev ier Science B.V. All rights reserved.