ITA
ENG

Spoken document representations for probabilistic retrieval

Authors

Jourlin, P Johnson, SE Jones, KS Woodland, PC

Citation

P. Jourlin et al., Spoken document representations for probabilistic retrieval, SPEECH COMM, 32(1-2), 2000, pp. 21-36

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SPEECH COMMUNICATION

ISSN journal

01676393 → ACNP

Volume

Issue

1-2

Year of publication

2000

Pages

21 - 36

Database

ISI

SICI code

0167-6393(200009)32:1-2<21:SDRFPR>2.0.ZU;2-G

Abstract

This paper presents some developments in query expansion and document repre sentation of our spoken document retrieval system and shows how various ret rieval techniques affect performance for different sets of transcriptions d erived from a common speech source. Modifications of the document represent ation are used, which combine several techniques for query expansion, knowl edge-based on one hand and statistics-based on the other. Taken together, t hese techniques can improve Average Precision by over 19% relative to a sys tem similar to that which we presented at TREC-7. These new experiments hav e also confirmed that the degradation of Average Precision due to a word er ror rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval syste m can also be observed for seven different sets of transcriptions from diff erent recognition engines with a WER ranging from 24.8% to 61.5%. We hope t o repeat these experiments when larger document collections become availabl e, in order to evaluate the scalability of these techniques. (C) 2000 Elsev ier Science B.V. All rights reserved.