Robust information extraction from automatically generated speech transcriptions

Citation
Dd. Palmer et al., Robust information extraction from automatically generated speech transcriptions, SPEECH COMM, 32(1-2), 2000, pp. 95-109
Citations number
32
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
32
Issue
1-2
Year of publication
2000
Pages
95 - 109
Database
ISI
SICI code
0167-6393(200009)32:1-2<95:RIEFAG>2.0.ZU;2-1
Abstract
This paper describes a robust system for information extraction (IE) from s poken language data. The system extends previous hidden Markov model (HMM) work in IE, using a state topology designed for explicit modeling of variab le-length phrases and class-based statistical language model smoothing to p roduce state-of-the-art performance for a wide range of speech error rates. Experiments on broadcast news data show that the system performs well with temporal and source differences in the data. In addition, strategies for i ntegrating word-level confidence estimates into the model are introduced, s howing improved performance by using a generic error token for incorrectly recognized words in the training data and low confidence words in the test data. (C) 2000 Elsevier Science B.V. All rights reserved.