Spoken content representation in MPEG-7

Citation
Jpa. Charlesworth et Pn. Garner, Spoken content representation in MPEG-7, IEEE CIR SV, 11(6), 2001, pp. 730-736
Citations number
14
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
ISSN journal
10518215 → ACNP
Volume
11
Issue
6
Year of publication
2001
Pages
730 - 736
Database
ISI
SICI code
1051-8215(200106)11:6<730:SCRIM>2.0.ZU;2-P
Abstract
The words spoken in an audio-visual. document form an obvious and intuitive metadata component. This component is essential to ensure comprehensive co verage of audio-visual content by the MPEG-7 standard. With manual transcri ption prohibitively costly, such metadata will typically be derived from au tomatic speech recognition systems. The errors inherent in the output of su ch extraction tools cause particular difficulties for robust retrieval, as well as for interoperability in heterogeneous databases. We describe a stru cture comprising a probabilistic combined word and phone lattice along with an explanatory metadata header and detail how this structure avoids of ame liorates these problems.