ITA
ENG

Spoken content representation in MPEG-7

Authors

Charlesworth, JPA Garner, PN

Citation

Jpa. Charlesworth et Pn. Garner, Spoken content representation in MPEG-7, IEEE CIR SV, 11(6), 2001, pp. 730-736

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

ISSN journal

10518215 → ACNP

Volume

Issue

Year of publication

2001

Pages

730 - 736

Database

ISI

SICI code

1051-8215(200106)11:6<730:SCRIM>2.0.ZU;2-P

Abstract

The words spoken in an audio-visual. document form an obvious and intuitive metadata component. This component is essential to ensure comprehensive co verage of audio-visual content by the MPEG-7 standard. With manual transcri ption prohibitively costly, such metadata will typically be derived from au tomatic speech recognition systems. The errors inherent in the output of su ch extraction tools cause particular difficulties for robust retrieval, as well as for interoperability in heterogeneous databases. We describe a stru cture comprising a probabilistic combined word and phone lattice along with an explanatory metadata header and detail how this structure avoids of ame liorates these problems.