ITA
ENG

Content-based video parsing and indexing based on audio-visual interaction

Authors

Tsekeridou, S Pitas, I

Citation

S. Tsekeridou et I. Pitas, Content-based video parsing and indexing based on audio-visual interaction, IEEE CIR SV, 11(4), 2001, pp. 522-535

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

ISSN journal

10518215 → ACNP

Volume

Issue

Year of publication

2001

Pages

522 - 535

Database

ISI

SICI code

1051-8215(200104)11:4<522:CVPAIB>2.0.ZU;2-W

Abstract

A content-based video parsing and indexing method is presented in this pape r, which analyzes both information sources (auditory and visual) and accoun ts for their inter-relations and synergy to extract high-level semantic inf ormation. Both frame- and object-based access to the visual information is employed. The aim of the method is to extract semantically meaningful video scenes and assign semantic label(s) to them. Due to the temporal nature of video, time has to be accounted for. Thus, time-constrained video represen tations and indices are generated. The current approach searches for specif ic types of content information relevant to the presence or absence of spea kers or persons. Audio-source parsing and indexing leads to the extraction of a speaker label mapping of the source over time. Video-source parsing an d indexing results in the extraction of a talking-face shot mapping over ti me, Integration of the audio and visual mappings constrained by interaction rules leads to higher levels of video abstraction and even partial detecti on of its context.