ITA
ENG

Real-time automated video and audio capture with multiple cameras and microphones

Authors

Wang, C Griebel, S Brandstein, M Hsu, BJ

Citation

C. Wang et al., Real-time automated video and audio capture with multiple cameras and microphones, J VLSI S P, 29(1-2), 2001, pp. 81-99

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY

ISSN journal

13875485 → ACNP

Volume

Issue

1-2

Year of publication

2001

Pages

81 - 99

Database

ISI

SICI code

1387-5485(200108)29:1-2<81:RAVAAC>2.0.ZU;2-M

Abstract

This work presents the acoustic and visual-based tracking system functionin g at the Harvard Intelligent Multi-Media Environments Laboratory (HIMMEL). The environment is populated with a number of microphones and steerable vid eo cameras. Acoustic source localization, video-based face tracking and pos e estimation, and multi-channel speech enhancement methods are applied in c ombination to detect and track individuals in a practical environment while also providing an improved audio signal to accompany the video stream. The video portion of the system tracks talkers by utilizing source motion, con tour geometry, color data, and simple facial features. Decisions involving which camera to use are based on an estimate of the head's gazing angle. Th is head pose estimation is achieved using a very general head model which e mploys hairline features and a learned network classification procedure. Fi nally, a beamforming and postfiltering microphone array technique is used t o create an enhanced speech waveform to accompany the recorded video signal . The system presented in this paper is robust to both visual clutter (e.g. ovals in the scene of interest which are not faces) and audible noise (e.g . reverberations and background noise).