ITA
ENG

ROBUST SENSOR FUSION - ANALYSIS AND APPLICATION TO AUDIO-VISUAL SPEECH RECOGNITION

Authors

MOVELLAN JR MINEIRO P

Citation

Jr. Movellan et P. Mineiro, ROBUST SENSOR FUSION - ANALYSIS AND APPLICATION TO AUDIO-VISUAL SPEECH RECOGNITION, Machine learning, 32(2), 1998, pp. 85-100

Citations number

Categorie Soggetti

Computer Science Artificial Intelligence","Computer Science Artificial Intelligence

Journal title

Machine learning → ACNP

ISSN journal

08856125

Volume

Issue

Year of publication

1998

Pages

85 - 100

Database

ISI

SICI code

0885-6125(1998)32:2<85:RSF-AA>2.0.ZU;2-5

Abstract

This paper analyzes the issue of catastrophic fusion, a problem that o ccurs in multimodal recognition systems that integrate the output from several modules while working in non-stationary environments. For con creteness we frame the analysis with regard to the problem of automati c audio visual speech recognition (AVSR), but the issues at hand are v ery general and arise in multimodal recognition systems which need to work in a wide variety of contexts. Catastrophic fusion is said to hav e occurred when the performance of a multimodal system is inferior to the performance of some isolated modules, e.g., when the performance o f the audio visual speech recognition system is inferior to that of th e audio system alone. Catastrophic fusion arises because recognition m odules make implicit assumptions and thus operate correctly only withi n a certain context. Practice shows that when modules are tested in co ntexts inconsistent with their assumptions, their influence on the fus ed product tends to increase, with catastrophic results. We propose a principled solution to this problem based upon Bayesian ideas of compe titive models and inference robustification. Pie study the approach an alytically on a classic Gaussian discrimination task and then apply it to a realistic problem on audio visual speech recognition (AVSR) with excellent results.