ROBUST SENSOR FUSION - ANALYSIS AND APPLICATION TO AUDIO-VISUAL SPEECH RECOGNITION

Citation
Jr. Movellan et P. Mineiro, ROBUST SENSOR FUSION - ANALYSIS AND APPLICATION TO AUDIO-VISUAL SPEECH RECOGNITION, Machine learning, 32(2), 1998, pp. 85-100
Citations number
32
Categorie Soggetti
Computer Science Artificial Intelligence","Computer Science Artificial Intelligence
Journal title
ISSN journal
08856125
Volume
32
Issue
2
Year of publication
1998
Pages
85 - 100
Database
ISI
SICI code
0885-6125(1998)32:2<85:RSF-AA>2.0.ZU;2-5
Abstract
This paper analyzes the issue of catastrophic fusion, a problem that o ccurs in multimodal recognition systems that integrate the output from several modules while working in non-stationary environments. For con creteness we frame the analysis with regard to the problem of automati c audio visual speech recognition (AVSR), but the issues at hand are v ery general and arise in multimodal recognition systems which need to work in a wide variety of contexts. Catastrophic fusion is said to hav e occurred when the performance of a multimodal system is inferior to the performance of some isolated modules, e.g., when the performance o f the audio visual speech recognition system is inferior to that of th e audio system alone. Catastrophic fusion arises because recognition m odules make implicit assumptions and thus operate correctly only withi n a certain context. Practice shows that when modules are tested in co ntexts inconsistent with their assumptions, their influence on the fus ed product tends to increase, with catastrophic results. We propose a principled solution to this problem based upon Bayesian ideas of compe titive models and inference robustification. Pie study the approach an alytically on a classic Gaussian discrimination task and then apply it to a realistic problem on audio visual speech recognition (AVSR) with excellent results.