This paper describes a multimodal approach for speaker verification. T
he system consists of two classifiers, one using visual features, the
other using acoustic features. A lip tracker is used to extract visual
information from the speaking face which provides shape and intensity
features. We describe an approach for normalizing and mapping differe
nt modalities onto a common confidence interval. We also describe a no
vel method for integrating the scores of multiple classifiers. Verific
ation experiments are reported for the individual modalities and for t
he combined classifier. The integrated system outperformed each sub-sy
stem and reduced the false acceptance rate of the acoustic sub-system
from 2.3% to 0.5%. (C) 1997 Elsevier Science B.V.