ITA
ENG

VISUAL SPEECH RECOGNITION BY RECURRENT NEURAL NETWORKS

Authors

RABI G LU SW

Citation

G. Rabi et Sw. Lu, VISUAL SPEECH RECOGNITION BY RECURRENT NEURAL NETWORKS, Journal of electronic imaging, 7(1), 1998, pp. 61-69

Citations number

Categorie Soggetti

Engineering, Eletrical & Electronic",Optics,"Photographic Tecnology

Journal title

Journal of electronic imaging → ACNP

ISSN journal

10179909

Volume

Issue

Year of publication

1998

Pages

61 - 69

Database

ISI

SICI code

1017-9909(1998)7:1<61:VSRBRN>2.0.ZU;2-P

Abstract

One of the major drawbacks of current acoustically based speech recogn izers is that their performance deteriorates drastically with noise. O ur focus is to develop a computer system that performs speech recognit ion based on visual information concerning the speaker. The system aut omatically extracts visual speech features through image-processing te chniques that operate on facial images taken in a normally illuminated environment. To cope with the dynamic nature of change in speech patt erns with respect to time as well as the spatial variations in the ind ividual patterns, the proposed recognition scheme uses a recurrent neu ral network architecture. By specifying a certain behavior when the ne twork is presented with exemplar sequences, the recurrent network is t rained with no more than feedforward complexity. The network's desired behavior is based on characterizing a given word by well-defined segm ents. Adaptive segmentation is employed to segment the training sequen ces of a given class. This technique iterates the execution of two ste ps. First, the sequences are segmented individually. Then, a generaliz ed version of dynamic time warping is used to align the segments of ai l sequences. At each iteration, the weights of the distance functions used in the two steps are updated in a way that minimizes a segmentati on error. The system is implemented and tested an a few words. The res ults are satisfactory. In particular, the system is able to distinguis h between words with common segments. Moreover, it tolerates to a grea t extent variable-duration words of the same class. (C) 1998 SPIE and IS&T. [S1017-9909(98)00701-6].