This article reports results from a program that produces high-quality
animation of facial expressions and head movements as automatically a
s possible in conjunction with meaning-based speech synthesis, includi
ng spoken intonation, The goal of the research is as much to test and
define our theories of the formal semantics for such gestures, as to p
roduce convincing animation. Towards this end, we have produced a high
-level programming language for three-dimensional (3-D) animation of f
acial expressions, We have been concerned primarily with expressions c
onveying information correlated with the intonation of the voice: This
includes the differences of timing, pitch, and emphasis that are rela
ted to such semantic distinctions of discourse as ''focus.'' ''topic,'
' and ''comment,'' ''theme'' and ''rheme.'' or ''given'' and ''new'' i
nformation, We ore also interested in the relation of affect or emotio
n to facial expression. Until now, systems have not embodied such rule
-governed translation from spoken utterance meaning to facial expressi
ons. Our system embodies rules that describe and coordinate these rela
tions: intonation/information, intonation/effect, and facial expressio
ns/effect. A meaning representation includes discourse information: Wh
at is contrastive/background information in the given context, and wha
t is the ''topic'' or ''theme'' of the discourse? The system maps the
meaning representation into how accents and their placement ore chosen
, how they are conveyed over facial expression, and how speech and fac
ial expressions ore coordinated. This determines a sequence of functio
nal groups: lip shapes, conversational signals, punctuators, regulator
s, and manipulators. Our algorithms then impose synchrony, create coar
ticulation effects, end determine affectual signals, eye and head move
ments. The lowest level representation is the Facial Action Coding Sys
tem (FACS), which makes the generation system portable to other facial
models.