Multimodal interfaces combining natural language and graphics take adv
antage of both the individual strength of each communication mode and
the fact that several modes can be employed in parallel. The central c
laim of this paper is that the generation of a multimodal presentation
can be considered as an incremental planning process that aims to ach
ieve a given communicative goal. We describe the multimodal presentati
on system WIP which allows the generation of alternate presentations o
f the same content taking into account various contextual factors. We
discuss how the plan-based approach to presentation design can be expl
oited so that graphics generation influences the production of text an
d vice versa. We show that well-known concepts from the area of natura
l language processing like speech acts, anaphora, and rhetorical relat
ions take on an extended meaning in the context of multimodal communic
ation. Finally, we discuss two detailed examples illustrating and rein
forcing our theoretical claims.