This paper contrasts two ways of forming conceptual descriptions from image
s. The first, called "monitoring", just follows the flow of data from image
s to interpretation, having little need for top-level control. The second,
called "watching", emphasizes the use of top-level control and actively sel
ects evidence for task-based descriptions of the dynamic scenes. Here we lo
ok at the effect this has on forming conceptual descriptions. First, we loo
k at how motion verbs and the perception of events contribute to an effecti
ve representational scheme. Then we go on to discuss illustrated examples o
f computing conceptual descriptions from images in our implementations of t
he monitoring and watching systems. Finally, we discuss future plans and re
lated work. (C) 2000 Elsevier Science B.V. All rights reserved.