This paper presents a knowledge representation framework that provides
an integrated approach to reasoning about vision and language in spat
ial domains. Images are stored using a representation that preserves i
nformation such as shape and distance; linguistic information is repre
sented in a descriptive, semantic network. We incorporate a model-base
d formalism as an intermediate representation, which can be used to tr
ansform visual to descriptive representations and vice versa.