In this paper, we propose a multi-level abstraction mechanism for capturing
the spatial and temporal semantics associated with various objects in an i
nput image or in a sequence of video frames. This abstraction can manifest
itself effectively in conceptualizing events and views in multimedia data a
s perceived by individual users. The objective is to provide an efficient m
echanism for handling content-based queries, with the minimum amount of pro
cessing performed on raw data during query evaluation. We introduce a multi
level architecture for video data management at different levels of abstrac
tion. The architecture facilitates a multi-level indexing/searching mechani
sm. At the finest level of granularity, video data can be indexed based on
mere appearance of objects and faces. For management of information at high
er levels of abstractions, an object-oriented paradigm is proposed which is
capable of supporting domain specific views.