Recently, two approaches investigated indexing and retrieving videos. One a
pproach utilized the visual features of individual objects, and the other a
pproach exploited the spatio-temporal relationships between multiple object
s. In this paper, we integrate both approaches into a new video model, call
ed the Visual-Spatio-Temporal (VST) model to represent videos. The visual f
eatures are modeled in a topological approach and integrated with the spati
o-temporal relationships. As a result, we defined rich sets of VST relation
ships which support and simplify the formulation of more semantical queries
. An intuitive query interface which allows users to describe VST features
of video objects by sketch and feature specification is presented. The cond
ucted experiments prove the effectiveness of modeling and querying videos b
y the visual features of individual objects and the VST relationships betwe
en multiple objects.