The large amount and the ubiquitous availability of multimedia information
(e.g., video, audio, image, and also text documents) require efficient, eff
ective, and automatic annotation and retrieval methods. As videos start to
play an even more important role in multimedia, content-based retrieval of
videos becomes an issue, especially as there should be an integrated method
ology for all types of multimedia documents.
Our approach for the integrated retrieval of videos, images, and text compr
ises three necessary steps: First, the detection and extraction of shots fr
om a video, second. the construction of a still image from the frames in a
shot. This is achieved by an extraction of ky frames or a mosaicing techniq
ue. The result is a single image visualization of a shot, which in turn can
be analyzed by the ImageMiner double dagger(TM) system.
The ImageMiner system was developed in cooperation with IBM at the Universi
ty of Bremen in the Image Processing Department of the Center for Computing
Technologies. It realizes the content-based retrieval of single images thr
ough a novel combination of techniques and methods from computer vision and
artificial intelligence. Its output is a textual description of an image,
and thus in our case, of the static elements of a video shot. In this way,
the annotations of a video can be indexed with standard text retrieval syst
ems, along with text documents or annotations of other multimedia documents
, thus ensuring an integrated interface for all kinds of multimedia documen
ts. (C) 1999 Elsevier Science Ltd. All rights reserved.