An efficient technique for summarization of stereoscopic video sequences is
presented in this paper, which extracts a small but meaningful set of vide
o frames using a content-based sampling algorithm. The proposed video-conte
nt representation provides the capability of browsing digital stereoscopic
video sequences and performing more efficient content-based queries and ind
exing. Each stereoscopic video sequence is first partitioned into shots by
applying a shot-cut detection algorithm so that frames (or stereo pairs) of
similar visual characteristics are gathered together. Each shot is then an
alyzed using stereo-imaging techniques, and the disparity field, occluded a
reas, and depth map are estimated. A multiresolution implementation of the
Recursive Shortest Spanning Tree (RSST) algorithm is applied for color and
depth segmentation, while fusion of color and depth segments is employed fo
r reliable video object extraction. In particular, color segments are proje
cted onto depth segments so that video objects on the same depth plane are
retained, while at the same time accurate object boundaries are extracted.
Feature vectors are then constructed using multidimensional fuzzy classific
ation of segment features including size, location, color, and depth. Shot
selection is accomplished by clustering similar shots based on the generali
zed Lloyd-Max algorithm, while for a given shot, key frames are extracted u
sing an optimization method for locating frames of minimally correlated fea
ture vectors. For efficient implementation of the latter method, a genetic
algorithm is used. Experimental results are presented, which indicate the r
eliable performance of the proposed scheme on real-life stereoscopic video
sequences.