A fundamental task in video analysis is to extract structures from the vide
o to facilitate user's access (browsing and retrieval). Motivated by the im
portant role that the table of content (ToC) plays in a book, in this paper
, we introduce the concept of ToC in the video domain. Some existing approa
ches implicitly use the ToC, but are mainly limited to low-level entities (
e.g., shots and key frames). The drawbacks are that low-level structures (1
) contain too many entries to be efficiently presented to the user; and (2)
do not capture the underlying semantic structure of the video based on whi
ch the user may wish to browse/retrieve. To address these limitations, in t
his paper, we present an effective semantic-level ToC construction techniqu
e based on intelligent unsupervised clustering. It has the characteristics
of better modeling the time locality and scene structure. Experiments based
on real-world movie videos validate the effectiveness of the proposed appr
oach. Examples are given to demonstrate the usage of the scene-based ToC in
facilitating user's access to the video.