We present a method to automatically localize captions in JPEG compressed i
mages and the I-frames of MPEG compressed videos. Caption text regions are
segmented from background images using their distinguishing texture charact
eristics. Unlike previously published methods which fully decompress the vi
deo sequence before extracting the text regions, this method locates candid
ate caption text regions directly in the DCT compressed domain using the in
tensity variation information encoded in the DCT domain. Therefore, only a
very small amount of decoding is required. The proposed algorithm takes abo
ut 0.006 second to process a 240 x 350 image and achieves a recall rate of
99.17 percent while falsely accepting about 1.87 percent nontext DCT blocks
on a variety of MPEG compressed videos containing more than 2,300 I-frames
.