With the popularity of multimedia acceleration instructions such as MMX, MP
EG decompression is increasingly executed on general purpose processors ins
tead of dedicated MPEG hardware. The gap between processor speed and memory
access means that a significant amount of time is spent in the memory syst
em. As processors get faster-both in terms of higher clock speeds and incre
ased instruction level parallelism-the time spent in the memory system beco
mes even more significant.
Data prefetching is a well-known technique for improving cache performance.
While several studies have examined prefetch strategies for scientific and
commercial applications, this paper focuses on video applications. Data is
presented for three types of hardware-prefetching schemes: the stream buff
er, the stride prediction table (SPT), and the stream cache, as well as a n
ew software-directed prefetching technique based on emulation of the hardwa
re SPT. Up to 90% of the misses that would otherwise occur with no prefetch
ing are eliminated. The stream cache can cut execution time by more than ha
lf with the addition of a relatively small amount of additional hardware. S
oftware prefetching achieves nearly equal performance with minimal addition
al hardware. Techniques presented in this paper can be used to improve perf
ormance in a general-purpose CPU or an embedded MPEG processor. Performance
gains achieved for MPEG benchmarks apply equally effectively to similar mu
ltimedia applications.