The new technique for reducing the load latency is presented. This tec
hnique, named tunneling-load, utilizes the register specifier buffer i
n order to reduce the load latency without fetching the data cache spe
culatively, and thus eliminates the drawback of any load address predi
ction techniques. As a consequence of the trend toward increasing cloc
k frequency, the internal cache is no longer able to fill the speed ga
p between the processor and the external memory, and the data cache la
tency degrades the processor performance. In order to hide this latenc
y, several techniques predicting the load address have been proposed.
These techniques carry out the speculative data cache fetching, which
causes the explosion of the memory traffic and the pollution of the da
ta cache. The tunneling-load solves these problems. We have evaluated
the effects of the tunneling-load, and found that in an in-order-issue
superscalar platform the instruction level parallelism is increased b
y approximately 10%.