We have extended a single-issue pipelined implementation of SPARC with
mechanisms to support non-blocking load instructions and analyzed it
with respect to speed and complexity. We present the functionality of
the non-blocking load scheme as well as a detailed implementation anal
ysis of it. We find that it is possible to implement the non-blocking
load mechanisms without significantly complicating the pipeline design
and with no increase of the processor cycle time. This is mainly beca
use the non-blocking load mechanisms can work in parallel with the ALU
, the registerfile, and the cache memories-datapath components that of
ten establish the critical path in a pipelined processor.