R. Collins et G. Steven, AN EXPLICITLY DECLARED DELAYED-BRANCH MECHANISM FOR A SUPERSCALAR ARCHITECTURE, Microprocessing and microprogramming, 40(10-12), 1994, pp. 677-680
One of the main obstacles to exploiting the fine-grained parallelism t
hat is available in general-purpose code is the frequency of branches
that cause unpredictable changes in the control flow of a program at r
un-time. Whenever a branch is taken, a performance penalty may be incu
rred as the processor waits for instructions to be fetched from the br
anch target stream. RISC processors introduce a delayed-branch mechani
sm which defines branch delay slots into which code can be scheduled.
This strategy allows the processor to be kept busy executing useful in
structions while the change of control flow takes place. While the con
cept of delayed branches can be readily extended to VLIW architectures
, it is less dear how it should be incorporated in a superscalar archi
tecture. This paper proposes a general branch-delay mechanism which is
suitable for a range of code-compatible superscalar processors and wh
ich completely avoids the need to introduce NOPs into the code. This t
echnique was developed as an integral part of the HSP superscalar proj
ect. HSP is a superscalar architecture currently being researched at t
he University of Hertfordshire with the aim of using compile-time inst
ruction scheduling to achieve an order of magnitude speed-up over trad
itional RISC architectures for a suite of non-numeric benchmark progra
ms.