Y. Shintani et al., HIERARCHICAL EXECUTION TO SPEED-UP PIPELINE INTERLOCK IN MAINFRAME COMPUTERS, I.E.E.E. transactions on computers, 45(5), 1996, pp. 589-599
This paper introduces a methodology, called hierarchical execution, wh
ich reduces stalls caused by pipeline interlocks such as data and cont
rol dependencies. Since a lot of software has been accumulated in main
frame computer systems as object code, it is important to improve perf
ormance without having to recompile the code for optimization. Our met
hodology consists of a simple pre-ALU that generates results, with sho
rter latency than the main ALU, asynchronously, which reduces the over
head especially for address generation interlocks and branch instructi
ons. This method was implemented in Hitachi's mainframe processors, M-
680 and M-880. In M-680, the pre-ALU, together with the instruction de
coder, processes instructions in superpipelined fashion, which further
improves performance. The aggregate effect of hierarchical execution
on CPU time, for evaluated benchmarks, is 10% on average, with only a
1.6% increase in hardware. Therefore, we can roughly say that the hier
archical execution method improved cost performance by 8%.