RESOURCE-DIRECTED LOOP PIPELINING - EXPOSING JUST ENOUGH PARALLELISM

Citation
S. Novack et A. Nicolau, RESOURCE-DIRECTED LOOP PIPELINING - EXPOSING JUST ENOUGH PARALLELISM, Computer journal, 40(6), 1997, pp. 311-321
Citations number
10
Journal title
ISSN journal
00104620
Volume
40
Issue
6
Year of publication
1997
Pages
311 - 321
Database
ISI
SICI code
0010-4620(1997)40:6<311:RLP-EJ>2.0.ZU;2-X
Abstract
Many techniques have been proposed for exploiting instruction-level pa rallelism, ranging from the optimal and expensive but ignoring resourc e constraints, to various forms of introducing resource constraints, O ne of the most aggressive of these techniques is resource-constrained software pipelining (RCSP), RCSP works by repeatedly scheduling succes sive iterations of a loop in parallel until the data and resource depe ndence structure of the loop causes the process to converge on a repea ting scheduling pattern, This repeating pattern is then used as the ne w loop body, In principle, this process can be made optimal with respe ct to full unrolling and scheduling of the loop. Of course, this is no t the same as absolute optimality; however, given the NP-hard nature o f the problem and the results of Schwiegelshohn et al., this may be th e strongest form possible for general loop pipelining, The main drawba ck of RCSP is that, in practice, its space/time overhead can be fairly expensive, In this paper, we present resource-directed loop pipelinin g (RDLP), a new approach that attempts to retain many of the advantage s of RCSP while minimizing the expense, It does so by allowing the ava ilability of target resources to in some sense guide the application o f parallelism exposing and parallelizing transformations. One of the k ey features of RDLP is the separation of control heuristics from trans formations that allow the loop pipelining to be as general as the unde rlying system of code motion transformations. Results are presented wh ich show that even with very unsophisticated heuristics, RDLP achieves roughly the same performance as RCSP, while providing a fourfold decr ease in the space/time cost; moreover, we show that RDLP exposes 'just enough' parallelism-incurs a minimum of code explosion-to maximally u tilize resources.