ITA
ENG

RESOURCE-DIRECTED LOOP PIPELINING - EXPOSING JUST ENOUGH PARALLELISM

Authors

NOVACK S NICOLAU A

Citation

S. Novack et A. Nicolau, RESOURCE-DIRECTED LOOP PIPELINING - EXPOSING JUST ENOUGH PARALLELISM, Computer journal, 40(6), 1997, pp. 311-321

Citations number

Journal title

Computer journal → ACNP

ISSN journal

00104620

Volume

Issue

Year of publication

1997

Pages

311 - 321

Database

ISI

SICI code

0010-4620(1997)40:6<311:RLP-EJ>2.0.ZU;2-X

Abstract

Many techniques have been proposed for exploiting instruction-level pa rallelism, ranging from the optimal and expensive but ignoring resourc e constraints, to various forms of introducing resource constraints, O ne of the most aggressive of these techniques is resource-constrained software pipelining (RCSP), RCSP works by repeatedly scheduling succes sive iterations of a loop in parallel until the data and resource depe ndence structure of the loop causes the process to converge on a repea ting scheduling pattern, This repeating pattern is then used as the ne w loop body, In principle, this process can be made optimal with respe ct to full unrolling and scheduling of the loop. Of course, this is no t the same as absolute optimality; however, given the NP-hard nature o f the problem and the results of Schwiegelshohn et al., this may be th e strongest form possible for general loop pipelining, The main drawba ck of RCSP is that, in practice, its space/time overhead can be fairly expensive, In this paper, we present resource-directed loop pipelinin g (RDLP), a new approach that attempts to retain many of the advantage s of RCSP while minimizing the expense, It does so by allowing the ava ilability of target resources to in some sense guide the application o f parallelism exposing and parallelizing transformations. One of the k ey features of RDLP is the separation of control heuristics from trans formations that allow the loop pipelining to be as general as the unde rlying system of code motion transformations. Results are presented wh ich show that even with very unsophisticated heuristics, RDLP achieves roughly the same performance as RCSP, while providing a fourfold decr ease in the space/time cost; moreover, we show that RDLP exposes 'just enough' parallelism-incurs a minimum of code explosion-to maximally u tilize resources.