AN EMPIRICAL-STUDY OF DECENTRALIZED ILP EXECUTION MODELS

Citation
N. Ranganathan et M. Franklin, AN EMPIRICAL-STUDY OF DECENTRALIZED ILP EXECUTION MODELS, ACM SIGPLAN NOTICES, 33(11), 1998, pp. 272-281
Citations number
20
Categorie Soggetti
Computer Science Software Graphycs Programming","Computer Science Software Graphycs Programming
Journal title
Volume
33
Issue
11
Year of publication
1998
Supplement
S
Pages
272 - 281
Database
ISI
SICI code
Abstract
Recent fascination for dynamic scheduling as a means for exploiting in struction-level parallelism has introduced significant interest in the scalability aspects of dynamic scheduling hardware. In order to overc ome the scalability problems of centralized hardware schedulers, many decentralized execution models are being proposed and investigated rec ently. The crux of all these models is to split the instruction window across multiple processing elements (PEs) that do independent schedul ing of instructions. The decentralized execution models proposed so fa r can be grouped under 3 categories, based on the criterion used for a ssigning an instruction to a particular PE. They are: (i) execution un it dependence based decentralization (EDD), (ii) control dependence ba sed decentralization (CDD), and (iii) data dependence based decentrali zation (DDD). This paper investigates the performance aspects of these three decentralization approaches. Using a suite of important benchma rks and realistic system parameters, we examine performance difference s resulting from the type of partitioning as well as from specific imp lementation issues such as the type of PE interconnect. We found that with a ring-type PE interconnect, the DDD approach performs the best w hen the number of PEs is moderate, and that the CDD approach performs best when the number of PEs is large. The currently used approach-EDD- does not perform well for any configuration. With a realistic crossbar , performance does not increase with the number of PEs for any of the partitioning approaches. The results give insight into the best way to use the transistor budget available for implementing the instruction window.