ITA
ENG

AN EMPIRICAL-STUDY OF DECENTRALIZED ILP EXECUTION MODELS

Authors

RANGANATHAN N FRANKLIN M

Citation

N. Ranganathan et M. Franklin, AN EMPIRICAL-STUDY OF DECENTRALIZED ILP EXECUTION MODELS, ACM SIGPLAN NOTICES, 33(11), 1998, pp. 272-281

Citations number

Categorie Soggetti

Computer Science Software Graphycs Programming","Computer Science Software Graphycs Programming

Journal title

ACM SIGPLAN NOTICES → ACNP

Volume

Issue

Year of publication

1998

Supplement

Pages

272 - 281

Database

ISI

SICI code

Abstract

Recent fascination for dynamic scheduling as a means for exploiting in struction-level parallelism has introduced significant interest in the scalability aspects of dynamic scheduling hardware. In order to overc ome the scalability problems of centralized hardware schedulers, many decentralized execution models are being proposed and investigated rec ently. The crux of all these models is to split the instruction window across multiple processing elements (PEs) that do independent schedul ing of instructions. The decentralized execution models proposed so fa r can be grouped under 3 categories, based on the criterion used for a ssigning an instruction to a particular PE. They are: (i) execution un it dependence based decentralization (EDD), (ii) control dependence ba sed decentralization (CDD), and (iii) data dependence based decentrali zation (DDD). This paper investigates the performance aspects of these three decentralization approaches. Using a suite of important benchma rks and realistic system parameters, we examine performance difference s resulting from the type of partitioning as well as from specific imp lementation issues such as the type of PE interconnect. We found that with a ring-type PE interconnect, the DDD approach performs the best w hen the number of PEs is moderate, and that the CDD approach performs best when the number of PEs is large. The currently used approach-EDD- does not perform well for any configuration. With a realistic crossbar , performance does not increase with the number of PEs for any of the partitioning approaches. The results give insight into the best way to use the transistor budget available for implementing the instruction window.