ITA
ENG

Cost-conscious strategies to increase performance of numerical programs onaggressive VLIW architectures

Authors

Lopez, D Llosa, J Valero, M Ayguade, E

Citation

D. Lopez et al., Cost-conscious strategies to increase performance of numerical programs onaggressive VLIW architectures, IEEE COMPUT, 50(10), 2001, pp. 1033-1051

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

IEEE TRANSACTIONS ON COMPUTERS

ISSN journal

00189340 → ACNP

Volume

Issue

Year of publication

2001

Pages

1033 - 1051

Database

ISI

SICI code

0018-9340(200110)50:10<1033:CSTIPO>2.0.ZU;2-G

Abstract

Loops are the main time-consuming part of numerical applications. The perfo rmance of the loops is limited either by the resources offered by the archi tecture or by recurrences in the computation. To execute more operations pe r cycle, current processors are designed with growing degrees of resource r eplication (replication technique) for memory ports and functional units. H owever, the high cost in terms of area and cycle time of this technique pre cludes the use of high degrees of replication. High values for the cycle ti me may clearly offset any gain In terms of number of execution cycles. High values for the area may lead to an unimplementable configuration. An alter native to resource replication is resource widening (widening technique), w hich has also been used in some recent designs in which the width of the re sources is increased (i.e., a single operation is performed over multiple d ata). Moreover, several general-purpose superscalar microprocessors have be en implemented with multiply-add fused floating-point units (fusion techniq ue), which reduces the latency of the combined operation and the number of resources used. In this paper, we evaluate a broad set of VLIW processor de sign alternatives that combine the three techniques. We perform a technolog ical projection for the next processor generations in order to foresee the possible implementable alternatives. From this study, we conclude that if t he cost is taken into account, combining certain degrees of replication and widening in the hardware resources is more effective than applying only re plication. Also, we confirm that multiply-add fused units will have a signi ficant impact in raising the performance of future processors architectures with a reasonable increase in cost.