Cost-conscious strategies to increase performance of numerical programs onaggressive VLIW architectures

Citation
D. Lopez et al., Cost-conscious strategies to increase performance of numerical programs onaggressive VLIW architectures, IEEE COMPUT, 50(10), 2001, pp. 1033-1051
Citations number
38
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON COMPUTERS
ISSN journal
00189340 → ACNP
Volume
50
Issue
10
Year of publication
2001
Pages
1033 - 1051
Database
ISI
SICI code
0018-9340(200110)50:10<1033:CSTIPO>2.0.ZU;2-G
Abstract
Loops are the main time-consuming part of numerical applications. The perfo rmance of the loops is limited either by the resources offered by the archi tecture or by recurrences in the computation. To execute more operations pe r cycle, current processors are designed with growing degrees of resource r eplication (replication technique) for memory ports and functional units. H owever, the high cost in terms of area and cycle time of this technique pre cludes the use of high degrees of replication. High values for the cycle ti me may clearly offset any gain In terms of number of execution cycles. High values for the area may lead to an unimplementable configuration. An alter native to resource replication is resource widening (widening technique), w hich has also been used in some recent designs in which the width of the re sources is increased (i.e., a single operation is performed over multiple d ata). Moreover, several general-purpose superscalar microprocessors have be en implemented with multiply-add fused floating-point units (fusion techniq ue), which reduces the latency of the combined operation and the number of resources used. In this paper, we evaluate a broad set of VLIW processor de sign alternatives that combine the three techniques. We perform a technolog ical projection for the next processor generations in order to foresee the possible implementable alternatives. From this study, we conclude that if t he cost is taken into account, combining certain degrees of replication and widening in the hardware resources is more effective than applying only re plication. Also, we confirm that multiply-add fused units will have a signi ficant impact in raising the performance of future processors architectures with a reasonable increase in cost.