D. Lopez et al., Cost-conscious strategies to increase performance of numerical programs onaggressive VLIW architectures, IEEE COMPUT, 50(10), 2001, pp. 1033-1051
Loops are the main time-consuming part of numerical applications. The perfo
rmance of the loops is limited either by the resources offered by the archi
tecture or by recurrences in the computation. To execute more operations pe
r cycle, current processors are designed with growing degrees of resource r
eplication (replication technique) for memory ports and functional units. H
owever, the high cost in terms of area and cycle time of this technique pre
cludes the use of high degrees of replication. High values for the cycle ti
me may clearly offset any gain In terms of number of execution cycles. High
values for the area may lead to an unimplementable configuration. An alter
native to resource replication is resource widening (widening technique), w
hich has also been used in some recent designs in which the width of the re
sources is increased (i.e., a single operation is performed over multiple d
ata). Moreover, several general-purpose superscalar microprocessors have be
en implemented with multiply-add fused floating-point units (fusion techniq
ue), which reduces the latency of the combined operation and the number of
resources used. In this paper, we evaluate a broad set of VLIW processor de
sign alternatives that combine the three techniques. We perform a technolog
ical projection for the next processor generations in order to foresee the
possible implementable alternatives. From this study, we conclude that if t
he cost is taken into account, combining certain degrees of replication and
widening in the hardware resources is more effective than applying only re
plication. Also, we confirm that multiply-add fused units will have a signi
ficant impact in raising the performance of future processors architectures
with a reasonable increase in cost.