The multicluster architecture that we introduce offers a decentralized, dyn
amically-scheduled architecture, in which the register files, dispatch queu
e, and Functional units of the architecture are distributed across multiple
clusters, and each cluster is assigned a subset of the architectural regis
ters. The motivation for the multicluster architecture is to reduce the clo
ck cycle time, relative to a single-cluster architecture with the same numb
er of hardware resources, by reducing the size and complexity of components
on critical timing paths. Resource partitioning, however, introduces instr
uction-execution overhead and may reduce the number of concurrently executi
ng instructions. To counter these two negative by-products of partitioning,
we developed a static instruction scheduling algorithm. We describe this a
lgorithm, and using trace-driven simulations of SPEC92 benchmarks, evaluate
its effectiveness. This evaluation indicates that for the configurations c
onsidered, the multicluster architecture may have significant performance a
dvantages at feature sizes below 0.35 mu m, and warrants further investigat
ion.