Tuning compiler optimizations for simultaneous multithreading

Citation
Jl. Lo et al., Tuning compiler optimizations for simultaneous multithreading, INT J P PRO, 27(6), 1999, pp. 477-503
Citations number
40
Categorie Soggetti
Computer Science & Engineering
Journal title
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
ISSN journal
08857458 → ACNP
Volume
27
Issue
6
Year of publication
1999
Pages
477 - 503
Database
ISI
SICI code
0885-7458(199912)27:6<477:TCOFSM>2.0.ZU;2-R
Abstract
Simultaneous Multithreading (SMT) is a processor architectural technique th at promises to significantly improve the utilization and performance of mod ern wide-issue superscalar processors. An SMT processor is capable of issui ng multiple instructions From multiple threads to a processor's functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and be nefits from fine-grained sharing of processor and memory system resources; unlike current uniprocessors, SMT exposes and benefits from inter-thread in struction-level parallelism when hiding long-latency operations. Compiler o ptimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine, particularly for par allel processors. For example, when targeting shared-memory multiprocessors , parallel programs are compiled to minimize sharing, in order to decrease high-cost inter-processor communication. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT, w hich can benefit from fine-grained resource sharing within the processor. T his paper reexamines several compiler optimizations in the context of simul taneous multithreading. We revisit three optimizations in this light: loop- iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyc lic, rather than a blocked algorithm; non-loop programs should not be softw are speculated, and compilers no longer need to be concerned about precisel y sizing tiles to match cache sizes. By following these new guidelines, com pilers can generate code that improves the performance of programs executin g on SMT machines.