ITA
ENG

Tuning compiler optimizations for simultaneous multithreading

Authors

Lo, JL Eggers, SJ Levy, HM Parekh, SS Tulsen, DM

Citation

Jl. Lo et al., Tuning compiler optimizations for simultaneous multithreading, INT J P PRO, 27(6), 1999, pp. 477-503

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING

ISSN journal

08857458 → ACNP

Volume

Issue

Year of publication

1999

Pages

477 - 503

Database

ISI

SICI code

0885-7458(199912)27:6<477:TCOFSM>2.0.ZU;2-R

Abstract

Simultaneous Multithreading (SMT) is a processor architectural technique th at promises to significantly improve the utilization and performance of mod ern wide-issue superscalar processors. An SMT processor is capable of issui ng multiple instructions From multiple threads to a processor's functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and be nefits from fine-grained sharing of processor and memory system resources; unlike current uniprocessors, SMT exposes and benefits from inter-thread in struction-level parallelism when hiding long-latency operations. Compiler o ptimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine, particularly for par allel processors. For example, when targeting shared-memory multiprocessors , parallel programs are compiled to minimize sharing, in order to decrease high-cost inter-processor communication. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT, w hich can benefit from fine-grained resource sharing within the processor. T his paper reexamines several compiler optimizations in the context of simul taneous multithreading. We revisit three optimizations in this light: loop- iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyc lic, rather than a blocked algorithm; non-loop programs should not be softw are speculated, and compilers no longer need to be concerned about precisel y sizing tiles to match cache sizes. By following these new guidelines, com pilers can generate code that improves the performance of programs executin g on SMT machines.