Simultaneous Multithreading (SMT) is a processor architectural technique th
at promises to significantly improve the utilization and performance of mod
ern wide-issue superscalar processors. An SMT processor is capable of issui
ng multiple instructions From multiple threads to a processor's functional
units each cycle. Unlike shared-memory multiprocessors, SMT provides and be
nefits from fine-grained sharing of processor and memory system resources;
unlike current uniprocessors, SMT exposes and benefits from inter-thread in
struction-level parallelism when hiding long-latency operations. Compiler o
ptimizations are often driven by specific assumptions about the underlying
architecture and implementation of the target machine, particularly for par
allel processors. For example, when targeting shared-memory multiprocessors
, parallel programs are compiled to minimize sharing, in order to decrease
high-cost inter-processor communication. Therefore, optimizations that are
appropriate for these conventional machines may be inappropriate for SMT, w
hich can benefit from fine-grained resource sharing within the processor. T
his paper reexamines several compiler optimizations in the context of simul
taneous multithreading. We revisit three optimizations in this light: loop-
iteration scheduling, software speculative execution, and loop tiling. Our
results show that all three optimizations should be applied differently in
the context of SMT architectures: threads should be parallelized with a cyc
lic, rather than a blocked algorithm; non-loop programs should not be softw
are speculated, and compilers no longer need to be concerned about precisel
y sizing tiles to match cache sizes. By following these new guidelines, com
pilers can generate code that improves the performance of programs executin
g on SMT machines.