The performance of large-scale shared-memory multiprocessors can be gr
eatly improved if they can cache remote shared data in the private cac
hes of the processors. However, maintaining cache coherence for such s
ystems remains a challenge. Although hardware directory schemes give g
ood performance, they might be too complicated and expensive for large
-scale multiprocessors. This tutorial article provides a comprehensive
guide of an alternative approach, called compiler-directed cache cohe
rence techniques. Compiler-directed techniques maintain coherence of c
aches locally by individual processors, eliminating the need for direc
tory hardware and interprocessor communication. We survey the state-of
-the-art software and hardware compiler-directed techniques and discus
s the basic concepts and issues. We also demonstrate the feasibility a
nd performance of compiler-directed cache coherence by presenting a ca
se study of the Two-Phase Invalidation scheme.