The overall performance of a shared-memory, common bus multiprocessor
system can be seriously affected by useless coherence-related actions.
This occurs, in particular, when a private data block of a process be
comes resident in more than one cache as a consequence of the migratio
n of the owner process. We introduce a hardware solution to eliminate
these useless shared copies, and show how this technique can be applie
d to a specific coherence protocol. Two extreme workload conditions ar
e properly selected to evaluate the performance of a multiprocessor sy
stem.