The synthetic-perturbation screening (SPS) methodology is based on an
empirical approach; SPS introduces artificial perturbations into the M
IMD program and captures the effects of such perturbations by using th
e modern branch of statistics called design of experiments. SPS can pr
ovide the basis of a powerful tool for screening MIMD programs for per
formance bottlenecks. This technique is portable across machines and a
rchitectures, and scales extremely well on massively parallel processo
rs. The purpose of this paper is to explain the general approach and t
o extend it to address specific features that are the main source of p
oor performance on the shared memory programming model. These include
performance degradation due to load imbalance and insufficient paralle
lism, and overhead introduced by synchronizations and by accessing sha
red data structures. We illustrate the practicality of SPS by demonstr
ating its use on two very different case studies: a large image unders
tanding benchmark and a parallel quicksort.