Ep. Markatos et Tj. Leblanc, USING PROCESSOR AFFINITY IN LOOP SCHEDULING ON SHARED-MEMORY MULTIPROCESSORS, IEEE transactions on parallel and distributed systems, 5(4), 1994, pp. 379-400
Citations number
32
Categorie Soggetti
System Science","Engineering, Eletrical & Electronic","Computer Science Theory & Methods
Loops are the single largest source of parallelism in many application
s. One way to exploit this parallelism is to execute loop iterations i
n parallel on different processors. Previous approaches to loop schedu
ling attempted to achieve the minimum completion time by distributing
the workload as evenly as possible while minimizing the number of sync
hronization operations required. In this paper, we consider a third di
mension to the problem of loop scheduling on shared-memory multiproces
sors: communication overhead caused by accesses to nonlocal data. We s
how that traditional algorithms for loop scheduling, which ignore the
location of data when assigning iterations to processors, incur a sign
ificant performance penalty on modern shared-memory multiprocessors. W
e propose a new loop scheduling algorithm that attempts to simultaneou
sly balance the workload, minimize synchronization, and co-locate loop
iterations with the necessary data. We compare the performance of thi
s new algorithm to other known algorithms by using five representative
kernel programs on a Silicon Graphics multiprocessor work-station, a
BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new
algorithm offers substantial performance improvements, up to a factor
of 4 in some cases. We conclude that loop scheduling algorithms for sh
ared-memory multiprocessors cannot afford to ignore the location of da
ta, particularly in light of the increasing disparity between processo
r and memory speeds.