ITA
ENG

USING PROCESSOR AFFINITY IN LOOP SCHEDULING ON SHARED-MEMORY MULTIPROCESSORS

Authors

MARKATOS EP LEBLANC TJ

Citation

Ep. Markatos et Tj. Leblanc, USING PROCESSOR AFFINITY IN LOOP SCHEDULING ON SHARED-MEMORY MULTIPROCESSORS, IEEE transactions on parallel and distributed systems, 5(4), 1994, pp. 379-400

Citations number

Categorie Soggetti

System Science","Engineering, Eletrical & Electronic","Computer Science Theory & Methods

Journal title

IEEE transactions on parallel and distributed systems → ACNP

ISSN journal

10459219

Volume

Issue

Year of publication

1994

Pages

379 - 400

Database

ISI

SICI code

1045-9219(1994)5:4<379:UPAILS>2.0.ZU;2-F

Abstract

Loops are the single largest source of parallelism in many application s. One way to exploit this parallelism is to execute loop iterations i n parallel on different processors. Previous approaches to loop schedu ling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of sync hronization operations required. In this paper, we consider a third di mension to the problem of loop scheduling on shared-memory multiproces sors: communication overhead caused by accesses to nonlocal data. We s how that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a sign ificant performance penalty on modern shared-memory multiprocessors. W e propose a new loop scheduling algorithm that attempts to simultaneou sly balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of thi s new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor work-station, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. We conclude that loop scheduling algorithms for sh ared-memory multiprocessors cannot afford to ignore the location of da ta, particularly in light of the increasing disparity between processo r and memory speeds.