ITA
ENG

THREADED RUNTIME SUPPORT FOR EXECUTION OF FINE-GRAIN PARALLEL CODE ONCOARSE-GRAIN MULTIPROCESSORS

Authors

NEVES R SCHNABEL RB

Citation

R. Neves et Rb. Schnabel, THREADED RUNTIME SUPPORT FOR EXECUTION OF FINE-GRAIN PARALLEL CODE ONCOARSE-GRAIN MULTIPROCESSORS, Journal of parallel and distributed computing, 42(2), 1997, pp. 128-142

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Theory & Methods

Journal title

Journal of parallel and distributed computing → ACNP

ISSN journal

07437315

Volume

Issue

Year of publication

1997

Pages

128 - 142

Database

ISI

SICI code

0743-7315(1997)42:2<128:TRSFEO>2.0.ZU;2-B

Abstract

The goal of this research is to provide systems support that allows fi ne grain, data parallel code to execute efficiently on much coarser gr ain multiprocessors. The task of writing parallel applications is simp lified by allowing the programmer to assume a number of processors con venient to the algorithm being implemented. This paper describes and e valuates a runtime approach that efficiently manages thousands of virt ual processors per actual processor. The limits in using user-level th reads as fine grain virtual processors are identified. Key techniques used are tight integration and specialization of scheduling, communica tion, optimized context switching, and fine-tuned stack management. A prototype of this runtime approach is evaluated by comparing implement ations of three problems, a smoothing kernel of a thin-layer Navier-St okes code, a five point stencil problem, and a block bordered system o f linear equations on an Intel Paragon multiprocessor and on a network of DEC Alpha workstations. The additional cost relative to an efficie nt manually contracted code can be as low as 15% for granularities of 50 floating point operations per virtual processor and is typically 5- 20% for granularities of about 100 floating point operations per virtu al processor. The overhead is analyzed in detail to show the costs of scheduling, communication, context switching, reduced memory performan ce, and insuring data consistency. The implementation and analysis ind icate that fine grain code can be efficiently executed on a coarse gra in multiprocessor using very lightweight, specialized threads. (C) 199 7 Academic Press.