MODELS AND SCHEDULING ALGORITHMS FOR MIXED DATA AND TASK PARALLEL PROGRAMS

Citation
S. Chakrabarti et al., MODELS AND SCHEDULING ALGORITHMS FOR MIXED DATA AND TASK PARALLEL PROGRAMS, Journal of parallel and distributed computing, 47(2), 1997, pp. 168-184
Citations number
44
ISSN journal
07437315
Volume
47
Issue
2
Year of publication
1997
Pages
168 - 184
Database
ISI
SICI code
0743-7315(1997)47:2<168:MASAFM>2.0.ZU;2-Z
Abstract
An increasing number of scientific programs exhibit two forms of paral lelism, often in a nested fashion, at the outer level, the application comprises coarse-grained task parallelism, with dependencies between tasks reflected by an acyclic graph, At the inner level, each node of the graph is a data-parallel operation on arrays, Designers of languag es, compilers, and runtime systems are building mechanisms to support such applications by providing processor groups and array remapping ca pabilities. In this paper we explore how to supplement these mechanism s with policy, What properties of an application, its data size, and t he parallel machine determine the maximum potential gains from using b oth kinds of parallelism? It turns out that large gains can he expecte d only for specific task graph structures. For such applications, what are practical and effective ways to allocate processors to the modes of the task graph? In principle one could solve the NP-complete proble m of finding the best possible allocation of arbitrary processor subse ts to nodes ha the task graph, Instead of this, our analysis and simul ations show that a simple switched scheduling paradigm, which alternat es between pore task and pure data parallelism, provides nearly optima l performance for the task graphs considered here, Furthermore, our sc heme is much simpler to implement, has less overhead than the optimal allocation, and would he attractive even if the optimal allocation was free to compute. To evaluate switching in real applications, we imple mented a switching task scheduler in the parallel numerical library Sc aLAPACK and used it in a nonsymmetric eigenvalue program, Even far fai rly large input sizer;, the efficiency improves by factors of 1.5 on t he Intel Paragon and 2.5 on the IBM SP-2. The remapping and scheduling overhead is negligible, between 0.5 and 5%. (C) 1997 Academic Press.