A PERFORMANCE-MODEL FOR KRYLOV SUBSPACE METHODS ON MESH-BASED PARALLEL COMPUTERS

Authors
Citation
E. Desturler, A PERFORMANCE-MODEL FOR KRYLOV SUBSPACE METHODS ON MESH-BASED PARALLEL COMPUTERS, Parallel computing, 22(1), 1996, pp. 57-74
Citations number
21
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
Journal title
ISSN journal
01678191
Volume
22
Issue
1
Year of publication
1996
Pages
57 - 74
Database
ISI
SICI code
0167-8191(1996)22:1<57:APFKSM>2.0.ZU;2-F
Abstract
We develop a performance model for Krylov subspace methods implemented on distributed memory parallel computers for which the underlying com munication network is a two-dimensional mesh. The model is based on th e runtime of a single iteration or cycle of iterations (for methods li ke GMRES(m)), because the iteration count is problem dependent. Moreov er, we intend to use the model only for parallel implementations that do not change the mathematical properties of the method (significantly ). The main purpose of this model is a qualitative analysis of the per formance; the model is not meant for very accurate predictions. We exp ress the efficiency, speed-up, and runtime as functions of the number of processors scaled by the number of processors that gives the minima l runtime for the given problem size (P-max). This provides a natural way to analyze the performance characteristics for the range of the nu mbers of processors that can be used effectively. The approach is part icularly interesting because it turns out that the performance is char acterized completely by the sequential runtime and P-max. The efficien cy as a function of the number of processors relative to P-max is inde pendent of the problem size and parameters describing the machine and solution method. Analogous relations can be obtained for the speed-up and runtime. P-max itself, of course, depends on N and the other param eters, and a simple equation for P-max is given. The performance model is also used to evaluate the improvements in the performance if we re duce the communication as described in [7,9,8]. Although the scope of the performance model is limited by assumptions on the load balance an d the processor grid, there are several obvious generalizations. One i mportant and straightforward generalization is to higher dimensional m eshes. We will discuss such generalizations at the end of this article .