We develop a performance model for Krylov subspace methods implemented
on distributed memory parallel computers for which the underlying com
munication network is a two-dimensional mesh. The model is based on th
e runtime of a single iteration or cycle of iterations (for methods li
ke GMRES(m)), because the iteration count is problem dependent. Moreov
er, we intend to use the model only for parallel implementations that
do not change the mathematical properties of the method (significantly
). The main purpose of this model is a qualitative analysis of the per
formance; the model is not meant for very accurate predictions. We exp
ress the efficiency, speed-up, and runtime as functions of the number
of processors scaled by the number of processors that gives the minima
l runtime for the given problem size (P-max). This provides a natural
way to analyze the performance characteristics for the range of the nu
mbers of processors that can be used effectively. The approach is part
icularly interesting because it turns out that the performance is char
acterized completely by the sequential runtime and P-max. The efficien
cy as a function of the number of processors relative to P-max is inde
pendent of the problem size and parameters describing the machine and
solution method. Analogous relations can be obtained for the speed-up
and runtime. P-max itself, of course, depends on N and the other param
eters, and a simple equation for P-max is given. The performance model
is also used to evaluate the improvements in the performance if we re
duce the communication as described in [7,9,8]. Although the scope of
the performance model is limited by assumptions on the load balance an
d the processor grid, there are several obvious generalizations. One i
mportant and straightforward generalization is to higher dimensional m
eshes. We will discuss such generalizations at the end of this article
.