Sw. Song et Re. Baddour, PARALLEL-PROCESSING FOR BOUNDARY-ELEMENT COMPUTATIONS ON DISTRIBUTED SYSTEMS, Engineering analysis with boundary elements, 19(1), 1997, pp. 73-84
A parallel boundary integral algorithm for solving boundary value prob
lems on distributed memory computer systems is presented in this paper
. The paper focuses on parallelizations of the influence coefficients
matrix generation and the solution of the resulting linear system, whi
ch are the two main parts of boundary integral formulations. The distr
ibuted parallel boundary integral algorithm presented in this paper ge
nerates a part of the influence coefficients matrix on each computatio
n node of a multicomputer platform and stores that part in the local m
emory. The distributed influence coefficients matrix is then used in i
ts partitioned form by the parallel linear system solver to obtain the
solution. The matrix of coefficients is large, dense and nonsymmetric
. Three parallel linear system solvers are presented. Among them, Algo
rithm-1 and Algorithm-2 are based on the conjugate-gradient-squared (C
GS) method, Algorithm-3 is based on a direct method. It is found that
distributed memory parallel algorithms are very much machine dependent
. The performance of a parallel linear system solver depends not only
on the method of solution but also on the multicomputer network topolo
gy, and its ratio of node computation speed to network data transfer b
andwidth. By selecting the algorithm which fits the characteristics of
a distributed computer system hardware one can achieve relative maxim
um performance. The performance characteristics of distributed linear
system solvers are studied in order to select an optimum method for th
e available hardware at minimum development cost. The performance of s
equential linear system solvers which have been extensively documented
in the literature are not applicable to distributed memory systems. T
his paper presents results and discussions on the selection of simple
robust algorithms, easily implemented on specific hardware topologies
and provides information for extending the selected methods to differe
nt parallel machines. The results can also serve as a reference for pa
rallel computer benchmarking. (C) 1997 Elsevier Science Ltd.