The conjugate gradient squared (CGS) algorithm is a Krylov subspace algorit
hm that can be used to obtain fast solutions for linear systems (Ax=b) with
complex nonsymmetric, very large, and very sparse coefficient matrices (A)
. By considering electromagnetic scattering problems as examples, a study o
f the performance and scalability of this algorithm on two MIMD machines is
presented. A modified CGS (MCGS) algorithm, where the synchronization over
head is effectively reduced by a factor of two, is proposed in this paper.
This is achieved by changing the computation sequence in the CGS algorithm.
Both experimental and theoretical analyses are performed to investigate th
e impact of this modification on the overall execution time. From the theor
etical and experimental analysis it is found that CGS is faster than MCGS f
or smaller number of processors and MCGS outperforms CGS as the number of p
rocessors increases. Based on this observation, a set of algorithms approac
h is proposed, where either CGS or MGS is selected depending on the values
of the dimension of the A matrix (N) and number of processors (P). The set
approach provides an algorithm that is more scalable than either the CGS or
MCGS algorithms. The experiments performed on a 128-processor mesh Intel P
aragon and on a 16-processor IBM SP2 with multistage network indicate that
MCGS is approximately 20% faster than CGS.