Aj. Lewis et Ad. Brent, A COMPARISON OF COARSE AND FINE-GRAIN PARALLELIZATION STRATEGIES FOR THE SIMPLE PRESSURE CORRECTION ALGORITHM, International journal for numerical methods in fluids, 16(10), 1993, pp. 891-914
The primary aim of this work was to determine the simplest and most ef
fective parallelization strategy for control-volume-based codes solvin
g industrial problems. It has been found that for certain classes of p
roblems, the coarse-grain functional decomposition strategy, largely i
gnored due to its limited scaling capability, offers the potential for
significant execution speed-ups while maintaining the inherent struct
ure of traditional serial algorithms. Functional decomposition require
s only minor modification of the existing serial code to implement and
, hence, code portability across both concurrent and serial computers
is maintained, Fine-grain parallelization strategies at the 'DO loop'
level are also easy to implement and largely preserve code portability
. Both coarse-grain functional decomposition and fine-grain loop-level
parallelization strategies for the SIMPLE pressure correction algorit
hm are demonstrated on a Silicon Graphics 4D280S eight CPU shared memo
ry computer system for a highly coupled, transient two-dimensional sim
ulation involving melting of a metal in the presence of thermal-buoyan
cy-driven laminar convection. Problems requiring the solution of a lar
ger number of transport equations were simulated by including further
scalar variables in the calculation. While resulting in slight degrada
tion of the convergence rate, the functional decomposition strategy ex
hibited higher parallel efficiencies and yielded greater speed-ups rel
ative to the original serial code. Initially, this strategy showed a s
ignificant degradation in convergence rate due to an inconsistency in
the parallel solution of the pressure correction equation. After corre
cting for this inconsistency, the maximum speed-up for 16 dependent va
riables was a factor of 5.28 with eight processors, representing a par
allel efficiency of 67%. Peak efficiency of 76% was achieved using fiv
e processors to solve for 10 dependent variables.