Ba. Hendrickson et De. Womble, THE TORUS-WRAP MAPPING FOR DENSE MATRIX CALCULATIONS ON MASSIVELY-PARALLEL COMPUTERS, SIAM journal on scientific computing, 15(5), 1994, pp. 1201-1226
Dense linear systems of equations are quite common in science and engi
neering, arising in boundary element methods, least squares problems,
and other settings. Massively parallel computers will be necessary to
solve the large systems required by scientists and engineers, and scal
able parallel algorithms for the linear algebra applications must be d
evised for these machines. A critical step in these algorithms is the
mapping of matrix elements to processors. In this paper, the use of th
e torus-wrap mapping in general dense matrix algorithms is studied fro
m both theoretical and practical viewpoints. Under reasonable assumpti
ons, it is proved that this assignment scheme leads to dense matrix al
gorithms that achieve (to within a constant factor) the lower bound on
interprocessor communication. It is also shown that the torus-wrap ma
pping allows algorithms to exhibit less idle time, better load balanci
ng, and less memory overhead than the more common row and column mappi
ngs. Finally, practical implementation issues are discussed, such as c
ompatibility with basic linear algebra subprograms (BLAS) levels 1, 2,
and 3, and the results of implementations of several dense matrix alg
orithms are presented. These theoretical and experimental results are
compared with those obtained from more traditional mappings.