Ls. Blackford et al., PRACTICAL EXPERIENCE IN THE NUMERICAL DANGERS OF HETEROGENEOUS COMPUTING, ACM transactions on mathematical software, 23(2), 1997, pp. 133-147
Special challenges exist in writing reliable numerical library softwar
e for heterogeneous computing environments. Although a lot of software
for distributed-memory parallel computers has been written, porting t
his software to a network of workstations requires careful considerati
on. The symptoms of heterogeneous computing failures can range from er
roneous results without warning to deadlock. Some of the problems are
straightforward to solve, but for others the solutions are not so obvi
ous, or incur an unacceptable overhead. Making software robust on hete
rogeneous systems often requires additional communication. We describe
and illustrate the problems encountered during the development of Sca
LAPACK and the NAG Numerical PVM Library. Where possible, we suggest w
ays to avoid potential pitfalls, or if that is not possible, we recomm
end that the software not be used on heterogeneous networks.