Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications

Authors
Citation
Y. He et Chq. Ding, Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications, J SUPERCOMP, 18(3), 2001, pp. 259-277
Citations number
26
Categorie Soggetti
Computer Science & Engineering
Journal title
JOURNAL OF SUPERCOMPUTING
ISSN journal
09208542 → ACNP
Volume
18
Issue
3
Year of publication
2001
Pages
259 - 277
Database
ISI
SICI code
0920-8542(200103)18:3<259:UAATIN>2.0.ZU;2-S
Abstract
Numerical reproducibility and stability of large scale scientific simulatio ns, especially climate modeling, on distributed memory parallel computers a re becoming critical issues. In particular, global summation of distributed arrays is most susceptible to rounding errors, and their propagation and a ccumulation cause uncertainty in final simulation results. We analyzed seve ral accurate summation methods and found that two methods are particularly effective to improve (ensure) reproducibility and stability: Kahan's self-c ompensated summation and Bailey's double-double precision summation. We pro vide an MPI operator MPI_SUMDD to work with MPI collective operations to en sure a scalable implementation on large number of processors. The final met hods are particularly simple to adopt in practical codes: not only global s ummations, but also vector-vector dot products and matrix-vector or matrix- matrix operations.