ITA
ENG

Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications

Authors

He, Y Ding, CHQ

Citation

Y. He et Chq. Ding, Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications, J SUPERCOMP, 18(3), 2001, pp. 259-277

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

JOURNAL OF SUPERCOMPUTING

ISSN journal

09208542 → ACNP

Volume

Issue

Year of publication

2001

Pages

259 - 277

Database

ISI

SICI code

0920-8542(200103)18:3<259:UAATIN>2.0.ZU;2-S

Abstract

Numerical reproducibility and stability of large scale scientific simulatio ns, especially climate modeling, on distributed memory parallel computers a re becoming critical issues. In particular, global summation of distributed arrays is most susceptible to rounding errors, and their propagation and a ccumulation cause uncertainty in final simulation results. We analyzed seve ral accurate summation methods and found that two methods are particularly effective to improve (ensure) reproducibility and stability: Kahan's self-c ompensated summation and Bailey's double-double precision summation. We pro vide an MPI operator MPI_SUMDD to work with MPI collective operations to en sure a scalable implementation on large number of processors. The final met hods are particularly simple to adopt in practical codes: not only global s ummations, but also vector-vector dot products and matrix-vector or matrix- matrix operations.