IMPLEMENTATION AND PERFORMANCE ISSUES OF A MASSIVELY-PARALLEL ATMOSPHERIC MODEL

Citation
Sw. Hammond et al., IMPLEMENTATION AND PERFORMANCE ISSUES OF A MASSIVELY-PARALLEL ATMOSPHERIC MODEL, Parallel computing, 21(10), 1995, pp. 1593-1619
Citations number
22
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
Journal title
ISSN journal
01678191
Volume
21
Issue
10
Year of publication
1995
Pages
1593 - 1619
Database
ISI
SICI code
0167-8191(1995)21:10<1593:IAPIOA>2.0.ZU;2-Y
Abstract
We present implementation and performance issues of a data parallel ve rsion of the National Center for Atmospheric Research (NCAR) Community Climate Model (CCM2). We describe automatic conversion tools used to aid in converting a production code written for a traditional vector a rchitecture to data parallel code suitable for the Thinking Machines C orporation CM-5, Also, we describe the 3-D transposition method used t o parallelize the spherical harmonic transforms in CCM2. This method e mploys dynamic data mapping techniques to improve data locality and pa rallel efficiency of these computations. We present performance data f or the 3-D transposition method on the CM-5 for machine size up to 512 processors. We conclude that the parallel performance of the 3-D tran sposition method is adversely affected on the CM-5 by short vector len gths and array padding. We also find that the CM-5 spherical harmonic transforms spend about 70% of their execution time in communication. W e detail a transposition-based data parallel implementation of the sem i-Lagrangian Transport (SLT) algorithm used in CCM2. We analyze two ap proaches to parallelizing the SLT, called the departure point and arri val point based methods. We develop a performance model for choosing b etween these methods. We present SLT performance data which shows that the localized horizontal interpolation in the SLT takes 70% of the ti me, while the data remapping itself only require approximately 16%. We discuss the importance of scalable I/O to CCM2, and present the I/O r ates measured on the CM-5. We compare the performance of the data para llel version of CCM2 on a 32-processor CM-5 with the optimized vector code running on a single processor Gray Y-MP. We show that the CM-5 co de is 75% faster. We also give the overall performance of CCM2 running at higher resolutions on different numbers of CM-5 processors. We con clude by discussing the significance of these results and their implic ations for data parallel climate models.