EXPLICIT PARALLEL BLOCK CHOLESKY ALGORITHMS ON THE CRAY APP

Authors
Citation
M. Nool, EXPLICIT PARALLEL BLOCK CHOLESKY ALGORITHMS ON THE CRAY APP, Applied numerical mathematics, 19(1-2), 1995, pp. 91-114
Citations number
13
Categorie Soggetti
Mathematics,Mathematics
ISSN journal
01689274
Volume
19
Issue
1-2
Year of publication
1995
Pages
91 - 114
Database
ISI
SICI code
0168-9274(1995)19:1-2<91:EPBCAO>2.0.ZU;2-9
Abstract
In this paper we consider the CRAY APP, the Attached Parallel Processo r of the CRAY S-MP, which consists of seven buses with each bus suppor ting up to 12 processing elements. Processing elements on different bu ses can communicate simultaneously with the shared main memory, but pr ocessing elements sharing the Same bus can not, since only one process ing element per bus can access memory at a given time. Applications wi th a high level of data reuse, or, with a high computation intensity, and applications being highly parallel are very suitable to run on the APP. An example of such an algorithm is matrix-matrix multiplication. We illustrate how the data traffic's restriction influences the perfo rmance and we discuss a performance model of the bus architecture, con sidering a change in processor speed, data traffic speed and cache con tents. Furthermore, two different algorithms for Cholesky factorizatio n are discussed: a block left-looking algorithm and a block right-look ing algorithm. The maximum achievable speed on the GRAY APP is mainly determined by the performance of the matrix-matrix multiplication. Par allelism is applied explicitly over the blocks, which makes it possibl e to concatenate different block operations in cache. The results obta ined on CWI's APP (a machine having twenty-eight processing elements) indicate how block algorithms can be parallelized on machines with hun dreds or thousands of processors.