ITA
ENG

A PARALLEL BLOCK IMPLEMENTATION OF LEVEL-3 BLAS FOR MIMD VECTOR PROCESSORS

Authors

DAYDE MJ DUFF IS PETITET A

Citation

Mj. Dayde et al., A PARALLEL BLOCK IMPLEMENTATION OF LEVEL-3 BLAS FOR MIMD VECTOR PROCESSORS, ACM transactions on mathematical software, 20(2), 1994, pp. 178-193

Citations number

Categorie Soggetti

Computer Sciences",Mathematics

Journal title

ACM transactions on mathematical software → ACNP

ISSN journal

00983500

Volume

Issue

Year of publication

1994

Pages

178 - 193

Database

ISI

SICI code

0098-3500(1994)20:2<178:APBIOL>2.0.ZU;2-R

Abstract

We describe an implementation of Level-3 BLAS (Basic Linear Algebra Su bprograms) based on the use of the matrix-matrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers provide at least an efficient serial version of GEMM so that our implementation can ca pture a significant percentage of the computer performance. A paramete r which controls the blocking allows an efficient exploitation of the memory hierarchy of the various target computers. Furthermore, this bl ocked version of Level-3 BLAS is naturally parallel. We present result s on the ALLIANT FX/80, the CONVEX C220, the CRAY-2, and the IBM 3090/ VF. For GEMM, we always use the manufacturer-supplied versions. For th e operations dealing with triangular blocks, we use assembler or tuned Fortran (using loop-unrolling) codes, depending on the efficiency of the available libraries.