GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark

Citation
B. Kagstrom et al., GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark, ACM T MATH, 24(3), 1998, pp. 268-302
Citations number
26
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
ISSN journal
00983500 → ACNP
Volume
24
Issue
3
Year of publication
1998
Pages
268 - 302
Database
ISI
SICI code
0098-3500(199809)24:3<268:GL3BHM>2.0.ZU;2-O
Abstract
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the d evelopment of optimal level 3 BLAS code is costly and time consuming. Howev er, it is possible to develop a portable and high-performance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the gene ral matrix multiply and add operation. With suitable partitioning all the o ther level 3 BLAS can be defined in terms of GEMM and a small amount of lev el 1 and level 2 computations. Our contribution is twofold. First, the mode l implementations in Fortran 77 of the GEMM-based level 3 BLAS are structur ed to reduce effectively data traffic in a memory hierarchy. Second, the GE MM-based level 3 BLAS performance evaluation benchmark. is a tool for evalu ating and comparing different implementations of the level 3 BLAS with the GEMM-based model implementations.