B. Kagstrom et al., GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark, ACM T MATH, 24(3), 1998, pp. 268-302
The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform
various matrix multiply and triangular system solving computations. Due to
the complex hardware organization of advanced computer architectures the d
evelopment of optimal level 3 BLAS code is costly and time consuming. Howev
er, it is possible to develop a portable and high-performance level 3 BLAS
library mainly relying on a highly optimized GEMM, the routine for the gene
ral matrix multiply and add operation. With suitable partitioning all the o
ther level 3 BLAS can be defined in terms of GEMM and a small amount of lev
el 1 and level 2 computations. Our contribution is twofold. First, the mode
l implementations in Fortran 77 of the GEMM-based level 3 BLAS are structur
ed to reduce effectively data traffic in a memory hierarchy. Second, the GE
MM-based level 3 BLAS performance evaluation benchmark. is a tool for evalu
ating and comparing different implementations of the level 3 BLAS with the
GEMM-based model implementations.