This companion article discusses portability and optimization issues of the
GEMM-based level 3 BLAS model implementations and the performance evaluati
on benchmark. All software comes in all four data types (single- and double
-precision, real and complex) and are designed to be easy to implement and
use on different platforms. Each of the GEMM-based routines has a few machi
ne-dependent parameters that specify internal block. sizes, cache character
istics, and branch points for alternative code sections. These parameters p
rovide means for adjustment to the characteristics of a memory hierarchy.