D. Kramer et al., LOCAL BASIC LINEAR ALGEBRA SUBROUTINES (LBLAS) FOR THE CM-5 5E/, The international journal of supercomputer applications and high performance computing, 10(4), 1996, pp. 300-335
The Connection Machine Scientific Software Library (CMSSL) is a librar
y of scientific routines designed for distributed memory architectures
, The basic linear algebra subroutines (BLAS) of the CMSSL have been i
mplemented as a two-level structure to exploit optimizations local to
nodes and across nodes. This paper presents the implementation conside
rations and performance of the local BLAS, or BLAS local to each node
of the system. A wide variety of loop structures and unrollings have b
een implemented in order to achieve a uniform and high performance, ir
respective of the data layout in node memory. The CMSSL is the only ex
isting high performance library capable of supporting both the data pa
rallel and message-passing modes of programming a distributed memory c
omputer. The implications of implementing BLAS on distributed memory c
omputers are considered in this light.