Pb. Vasconcelos et Fd. Dalmeida, COLUMNWISE BLOCK LU FACTORIZATION USING BLAS KERNELS ON VAX 6520 2VP/, Computing systems in engineering, 6(4-5), 1995, pp. 423-429
The LU factorization of a matrix A is a widely used algorithm, for ins
tance in the solution of linear systems Ax = b. The increasing capacit
ies of high performance computers allow us to use direct methods for s
ystems of large and dense matrices. To build portable and efficient LU
codes for vector and parallel computers, this method is rewritten in
block versions and BLAS (Basic Linear Algebra Subprograms) kernels are
used to mask the architectural details and allow good performance of
codes such as the LAPACK (Linear Algebra PACKage) library. In the refe
rences it was proved that this strategy leads to portability and effic
iency of codes using tuned BLAS kernels. After a short description of
the block versions we will present some results obtained on the VAX 65
20/2VP, comparing the block algorithm versus point algorithm, and vect
orized versions versus scalar versions. The three columnwise versions
of the block algorithm showed similar performance for this computer an
d large matrix dimensions. The block size used is a crucial parameter
for these algorithms and the results show that the best performance is
obtained with block size 64 (for large matrices) which is the vector
registered size of the machine used.