As important as the advance of computer technology is the development
of suitable software for the exploitation of computer capacity. In thi
s sense LAPACK (Anderson et al., LAPACK User's Guide, release 1.0, SIA
M, Philadelphia, 1992) appears as the most efficient library in the de
nse linear algebra field, obtaining good results on vector and paralle
l computers, and in general, on hierarchical memory machines. However,
some deficiencies in the general matrix LU decomposition were detecte
d in DGETRF subroutine. On hierarchical memory machines not only the c
ache and TLB faults have to be minimized, but also the page faults, wh
ich cause excessive I/O operations. The blocking strategy used by DGET
RF (and LAPACK in general) makes good use of the cache memory, but doe
s not seem to be enough to avoid unnecessary I/O operations. Therefore
, DGETRF does not provide satisfactory run times for large dimension m
atrices. In this paper a new code using a double blocking strategy wil
l be described, which attains better run times than DGETRF. Copyright
(C) 1996.