S. Toledo, LOCALITY OF REFERENCE IN LU DECOMPOSITION WITH PARTIAL PIVOTING, SIAM journal on matrix analysis and applications, 18(4), 1997, pp. 1065-1081
This paper presents a new partitioned algorithm for LU decomposition w
ith partial pivoting, The new algorithm, called the recursively partit
ioned algorithm, is based on a recursive partitioning of the matrix. T
he paper analyzes the locality of reference in the new algorithm and t
he locality of reference in a known and widely used partitioned algori
thm for LU decomposition called the right-looking algorithm, The analy
sis reveals that the new algorithm performs a factor of Theta(root M/n
) fewer I/O operations (or cache misses) than the right-looking algori
thm, where n is the order of the matrix and M is the size of primary m
emory. The analysis also determines the optimal block size for the rig
ht-looking algorithm. Experimental comparisons between the new algorit
hm and the right-looking algorithm show that an implementation of the
new algorithm outperforms a similarly coded right-looking algorithm on
six different RISC architectures, that the new algorithm performs few
er cache misses than any other algorithm tested, and that it benefits
more from Strassen's matrix-multiplication algorithm.