COMPILER BLOCKABILITY OF DENSE MATRIX FACTORIZATIONS

Authors
Citation
S. Carr et Rb. Lehoucq, COMPILER BLOCKABILITY OF DENSE MATRIX FACTORIZATIONS, ACM transactions on mathematical software, 23(3), 1997, pp. 336-361
Citations number
38
ISSN journal
00983500
Volume
23
Issue
3
Year of publication
1997
Pages
336 - 361
Database
ISI
SICI code
0098-3500(1997)23:3<336:CBODMF>2.0.ZU;2-E
Abstract
The goal of the LAPACK project is to provide efficient and portable so ftware for dense numerical linear algebra computations. By recasting m any of the fundamental dense matrix computations in terms of calls to an efficient implementation of the BLAS (Basic Linear Algebra Subprogr ams), the LAPACK project has, in large part, achieved its goal. Unfort unately, the efficient implementation of the BLAS results often in mac hine-specific code that is not portable across multiple architectures without a significant loss in performance or a significant effort to r eoptimize them. This article examines whether most of the hand optimiz ations performed on matrix factorization codes are unnecessary because they can (and should) be performed by the compiler. We believe that i t is better for the programmer to express algorithms in a machine-inde pendent form and allow the compiler to handle the machine-dependent de tails. This gives the algorithms portability across architectures and removes the error-prone, expensive, and tedious process of hand optimi zation. Although there currently exist no production compilers that ca n perform all the loop transformations discussed in this article, a de scription of current research in compiler technology is provided that will prove beneficial to the numerical linear algebra community. We sh ow that the Cholesky and optimized automatically by a compiler to be a s efficient as the same hand-optimized version found in LAPACK. We als o show that the QR factorization may be optimized by the compiler to p erform comparably with the hand-optimized LAPACK version on modest mat rix sizes. Our approach allows us to conclude that with the advent of the compiler optimizations discussed in this article, matrix factoriza tions may be efficiently implemented in a BLAS-less form.