ITA
ENG

COMPILER BLOCKABILITY OF DENSE MATRIX FACTORIZATIONS

Authors

CARR S LEHOUCQ RB

Citation

S. Carr et Rb. Lehoucq, COMPILER BLOCKABILITY OF DENSE MATRIX FACTORIZATIONS, ACM transactions on mathematical software, 23(3), 1997, pp. 336-361

Citations number

Journal title

ACM transactions on mathematical software → ACNP

ISSN journal

00983500

Volume

Issue

Year of publication

1997

Pages

336 - 361

Database

ISI

SICI code

0098-3500(1997)23:3<336:CBODMF>2.0.ZU;2-E

Abstract

The goal of the LAPACK project is to provide efficient and portable so ftware for dense numerical linear algebra computations. By recasting m any of the fundamental dense matrix computations in terms of calls to an efficient implementation of the BLAS (Basic Linear Algebra Subprogr ams), the LAPACK project has, in large part, achieved its goal. Unfort unately, the efficient implementation of the BLAS results often in mac hine-specific code that is not portable across multiple architectures without a significant loss in performance or a significant effort to r eoptimize them. This article examines whether most of the hand optimiz ations performed on matrix factorization codes are unnecessary because they can (and should) be performed by the compiler. We believe that i t is better for the programmer to express algorithms in a machine-inde pendent form and allow the compiler to handle the machine-dependent de tails. This gives the algorithms portability across architectures and removes the error-prone, expensive, and tedious process of hand optimi zation. Although there currently exist no production compilers that ca n perform all the loop transformations discussed in this article, a de scription of current research in compiler technology is provided that will prove beneficial to the numerical linear algebra community. We sh ow that the Cholesky and optimized automatically by a compiler to be a s efficient as the same hand-optimized version found in LAPACK. We als o show that the QR factorization may be optimized by the compiler to p erform comparably with the hand-optimized LAPACK version on modest mat rix sizes. Our approach allows us to conclude that with the advent of the compiler optimizations discussed in this article, matrix factoriza tions may be efficiently implemented in a BLAS-less form.