The goal of the LAPACK project is to provide efficient and portable so
ftware for dense numerical linear algebra computations. By recasting m
any of the fundamental dense matrix computations in terms of calls to
an efficient implementation of the BLAS (Basic Linear Algebra Subprogr
ams), the LAPACK project has, in large part, achieved its goal. Unfort
unately, the efficient implementation of the BLAS results often in mac
hine-specific code that is not portable across multiple architectures
without a significant loss in performance or a significant effort to r
eoptimize them. This article examines whether most of the hand optimiz
ations performed on matrix factorization codes are unnecessary because
they can (and should) be performed by the compiler. We believe that i
t is better for the programmer to express algorithms in a machine-inde
pendent form and allow the compiler to handle the machine-dependent de
tails. This gives the algorithms portability across architectures and
removes the error-prone, expensive, and tedious process of hand optimi
zation. Although there currently exist no production compilers that ca
n perform all the loop transformations discussed in this article, a de
scription of current research in compiler technology is provided that
will prove beneficial to the numerical linear algebra community. We sh
ow that the Cholesky and optimized automatically by a compiler to be a
s efficient as the same hand-optimized version found in LAPACK. We als
o show that the QR factorization may be optimized by the compiler to p
erform comparably with the hand-optimized LAPACK version on modest mat
rix sizes. Our approach allows us to conclude that with the advent of
the compiler optimizations discussed in this article, matrix factoriza
tions may be efficiently implemented in a BLAS-less form.