This paper considers key ideas in the design of out-of-core dense LU f
actorization routines. A left-looking variant of the LU factorization
algorithm is shown to require less I/O to disk than the right-looking
variant, and is used to develop a parallel, out-of-core implementation
. This implementation makes use of a small library of parallel I/O rou
tines, together with ScaLAPACK and PBLAS routines. Results for runs on
an Intel Paragon are presented and interpreted using a simple perform
ance model.