In many scientific computing problems, the overall execution time is d
ominated by the time to solve very large linear systems. Quite often,
the matrices are unsymmetric and ill conditioned, with an irregular sp
arsity structure reflecting the irregular refinement in the discretiza
tion grid. With increasing problem size and problem dimension, direct
solvers cannot be used because of the huge memory requirements. The pe
rformance of preconditioned iterative solvers is largely dominated by
memory-related aspects like size, bandwidth, and indirect addressing s
peed. This article summarizes the experience of the authors on the rel
ationship between memory aspects and performance in real applications
in the domain of very large scale integration (VLSI) device simulation
. The authors analyze storage requirements of direct and iterative sol
vers on a statistical data set, and demonstrate performance variations
due to memory-related architectural features on a number of computers
ranging from workstations to Cray. NEC, and Fujitsu supercomputers fo
r typical and ill-conditioned linear systems, using different iterativ
e methods and preconditioners. The experiments are done using PILS, a
package of iterative linear solvers. PILS implements a large number of
iterative methods and preconditioners and allows them to be combined
in a flexible way.