It is widely acknowledged in high-performance computing circles that p
arallel input/output needs substantial improvement in order to make sc
alable computers truly usable. We present a data storage model that al
lows processors independent access to their own data and a correspondi
ng compilation strategy that integrates data-parallel computation with
data distribution for out-of-core problems. Our results compare sever
al communication methods and I/O optimizations using two out-of-core p
roblems, Jacobi iteration and LU factorization.