I/O in computer systems is prone to become a bottleneck. This is a par
ticular severe problem in highly parallel machines where some applicat
ions are fully I/O bound if only one or few conventional I/O paths exi
st. Similar to the use of multiprocessor technology for increasing pro
cessing performance, disk I/O performance can be substantially improve
d by employing parallel I/O schemes. Based on a distributed I/O archit
ecture for parallel computers, we propose to use disk caches on severa
l architectural levels, and confirm this by simulations of various str
uctural options. In this paper, we describe the cache modelling approa
ch and the I/O load model which has been derived From transaction-proc
essing and general-purpose applications. Then we discuss the results f
or caches on single and multiple architecture levels. Large caches on
I/O processors in combination with small caches on processing elements
turn out to be the preferable structure. In addition, hardware caches
can be employed at disk level for further performance improvement. Fo
r write operations, a delayed write strategy is shown to be superior t
o other modes.