We present an analytical model of a parallel computing system. Since the pr
obability of fault occurrence is non-negligible, the model takes into consi
deration fault-tolerance issues, by combining results obtained from a perfo
rmance model with a fault/repair model. To this purpose, the system perform
ance must be evaluated under several different configurations, caused by th
e occurrence of faults and repairs. This requires efficient solution techni
ques of the performance model. The model we adopt is based on an extended q
ueueing network. The queueing network includes a fork/join subnetwork with
finite capacity, and three different blocking models to manage saturation c
ondition: blocking before service (BBS), Repetitive Service or Blocking Aft
er Service. We prove that the underlying Markov process has a particular st
ructure suitable for efficient solution.
To show a possible use of such a model, we present numerical results for a
particular maintenance policy, looking for the optimal trade-off between th
e frequency of service interruption due to repair operations and the need o
f avoiding excessive performance degradation. (C)1999 Elsevier Science B.V.
All rights reserved.