An analytical model for a parallel fault-tolerant computing system

Citation
Vd. Persone et V. Grassi, An analytical model for a parallel fault-tolerant computing system, PERF EVAL, 38(3-4), 1999, pp. 201-218
Citations number
20
Categorie Soggetti
Computer Science & Engineering
Journal title
PERFORMANCE EVALUATION
ISSN journal
01665316 → ACNP
Volume
38
Issue
3-4
Year of publication
1999
Pages
201 - 218
Database
ISI
SICI code
0166-5316(199912)38:3-4<201:AAMFAP>2.0.ZU;2-K
Abstract
We present an analytical model of a parallel computing system. Since the pr obability of fault occurrence is non-negligible, the model takes into consi deration fault-tolerance issues, by combining results obtained from a perfo rmance model with a fault/repair model. To this purpose, the system perform ance must be evaluated under several different configurations, caused by th e occurrence of faults and repairs. This requires efficient solution techni ques of the performance model. The model we adopt is based on an extended q ueueing network. The queueing network includes a fork/join subnetwork with finite capacity, and three different blocking models to manage saturation c ondition: blocking before service (BBS), Repetitive Service or Blocking Aft er Service. We prove that the underlying Markov process has a particular st ructure suitable for efficient solution. To show a possible use of such a model, we present numerical results for a particular maintenance policy, looking for the optimal trade-off between th e frequency of service interruption due to repair operations and the need o f avoiding excessive performance degradation. (C)1999 Elsevier Science B.V. All rights reserved.