This paper describes fault tolerance features of a multiprocessor syst
em called SPAX (Scalable parallel Architecture based on X-bar network)
. It aims at a cost-effective reliable multiprocessor system for both
scientific and business applications. The system can be composed of up
to sixteen clusters. Each cluster consists of eight nodes which can b
e any combination of processing nodes, input/output nodes and communic
ation nodes. The system is designed to eliminate potential single-poin
ts of failures such as loss of a processor, loss of a network, or loss
of a disk drive. Xcent-Net, which is a duplicated hierarchical crossb
ar interconnection network built into the system, supports dual paths
to every node with high bandwidth and with low latency. Each node is d
esigned to support multi-level fault tolerance enabling a user to choo
se the level of fault tolerance with a possible resource or performanc
e penalty. The system has been implemented at ETRI. (C) 1997 Elsevier
Science B.V.