Yh. Choi et Ys. Kim, A FAULT-TOLERANT HIERARCHICAL DIAGNOSTIC NETWORK FOR MASSIVELY-PARALLEL PROCESSING SYSTEMS, Computers & electrical engineering, 24(5), 1998, pp. 349-361
Massively parallel processing systems consist of a large number of pro
cessing nodes to provide high performance primarily for data-intensive
applications. In a system of such dimensions, high availability canno
t be achieved without relying on redundancy and reconfiguration. An im
portant aspect of highly available design is rapid diagnosis and grace
ful degradation in the event of failures. This paper presents a hierar
chical diagnostic network for locating faults in parallel processor sy
stems comprised of a large number of identical processing nodes. In th
e case of a single fault, the network can locate the fault at the time
it is detected. Even in the case of multiple faults, it can significa
ntly reduce the test time as compared to the well-known binary search.
Unlike the existing self-diagnostic circuits, the diagnostic network
requires small hardware overhead and may tolerate a fault in the netwo
rk itself. (C) 1998 Elsevier Science Ltd. All rights reserved.