A FAULT-TOLERANT HIERARCHICAL DIAGNOSTIC NETWORK FOR MASSIVELY-PARALLEL PROCESSING SYSTEMS

Authors
Citation
Yh. Choi et Ys. Kim, A FAULT-TOLERANT HIERARCHICAL DIAGNOSTIC NETWORK FOR MASSIVELY-PARALLEL PROCESSING SYSTEMS, Computers & electrical engineering, 24(5), 1998, pp. 349-361
Citations number
6
Categorie Soggetti
Computer Science Interdisciplinary Applications","Computer Science Hardware & Architecture","Computer Science Interdisciplinary Applications","Engineering, Eletrical & Electronic
ISSN journal
00457906
Volume
24
Issue
5
Year of publication
1998
Pages
349 - 361
Database
ISI
SICI code
0045-7906(1998)24:5<349:AFHDNF>2.0.ZU;2-R
Abstract
Massively parallel processing systems consist of a large number of pro cessing nodes to provide high performance primarily for data-intensive applications. In a system of such dimensions, high availability canno t be achieved without relying on redundancy and reconfiguration. An im portant aspect of highly available design is rapid diagnosis and grace ful degradation in the event of failures. This paper presents a hierar chical diagnostic network for locating faults in parallel processor sy stems comprised of a large number of identical processing nodes. In th e case of a single fault, the network can locate the fault at the time it is detected. Even in the case of multiple faults, it can significa ntly reduce the test time as compared to the well-known binary search. Unlike the existing self-diagnostic circuits, the diagnostic network requires small hardware overhead and may tolerate a fault in the netwo rk itself. (C) 1998 Elsevier Science Ltd. All rights reserved.