A reconfigurable fault tolerant system achieves the attributes of depe
ndability of operations through fault detection, fault isolation and r
econfiguration, typically referred to as the FDIR paradigm. Fault diag
nosis is a key component of this approach, requiring an accurate deter
mination of the health and state of the system. An imprecise state ass
essment can lead to catastrophic failure due to an optimistic diagnosi
s, or conversely, result in underutilization of resources because of a
pessimistic diagnosis. Differing from classical testing and other off
-line diagnostic approaches, we develop procedures for maximal utiliza
tion of the system state information to provide for continual, on-line
diagnosis and reconfiguration capabilities as an integral part of the
system operations. Our diagnosis approach, unlike existing techniques
, does not require administered testing to gather syndrome information
but is based on monitoring the system message traffic among redundant
system functions. We present comprehensive on-line diagnosis algorith
ms capable of handling a continuum of faults of varying severity at th
e node and link level. Not only are the proposed algorithms on-line in
nature, but are themselves tolerant to faults in the diagnostic proce
ss. Formal analysis is presented for all proposed algorithms. These pr
oofs offer both insight into the algorithm operations and facilitate a
rigorous formal verification of the developed algorithms.