As multiprocessor systems become more complex, their reliability will need
to increase as well. In this paper we propose a novel technique which is ap
plicable to a wide variety of distributed real-time systems, especially tho
se exhibiting data parallelism. System-level fault tolerance involves relia
bility techniques incorporated within the system hardware and software wher
eas application-level fault tolerance involves reliability techniques incor
porated within the application software. We assert that, for high reliabili
ty, a combination of system-level fault tolerance and application-level fau
lt tolerance works best. In many systems, application-level fault tolerance
can be used to bridge the gap when system-level fault tolerance alone does
not provide the required reliability. We exemplify this with the RTHT targ
et tracking benchmark and the ABF beamforming benchmark.