ITA
ENG

LESSONS FROM FTM - AN EXPERIMENT IN THE DESIGN AND IMPLEMENTATION OF A LOW-COST FAULT-TOLERANT SYSTEM

Authors

MULLER G BANATRE M PEYROUZE N ROCHAT B

Citation

G. Muller et al., LESSONS FROM FTM - AN EXPERIMENT IN THE DESIGN AND IMPLEMENTATION OF A LOW-COST FAULT-TOLERANT SYSTEM, IEEE transactions on reliability, 45(2), 1996, pp. 332-340

Citations number

Categorie Soggetti

Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture","Computer Science Software Graphycs Programming

Journal title

IEEE transactions on reliability → ACNP

ISSN journal

00189529

Volume

Issue

Year of publication

1996

Pages

332 - 340

Database

ISI

SICI code

0018-9529(1996)45:2<332:LFF-AE>2.0.ZU;2-Y

Abstract

This paper describes an experiment in the design of a general purpose fault tolerant system, FTM. The main objective of the FTM design was t o implement a low-cost fault-tolerant system that could be used on sta ndard workstations, At the operating system level, our goal was to off er fault-tolerance transparency to user applications, In other words, porting an application to FTM need only require compiling the source c ode without having to modify it, These objectives were achieved using the Mach micro-kernel and a modular set of reliable servers which impl ement application checkpoints and provide continuous system functions despite machine crashes. At the architectural level, our approach reli es on a high-performance stable storage implementation, called Stable Transactional Memory (STM), which can be implemented either by hardwar e or software, We first motivate our design choices, then we detail th e FTM implementation at both architectural and operating system level. We discuss the reasons for the evolution of our stable memory technol ogy from hardware to software; We evaluate the performance of the FTM prototype, We conclude with lessons learned and give some assessments.