LESSONS FROM FTM - AN EXPERIMENT IN THE DESIGN AND IMPLEMENTATION OF A LOW-COST FAULT-TOLERANT SYSTEM

Citation
G. Muller et al., LESSONS FROM FTM - AN EXPERIMENT IN THE DESIGN AND IMPLEMENTATION OF A LOW-COST FAULT-TOLERANT SYSTEM, IEEE transactions on reliability, 45(2), 1996, pp. 332-340
Citations number
20
Categorie Soggetti
Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture","Computer Science Software Graphycs Programming
ISSN journal
00189529
Volume
45
Issue
2
Year of publication
1996
Pages
332 - 340
Database
ISI
SICI code
0018-9529(1996)45:2<332:LFF-AE>2.0.ZU;2-Y
Abstract
This paper describes an experiment in the design of a general purpose fault tolerant system, FTM. The main objective of the FTM design was t o implement a low-cost fault-tolerant system that could be used on sta ndard workstations, At the operating system level, our goal was to off er fault-tolerance transparency to user applications, In other words, porting an application to FTM need only require compiling the source c ode without having to modify it, These objectives were achieved using the Mach micro-kernel and a modular set of reliable servers which impl ement application checkpoints and provide continuous system functions despite machine crashes. At the architectural level, our approach reli es on a high-performance stable storage implementation, called Stable Transactional Memory (STM), which can be implemented either by hardwar e or software, We first motivate our design choices, then we detail th e FTM implementation at both architectural and operating system level. We discuss the reasons for the evolution of our stable memory technol ogy from hardware to software; We evaluate the performance of the FTM prototype, We conclude with lessons learned and give some assessments.