ITA
ENG

Selective checkpointing and rollbacks in multi-threaded object-oriented environment

Authors

Kasbekar, M Narayanan, C Das, CR

Citation

M. Kasbekar et al., Selective checkpointing and rollbacks in multi-threaded object-oriented environment, IEEE RELIAB, 48(4), 1999, pp. 325-337

Citations number

Categorie Soggetti

Eletrical & Eletronics Engineeing

Journal title

IEEE TRANSACTIONS ON RELIABILITY

ISSN journal

00189529 → ACNP

Volume

Issue

Year of publication

1999

Pages

325 - 337

Database

ISI

SICI code

0018-9529(199912)48:4<325:SCARIM>2.0.ZU;2-F

Abstract

This paper presents selective checkpointing and rollback schemes for MT-GO (multithreaded, object-oriented) programs. There is a need for checkpointin g mechanisms that are more sophisticated than the traditional process-level checkpointing. The program model, theoretical foundations, and an implemen tation of the selective checkpointing & rollback schemes are described. The usefulness of the schemes is demonstrated by implementing a higher level f ault-tolerance scheme of conversations using them. The performance implicat ions are studied on a prototype internet e-commerce server. The use of the selective schemes in the prototype server showed an appreciable reduction i n the loss of work in the presence of faults. Benefits are more pronounced for a larger level of concurrency in the server. The selective scheme usual ly outperforms the hypothetical zero-cost global scheme in the presence of faults, vis-a-vis completion times. The experiments also show the vast diff erence between the sizes of selective checkpoints and global checkpoints. T he concurrent sessions scheme (based on the concept of relaxed conversation s) required 160 checkpoints in less than an hour. Traditionally, such a sch eme would be considered outrageous, but the selective schemes still improve performance in the presence of faults. The main contribution of this paper is that it brings forward an OO (object -oriented) approach to checkpointing. Not only does the program model separ ate program state from process state, but it allows one to identify the sta te associated with each individual thread of the MT program. The prototype showed that this abstract knowledge about the program state can be made ava ilable at runtime in the form of suitable data structures. The availability of this information at runtime fuels the design of selective schemes.