ITA
ENG

Enhancing replica management services to cope with group failures

Authors

Ezhilchelvan, PD Shrivastava, SK

Citation

Pd. Ezhilchelvan et Sk. Shrivastava, Enhancing replica management services to cope with group failures, LECT N COMP, 1752, 2000, pp. 79-103

Citations number

Categorie Soggetti

Current Book Contents

Journal title

ADVANCES IN DISTRIBUTED SYSTEMS → ACNP

ISSN journal

03029743

Volume

1752

Year of publication

2000

Pages

79 - 103

Database

ISI

SICI code

0302-9743(2000)1752:<79:ERMSTC>2.0.ZU;2-7

Abstract

In a distributed system, replication of components, such as objects, is a w ell known way of achieving availability. For increased availability, crashe d and disconnected components must be replaced by new components on availab le spare nodes. This replacement results in the membership of the replicate d group 'walking' over a number of machines during system operation. In thi s context, we address the problem of reconfiguring a group after the group as an entity has failed. Such a failure is termed a group failure which, fo r example, can be the crash of every component in the group or the group be ing partitioned into minority islands. The solution assumes crash-proof sto rage, and eventual recovery of crashed nodes and healing of partitions. It guarantees that (i) the number of groups reconfigured after a group failure is never more than one, and (ii) the reconfigured group contains a majorit y of the components which were members of the group just before the group f ailure occurred, so that the loss of state information due to a group failu re is minimal. Though the protocol is subject to blocking, it remains effic ient in terms of communication rounds and use of stable store, during both normal operations and reconfiguration after a group failure.