This paper presents a robust and nonblocking group membership protocol for
large-scale distributed systems. This protocol uses the causal relation bet
ween membership-updating messages (i.e., those specifying the adding and de
leting of members) and allows the messages to be executed in a nonblocking
manner. It differs from conventional group membership protocols in the foll
owing points: (1) neither global locking nor global synchronization is requ
ired; (2) membership-updating messages can be issued without being synchron
ized with each other, and they can be executed immediately after their arri
val. The proposed protocol therefore is highly scalable, and is more tolera
nt to node and network failures and to network partitions than are the conv
entional protocols. This paper proves that the proposed protocol works prop
erly as long as messages can eventually be received by their destinations.
This paper also discusses some design issues, such as multicast communicati
on of the regular messages, fault tolerance and application to reliable com
munication protocols (e.g., TCP/IP).