TESTING DISTRIBUTED PROGRAMS CONTAINING RACING MESSAGES

Citation
Rb. Kilgore et Cm. Chase, TESTING DISTRIBUTED PROGRAMS CONTAINING RACING MESSAGES, Computer journal, 40(8), 1997, pp. 489-498
Citations number
22
Journal title
ISSN journal
00104620
Volume
40
Issue
8
Year of publication
1997
Pages
489 - 498
Database
ISI
SICI code
0010-4620(1997)40:8<489:TDPCRM>2.0.ZU;2-#
Abstract
Finding errors in non-deterministic programs is complicated by the fac t that a bug may be revealed only by a particular sequence of program activities. An erroneous program may run correctly hundreds or thousan ds of times, each time avoiding the failure-causing sequence. This pro blem is exacerbated in distributed systems since race conditions on me ssages may not be under the direct control of the programmer. We descr ibe how message delivery ordering can be controlled during execution. Our objective is to provide a practical yet powerful testing environme nt for distributed systems, using re-execution. Previous work in this area was limited to replaying deterministically the same execution rep eatedly. We focus on re-executing the program, under a strictly differ ent message ordering. In this way, latent bugs are more likely to reve al themselves during testing. We show that messages are grouped into w aves, such that any two messages from different waves must always be r eceived in the same order. We provide an algorithm that produces a re- execution that maximizes the number of reordered pairs of message deli very events. We prove a tight lower bound of k - 1 reordered pairs of messages where k is the number of messages in a wave. We also provide an efficient on-line algorithm for detecting racing messages. Previous methods for detecting race conditions were either off-line, or limite d to detecting the races far a single process.