ITA
ENG

TESTING DISTRIBUTED PROGRAMS CONTAINING RACING MESSAGES

Authors

KILGORE RB CHASE CM

Citation

Rb. Kilgore et Cm. Chase, TESTING DISTRIBUTED PROGRAMS CONTAINING RACING MESSAGES, Computer journal, 40(8), 1997, pp. 489-498

Citations number

Journal title

Computer journal → ACNP

ISSN journal

00104620

Volume

Issue

Year of publication

1997

Pages

489 - 498

Database

ISI

SICI code

0010-4620(1997)40:8<489:TDPCRM>2.0.ZU;2-#

Abstract

Finding errors in non-deterministic programs is complicated by the fac t that a bug may be revealed only by a particular sequence of program activities. An erroneous program may run correctly hundreds or thousan ds of times, each time avoiding the failure-causing sequence. This pro blem is exacerbated in distributed systems since race conditions on me ssages may not be under the direct control of the programmer. We descr ibe how message delivery ordering can be controlled during execution. Our objective is to provide a practical yet powerful testing environme nt for distributed systems, using re-execution. Previous work in this area was limited to replaying deterministically the same execution rep eatedly. We focus on re-executing the program, under a strictly differ ent message ordering. In this way, latent bugs are more likely to reve al themselves during testing. We show that messages are grouped into w aves, such that any two messages from different waves must always be r eceived in the same order. We provide an algorithm that produces a re- execution that maximizes the number of reordered pairs of message deli very events. We prove a tight lower bound of k - 1 reordered pairs of messages where k is the number of messages in a wave. We also provide an efficient on-line algorithm for detecting racing messages. Previous methods for detecting race conditions were either off-line, or limite d to detecting the races far a single process.