ITA
ENG

EVALUATING TESTING METHODS BY DELIVERED RELIABILITY

Authors

FRANKL PG HAMLET RG LITTLEWOOD B STRIGINI L

Citation

Pg. Frankl et al., EVALUATING TESTING METHODS BY DELIVERED RELIABILITY, IEEE transactions on software engineering, 24(8), 1998, pp. 586-601

Citations number

Categorie Soggetti

Computer Science Software Graphycs Programming","Engineering, Eletrical & Electronic","Computer Science Software Graphycs Programming

Journal title

IEEE transactions on software engineering → ACNP

ISSN journal

00985589

Volume

Issue

Year of publication

1998

Pages

586 - 601

Database

ISI

SICI code

0098-5589(1998)24:8<586:ETMBDR>2.0.ZU;2-M

Abstract

There are two main goals in testing software: 1) to achieve adequate q uality (debug testing); the objective is to probe the software for def ects so that these can be removed and 2) to assess existing quality (o perational testing); the objective is to gain confidence that the soft ware is reliable. The names are arbitrary, and most testing techniques address both goals to some degree. However, debug methods tend to ign ore random selection of test data from an operational profile, while f or operational methods this selection is all-important. Debug methods are thought, without any real proof, to be good at uncovering defects so that these can be repaired, but having done so they do not provide a technically defensible assessment of the reliability that results. O n the other hand, operational methods provide accurate assessment, but may not be as useful for achieving reliability. This paper examines t he relationship between the two testing goals, using a probabilistic a nalysis. We define simple models of programs and their testing, and tr y to answer theoretically the question of how to attain program reliab ility: Is it better to test by probing for defects as in debug testing , or to assess reliability directly as in operational testing, uncover ing defects by accident, so to speak? There is no simple answer, of co urse. Testing methods are compared in a model where program failures a re detected and the software changed to eliminate them. The ''better'' method delivers higher reliability after all test failures have been eliminated. This comparison extends previous work, where the measure w as the probability of detecting a failure. Revealing special cases are exhibited in which each kind of testing is superior. Preliminary anal ysis of the distribution of the delivered reliability indicates that e ven simple models have unusual statistical properties, suggesting caut ion in interpreting theoretical comparisons.