A comprehensive evaluation of capture-recapture models for estimating software defect content

Citation
Lc. Briand et al., A comprehensive evaluation of capture-recapture models for estimating software defect content, IEEE SOFT E, 26(6), 2000, pp. 518-540
Citations number
42
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
ISSN journal
00985589 → ACNP
Volume
26
Issue
6
Year of publication
2000
Pages
518 - 540
Database
ISI
SICI code
0098-5589(200006)26:6<518:ACEOCM>2.0.ZU;2-B
Abstract
An important requirement to control the inspection of software artifacts is to be able to decide, based on more objective information, whether the ins pection can stop or whether it should continue to achieve a suitable level of artifact quality. A prediction of the number of remaining defects in an inspected artifact can be used for decision making. Several studies in soft ware engineering have considered capture-recapture models, originally propo sed by biologists to estimate animal populations, to make a prediction. How ever, few studies compare the actual number of remaining defects to the one predicted by a capture-recapture model on real software engineering artifa cts. Thus, there is little work looking at the robustness of capture-recapt ure models under realistic software engineering conditions, where it is exp ected that some of their assumptions will be violated. Simulations have bee n performed, but no definite conclusions can be drawn regarding the degree of accuracy of such models under realistic inspection conditions and the fa ctors affecting this accuracy. Furthermore, the existing studies focused on a subset of the existing capture-recapture models. Thus, a more exhaustive comparison is still missing. In this study, we focus on traditional inspec tions and estimate, based on actual inspections data, the degree of accurac y of relevant, state-of-the-art capture-recapture models as they have been proposed in biology and for which statistical estimators exist, in order to assess their robustness, we look at the impact of the number of inspectors and the number of actual defects on the estimators' accuracy based on actu al inspection data. Our results show that models are strongly affected by t he number of inspectors and, therefore, one must consider this factor befor e using capture-recapture models. When the number of inspectors is too smal l, no model is sufficiently accurate and underestimation may be substantial . In addition, some models perform better than others in a large number of conditions and plausible reasons are discussed. Based on our analyses, we r ecommend using a model taking into account that defects have different prob abilities of being detected and the corresponding Jackknife Estimator. Furt hermore, we attempt to calibrate the prediction models based on their relat ive error, as previously computed on other inspections. Although intuitive and straightforward, we identified theoretical limitations to this approach which were then confirmed by the data.