Classification-tree models of software-quality over multiple releases

Citation
Tm. Khoshgoftaar et al., Classification-tree models of software-quality over multiple releases, IEEE RELIAB, 49(1), 2000, pp. 4-11
Citations number
40
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON RELIABILITY
ISSN journal
00189529 → ACNP
Volume
49
Issue
1
Year of publication
2000
Pages
4 - 11
Database
ISI
SICI code
0018-9529(200003)49:1<4:CMOSOM>2.0.ZU;2-D
Abstract
This paper presents an empirical study that evaluates software-quality mode ls over several releases, to address the question, "How long will a model y ield useful predictions?" The Classification And Regression Trees (CART) al gorithm is introduced. CART can achieve a preferred balance between the two types of misclassification rates. This is desirable because misclassificat ion of fault-prone modules often has much more severe consequences than mis classification of those that are not fault-prone. The case-study developed 2 classification-tree models based on 4 consecutiv e releases of a very large legacy telecommunication system. Forty-two softw are product, process, and execution metrics were candidate predictors. Mode l #1 used measurements of the first release as the training data set; this model had II important predictors. Model #2 used measurements of the second release as the training data set; this model had 15 important predictors, Measurements of subsequent releases were evaluation data sets. Analysis of the models' predictors yielded insights into various software development p ractices. Both models had accuracy that would be useful to developers. One might supp ose that software-quality models lose their value very quickly over success ive releases due to evolution of the product and the underlying development processes. We found the models remained useful over all the releases studi ed.