This paper presents an empirical study that evaluates software-quality mode
ls over several releases, to address the question, "How long will a model y
ield useful predictions?" The Classification And Regression Trees (CART) al
gorithm is introduced. CART can achieve a preferred balance between the two
types of misclassification rates. This is desirable because misclassificat
ion of fault-prone modules often has much more severe consequences than mis
classification of those that are not fault-prone.
The case-study developed 2 classification-tree models based on 4 consecutiv
e releases of a very large legacy telecommunication system. Forty-two softw
are product, process, and execution metrics were candidate predictors. Mode
l #1 used measurements of the first release as the training data set; this
model had II important predictors. Model #2 used measurements of the second
release as the training data set; this model had 15 important predictors,
Measurements of subsequent releases were evaluation data sets. Analysis of
the models' predictors yielded insights into various software development p
ractices.
Both models had accuracy that would be useful to developers. One might supp
ose that software-quality models lose their value very quickly over success
ive releases due to evolution of the product and the underlying development
processes. We found the models remained useful over all the releases studi
ed.