Case-based reasoning (CBR) has been proposed for predicting the risk class
of software components. Risky components can be defined as those that are f
ault-prone, or those that require a large amount of effort to maintain. Thu
s far evaluative studies of CBR classifiers have been promising, showing th
at their predictive performance is as good as or better than other types of
classifiers. However, a CBR classifier can be instantiated in different wa
ys by varying its parameters, and it is not clear which combination of para
meters provides the best performance. In this paper we evaluate the perform
ance of a CBR classifier with different parameters, namely: (a) different d
istance measures, (b) different standardization techniques, (c) use or non-
use of weights, and (d) the number of nearest neighbors to use for the pred
iction. In total, we compared 30 different CBR classifiers. The study was c
onducted with a data set from a large real-time system, and the objective w
as to predict the fault-proneness of its components. Our results indicate t
hat there is no difference in prediction performance when using any combina
tion of parameters. Based on these results, we recommend using a simple CBR
classifier with Euclidean distance, z-score standardization, no weighting
scheme, and selecting the single nearest neighbor for prediction. The advan
tage of such a classifier is its intuitive appeal to nonspecialists, and th
e fact that it performs as well as more complex classifiers. (C) 2001 Elsev
ier Science Inc. All rights reserved.