R. Jeffery et al., Using public domain metrics to estimate software development effort, SEVENTH INTERNATIONAL SOFTWARE METRICS SYMPOSIUM - METRICS 2001, PROCEEDINGS, 2000, pp. 16-27
In this paper we investigate the accuracy of cost estimates when applying m
ost commonly used modeling techniques to a large-scale industrial data set
which is professionally maintained by the International Software Standards
Benchmarking Group (ISBSG). The modeling techniques applied are ordinary le
ast squares regression (OLS), Analogy-based estimation, stepwise ANOVA, CAR
T, and robust regression.
The questions we address in this study are related to important issues. The
first is the appropriate selection of a technique in a given context The s
econd is the assessment of the feasibility of using multi-organizational da
ta compared to the benefits from company-specific data collection.
We compare company-specific models with models based on multi-company data.
This is done by using the estimates derived for one company that contribut
ed to the ISBSG data set and estimates from using carefully marched data fr
om the rest of the ISBSG data.
When using the ISBSG data set to derive estimates for the company generally
poor results were obtained. Robust regression and OLS performed most accur
ately. When using the company's own data as the basis for estimation OLS, a
CART-variant, and Analogy performed best.
In contrast to previous studies, the estimation accuracy when using the com
pany's data is significantly higher than when using the rest of the ISBSG d
ata set.
Thus, from these results, the company that contributed to the ISBSG data se
t, would be better off when using ifs own data for cost estimation.