A PROCEDURE FOR ANALYZING UNBALANCED DATASETS

Authors
Citation
B. Kitchenham, A PROCEDURE FOR ANALYZING UNBALANCED DATASETS, IEEE transactions on software engineering, 24(4), 1998, pp. 278-301
Citations number
21
Categorie Soggetti
Computer Science Software Graphycs Programming","Engineering, Eletrical & Electronic","Computer Science Software Graphycs Programming
ISSN journal
00985589
Volume
24
Issue
4
Year of publication
1998
Pages
278 - 301
Database
ISI
SICI code
0098-5589(1998)24:4<278:APFAUD>2.0.ZU;2-7
Abstract
This paper describes a procedure for analyzing unbalanced datasets tha t include many nominal-and ordinal-scale factors. Such datasets are of ten found in company datasets used for benchmarking and productivity a ssessment. The two major problems caused by lack of balance are that t he impact of factors can be concealed and that spurious impacts can be observed. These effects are examined with the help of two small artif icial datasets. The paper proposes a method of forward pass residual a nalysis to analyze such datasets. The analysis procedure is demonstrat ed on the artificial datasets and then applied to the COCOMO dataset. The paper ends with a discussion of the advantages and limitations of the analysis procedure.