STATLOG - COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLDPROBLEMS

Citation
Rd. King et al., STATLOG - COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLDPROBLEMS, Applied artificial intelligence, 9(3), 1995, pp. 289-333
Citations number
70
Categorie Soggetti
System Science","Computer Science Artificial Intelligence","Engineering, Eletrical & Electronic
ISSN journal
08839514
Volume
9
Issue
3
Year of publication
1995
Pages
289 - 333
Database
ISI
SICI code
0883-9514(1995)9:3<289:S-COCA>2.0.ZU;2-C
Abstract
This paper describes work in the StatLog project comparing classificat ion algorithms on large real-world problems. The algorithms compared w ere from symbolic learning (CART, C4.5, NewlD, AC(2), ITrule, Cal5, CN 2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linea r discriminant, quadratic discriminant, logistic regression, projectio n pursuit, Bayesian networks), and neural networks (backpropagation, r adial basis functions). Twelve datasets were used:five from image anal ysis, three from medicine, and two each from engineering and finance. We found that which algorithm performed best depended critically on th e data set investigated. We therefore developed a set of data set desc riptors to help decide which algorithms are suited to particular data sets. For example, data sets with extreme distributions (skew > 1 and kurtosis > 7) and with many binary/categorical attributes (> 38%) tend to favor symbolic learning algorithms. We suggest how classification algorithms can be extended in a number of directions.