Rd. King et al., STATLOG - COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLDPROBLEMS, Applied artificial intelligence, 9(3), 1995, pp. 289-333
Citations number
70
Categorie Soggetti
System Science","Computer Science Artificial Intelligence","Engineering, Eletrical & Electronic
This paper describes work in the StatLog project comparing classificat
ion algorithms on large real-world problems. The algorithms compared w
ere from symbolic learning (CART, C4.5, NewlD, AC(2), ITrule, Cal5, CN
2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linea
r discriminant, quadratic discriminant, logistic regression, projectio
n pursuit, Bayesian networks), and neural networks (backpropagation, r
adial basis functions). Twelve datasets were used:five from image anal
ysis, three from medicine, and two each from engineering and finance.
We found that which algorithm performed best depended critically on th
e data set investigated. We therefore developed a set of data set desc
riptors to help decide which algorithms are suited to particular data
sets. For example, data sets with extreme distributions (skew > 1 and
kurtosis > 7) and with many binary/categorical attributes (> 38%) tend
to favor symbolic learning algorithms. We suggest how classification
algorithms can be extended in a number of directions.