BIAS IN INFORMATION-BASED MEASURES IN DECISION TREE INDUCTION

Authors
Citation
Ap. White et Wz. Liu, BIAS IN INFORMATION-BASED MEASURES IN DECISION TREE INDUCTION, Machine learning, 15(3), 1994, pp. 321-329
Citations number
11
Categorie Soggetti
Computer Sciences","Computer Science Artificial Intelligence",Neurosciences
Journal title
ISSN journal
08856125
Volume
15
Issue
3
Year of publication
1994
Pages
321 - 329
Database
ISI
SICI code
0885-6125(1994)15:3<321:BIIMID>2.0.ZU;2-8
Abstract
A fresh look is taken at the problem of bias in information-based attr ibute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that t he usual measures such as information gain, gain ratio, and a new meas ure recently proposed by Lopez de Mantaras (1991) are all biased in fa vour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable be cause they compensate automatically for differences between attributes in the number of levels they take.