A fresh look is taken at the problem of bias in information-based attr
ibute selection measures, used in the induction of decision trees. The
approach uses statistical simulation techniques to demonstrate that t
he usual measures such as information gain, gain ratio, and a new meas
ure recently proposed by Lopez de Mantaras (1991) are all biased in fa
vour of attributes with large numbers of values. It is concluded that
approaches which utilise the chi-square distribution are preferable be
cause they compensate automatically for differences between attributes
in the number of levels they take.