ITA
ENG

Partitioning nominal attributes in decision trees

Authors

Coppersmith, D Hong, SJ Hosking, JRM

Citation

D. Coppersmith et al., Partitioning nominal attributes in decision trees, DATA M K D, 3(2), 1999, pp. 197-217

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

DATA MINING AND KNOWLEDGE DISCOVERY

ISSN journal

13845810 → ACNP

Volume

Issue

Year of publication

1999

Pages

197 - 217

Database

ISI

SICI code

1384-5810(199906)3:2<197:PNAIDT>2.0.ZU;2-L

Abstract

To find the optimal branching of a nominal attribute at a node in an L-ary decision tree, one is often forced to search over all possible L-ary partit ions for the one that yields the minimum impurity measure. For binary trees (L = 2) when there are just two classes a short-cut search is possible tha t is linear in n, the number of distinct values of the attribute. For the g eneral case in which the number of classes, k, may be greater than two, Bur shtein et al. have shown that the optimal partition satisfies a condition t hat involves the existence of (L\atop 2) hyperplanes in the class probabili ty space. We derive a property of the optimal partition for concave impurit y measures (including in particular the Gini and entropy impurity measures) in terms of the existence of L vectors in the dual of the class probabilit y space, which implies the earlier condition. Unfortunately, these insights still do not offer a practical search method when n and k are large, even for binary trees. We therefore present a new h euristic search algorithm to find a good partition. It is based on ordering the attribute's values according to their principal component scores in th e class probability space, and is linear in n. We demonstrate the effective ness of the new method through Monte Carlo simulation experiments and compa re its performance against other heuristic methods.