AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION

Citation
C. Apte et al., AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION, ACM transactions on information systems, 12(3), 1994, pp. 233-251
Citations number
24
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems
ISSN journal
10468188
Volume
12
Issue
3
Year of publication
1994
Pages
233 - 251
Database
ISI
SICI code
1046-8188(1994)12:3<233:ALODRF>2.0.ZU;2-2
Abstract
We describe the results of extensive experiments using optimized rule- based induction methods on large document collections. The goal of the se methods is to discover automatically classification patterns that c an be used for general document categorization or personalized filteri ng of free text. Previous reports indicate that human-engineered rule- based systems, requiring many man-years of developmental efforts, have been successfully built to ''read'' documents and assign topics to th em. We show that machine-generated decision rules appear comparable to human performance, while using the identical rule-based representatio n. In comparison with other machine-learning techniques, results on a key benchmark from the Reuters collection show a large gain in perform ance, from a previously reported 67% recall/precision breakeven point to 80.5%. In the context of a very high-dimensional feature space, sev eral methodological alternatives are examined, including universal ver sus local dictionaries, and binary versus frequency-related features.