Text categorization with support vector machines. How to represent texts in input space ?

Citation
E. Leopold et J. Kindermann, Text categorization with support vector machines. How to represent texts in input space ?, MACH LEARN, 46(1-3), 2002, pp. 423-444
Citations number
22
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
08856125 → ACNP
Volume
46
Issue
1-3
Year of publication
2002
Pages
423 - 444
Database
ISI
SICI code
0885-6125(2002)46:1-3<423:TCWSVM>2.0.ZU;2-3
Abstract
The choice of the kernel function is crucial to most applications of suppor t vector machines. In this paper, however, we show that in the case of text classification, term-frequency transformations have a larger impact on the performance of SVM than the kernel itself. We discuss the role of importan ce-weights (e.g. document frequency and redundancy), which is not yet fully understood in the light of model complexity and calculation cost, and we s how that time consuming lemmatization or stemming can be avoided even when classifying a highly inflectional language like German.