E. Leopold et J. Kindermann, Text categorization with support vector machines. How to represent texts in input space ?, MACH LEARN, 46(1-3), 2002, pp. 423-444
The choice of the kernel function is crucial to most applications of suppor
t vector machines. In this paper, however, we show that in the case of text
classification, term-frequency transformations have a larger impact on the
performance of SVM than the kernel itself. We discuss the role of importan
ce-weights (e.g. document frequency and redundancy), which is not yet fully
understood in the light of model complexity and calculation cost, and we s
how that time consuming lemmatization or stemming can be avoided even when
classifying a highly inflectional language like German.