We study the use of support vector machines (SVM's) In classifying e-mail a
s spam or nonspam by comparing it to three other classification algorithms:
Ripper, Rocchio, and boosting decision trees, These four algorithms were t
ested on two different data sets: one data set where the number of features
were constrained to the 1000 best features and another data set where the
dimensionality was over 7000, SVM's performed best when using binary featur
es. For both data sets, boosting trees and SVM's had acceptable test perfor
mance in terms of accuracy and speed. However, SVM's had significantly less
training time.