Mining e-mail content for author identification forensics

Citation
O. De Vel et al., Mining e-mail content for author identification forensics, SIG RECORD, 30(4), 2001, pp. 55-64
Citations number
42
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
30
Issue
4
Year of publication
2001
Pages
55 - 64
Database
ISI
SICI code
0163-5808(200112)30:4<55:MECFAI>2.0.ZU;2-F
Abstract
We describe an investigation into e-mail content mining for author identifi cation, or authorship attribution, for the purpose of forensic investigatio n. We focus our discussion on the ability to discriminate between authors f or the case of both aggregated e-mail topics as well as across different em ail topics. An extended set of e-mail document features including structura l characteristics and linguistic patterns were derived and, together with a Support Vector Machine learning algorithm, were used for mining the e-mail content. Experiments using a number of e-mail documents generated by diffe rent authors on a set of topics gave promising results for both aggregated and multi-topic author categorisation.