Gene selection for cancer classification using support vector machines

Citation
I. Guyon et al., Gene selection for cancer classification using support vector machines, MACH LEARN, 46(1-3), 2002, pp. 389-422
Citations number
46
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
08856125 → ACNP
Volume
46
Issue
1-3
Year of publication
2002
Pages
389 - 422
Database
ISI
SICI code
0885-6125(2002)46:1-3<389:GSFCCU>2.0.ZU;2-5
Abstract
DNA micro-arrays now permit scientists to screen thousands of genes simulta neously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices gener ate bewildering amounts of raw data, new analytical methods must be develop ed to sort out whether cancer tissues have distinctive signatures of gene e xpression over normal tissues or other types of cancer tissues. In this paper, we address the problem of selection of a small subset of gen es from broad patterns of gene expression data, recorded on DNA micro-array s. Using available training examples from cancer and normal patients, we bu ild a classifier suitable for genetic diagnosis, as well as drug discovery. Previous attempts to address this problem select genes with correlation te chniques. We propose a new method of gene selection utilizing Support Vecto r Machine methods based on Recursive Feature Elimination (RFE). We demonstr ate experimentally that the genes selected by our techniques yield better c lassification performance and are biologically relevant to cancer. In contrast with the baseline method, our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error, while 64 genes are necessary for the baseline method to get the best result (one leave-one-out error). In the colon cancer database, using only 4 genes our method is 98% accurate, while the baseline method is only 86% accurate.