Analysis of molecular profile data using generative and discriminative methods

Citation
Ej. Moler et al., Analysis of molecular profile data using generative and discriminative methods, PHYSIOL GEN, 4(2), 2000, pp. 109-126
Citations number
52
Categorie Soggetti
Molecular Biology & Genetics
Journal title
PHYSIOLOGICAL GENOMICS
ISSN journal
10948341 → ACNP
Volume
4
Issue
2
Year of publication
2000
Pages
109 - 126
Database
ISI
SICI code
1094-8341(200012)4:2<109:AOMPDU>2.0.ZU;2-X
Abstract
A modular framework is proposed for modeling and understanding the relation ships between molecular profile data and other domain knowledge using a com bination of generative (here, graphical models) and discriminative [Support Vector Machines (SVMs)] methods. As illustration, naive Bayes models, simp le graphical models, and SVMs were applied to published transcription profi le data for 1,988 genes in 62 colon adenocarcinoma tissue specimens labeled as tumor or nontumor. These unsupervised and supervised learning methods i dentified three classes or subtypes of specimens, assigned tumor or nontumo r labels to new specimens and detected six potentially mislabeled specimens . The probability parameters of the three classes were utilized to develop a novel gene relevance, ranking, and selection method. SVMs trained to disc riminate nontumor from tumor specimens using only the 50-200 top-ranked gen es had the same or better generalization performance than the full repertoi re of 1,988 genes. Approximately 90 marker genes were pinpointed for use in understanding the basic biology of colon adenocarcinoma, defining targets for therapeutic intervention and developing diagnostic tools. These potenti al markers highlight the importance of tissue biology in the etiology of ca ncer. Comparative analysis of molecular profile data is proposed as a mecha nism for predicting the physiological function of genes in instances when c omparative sequence analysis proves uninformative, such as with human and y east translationally controlled tumour protein. Graphical models and SVMs h old promise as the foundations for developing decision support systems for diagnosis, prognosis, and monitoring as well as inferring biological networ ks.