ITA
ENG

Analysis of molecular profile data using generative and discriminative methods

Authors

Moler, EJ Chow, ML Mian, IS

Citation

Ej. Moler et al., Analysis of molecular profile data using generative and discriminative methods, PHYSIOL GEN, 4(2), 2000, pp. 109-126

Citations number

Categorie Soggetti

Molecular Biology & Genetics

Journal title

PHYSIOLOGICAL GENOMICS

ISSN journal

10948341 → ACNP

Volume

Issue

Year of publication

2000

Pages

109 - 126

Database

ISI

SICI code

1094-8341(200012)4:2<109:AOMPDU>2.0.ZU;2-X

Abstract

A modular framework is proposed for modeling and understanding the relation ships between molecular profile data and other domain knowledge using a com bination of generative (here, graphical models) and discriminative [Support Vector Machines (SVMs)] methods. As illustration, naive Bayes models, simp le graphical models, and SVMs were applied to published transcription profi le data for 1,988 genes in 62 colon adenocarcinoma tissue specimens labeled as tumor or nontumor. These unsupervised and supervised learning methods i dentified three classes or subtypes of specimens, assigned tumor or nontumo r labels to new specimens and detected six potentially mislabeled specimens . The probability parameters of the three classes were utilized to develop a novel gene relevance, ranking, and selection method. SVMs trained to disc riminate nontumor from tumor specimens using only the 50-200 top-ranked gen es had the same or better generalization performance than the full repertoi re of 1,988 genes. Approximately 90 marker genes were pinpointed for use in understanding the basic biology of colon adenocarcinoma, defining targets for therapeutic intervention and developing diagnostic tools. These potenti al markers highlight the importance of tissue biology in the etiology of ca ncer. Comparative analysis of molecular profile data is proposed as a mecha nism for predicting the physiological function of genes in instances when c omparative sequence analysis proves uninformative, such as with human and y east translationally controlled tumour protein. Graphical models and SVMs h old promise as the foundations for developing decision support systems for diagnosis, prognosis, and monitoring as well as inferring biological networ ks.