ITA
ENG

Identifying marker genes in transcription profiling data using a mixture of feature relevance experts

Authors

Chow, ML Moler, EJ Mian, IS

Citation

Ml. Chow et al., Identifying marker genes in transcription profiling data using a mixture of feature relevance experts, PHYSIOL GEN, 5(2), 2001, pp. 99-111

Citations number

Categorie Soggetti

Molecular Biology & Genetics

Journal title

PHYSIOLOGICAL GENOMICS

ISSN journal

10948341 → ACNP

Volume

Issue

Year of publication

2001

Pages

99 - 111

Database

ISI

SICI code

1094-8341(20010308)5:2<99:IMGITP>2.0.ZU;2-G

Abstract

Transcription profiling experiments permit the expression levels of many ge nes to be measured simultaneously. Given profiling data from two types of s amples, genes that most distinguish the samples (marker genes) are good can didates for subsequent in-depth experimental studies and developing decisio n support systems for diagnosis, prognosis, and monitoring. This work propo ses a mixture of feature relevance experts as a method for identifying mark er genes and illustrates the idea using published data from samples labeled as acute lymphoblastic and myeloid leukemia (ALL, AML). A feature relevanc e expert implements an algorithm that calculates how well a gene distinguis hes samples, reorders genes according to this relevance measure, and uses a supervised learning method [here, support vector machines (SVMs)] to deter mine the generalization performances of different nested gene subsets. The mixture of three feature relevance experts examined implement two existing and one novel feature relevance measures. For each expert, a gene subset co nsisting of the top 50 genes distinguished ALL from AML samples as complete ly as all 7,070 genes. The 125 genes at the union of the top 50s are plausi ble markers for a prototype decision support system. Chromosomal aberration and other data support the prediction that the three genes at the intersec tion of the top 50s, cystatin C, azurocidin, and adipsin, are good targets for investigating the basic biology of ALL/AML. The same data were employed to identify markers that distinguish samples based on their labels of T ce ll/B cell, peripheral blood/bone marrow, and male/female. Selenoprotein W m ay discriminate T cells from B cells. Results from analysis of transcriptio n profiling data from tumor/nontumor colon adenocarcinoma samples support t he general utility of the aforementioned approach. Theoretical issues such as choosing SVM kernels and their parameters, training and evaluating featu re relevance experts, and the impact of potentially mislabeled samples on m arker identification (feature selection) are discussed.