Gene expression studies bridge the gap between DNA information and trait in
formation by dissecting biochemical pathways into intermediate components b
etween genotype and phenotype. These Studies open new avenues for identifyi
ng complex disease genes and biomarkers for disease diagnosis and for asses
sing drug efficacy and toxicity. However, the majority of analytical method
s applied to gene expression data are not efficient for biomarker identific
ation and disease diagnosis. In this paper, we propose a general framework
to incorporate feature (gene) selection into pattern recognition in the pro
cess to identify biomarkers. Using this framework, we develop three feature
wrappers that search through the space Of feature subsets using the classi
fication error as measure of goodness for a particular feature subset being
"wrapped around": linear discriminant analysis, logistic regression, and s
upport vector machines. To effectively carry Out this computationally inten
sive search process, we employ sequential forward search and Sequential for
ward floating search algorithms. To evaluate the performance of feature sel
ection for biomarker identification we have applied the proposed methods to
three data sets. The preliminary results demonstrate that very high classi
fication accuracy can be attained by identified composite classifiers with
several biomarkers.