Jg. Thomas et al., An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles, GENOME RES, 11(7), 2001, pp. 1227-1236
We have developed a statistical regression modeling approach to discover ge
nes that are differentially expressed between two predefined sample groups
in DNA microarray experiments. Our model is based on well-defined assumptio
ns, uses rigorous and well-characterized statistical measures, and accounts
for the heterogeneity and genomic complexity of the data. In contrast to c
luster analysis, which attempts to define groups of genes and/or samples th
at share common overall expression profiles, our modeling approach uses kno
wn sample group membership to Focus on expression profiles of individual ge
nes in a sensitive and robust manner. Further, this approach can be used to
test statistical hypotheses about gene expression. To demonstrate this met
hodology, we compared the expression profiles of 11 acute myeloid leukemia
(AML) and 27 acute lymphoblastic leukemia (ALL) samples From a previous stu
dy (Golub et al. 1999) acid found 141 genes differentially expressed betwee
n AML and ALL with a 1% significance at the genomic level. Using this model
ing approach to compare different sample groups within the AML samples, we
identified a group of genes whose expression profiles correlated with that
of thrombopoietin and found that genes whose expression associated with AML
treatment outcome lie in recurrent chromosomal locations. Our results are
compared with those obtained using t-tests or Wilcoxon rank sum statistics.