Since efficient and relatively cheap methods were developed for determining
biosequences, a lot of biosequence data has been generated. As the main pr
oblem in molecular biology is the analysis of the data instead of the data
acquisition, part of the study of computational biology is to extract all k
inds of meaningful information from the sequences. Computer-assisted method
s have become very important in analyzing biosequence data. However, most o
f the current computer-assisted methods are limited to Ending motifs. Genes
can be regulated in many ways, including combinations of regulatory elemen
ts. This research is aimed at developing a new integrated system for genome
-wide gene expression analysis. This research begins with a new motif-findi
ng method, using a new objective function combining multiple well defined c
omponents and an improved stochastic iterative sampling strategy. Combinato
rial motif analysis is accomplished by constructive induction that analyzes
potential motif combinations. We then apply standard inductive learning al
gorithms to generate hypotheses for different gene behaviors. A genome-wide
gene expression analysis demonstrated the value of this novel integrated s
ystem. (C) 2001 Elsevier Science Ireland Ltd. All rights reserved.