Motivation: Computer-assisted methods are essential for the analysis of bio
sequences. Gene activity is regulated in part by the binding of regulatory
molecules (transcription factors) to combinations of short motifs, The goal
of our analysis is the development of algorithms to identify regulatory mo
tifs and to predict the activity of combinations of those motifs.
Approach: Our research begins with a new motif-finding method, using multip
le objective functions and an improved stochastic iterative sampling strate
gy. Combinatorial motif analysis is accomplished by constructive induction
that analyzes potential motif combinations. The hypothesis is generated by
applying standard inductive learning algorithms.
Results: Tests using 10 previously identified regulons from budding yeast a
nd 14 artificial families of sequences demonstrated the effectiveness of th
e new motif-finding method Motif combination and classification approaches
were used in the analysis of a sample DNA array data set derived from genom
e-wide gene expression analysis.