Lj. Jensen et S. Knudsen, Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation, BIOINFORMAT, 16(4), 2000, pp. 326-333
Motivation: The whole genomes submitted to GenBank contain valuable informa
tion about the function of genes as well as the upstream sequences and whol
e cell expression provides valuable information on gene regulation. To util
ize these large amounts of data for a biological understanding of the regul
ation of gene expression, new automatic methods for pattern finding are nee
ded.
Results: Two word-analysis algorithms for automatic discovery of regulatory
sequence elements have been developed. We show that sequence patterns corr
elated to whole cell expression data can be found using Kolmogorov-Smirnov
tests on the raw data, thereby eliminating the need for clustering co-regul
ated genes. Regulatory elements have also been identified by systematic cal
culations of the significance of correlations between words found in the fu
nctional annotation of genes and DNA words occuring in their promoter regio
ns. Application of these algorithms to the Saccharomyces cerevisiae genome
and publicly available DNA array data sets revealed a highly conserved 9-me
r occuring in the upstream regions of genes coding for proteasomal subunits
. Several other putative and known regulatory elements were also found.