Motivation: A new approach to the prediction of eukaryotic PolII promoters
from DNA sequence takes advantage of a combination of elements similar to n
eural networks and genetic algorithms to recognize a set of discrete subpat
terns with variable separation as one pattern: a promoter. The neural netwo
rks use as input a small window of DNA sequence, as well as the output of o
ther neural networks. Through the use of genetic algorithms, the weights in
the neural networks are optimized to discriminate maximally between promot
ers and non-promoters.
Results: After several thousand generations of optimization, the algorithm
was able to discriminate between vertebrate promoter and non-promoter seque
nces in a test set with a correlation coefficient of 0.63. In addition, all
five known transcription start sites on the plus strand of the complete ad
enovirus genome were within 161 bp of 35 predicted transcription start site
s. On standardized test sets consisting of human genomic DNA, the performan
ce of Promoter2.0 compares well with other software developed for the same
purpose.