Eukaryotic promoters are among the most important functional domains y
et to be characterized in a satisfactory manner in genomic sequences.
Most current detection methods rely on the recognition of individual t
ranscription elements using position-weight matrices (PWM) or consensu
s sequences. Here, we study a simple promoter detection algorithm base
d on Markov transition matrices built from sequences upward from prove
n transcription initiation sites. The performances have been evaluated
on the training set and on a test set of promoter-containing sequence
s. The results on the training set are surprisingly good, given that t
he algorithm does not incorporate any specific knowledge about promote
rs. Yet, the program exhibits the pathological behaviour typical of al
l training set-based methods: a significant decline in performance whe
n confronted with previously unseen sequences. Thus, the Markov algori
thm, like the others presently available, does not truly capture the e
ssence of eukaryotic promoters. A detection program based on a Markov
model is likely to be blind to categories of promoters without close r
epresentatives in the training set. (C) 1997 Elsevier Science Ltd.