Computational prediction of eukaryotic promoters from the nucleotide sequen
ce is one of the most attractive problems in sequence analysis today, but i
t is also a very difficult one. Thus, current methods predict in the order
of one promoter per kilobase in human DNA, while the average distance betwe
en functional promoters has been estimated to be in the range of 30-40 kilo
bases. Although it is conceivable that some of these predicted promoters co
rrespond to cryptic initiation sites that are used in vivo, it is likely th
at most are false positives. This suggests that it is important to carefull
y reconsider the biological data that forms the basis of current algorithms
, and we here present a review of data that may be useful in this regard. T
he review covers the following topics: (1) basal transcription and core pro
moters, (2) activated transcription and transcription factor binding sites,
(3) CpG islands and DNA methylation, (4) chromosomal structure and nucleos
ome modification, and (5) chromosomal domains and domain boundaries. We dis
cuss the possible lessons that may be learned, especially with respect to t
he wealth of information about epigenetic regulation of transcription that
has been appearing in recent years. (C) 1999 Elsevier Science Ltd. All righ
ts reserved.