K. Frech et al., A NOVEL METHOD TO DEVELOP HIGHLY SPECIFIC MODELS FOR REGULATORY UNITSDETECTS A NEW LTR IN GENBANK WHICH CONTAINS A FUNCTIONAL PROMOTER, Journal of Molecular Biology, 270(5), 1997, pp. 674-687
Functional promoters are composed of individual modules (e.g. transcri
ption factor binding sites, secondary structure elements, repeats) arr
anged in distinct patterns. Recognition of such patterns is essential
for identification of promoters in non-coding sequences. However, this
is difficult due to the absence of overall sequence similarity in pro
moters even if they are regulated in a similar way. We implemented sim
ple formal representations of general features of regulatory regions i
nto an algorithm capable of developing complex models reflecting both
the element composition and the functional organization of individual
elements (ModelGenerator). Though ModelGenerator requires a very simpl
e initial model (e.g. two modules and their relative order) it will ge
nerate a much more sophisticated model by analysis of the training set
of at least ten sequences. We show ModelGenerator to successfully mod
el different retroviral long terminal repeat (LTR) classes (Lentivirus
as well as avian and mammalian C-type) which contain functional promo
ters. Database searches with the program ModelInspector demonstrated t
he high specificity of these models and no apparent false negatives we
re defected. We also verified one match from GenBank to the mammalian
C-type LTR model experimentally and showed this sequence to contain an
active promoter. Thus, the concert of modular organization of functio
nal regulatory DNA regions (e.g. promoters) could be successfully impl
emented into a set of computer tools which might be flexible and speci
fic enough to be suitable for prospective analysis of new genomic DNA
sequences. (C) 1997 Academic Press Limited.