A NOVEL METHOD TO DEVELOP HIGHLY SPECIFIC MODELS FOR REGULATORY UNITSDETECTS A NEW LTR IN GENBANK WHICH CONTAINS A FUNCTIONAL PROMOTER

Citation
K. Frech et al., A NOVEL METHOD TO DEVELOP HIGHLY SPECIFIC MODELS FOR REGULATORY UNITSDETECTS A NEW LTR IN GENBANK WHICH CONTAINS A FUNCTIONAL PROMOTER, Journal of Molecular Biology, 270(5), 1997, pp. 674-687
Citations number
48
Categorie Soggetti
Biology
ISSN journal
00222836
Volume
270
Issue
5
Year of publication
1997
Pages
674 - 687
Database
ISI
SICI code
0022-2836(1997)270:5<674:ANMTDH>2.0.ZU;2-P
Abstract
Functional promoters are composed of individual modules (e.g. transcri ption factor binding sites, secondary structure elements, repeats) arr anged in distinct patterns. Recognition of such patterns is essential for identification of promoters in non-coding sequences. However, this is difficult due to the absence of overall sequence similarity in pro moters even if they are regulated in a similar way. We implemented sim ple formal representations of general features of regulatory regions i nto an algorithm capable of developing complex models reflecting both the element composition and the functional organization of individual elements (ModelGenerator). Though ModelGenerator requires a very simpl e initial model (e.g. two modules and their relative order) it will ge nerate a much more sophisticated model by analysis of the training set of at least ten sequences. We show ModelGenerator to successfully mod el different retroviral long terminal repeat (LTR) classes (Lentivirus as well as avian and mammalian C-type) which contain functional promo ters. Database searches with the program ModelInspector demonstrated t he high specificity of these models and no apparent false negatives we re defected. We also verified one match from GenBank to the mammalian C-type LTR model experimentally and showed this sequence to contain an active promoter. Thus, the concert of modular organization of functio nal regulatory DNA regions (e.g. promoters) could be successfully impl emented into a set of computer tools which might be flexible and speci fic enough to be suitable for prospective analysis of new genomic DNA sequences. (C) 1997 Academic Press Limited.