Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: A novel context analysis approach

Citation
M. Scherf et al., Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: A novel context analysis approach, J MOL BIOL, 297(3), 2000, pp. 599-606
Citations number
20
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
297
Issue
3
Year of publication
2000
Pages
599 - 606
Database
ISI
SICI code
0022-2836(20000331)297:3<599:HSLOPR>2.0.ZU;2-K
Abstract
We present a new algorithm called PromoterInspector to locate eukaryotic po lymase II promoter regions in large genomic sequences with a high degree of specificity. PromoterInspector focuses on the genetic context of promoters , rather than their exact location. Application of PromoterInspector can se rve as a crucial pre-processing step for other methods to locate exactly, o r to analyze promoters. PromoterInspector does not depend on heuristics, because it is purely based on libraries of IUPAC words extracted from training sequences by an unsupe rvised learning approach. We compared PromoterInspector to in silico promoter prediction tools using the sequences from the review by J. W. Fickett. PromoterInspector compared favourably on Fickett's evaluation scheme. A true positive to false positiv e ratio of 2.3 was obtained, surpassing the best ratio of 0.6, reported for TSSG. The application of our method to several large genomic sequences of over 1.3 million base-pairs in total resulted in even more specific predict ions. The coverage of annotated promoters was comparable to other in silico promoter prediction methods, while the true positive predictions increased by up to 100 % of fetal matches. PromoterInspector scans 100 kb in less than one minute on a work-station, a nd thus is especially applicable for large genome analysis. The method is a vailable at http:// genomatix.gsf.de/cgi-bin/promoterinspector/promoterinsp ector.pl (C) 2000 Academic Press.