M. Scherf et al., Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: A novel context analysis approach, J MOL BIOL, 297(3), 2000, pp. 599-606
We present a new algorithm called PromoterInspector to locate eukaryotic po
lymase II promoter regions in large genomic sequences with a high degree of
specificity. PromoterInspector focuses on the genetic context of promoters
, rather than their exact location. Application of PromoterInspector can se
rve as a crucial pre-processing step for other methods to locate exactly, o
r to analyze promoters.
PromoterInspector does not depend on heuristics, because it is purely based
on libraries of IUPAC words extracted from training sequences by an unsupe
rvised learning approach.
We compared PromoterInspector to in silico promoter prediction tools using
the sequences from the review by J. W. Fickett. PromoterInspector compared
favourably on Fickett's evaluation scheme. A true positive to false positiv
e ratio of 2.3 was obtained, surpassing the best ratio of 0.6, reported for
TSSG. The application of our method to several large genomic sequences of
over 1.3 million base-pairs in total resulted in even more specific predict
ions. The coverage of annotated promoters was comparable to other in silico
promoter prediction methods, while the true positive predictions increased
by up to 100 % of fetal matches.
PromoterInspector scans 100 kb in less than one minute on a work-station, a
nd thus is especially applicable for large genome analysis. The method is a
vailable at http:// genomatix.gsf.de/cgi-bin/promoterinspector/promoterinsp
ector.pl (C) 2000 Academic Press.