Interpolated Markov chains for eukaryotic promoter recognition

Citation
U. Ohler et al., Interpolated Markov chains for eukaryotic promoter recognition, BIOINFORMAT, 15(5), 1999, pp. 362-369
Citations number
20
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
15
Issue
5
Year of publication
1999
Pages
362 - 369
Database
ISI
SICI code
1367-4803(199905)15:5<362:IMCFEP>2.0.ZU;2-U
Abstract
Motivation: We describe a new content-based approach for the detection of p romoter regions of eukaryotic protein encoding genes. Our system is based o n three interpolated markov chains (IMCs) of different order which are trai ned on coding, non-coding and promoter sequences. It was recently shown tha t the interpolation of Markov chains leads to stable parameters and improve s on the results in microbial gene finding (Salzberg et al., Nucleic Acids Res., 26, 544-548, 1998). Here, we present new methods for an automated est imation of optimal interpolation parameters and show how the IMCs can be ap plied to detect promoters in contiguous DNA sequences Our interpolation app roach can also be employed to obtain a reliable scoring function for human coding DNA regions, and the trained models can easily be incorporated in th e general framework for gene recognition systems. Results: a 5-fold cross-validation evaluation of our IMC approach on a repr esentative sequence set yielded a man correlation coefficient of 0.84 (prom oter versus coding sequences) and 0.53 (promoter versus non-coding sequence s). Applied to the task of eukaryotic promoter region identification in gen omic DNA sequences, our classifier identifies 50% of the promoter regions i n the sequences used in the most recent review and comparison by Fickett an d Hatzigeorgiou (Genome Res., 7, 861-878, 1997), while having a false-posit ive rate of 1/849 bp.