AUTOMATED GENE IDENTIFICATION IN LARGE-SCALE GENOMIC SEQUENCES

Citation
Y. Xu et Ec. Uberbacher, AUTOMATED GENE IDENTIFICATION IN LARGE-SCALE GENOMIC SEQUENCES, Journal of computational biology, 4(3), 1997, pp. 325-338
Citations number
24
Categorie Soggetti
Mathematical Methods, Biology & Medicine",Mathematics,Biology,"Biochemical Research Methods",Mathematics,"Biothechnology & Applied Migrobiology
ISSN journal
10665277
Volume
4
Issue
3
Year of publication
1997
Pages
325 - 338
Database
ISI
SICI code
1066-5277(1997)4:3<325:AGIILG>2.0.ZU;2-V
Abstract
Computational methods for gene identification in genomic sequences typ ically have two phases: coding region recognition and gene parsing, Wh ile there are a number of effective methods for recognizing coding reg ions (exons), parsing the recognized exons into proper gene structures , to a large extent, remains an unsolved problem, We have developed a computer program which can automatically parse the recognized exons in to gene models that are most consistent with the available Expressed S equence Tags (ESTs) and a set of biological heuristics, derived empiri cally, The gene modeling algorithm used in this program provides a gen eral framework for applying EST information so the modeling accuracy i mproves as the amount of available EST information increases, Based on preliminary tests on a number of large DNA sequences, using the dbEST database, we have observed that the algorithm can (1) accurately mode l complicated multiple gene structures, including embedded genes, (2) identify falsely-recognized exons and locate missed exons by the initi al exon recognition phase, and (3) make more accurate exon boundary pr edictions, if the necessary EST information is available, We have exte nded this EST-based gene modeling algorithm to model genes on unfinish ed DNA contigs at the end of the shotgun sequencing, This extended ver sion can automatically determine the orientations and the relative ord er of the DNA contigs (with gaps between them) using the available EST s as reference models, before the gene modeling phase.