Gene structure prediction by spliced alignment of genomic DNA with proteinsequences: Increased accuracy by differential splice site scoring

Citation
J. Usuka et V. Brendel, Gene structure prediction by spliced alignment of genomic DNA with proteinsequences: Increased accuracy by differential splice site scoring, J MOL BIOL, 297(5), 2000, pp. 1075-1085
Citations number
34
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
297
Issue
5
Year of publication
2000
Pages
1075 - 1085
Database
ISI
SICI code
0022-2836(20000414)297:5<1075:GSPBSA>2.0.ZU;2-Y
Abstract
Gene identification in genomic DNA from eukaryotes is complicated by the va st combinatorial possibilities of potential exon assemblies. If the gene en codes a protein that is closely related to known proteins, gene identificat ion is aided by matching similarity of potential translation products to th ose target proteins. The genomic DNA and protein sequences can be aligned d irectly by scoring the implied residues of in-frame nucleotide triplets aga inst the protein residues in conventional ways, while allowing for long gap s in the alignment corresponding to introns in the genomic DNA. We describe a novel method for such spliced alignment. The method derives an optimal a lignment based on scoring for both sequence similarity of the predicted gen e product to the protein sequence and intrinsic splice site strength of the predicted introns. Application of the method to a representative set of 50 known genes from Arabidopsis thaliana showed significant improvement in pr ediction accuracy compared to previous spliced alignment methods. The metho d is also more accurate than ab initio gene prediction methods, provided su fficiently close target proteins are available. In view of the fast growth of public sequence repositories, we argue that close targets will be availa ble for the majority of novel genes, making spliced alignment an excellent practical tool for high-throughput automated genome annotation. (C) 2000 Ac ademic Press.