Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome

Citation
S. Gopal et al., Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome, NAT GENET, 27(3), 2001, pp. 337-340
Citations number
26
Categorie Soggetti
Molecular Biology & Genetics
Journal title
NATURE GENETICS
ISSN journal
10614036 → ACNP
Volume
27
Issue
3
Year of publication
2001
Pages
337 - 340
Database
ISI
SICI code
1061-4036(200103)27:3<337:HAY1NC>2.0.ZU;2-L
Abstract
The approach to annotating a genome critically affects the number acid accu racy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes f ollowed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that c ombines the merits of stringent genome annotation with the benefits of over -prediction. First we identify plausible genes regardless of matches with E ST, cDNA or protein sequences from the organism (stage 1). In the second st age, proteins predicted from the plausible genes are compared at the protei n level with EST. cDNA and protein sequences, and protein structures from o ther organisms (stage 2). Remote but biologically meaningful protein sequen ce or structure homologies provide supporting evidence for genuine genes. T he method, applied to the Drosophila melanogaster genome, validated 1,042 n ovel candidate genes after filtering 19,410 plausible genes, of which 12,12 4 matched the original 13,601 annotated genes(1). This annotation strategy is applicable to genomes of all organisms, including human.