Computational methods for automated genome annotation are critical to our c
ommunity's ability to make full use of the large volume of genomic sequence
being generated and released. To explore the accuracy of these automated f
eature prediction tools in the genomes of higher organisms, we evaluated th
eir performance on a large, well-characterized sequence contig from the Adh
region of Drosophila melanogaster. This experiment, known as the Genome An
notation Assessment Project (GASP), was launched in May 1999. Twelve groups
, applying state-of-the-art tools, contributed predictions for features inc
luding gene structure, protein homologies, promoter sires, and repeat eleme
nts. We evaluated these predictions using two standards, one based on previ
ously unreleased high-quality full-length cDNA sequences and a second based
on the set of annotations generated as part of an in-depth study of the re
gion by a group of Drosophila experts. Although these standard sets only ap
proximate the unknown distribution of Features in this region, we believe t
hat when taken in context the results of an evaluation based on them are me
aningful. The results were presented as a tutorial at the conference on int
elligent Systems in Molecular Biology (ISMB-99) in August 1999. Over 95% of
the coding nucleotides in the region were correctly identified by the majo
rity of the gene finders, and the correct intron/exon structures were predi
cted For >40% of the genes. Homology-based annotation techniques recognized
and associated functions with almost half of the genes in the region; the
remainder were only identified by the ab initio techniques. This experiment
also presents the first assessment of promoter prediction techniques for a
significant number of genes in a large contiguous region. We discovered th
at the promoter predictors' high false-positive rates make their prediction
s difficult to use. Integrating gene Finding and cDNA/EST alignments with p
romoter predictions decreases the number of false-positive classifications
but discovers less than one-third of the promoters in the region. We believ
e that by establishing standards for evaluating genomic annotations and by
assessing the performance of existing automated genome annotation tools, th
is experiment establishes a baseline that contributes to the value of ongoi
ng large-scale annotation projects and should guide further research in gen
ome informatics.