A simple and general homology-based method for gene finding was applied to
the 2.9-Mb Drosophila melanogaster Adh region, the target sequence of the G
enome Annotation Assessment Project (GASP). Each strand of the entire seque
nce was used as query of the BLOCKS database of conserved regions of protei
ns. This led to functional assignments For more than one-third of the genes
and two-thirds of the transposons. Considering the enormous size of the qu
ery, the fact that only two false-positive matches were reported emphasizes
the high selectivity of protein family-based methods for gene finding. We
used the search results to improve BLOCKS+ by identifying compositionally b
iased blocks. Our results confirm that protein family databases can be used
effectively in automated sequence annotation efforts.