We describe a tool for analyzing and annotating large genomic sequence
s containing introns. The analysis and annotation tool (AAT) includes
two sets of programs, one for comparing the query sequence with a prot
ein database and the other for comparing the query with a cDNA databas
e. Each set contains a fast database search program and a rigorous ali
gnment program. The database search program quickly identifies regions
of the query sequence that are similar to a database sequence. Then t
he alignment program constructs an optimal alignment for each region a
nd the database sequence. The alignment program also reports the coord
inates of exons in the query sequence. Pairwise alignments of the quer
y sequence with protein and cDNA database sequences are combined into
multiple sequence alignments, which provide a view of all protein and
cDNA sequences matching a query region. On a data set of 570 DNA seque
nces, AAT identified 94% of coding nucleotides correctly and 74% of ex
ons exactly. Results of analyzing a human BAC sequence with the AAT to
ol are also presented. The AAT tool reduces the labor-intensive work o
f locating the exons of the query sequence and improves the process of
defining intron-exon boundaries by using the wealth of available prot
ein and cDNA data. (C) 1997 Academic Press.