Motivation: Sequencing of complete eukaryotic genomes and large syntenic fr
agments of genomes makes it possible to apply genomic comparison for gene r
ecognition.
Results: This paper describes a spliced alignment algorithm that aligns can
didate exon chains of two homologous genomic sequence fragments from differ
ent species. The algorithm is implemented in Pro-Gen software. Unlike other
algorithms, Pro-Gen does not assume conservation of the exon-intron struct
ure. Amino acid sequences obtained by the formal translation of candidate e
xons are aligned instead of nucleotide sequences, which allows for distant
comparisons. The algorithm was tested on a sample of human-mammal (mouse),
human-vertebrate (Xenopus) and human-invertebrate (Drosophila) gene pairs.
Surprisingly, the best results, 97-98% correlation between the actual and p
redicted genes, were obtained for more distant comparisons, whereas the cor
relation on the human-mouse sample was only 93%. The latter value increases
to 95% if conservation of the exon-intron structure is assumed. This is ca
used by a large amount of sequence conservation in non-coding regions of th
e human and mouse genes probably due to regulatory elements.