Ib. Rogozin et al., GENE STRUCTURE PREDICTION USING INFORMATION ON HOMOLOGOUS PROTEIN-SEQUENCE, Computer applications in the biosciences, 12(3), 1996, pp. 161-170
In this paper a new approach for the prediction of protein coding gene
structures is described. The principal scheme of prediction is as fol
lows: first, the exons with the best potential are predicted in a sequ
ence with unknown functions and a list of potential amino acid fragmen
ts coded by these exons is formed. Second testing the homology between
each amino acid fragment from the list and proteins from the SWISS-PR
OT database of amino acid sequences. One protein with the best homolog
y is chosen out of all the homologous sequences. Third, reconstruction
of the exon-intron structure, basing if on its homology, with the cho
sen protein sequences. The method was tested on art independent contro
l set (20 genes). The results were as follows: 21% of real exons were
lost and 3% of non-real exons were found. This system can be used to r
efine the results of gene prediction systems, especially if highly hom
ologous proteins are found in the amino acid sequence database.