Mt. Laub et Dw. Smith, FINDING INTRON EXON SPLICE JUNCTIONS USING INFO, INTERRUPTION FINDER AND ORGANIZER/, Journal of computational biology, 5(2), 1998, pp. 307-321
Citations number
23
Categorie Soggetti
Mathematics,Biology,"Biochemical Research Methods",Mathematics,"Biothechnology & Applied Migrobiology
INFO, INterruption Finder and Organizer, has been used to find coding
sequence intron-exon splice junctions in human and other DNA by compar
ing the six conceptual translations of the input DNA sequence with seq
uences in protein databanks using a similarity matrix and windowing al
gorithm. Similarities detected both delineate position of the gene and
provide clues as to the function of the gene product. In addition to
use of a standard similarity matrix and windowing algorithm, INFO uses
two novel steps, the MiniLibrary and Reverse Sequence steps, to enhan
ce identification of small exons and to improve precision of junction
nucleotide delineation, Exons as small as about 30 bases can be reliab
ly found, and >90% of junctions are precisely identified when canonica
l splice junction information is used. With the MiniLibrary and Revers
e Sequence steps, INFO parameters need not be optimized by the user. I
n comparative test runs using 19 human DNA sequences, INFO found 108 o
f 111 exons, with 0 reported false positives, compared with 111 exons
and 51 false positives for BLASTX, 99 exons and 6 false positives for
GRAIL II, 77 exons and 24 false positives for GeneMark, 61 exons and 9
false positives for GeneID, and 105 exons and 6 false positives for P
ROCRUSTES, The correlation coefficient for finding and positioning the
se 111 exons was greater than 98% for INFO, Comparable results were ob
tained in test runs of 13 nonhuman DNA sequences. INFO is applicable t
o DNA from any species, will become more robust as sequence databanks
expand, and complements other heuristic approaches.