ITA
ENG

FINDING INTRON EXON SPLICE JUNCTIONS USING INFO, INTERRUPTION FINDER AND ORGANIZER/

Authors

LAUB MT SMITH DW

Citation

Mt. Laub et Dw. Smith, FINDING INTRON EXON SPLICE JUNCTIONS USING INFO, INTERRUPTION FINDER AND ORGANIZER/, Journal of computational biology, 5(2), 1998, pp. 307-321

Citations number

Categorie Soggetti

Mathematics,Biology,"Biochemical Research Methods",Mathematics,"Biothechnology & Applied Migrobiology

Journal title

Journal of computational biology → ACNP

ISSN journal

10665277

Volume

Issue

Year of publication

1998

Pages

307 - 321

Database

ISI

SICI code

1066-5277(1998)5:2<307:FIESJU>2.0.ZU;2-V

Abstract

INFO, INterruption Finder and Organizer, has been used to find coding sequence intron-exon splice junctions in human and other DNA by compar ing the six conceptual translations of the input DNA sequence with seq uences in protein databanks using a similarity matrix and windowing al gorithm. Similarities detected both delineate position of the gene and provide clues as to the function of the gene product. In addition to use of a standard similarity matrix and windowing algorithm, INFO uses two novel steps, the MiniLibrary and Reverse Sequence steps, to enhan ce identification of small exons and to improve precision of junction nucleotide delineation, Exons as small as about 30 bases can be reliab ly found, and >90% of junctions are precisely identified when canonica l splice junction information is used. With the MiniLibrary and Revers e Sequence steps, INFO parameters need not be optimized by the user. I n comparative test runs using 19 human DNA sequences, INFO found 108 o f 111 exons, with 0 reported false positives, compared with 111 exons and 51 false positives for BLASTX, 99 exons and 6 false positives for GRAIL II, 77 exons and 24 false positives for GeneMark, 61 exons and 9 false positives for GeneID, and 105 exons and 6 false positives for P ROCRUSTES, The correlation coefficient for finding and positioning the se 111 exons was greater than 98% for INFO, Comparable results were ob tained in test runs of 13 nonhuman DNA sequences. INFO is applicable t o DNA from any species, will become more robust as sequence databanks expand, and complements other heuristic approaches.