Vv. Solovyev et al., PREDICTING INTERNAL EXONS BY OLIGONUCLEOTIDE COMPOSITION AND DISCRIMINANT-ANALYSIS OF SPLICEABLE OPEN READING FRAMES, Nucleic acids research, 22(24), 1994, pp. 5156-5163
A new method which predicts internal exon sequences in human DNA has b
een developed. The method is based on a splice site prediction algorit
hm that uses the linear discriminant function to combine information a
bout significant triplet frequencies of various functional parts of sp
lice site regions and preferences of oligonucleotides in protein codin
g and intron regions. The accuracy of our splice site recognition func
tion is 97% for donor splice sites and 96% for acceptor splice sites.
For exon prediction, we combine in a discriminant function the charact
eristics describing the 5'-intron region, donor splice site, coding re
gion, acceptor splice site and 3'-intron region for each open reading
frame flanked by GT and AG base pairs. The accuracy of precise interna
l exon recognition on a test set of 451 exon and 246693 pseudoexon seq
uences is 77% with a specificity of 79%. The recognition quality compu
ted at the level of individual nucleotides is 89% for exon sequences a
nd 98% for intron sequences. This corresponds to a correlation coeffic
ient for exon prediction of 0.87. The precision of this approach is be
tter than other methods and has been tested on a larger data set. We h
ave also developed a means for predicting exon - exon junctions in cDN
A sequences, which can be useful for selecting optimal PCR primers.