Sl. Salzberg, A METHOD FOR IDENTIFYING SPLICE SITES AND TRANSLATIONAL START SITES IN EUKARYOTIC MESSENGER-RNA, Computer applications in the biosciences, 13(4), 1997, pp. 365-376
This paper describes a new method for determining the consensus sequen
ces that signal the start of translation and the boundaries between ex
ons and introns (donor and acceptor sites) in eukaryotic mRNA. The met
hod takes into account the dependencies between adjacent bases, in con
trast to the usual technique of considering each position independentl
y. When coupled with a dynamic program to compute the most likely sequ
ence, new consensus sequences emerge. The consensus sequence informati
on is summarized in conditional probability matrices which, when used
to locate signals in uncharacterized genomic DNA, have greater sensiti
vity and specificity than conventional matrices. Species-specific vers
ions of these matrices are especially effective at distinguishing true
and false sites.