Erm. Tillier et Ra. Collins, NEIGHBOR JOINING AND MAXIMUM-LIKELIHOOD WITH RNA SEQUENCES - ADDRESSING THE INTERDEPENDENCE OF SITES, Molecular biology and evolution, 12(1), 1995, pp. 7-15
Intrastrand base pairings give ribosomal and other RNA molecules chara
cteristic structures that are important for their function. In order t
o maintain these structures, a substitution at one paired site may hav
e to be compensated for by an appropriate substitution at the compleme
ntary site. Thus paired sites do not evolve independently of one anoth
er. Most current methods for inferring phylogeny from molecular sequen
ces assume that the sites are independent and will therefore give stat
istically unreliable and possibly erroneous results when used on struc
tured RNA sequences. We analyze a new probabilistic model for the evol
ution of double-stranded RNA molecules that considers substitutions of
the base pairs rather than of each of the bases independently. The ne
w model, called the double-stranded model, was incorporated into the n
eighbor-joining distance and maximum likelihood methods. Computer simu
lations show that maximum likelihood is very robust to the violation o
f the assumption of the independence of sites. In contrast, the neighb
or-joining method is sensitive to such violations: the double-stranded
model can provide a significant increase in the chance of obtaining t
he correct tree topologies with neighbor joining when distances are la
rge and the tree is difficult to obtain. The new model also leads to l
ower but more realistic estimates for the statistical confidence in th
e branch lengths and tree topologies.