ITA
ENG

A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames

Authors

Pedersen, AMK Jensen, JL

Citation

Amk. Pedersen et Jl. Jensen, A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames, MOL BIOL EV, 18(5), 2001, pp. 763-776

Citations number

Categorie Soggetti

Biology,"Experimental Biology

Journal title

MOLECULAR BIOLOGY AND EVOLUTION

ISSN journal

07374038 → ACNP

Volume

Issue

Year of publication

2001

Pages

763 - 776

Database

ISI

SICI code

0737-4038(200105)18:5<763:ADMAAM>2.0.ZU;2-9

Abstract

We present a model and methodology for the maximum-likelihood analysis of p airwise alignments of DNA sequences in which two genes are encoded in overl apping reading frames. In the model for the substitution process, the insta ntaneous rates of substitution are allowed to depend on the nucleotides occ upying the sites in a neighborhood of the site subject to substitution at t he instant of the substitution. By defining the neighborhood of a site to e xtend over all sites in the codons in both reading frames to which a site b elongs, constraints imposed by the genetic code in both reading frames can be taken into account. Due to the dependency of the instantaneous rates of substitution on the states at neighboring sires, the transition probability between sequences does not factorize and therefore cannot be obtained dire ctly. We present a Markov chain Monte Carlo procedure for obtaining the rat io of two transition probabilities between two sequences under the model co nsidered, and we describe how maximum-likelihood parameter estimation and l ikelihood ratio tests can be performed using the procedure. We describe how the expected numbers of different types of substitutions in the shared his tory of two sequences can be calculated, and we use the described model and methodology in an analysis of a pairwise alignment of two hepatitis B sequ ences in which two genes are encoded in overlapping frames. Finally, we pre sent an extended model, together with a simpler approximate estimation proc edure, and use this to test the adequacy of the former model.