Detecting homogeneous segments in DNA sequences by using hidden Markov models

Citation
Rj. Boys et al., Detecting homogeneous segments in DNA sequences by using hidden Markov models, J ROY STA C, 49, 2000, pp. 269-285
Citations number
23
Categorie Soggetti
Mathematics
Journal title
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS
ISSN journal
00359254 → ACNP
Volume
49
Year of publication
2000
Part
2
Pages
269 - 285
Database
ISI
SICI code
0035-9254(2000)49:<269:DHSIDS>2.0.ZU;2-6
Abstract
In recent years there has been a rapid growth in the amount of DNA being se quenced and in its availability through genetic databases. Statistical tech niques which identify structure within these sequences can be of considerab le assistance to molecular biologists particularly when they incorporate th e discrete nature of changes caused by evolutionary processes. This paper f ocuses on the detection of homogeneous segments within heterogeneous DNA se quences. In particular, we study an intron from the chimpanzee alpha-fetopr otein gene; this protein plays an important role in the embryonic developme nt of mammals. We present a Bayesian solution to this segmentation problem using a hidden Markov model implemented by Markov chain Monte Carlo methods . We consider the important practical problem of specifying informative pri or knowledge about sequences of this type. Two Gibbs sampling algorithms ar e contrasted and the sensitivity of the analysis to the prior specification is investigated. Model selection and possible ways to overcome the label s witching problem are also addressed. Our analysis of intron 7 identifies th ree distinct homogeneous segment types, two of which occur in more than one region, and one of which is reversible.