Bayesian mixture modeling using a hybrid sampler with application to protein subfamily identification

Citation
Fong, Youyi et al., Bayesian mixture modeling using a hybrid sampler with application to protein subfamily identification, Biostatistics (Oxford. Print) , 11(1), 2010, pp. 18-33
ISSN journal
14654644
Volume
11
Issue
1
Year of publication
2010
Pages
18 - 33
Database
ACNP
SICI code
Abstract
Predicting protein function is essential to advancing our knowledge of biological processes.This article is focused on discovering the functional diversification within a protein family.A Bayesian mixture approach is proposed to model a protein family as a mixture of profile hidden Markov models.For a given mixture size, a hybrid Markov chain Monte Carlo sampler comprising both Gibbs sampling steps and hierarchical clustering.based split/merge proposals is used to obtain posterior inference. Inference for mixture size concentrates on comparing the integrated likelihoods.The choice of priors is critical with respect to the performance of the procedure.Through simulation studies, we show that 2 priors that are based on independent data sets allow correct identification of the mixture size, both when the data are homogeneous and when the data are generated from a mixture.We illustrate our method using 2 sets of real protein sequences.