Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions

Citation
Mccallum, Kenneth Jordan et Wang, Ji-ping, Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions, Biostatistics (Oxford. Print) , 14(3), 2013, pp. 600-611
ISSN journal
14654644
Volume
14
Issue
3
Year of publication
2013
Pages
600 - 611
Database
ACNP
SICI code
Abstract
Copy number variations (CNVs) are a significant source of genetic variation and have been found frequently associated with diseases such as cancers and autism.High-throughput sequencing data are increasingly being used to detect and quantify CNVs; however, the distributional properties of the data are not fully understood.A hidden Markov model (HMM) is proposed using inhomogeneous emission distributions based on negative binomial regression to account for the sequencing biases.The model is tested on the whole genome sequencing data and simulated data sets.An algorithm for CNV detection is implemented in the R package CNVfinder.The model based on negative binomial regression is shown to provide a good fit to the data and provides competitive performance compared with methods based on normalization of read counts.