Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data

Citation
Zhang, Bo et al., Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data, Biostatistics (Oxford. Print) , 13(1), 2012, pp. 74-88
ISSN journal
14654644
Volume
13
Issue
1
Year of publication
2012
Pages
74 - 88
Database
ACNP
SICI code
Abstract
High-dimensional biomarker data are often collected in epidemiological studies when assessing the association between biomarkers and human disease is of interest.We develop a latent class modeling approach for joint analysis of high-dimensional semicontinuous biomarker data and a binary disease outcome.To model the relationship between complex biomarker expression patterns and disease risk, we use latent risk classes to link the 2 modeling components.We characterize complex biomarker-specific differences through biomarker-specific random effects, so that different biomarkers can have different baseline (low-risk) values as well as different between-class differences.The proposed approach also accommodates data features that are common in environmental toxicology and other biomarker exposure data, including a large number of biomarkers, numerous zero values, and complex mean.variance relationship in the biomarkers levels.A Monte Carlo EM (MCEM) algorithm is proposed for parameter estimation.Both the MCEM algorithm and model selection procedures are shown to work well in simulations and applications.In applying the proposed approach to an epidemiological study that examined the relationship between environmental polychlorinated biphenyl (PCB) exposure and the risk of endometriosis, we identified a highly significant overall effect of PCB concentrations on the risk of endometriosis.