Combining contingency tables with missing dimensions

Authors
Citation
F. Dominici, Combining contingency tables with missing dimensions, BIOMETRICS, 56(2), 2000, pp. 546-553
Citations number
31
Categorie Soggetti
Biology,Multidisciplinary
Journal title
BIOMETRICS
ISSN journal
0006341X → ACNP
Volume
56
Issue
2
Year of publication
2000
Pages
546 - 553
Database
ISI
SICI code
0006-341X(200006)56:2<546:CCTWMD>2.0.ZU;2-V
Abstract
We propose a methodology for estimating the cell probabilities in a multiwa y contingency table by combining partial information from a number of studi es when not all of the variables are recorded in all studies. We jointly mo del the full set of categorical variables recorded in at least one of the s tudies, and we treat the variables that are not reported as missing dimensi ons of the study-specific contingency table. For example, ae might be inter ested in combining several cohort studies in which the incidence in the exp osed and nonexposed groups is not reported for all risk factors in all stud ies while the overall numbers of cases and cohort size is always available. To account for study-to-study variability, we adopt a Bayesian hierarchica l model. At the first stage of the model. the observation stage, data are m odeled by a multinomial distribution with fixed total number of observation s. At the second stage, we use the logistic normal (LN) distribution to mod el variability in the studs-specific cells' probabilities. Using this model and data augmentation techniques, we reconstruct the contingency table for each study regardless of which dimensions are missing, and we estimate pop ulation parameters of interest. Our hierarchical procedure harrows strength from all the studies and accounts for correlations among the cells' probab ilities. The main difficulty in combining studies recording different varia bles is in maintaining a consistent interpretation of parameters across stu dies. The approach proposed here overcomes this difficulty and at the same time addresses the uncertainty arising from the missing dimensions. We appl y our modeling strategy to analyze data on air pollution and mortality from 1987 to 1994 for six U.S. cities bg combining six cross-classification of low. medium, and high levels of mortality counts, particulate matter, ozone , and carbon monoxide with the complication that four of the six cities do not report all the air pollution variables. Our goals are to investigate th e association between air pollution and mortality by reconstructing the tab les with missing dimensions, to determine the most harmful pollutant combin ations, and to make predictions about these key issues for a city other tha n the six sampled. We find that, for high levels of ozone and carbon monoxi de, the number of cases with a high number of deaths increases as the level s of particulate matter, PM10, increases arid that the most harmful combina tions corresponds to high levels of PM10, confirming prior findings that le vels of PM10 higher than the NAAQS standard are harmful.