The identification of heterogeneity in effects between studies is a key iss
ue in meta-analyses of observational studies, since it is critical for dete
rmining whether it is appropriate to pool the individual results into one s
ummary measure. The result of a hypothesis test is often used as the decisi
on criterion. In this paper, the authors use a large simulation study patte
rned from the key features of five published epidemiologic meta-analyses to
investigate the type I error and statistical power of five previously prop
osed asymptotic homogeneity tests, a parametric bootstrap version of each o
f the tests, and tau(2)-bootstrap, a test proposed by the authors. The resu
lts show that the asymptotic DerSimonian and Laird Q statistic and the boot
strap versions of the other tests give the correct type I error under the n
ull hypothesis but that all of the tests considered have low statistical po
wer, especially when the number of studies included in the meta-analysis is
small (<20). From the point of view of validity, power, and computational
ease, the Q statistic is clearly the best choice. The authors found that th
e performance of all of the tests considered did not depend appreciably upo
n the value of the pooled odds ratio, both for size and for power. Because
tests for heterogeneity will often be underpowered, random effects models c
an be used routinely, and heterogeneity can be quantified by means of R-t,
the proportion of the total variance of the pooled effect measure due to be
tween-study variance, and CVB, the between-study coefficient of variation.