Using data from a meta-analysis of the effects of oestrogen replacemen
t therapy on the development of breast cancer, we compared alternative
methods for combining dose-response slopes from epidemiological studi
es. We evaluated issues related both to summarizing data from single s
tudies and to combining results from multiple studies. Findings relate
d to the analysis of individual dose-response studies include: (I) a m
ethod of weighing studies that gives greater influence to dose-respons
e slopes that conform to the linear relation of relative risk to durat
ion can lead to large differences in calculated weights as a function
of non-linearity; (2) a regression model using a variable-intercept re
sulted in a mean dose-response slope that increased as much as threefo
ld when compared with the values obtained with a zero-intercept model.
When combining results from multiple studies, we found: (1) calculati
ng standard errors of mean dose-response slopes by methods that allow
for both among-study and within-study variability (a random-effects ty
pe model) gave values different from a method that assumes homogeneity
and equal within-study precision (a fixed-effects model); (2) the ran
dom-effects model gives mean and standard error results most similar t
o a bootstrap resampling method as increasing heterogeneity is observe
d (however, this model could give biased mean estimates compared with
the bootstrap method); (3) a components-of-variance model compares fav
ourably with the bootstrap and is easier to apply than the random-effe
cts model. Based on these findings, we recommend the use of methods wh
ich incorporate heterogeneity to guard against underestimating the sta
ndard error. However, caution is urged because bias in point estimates
can occur if extreme heterogeneity is present. Two other observations
affect the interpretation of data combined from multiple studies. Fir
st, inclusion into a model of quality scores assigned by blinded revie
wers had little effect on the mean dose-response slope and its standar
d error. Second, the number of studies required to achieve desired sta
tistical power, varies with effect size.