Ir. Dohoo et al., AN OVERVIEW OF TECHNIQUES FOR DEALING WITH LARGE NUMBERS OF INDEPENDENT VARIABLES IN EPIDEMIOLOGIC STUDIES, Preventive veterinary medicine, 29(3), 1997, pp. 221-239
Many studies of health and production problems in livestock involve th
e simultaneous evaluation of large numbers of risk factors. These anal
yses may be complicated by a number of problems including: multicollin
earity (which arises because many of the risk factors may be related (
correlated) to each other), confounding, interaction, problems related
to sample size (and hence the power of the study), and the fact that
many associations are evaluated from a single dataset. This paper focu
ses primarily on the problem of multicollinearity and discusses a numb
er of techniques for dealing with this problem. However, some of the t
echniques discussed may also help to deal with the other problems iden
tified above. The first general approach to dealing with multicollinea
rity involves reducing the number of independent variables prior to in
vestigating associations with the disease. Techniques to accomplish th
is include: (1) excluding variables after screening for associations a
mong independent variables; (2) creating indices or scores which combi
ne data from multiple factors into a single variable; (3) creating a s
maller set of independent variables through the use of multivariable t
echniques such as principal components analysis or factor analysis. Th
e second general approach is to use appropriate steps and statistical
techniques to investigate associations between the independent variabl
es and the dependent variable. A preliminary screening of these associ
ations may be performed using simple statistical tests. Subsequently,
multivariable techniques such as linear or logistic regression or corr
espondence analysis can be used to identify important associations. Th
e strengths and limitations of these techniques are discussed and the
techniques are demonstrated using a dataset from a recent study of ris
k factors for pneumonia in swine. Emphasis is placed on comparing corr
espondence analysis with other techniques as it has been used less in
the epidemiology literature.