AN OVERVIEW OF TECHNIQUES FOR DEALING WITH LARGE NUMBERS OF INDEPENDENT VARIABLES IN EPIDEMIOLOGIC STUDIES

Citation
Ir. Dohoo et al., AN OVERVIEW OF TECHNIQUES FOR DEALING WITH LARGE NUMBERS OF INDEPENDENT VARIABLES IN EPIDEMIOLOGIC STUDIES, Preventive veterinary medicine, 29(3), 1997, pp. 221-239
Citations number
24
Categorie Soggetti
Veterinary Sciences
ISSN journal
01675877
Volume
29
Issue
3
Year of publication
1997
Pages
221 - 239
Database
ISI
SICI code
0167-5877(1997)29:3<221:AOOTFD>2.0.ZU;2-M
Abstract
Many studies of health and production problems in livestock involve th e simultaneous evaluation of large numbers of risk factors. These anal yses may be complicated by a number of problems including: multicollin earity (which arises because many of the risk factors may be related ( correlated) to each other), confounding, interaction, problems related to sample size (and hence the power of the study), and the fact that many associations are evaluated from a single dataset. This paper focu ses primarily on the problem of multicollinearity and discusses a numb er of techniques for dealing with this problem. However, some of the t echniques discussed may also help to deal with the other problems iden tified above. The first general approach to dealing with multicollinea rity involves reducing the number of independent variables prior to in vestigating associations with the disease. Techniques to accomplish th is include: (1) excluding variables after screening for associations a mong independent variables; (2) creating indices or scores which combi ne data from multiple factors into a single variable; (3) creating a s maller set of independent variables through the use of multivariable t echniques such as principal components analysis or factor analysis. Th e second general approach is to use appropriate steps and statistical techniques to investigate associations between the independent variabl es and the dependent variable. A preliminary screening of these associ ations may be performed using simple statistical tests. Subsequently, multivariable techniques such as linear or logistic regression or corr espondence analysis can be used to identify important associations. Th e strengths and limitations of these techniques are discussed and the techniques are demonstrated using a dataset from a recent study of ris k factors for pneumonia in swine. Emphasis is placed on comparing corr espondence analysis with other techniques as it has been used less in the epidemiology literature.