Da. Greenberg et al., Simulated data for a complex genetic trait (problem 2 for GAW11): How the model was developed, and why, GENET EPID, 17, 1999, pp. S449-S459
This paper describes a simulated data set created as Problem 2 for GAW11. T
he generating model for Problem 2 involved two different genetic diseases,
or "types," in three separate populations. The two-locus (2L) type results
from the epistatic interaction of two genetic loci, and the three-allele ty
pe, from a single locus with two disease-causing alleles and one normal all
ele. Each type has two phenotypic forms: Mild and Severe. Both forms are su
bject to both genetic and environmental influences. The disease occurs in t
hree different hypothetical populations, each with different disease allele
frequencies and penetrances. In two populations there is also a fourth loc
us with an allele that is associated with the 2L type. Misdiagnosis can occ
ur, but only after a family has already been ascertained through greater th
an or equal to 2 "genetically" affected offspring. Finally, the three diffe
rent populations are studied by four different hypothetical research groups
. These groups each have their own ideas about how the disease is inherited
and have therefore devised different ascertainment schemes based on those
beliefs. Each research group collected 100-family data sets, including data
on 300 markers on six chromosomes and measurements on disease status and o
n the proposed two environmental factors. GAW participants were supplied wi
th 25 random replicates of each data set. (C) 1999 Wiley-Liss, Inc.