J. Kowalski, A non-parametric approach to translating gene region heterogeneity associated with phenotype into location heterogeneity, BIOINFORMAT, 17(9), 2001, pp. 775-790
Motivation: The analysis of genetic data poses statistical problems in the
form of high dimensionality with small sample sizes. The construction of a
composite gene region (sequence pair) heterogeneity measure is one techniqu
e for reducing the dimensionality of the problem. This approach however is
not without cost, since the contribution of locations to observed gene regi
on differences between groups becomes entangled in this summary measure. Th
is is problematic since it is of scientific interest to identify locations
that together depict phenotype.
Results: A method is proposed for relating observed gene region heterogenei
ty back to the location level. In the spirit of a factor analysis-type sett
ing, the approach focuses on identifying a latent variable structure among
locations to explain within and between group genetic differences associate
d with phenotype. The method is flexible for identifying either the additiv
e contribution from individual locations or the additive contribution from
a group of locations, to observed gene region heterogeneity, depending upon
the weighting scheme used in constructing a gene region heterogeneity meas
ure. The approach is illustrated with clinical trial data, where the proble
m of altered HIV drug susceptibility is examined through characterizing loc
ation contributions to HIV protease gene region differences associated with
a phenotypic treatment response.