We address the problem of comparing interindividual genomic sequence divers
ity between two populations. Although the methods are general, for concrete
ness we focus on comparing two human immunodeficiency virus (HIV) infected
populations. From a viral isolate(s) taken from each individual in a sample
of persons from each population, suppose one or multiple measurements are
made on the genetic sequence of a coding region of HIV. Given a definition
of genetic distance between sequences, the goal is to test if the distribut
ion of interindividual distances differs between populations. If distances
between all pairs of sequences within each group are used, then data-depend
encies arising from the use of multiple sequences from individuals invalida
tes the use of a standard two-sample test such as the t-test. Where this pr
oblem has been recognized, a typical solution has been to apply a standard
test to a reduced dataset comprised of one sequence or a consensus sequence
from each patient. Disadvantages of this procedure are that the conclusion
of the test depends on the choice of utilized sequences, often an arbitrar
y decision, and exclusion of replicate sequences from the analysis may need
lessly sacrifice statistical power. We present a new test free of these dra
wbacks, which is based on a statistic that linearly combines all possible s
tandard test statistics calculated from independent sequence subsamples. We
describe statistical power advantages of the test and illustrate its use b
y application to nucleotide sequence distances measured from HIV-1 infected
populations in southern Africa (GenBank accession numbers AF110959-AF11098
1) and North America/Europe. The test makes minimal assumptions, is maximal
ly efficient and objective, and is broadly applicable.