ITA
ENG

An efficient test for comparing sequence diversity between two populations

Authors

Gilbert, PB Novitsky, VA Montano, MA Essex, M

Citation

Pb. Gilbert et al., An efficient test for comparing sequence diversity between two populations, J COMPUT BI, 8(2), 2001, pp. 123-139

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

JOURNAL OF COMPUTATIONAL BIOLOGY

ISSN journal

10665277 → ACNP

Volume

Issue

Year of publication

2001

Pages

123 - 139

Database

ISI

SICI code

1066-5277(2001)8:2<123:AETFCS>2.0.ZU;2-B

Abstract

We address the problem of comparing interindividual genomic sequence divers ity between two populations. Although the methods are general, for concrete ness we focus on comparing two human immunodeficiency virus (HIV) infected populations. From a viral isolate(s) taken from each individual in a sample of persons from each population, suppose one or multiple measurements are made on the genetic sequence of a coding region of HIV. Given a definition of genetic distance between sequences, the goal is to test if the distribut ion of interindividual distances differs between populations. If distances between all pairs of sequences within each group are used, then data-depend encies arising from the use of multiple sequences from individuals invalida tes the use of a standard two-sample test such as the t-test. Where this pr oblem has been recognized, a typical solution has been to apply a standard test to a reduced dataset comprised of one sequence or a consensus sequence from each patient. Disadvantages of this procedure are that the conclusion of the test depends on the choice of utilized sequences, often an arbitrar y decision, and exclusion of replicate sequences from the analysis may need lessly sacrifice statistical power. We present a new test free of these dra wbacks, which is based on a statistic that linearly combines all possible s tandard test statistics calculated from independent sequence subsamples. We describe statistical power advantages of the test and illustrate its use b y application to nucleotide sequence distances measured from HIV-1 infected populations in southern Africa (GenBank accession numbers AF110959-AF11098 1) and North America/Europe. The test makes minimal assumptions, is maximal ly efficient and objective, and is broadly applicable.