H. Salamon et al., Detection of deleted genomic DNA using a semiautomated computational analysis of GeneChip data, GENOME RES, 10(12), 2000, pp. 2044-2054
Genomic diversity within and between populations is caused by single nucleo
tide mutations, changes in repetitive DNA systems, recombination mechanisms
, and insertion and deletion events. The contribution of these sources to d
iversity, whether purely genetic or of phenotypic consequence, can only be
investigated if we have the means to quantitate and characterize diversity
in many samples. With the advent of complete sequence characterization of r
epresentative genomes of different species, the possibility of developing p
rotocols to screen for genetic polymorphism across entire genomes is active
ly being pursued. The large numbers of measurements such approaches yield d
emand that we pay careful attention to the numerical analysis of data, in t
his paper we present a novel application of an Affymetrix GeneChip to perfo
rm genome-wide screens for deletion polymorphism. A high-density oligonucle
otide array formatted for mRNA expression and targeted at a fully sequenced
4.4-million-base pair Mycobacterium tuberculosis standard strain genome wa
s adapted to compare genomic DNA. Hybridization intensities to 111,000 prob
e pairs (perfect complement and mismatch complement) were measured for geno
mic DNA from a clinical strain and from a vaccine organism. Because individ
ual probe-pair hybridization intensities exhibit limited sensitivity/specif
icity characteristics to detect deletions, data-analytical methodology to e
xploit measurements from multiple probes in tandem locations across the gen
ome was developed. The TSTEP (Tandem Set Terminal Extreme Probability) algo
rithm designed specifically to analyze the tandem hybridization measurement
s data was applied and shown to discover genomic deletions with high sensit
ivity. The TSTEP algorithm provides a foundation for similar efforts to cha
racterize deletions in many hybridization measures in similar-sized and lar
ger genomes. Issues relating to the design of genome content screening expe
riments and the implications of these methods for studying population genom
ics and the evolution of genomes are discussed.