ITA
ENG

A QUALITY-CONTROL ALGORITHM FOR DNA-SEQUENCING PROJECTS

Authors

WHITE O DUNNING T SUTTON G ADAMS M VENTER JC FIELDS C

Citation

O. White et al., A QUALITY-CONTROL ALGORITHM FOR DNA-SEQUENCING PROJECTS, Nucleic acids research, 21(16), 1993, pp. 3829-3838

Citations number

Categorie Soggetti

Biology

Journal title

Nucleic acids research → ACNP

ISSN journal

03051048

Volume

Issue

Year of publication

1993

Pages

3829 - 3838

Database

ISI

SICI code

0305-1048(1993)21:16<3829:AQAFDP>2.0.ZU;2-E

Abstract

Heterologous DNA sequences from rearrangements with the genomes of hos t cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA . Hybridization methods can only detect contaminants from known or sus pected heterologous sources, and whole library screening is technicall y very difficult. Detection of contaminating heterologous clones by se quence alignment is only possible when related sequences are present i n a known database. We have developed a statistical test to identify h eterologous sequences that is based on the differences in hexamer comp osition of DNA from different organisms. This test does not require th at sequences similar to potential heterologous contaminants are presen t in the database, and can in principle detect contamination by previo usly unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a qu ality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contaminatio n in some public database sequences annotated as human. Results obtain ed with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.