Detecting and analyzing DNA sequencing errors: Toward a higher quality of the Bacillus subtilis genome sequence

Citation
C. Medigue et al., Detecting and analyzing DNA sequencing errors: Toward a higher quality of the Bacillus subtilis genome sequence, GENOME RES, 9(11), 1999, pp. 1116-1127
Citations number
40
Categorie Soggetti
Molecular Biology & Genetics
Journal title
GENOME RESEARCH
ISSN journal
10889051 → ACNP
Volume
9
Issue
11
Year of publication
1999
Pages
1116 - 1127
Database
ISI
SICI code
1088-9051(199911)9:11<1116:DAADSE>2.0.ZU;2-W
Abstract
During the determination of a DNA sequence, the introduction of artifactual frameshifts and/or in-frame stop codons in putative genes can lead to misp rediction of gene products. Detection of such errors with a methods based o n protein similarity matching is only possible when related sequences are a vailable in databases. Hele, we present a method to detect frameshift error s in DNA sequences that is based on the intrinsic propel ties of the coding sequences. It combines the results of two analyses, the search for transla tional initiation/termination sites and the prediction of coding regions. T his method was used to screen the complete Bacillus subtilis genome sequenc e and the regions flanking putative errors were resequenced for verificatio n. This procedure allowed us to correct the sequence and to analyze in deta il the nature of the errors. Interestingly, in several cases in-flame termi nation codons or frameshifts were not sequencing errors but confirmed to be present in the chromosome, indicating that the genes are either nonfunctio nal [pseudogenes] or subject to regulatory processes such as programmed tra nslational frameshifts. The method can be used for checking the quality of the sequences produced by any prokaryotic genome sequencing project.