C. Medigue et al., Detecting and analyzing DNA sequencing errors: Toward a higher quality of the Bacillus subtilis genome sequence, GENOME RES, 9(11), 1999, pp. 1116-1127
During the determination of a DNA sequence, the introduction of artifactual
frameshifts and/or in-frame stop codons in putative genes can lead to misp
rediction of gene products. Detection of such errors with a methods based o
n protein similarity matching is only possible when related sequences are a
vailable in databases. Hele, we present a method to detect frameshift error
s in DNA sequences that is based on the intrinsic propel ties of the coding
sequences. It combines the results of two analyses, the search for transla
tional initiation/termination sites and the prediction of coding regions. T
his method was used to screen the complete Bacillus subtilis genome sequenc
e and the regions flanking putative errors were resequenced for verificatio
n. This procedure allowed us to correct the sequence and to analyze in deta
il the nature of the errors. Interestingly, in several cases in-flame termi
nation codons or frameshifts were not sequencing errors but confirmed to be
present in the chromosome, indicating that the genes are either nonfunctio
nal [pseudogenes] or subject to regulatory processes such as programmed tra
nslational frameshifts. The method can be used for checking the quality of
the sequences produced by any prokaryotic genome sequencing project.