Motivation: The underlying error rate for genomic sequencing sometimes
results in the introduction of artificial frameshifts and in-frame st
op codons into putative protein encoding genes. Severe errors are then
introduced into the inferred transcripts through mis-translation or p
remature termination. Results: We describe a system for screening segm
ents of DNA for frameshift and in-fame stop errors in coding regions.
The method is based on homology matching using blastx to compare all s
ix reading frames of the query nucleotide sequence against selected pr
otein sequence databases. Fragments of protein matching neighbouring r
egions of the query DNA are united and extended laterally to define ca
ndidate open reading frames, within which, frameshifts and stops are i
dentified. Suitable targets include prokaryotic or other intron-free g
enomic sequence and complementary DNAs. As an example of its use, we r
eport here two frameshifted ORFs that deviate from the original TIGR s
equence annotations for the recently released Helicobacter pylori geno
me.