A FRAMESHIFT ERROR-DETECTION ALGORITHM FOR DNA-SEQUENCING PROJECTS

Citation
Ga. Fichant et Y. Quentin, A FRAMESHIFT ERROR-DETECTION ALGORITHM FOR DNA-SEQUENCING PROJECTS, Nucleic acids research, 23(15), 1995, pp. 2900-2908
Citations number
24
Categorie Soggetti
Biology
Journal title
ISSN journal
03051048
Volume
23
Issue
15
Year of publication
1995
Pages
2900 - 2908
Database
ISI
SICI code
0305-1048(1995)23:15<2900:AFEAFD>2.0.ZU;2-U
Abstract
During the determination of DNA sequences, frameshift errors are not t he most frequent but they are the most bothersome as they corrupt the amino acid sequence over several residues. Detection of such errors by sequence alignment is only possible when related sequences are found in the databases, To avoid this limitation, we have developed a new to ol based on the distribution of non-overlapping 3-tuples or 6-tuples i n the three frames of an ORF. The method relies upon the result of a c orrespondence analysis. It has been extensively tested on Bacillus sub tilis and Saccharomyces cerevisiae sequences and has also been examine d with human sequences. The results indicate that it can detect frames hift errors affecting as few as 20 bp with a low rate of false positiv es (no more than 1.0/1000 bp scanned). The proposed algorithm can be u sed to scan a large collection of data, but it is mainly intended for laboratory practice as a tool for checking the quality of the sequence s produced during a sequencing project.