ALIGNMENTS OF DNA AND PROTEIN SEQUENCES CONTAINING FRAMESHIFT ERRORS

Citation
Xj. Guan et Ec. Uberbacher, ALIGNMENTS OF DNA AND PROTEIN SEQUENCES CONTAINING FRAMESHIFT ERRORS, Computer applications in the biosciences, 12(1), 1996, pp. 31-40
Citations number
10
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
12
Issue
1
Year of publication
1996
Pages
31 - 40
Database
ISI
SICI code
0266-7061(1996)12:1<31:AODAPS>2.0.ZU;2-I
Abstract
Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rat es and often generate artefactual insertions and deletions of bases (i ndels) which corrupt the translation of sequences and compromise the d etection of protein homologies. The impact of these errors on the util ity of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, stan dard algorithms using six-frame translation can miss important homolog ies because only subfragments of the correct translation are available in any given frame. We present a new algorithm which can detect and c orrect frameshift errors in DNA sequences during comparison of transla ted sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the pre sence of a 7% frameshift error rate. Our algorithm uses dynamic progra mming, producing a guaranteed optimal alignment in the presence of fra meshifts, and has a sensitivity equivalent to Smith-Waterman. The comp utational efficiency of the algorithm is O(nm) where n and m are the s izes of two sequences being compared. The algorithm does not rely on p rior knowledge or heurisitic rules and performs significantly better t han any previously reported method.