ITA
ENG

ALIGNMENTS OF DNA AND PROTEIN SEQUENCES CONTAINING FRAMESHIFT ERRORS

Authors

GUAN XJ UBERBACHER EC

Citation

Xj. Guan et Ec. Uberbacher, ALIGNMENTS OF DNA AND PROTEIN SEQUENCES CONTAINING FRAMESHIFT ERRORS, Computer applications in the biosciences, 12(1), 1996, pp. 31-40

Citations number

Categorie Soggetti

Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous

Journal title

Computer applications in the biosciences → ACNP

ISSN journal

02667061

Volume

Issue

Year of publication

1996

Pages

31 - 40

Database

ISI

SICI code

0266-7061(1996)12:1<31:AODAPS>2.0.ZU;2-I

Abstract

Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rat es and often generate artefactual insertions and deletions of bases (i ndels) which corrupt the translation of sequences and compromise the d etection of protein homologies. The impact of these errors on the util ity of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, stan dard algorithms using six-frame translation can miss important homolog ies because only subfragments of the correct translation are available in any given frame. We present a new algorithm which can detect and c orrect frameshift errors in DNA sequences during comparison of transla ted sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the pre sence of a 7% frameshift error rate. Our algorithm uses dynamic progra mming, producing a guaranteed optimal alignment in the presence of fra meshifts, and has a sensitivity equivalent to Smith-Waterman. The comp utational efficiency of the algorithm is O(nm) where n and m are the s izes of two sequences being compared. The algorithm does not rely on p rior knowledge or heurisitic rules and performs significantly better t han any previously reported method.