ITA
ENG

Fast, optimal alignment of three sequences using linear gap costs

Authors

Powell, DR Allison, L Dix, TI

Citation

Dr. Powell et al., Fast, optimal alignment of three sequences using linear gap costs, J THEOR BIO, 207(3), 2000, pp. 325-336

Citations number

Categorie Soggetti

Multidisciplinary

Journal title

JOURNAL OF THEORETICAL BIOLOGY

ISSN journal

00225193 → ACNP

Volume

207

Issue

Year of publication

2000

Pages

325 - 336

Database

ISI

SICI code

0022-5193(200012)207:3<325:FOAOTS>2.0.ZU;2-A

Abstract

Alignment algorithms can be used to infer a relationship between sequences when the true relationship is unknown. Simple alignment algorithms use a co st function that gives a fixed cost to each possible point mutation-mismatc h, deletion, insertion. These algorithms tend to find optimal alignments th at have many small gaps. It is more biologically plausible to have fewer lo nger gaps rather than many small gaps in an alignment. To address this issu e, linear gap cost algorithms are in common use for aligning biological seq uence data. More reliable inferences are obtained by aligning more than two sequences at a time. The obvious dynamic programming algorithm for optimal ly aligning k sequences of length n runs in O(n(k)) time. This is impractic al if k greater than or equal to 3 and n is of any reasonable length. Thus, for this problem there are many heuristics for aligning k sequences, howev er, they are not guaranteed to find an optimal alignment. In this paper, we present a new algorithm guaranteed to find the optimal alignment for three sequences using linear gap costs. This gives the same results as the dynam ic programming algorithm for three sequences, but typically does so much mo re quickly. It is particularly fast when the (three-way) edit distance is s mall. Our algorithm uses a speed-up technique based on Ukkonen's greedy alg orithm(Ukkonen, 1983) which he presented for two sequences and simple costs . (C) 2000 Academic Press.