T. Ikeda et H. Imai, ENHANCED A-ASTERISK ALGORITHMS FOR MULTIPLE ALIGNMENTS - OPTIMAL ALIGNMENTS FOR SEVERAL SEQUENCES AND K-OPT APPROXIMATE ALIGNMENTS FOR LARGE CASES, Theoretical computer science, 210(2), 1999, pp. 341-374
Citations number
19
Categorie Soggetti
Computer Science Theory & Methods","Computer Science Theory & Methods
The multiple alignment of the sequences of DNA and proteins is applica
ble to various important fields in molecular biology. Although the app
roach based on Dynamic Programming is well-known for this problem, it
requires enormous time and space to obtain the optimal alignment. On t
he other hand, this problem corresponds to the shortest path problem a
nd the A algorithm, which can efficiently find the shortest path with
an estimator, is usable. First, this paper directly applies the A al
gorithm to multiple sequence alignment problem with more powerful esti
mator in more than two-dimensional case and discusses the extensions o
f this approach utilizing an upper bound of the shortest path length a
nd of modification of network structure. The algorithm to provide the
upper bound is also proposed in this paper. The basic part of these re
sults was originally shown in Ikeda and Imai [11]. This part is simila
r to the branch-and-bound techniques implemented in MSA program in Gup
ta ct al. [6]. Our framework is based on the edge length transformatio
n to reduce the problem to the shortest path problem, which is more su
itable to generalizations to enumerating suboptimal alignments and par
ametric analysis as done in Shibuya and Imai [15-17]. By this enhanced
A algorithm, optimal multiple alignments of several long sequences c
an be computed in practice, which is shown by computational results. S
econd, this paper proposes a k-group alignment algorithm for multiple
alignment as a practical method for much larger-size problem of, say m
ultiple alignments of 50-100 sequences. A basic part of these results
were originally presented in Imai and Ikeda [13]. In existing iterativ
e improvement methods for multiple alignment, the so-called group-to-g
roup two-dimensional dynamic programming has been used, and in this re
spect our proposal is to extend the ordinary two-group dynamic program
ming to a k-group alignment programming. This extension is conceptuall
y straightforward, and here our contribution is to demonstrate that th
e k-group alignment can be implemented so as to nm in a reasonable tim
e and space under standard computing environments. This is established
by generalizing the above A search approach. The k-group alignment m
ethod can be directly incorporated in existing methods such as iterati
ve improvement algorithms [2, 5] and tree-based (iterative) algorithms
[9]. This paper performs computational experiments by applying the k-
group method to iterative improvement algorithms, and shows that our a
pproach can find better alignments in reasonable time. For example, th
rough larger-scale computational experiments here, 34 protein sequence
s with very high homology can be optimally 10-group aligned, and 64 se
quences with high homology can be optimally 5-group aligned. (C) 1999-
Elsevier Science B.V. All rights reserved.