ENHANCED A-ASTERISK ALGORITHMS FOR MULTIPLE ALIGNMENTS - OPTIMAL ALIGNMENTS FOR SEVERAL SEQUENCES AND K-OPT APPROXIMATE ALIGNMENTS FOR LARGE CASES

Authors
Citation
T. Ikeda et H. Imai, ENHANCED A-ASTERISK ALGORITHMS FOR MULTIPLE ALIGNMENTS - OPTIMAL ALIGNMENTS FOR SEVERAL SEQUENCES AND K-OPT APPROXIMATE ALIGNMENTS FOR LARGE CASES, Theoretical computer science, 210(2), 1999, pp. 341-374
Citations number
19
Categorie Soggetti
Computer Science Theory & Methods","Computer Science Theory & Methods
ISSN journal
03043975
Volume
210
Issue
2
Year of publication
1999
Pages
341 - 374
Database
ISI
SICI code
0304-3975(1999)210:2<341:EAAFMA>2.0.ZU;2-8
Abstract
The multiple alignment of the sequences of DNA and proteins is applica ble to various important fields in molecular biology. Although the app roach based on Dynamic Programming is well-known for this problem, it requires enormous time and space to obtain the optimal alignment. On t he other hand, this problem corresponds to the shortest path problem a nd the A algorithm, which can efficiently find the shortest path with an estimator, is usable. First, this paper directly applies the A al gorithm to multiple sequence alignment problem with more powerful esti mator in more than two-dimensional case and discusses the extensions o f this approach utilizing an upper bound of the shortest path length a nd of modification of network structure. The algorithm to provide the upper bound is also proposed in this paper. The basic part of these re sults was originally shown in Ikeda and Imai [11]. This part is simila r to the branch-and-bound techniques implemented in MSA program in Gup ta ct al. [6]. Our framework is based on the edge length transformatio n to reduce the problem to the shortest path problem, which is more su itable to generalizations to enumerating suboptimal alignments and par ametric analysis as done in Shibuya and Imai [15-17]. By this enhanced A algorithm, optimal multiple alignments of several long sequences c an be computed in practice, which is shown by computational results. S econd, this paper proposes a k-group alignment algorithm for multiple alignment as a practical method for much larger-size problem of, say m ultiple alignments of 50-100 sequences. A basic part of these results were originally presented in Imai and Ikeda [13]. In existing iterativ e improvement methods for multiple alignment, the so-called group-to-g roup two-dimensional dynamic programming has been used, and in this re spect our proposal is to extend the ordinary two-group dynamic program ming to a k-group alignment programming. This extension is conceptuall y straightforward, and here our contribution is to demonstrate that th e k-group alignment can be implemented so as to nm in a reasonable tim e and space under standard computing environments. This is established by generalizing the above A search approach. The k-group alignment m ethod can be directly incorporated in existing methods such as iterati ve improvement algorithms [2, 5] and tree-based (iterative) algorithms [9]. This paper performs computational experiments by applying the k- group method to iterative improvement algorithms, and shows that our a pproach can find better alignments in reasonable time. For example, th rough larger-scale computational experiments here, 34 protein sequence s with very high homology can be optimally 10-group aligned, and 64 se quences with high homology can be optimally 5-group aligned. (C) 1999- Elsevier Science B.V. All rights reserved.