ITA
ENG

ENHANCED A-ASTERISK ALGORITHMS FOR MULTIPLE ALIGNMENTS - OPTIMAL ALIGNMENTS FOR SEVERAL SEQUENCES AND K-OPT APPROXIMATE ALIGNMENTS FOR LARGE CASES

Authors

IKEDA T IMAI H

Citation

T. Ikeda et H. Imai, ENHANCED A-ASTERISK ALGORITHMS FOR MULTIPLE ALIGNMENTS - OPTIMAL ALIGNMENTS FOR SEVERAL SEQUENCES AND K-OPT APPROXIMATE ALIGNMENTS FOR LARGE CASES, Theoretical computer science, 210(2), 1999, pp. 341-374

Citations number

Categorie Soggetti

Computer Science Theory & Methods","Computer Science Theory & Methods

Journal title

Theoretical computer science → ACNP

ISSN journal

03043975

Volume

210

Issue

Year of publication

1999

Pages

341 - 374

Database

ISI

SICI code

0304-3975(1999)210:2<341:EAAFMA>2.0.ZU;2-8

Abstract

The multiple alignment of the sequences of DNA and proteins is applica ble to various important fields in molecular biology. Although the app roach based on Dynamic Programming is well-known for this problem, it requires enormous time and space to obtain the optimal alignment. On t he other hand, this problem corresponds to the shortest path problem a nd the A algorithm, which can efficiently find the shortest path with an estimator, is usable. First, this paper directly applies the A al gorithm to multiple sequence alignment problem with more powerful esti mator in more than two-dimensional case and discusses the extensions o f this approach utilizing an upper bound of the shortest path length a nd of modification of network structure. The algorithm to provide the upper bound is also proposed in this paper. The basic part of these re sults was originally shown in Ikeda and Imai [11]. This part is simila r to the branch-and-bound techniques implemented in MSA program in Gup ta ct al. [6]. Our framework is based on the edge length transformatio n to reduce the problem to the shortest path problem, which is more su itable to generalizations to enumerating suboptimal alignments and par ametric analysis as done in Shibuya and Imai [15-17]. By this enhanced A algorithm, optimal multiple alignments of several long sequences c an be computed in practice, which is shown by computational results. S econd, this paper proposes a k-group alignment algorithm for multiple alignment as a practical method for much larger-size problem of, say m ultiple alignments of 50-100 sequences. A basic part of these results were originally presented in Imai and Ikeda [13]. In existing iterativ e improvement methods for multiple alignment, the so-called group-to-g roup two-dimensional dynamic programming has been used, and in this re spect our proposal is to extend the ordinary two-group dynamic program ming to a k-group alignment programming. This extension is conceptuall y straightforward, and here our contribution is to demonstrate that th e k-group alignment can be implemented so as to nm in a reasonable tim e and space under standard computing environments. This is established by generalizing the above A search approach. The k-group alignment m ethod can be directly incorporated in existing methods such as iterati ve improvement algorithms [2, 5] and tree-based (iterative) algorithms [9]. This paper performs computational experiments by applying the k- group method to iterative improvement algorithms, and shows that our a pproach can find better alignments in reasonable time. For example, th rough larger-scale computational experiments here, 34 protein sequence s with very high homology can be optimally 10-group aligned, and 64 se quences with high homology can be optimally 5-group aligned. (C) 1999- Elsevier Science B.V. All rights reserved.