ITA
ENG

Statistical alignment: Computational properties, homology testing and goodness-of-fit

Authors

Hein, J Wiuf, C Knudsen, B Moller, MB Wibling, G

Citation

J. Hein et al., Statistical alignment: Computational properties, homology testing and goodness-of-fit, J MOL BIOL, 302(1), 2000, pp. 265-279

Citations number

Categorie Soggetti

Molecular Biology & Genetics

Journal title

JOURNAL OF MOLECULAR BIOLOGY

ISSN journal

00222836 → ACNP

Volume

302

Issue

Year of publication

2000

Pages

265 - 279

Database

ISI

SICI code

0022-2836(20000908)302:1<265:SACPHT>2.0.ZU;2-5

Abstract

The model of insertions and deletions in biological sequences, first formul ated by Theme, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we i nvestigate this model. Firstly, we show how to accelerate the statistical alignment algorithms sev eral orders of magnitude. The main innovations are to confine likelihood ca lculations to a band close to the similarity based alignment, to get good i nitial guesses of the evolutionary parameters and to apply an efficient num erical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Theme, Kishino and Fels enstein can be simplified. Two proteins, about 1500 amino acids long, can b e analysed with this method in less than five seconds on a fast desktop com puter, which makes this method practical for actual data analysis. Secondly, we propose a new homology test based on this model, where homolog y means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional s huffle test for proteins. Finally, we describe a goodness-of-fit test, that allows testing the propos ed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, c ontrary to what is assumed by the model. (C) 2000 Academic Press.