ITA
ENG

Pairwise sequence alignment below the twilight zone

Authors

Blake, JD Cohen, FE

Citation

Jd. Blake et Fe. Cohen, Pairwise sequence alignment below the twilight zone, J MOL BIOL, 307(2), 2001, pp. 721-735

Citations number

Categorie Soggetti

Molecular Biology & Genetics

Journal title

JOURNAL OF MOLECULAR BIOLOGY

ISSN journal

00222836 → ACNP

Volume

307

Issue

Year of publication

2001

Pages

721 - 735

Database

ISI

SICI code

0022-2836(20010323)307:2<721:PSABTT>2.0.ZU;2-W

Abstract

Improved sequence alignment at low pairwise identity is important for ident ifying potential remote homologues in database searches and for obtaining a ccurate alignments as a prelude to modeling structures by homology. Our wor k is motivated by two observations: structural data provide superior traini ng examples for developing techniques to improve the alignment of remote ho mologues; and general substitution patterns for remote homologues differ fr om those of closely related proteins. We introduce a new set of amino acid residue interchange matrices built from structural superposition data. Thes e matrices exploit known structural homology as a means of characterizing t he effect evolution has on residue-substitution profiles. Given their origi n, it is not surprising that the individual residue-residue interchange fre quencies are chemically sensible. The structural interchange matrices show a significant increase both in pai rwise alignment accuracy and in functional annotation/fold recognition accu racy across distantly related sequences. We demonstrate improved pairwise a lignment by using superpositions of homologous domains extracted from a str uctural database as a gold standard and go on to show an increase in fold r ecognition accuracy using a database of homologous fold families. This was applied to the unassigned open reading frames from the genome of Helicobact er pylori to identify five matches, two of which are not represented by new annotations in the sequence databases. In addition, we describe a new cycl ic permutation strategy to identify distant homologues that experienced gen e duplication and subsequent deletions. Using this method, we have identifi ed a potential homologue to one additional previously unassigned open readi ng frame from the H. pylori genome. (C) 2001 Academic Press.