ITA
ENG

Rapid automatic detection and alignment of repeats in protein sequences

Authors

Heger, A Holm, L

Citation

A. Heger et L. Holm, Rapid automatic detection and alignment of repeats in protein sequences, PROTEINS, 41(2), 2000, pp. 224-237

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

PROTEINS-STRUCTURE FUNCTION AND GENETICS

ISSN journal

08873585 → ACNP

Volume

Issue

Year of publication

2000

Pages

224 - 237

Database

ISI

SICI code

0887-3585(20001101)41:2<224:RADAAO>2.0.ZU;2-A

Abstract

Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. We have dev eloped an automatic algorithm, RADAR, for segmenting a query sequence into repeats. The segmentation procedure has three steps: (i) repeat length is d etermined by the spacing between suboptimal self-alignment traces; (ii) rep eat borders are optimized to yield a maximal integer number of repeats, and (iii) distant repeats are validated by iterative profile alignment. The me thod identifies short composition biased as well as gapped approximate repe ats and complex repeat architectures involving many different types of repe ats in the query sequence. No manual intervention and no prior assumptions on the number and length of repeats are required. Comparison to the Pfam-A database indicates good coverage, accurate alignments, and reasonable repea t borders. Screening the Swissprot database revealed 3,000 repeats not anno tated in existing domain databases. A number of these repeats had been desc ribed in the literature but most were novel. This illustrates how in times when curated databases grapple with ever increasing backlogs, automatic (re )analysis of sequences provides an efficient way to capture this important information. Proteins 2000;41:224-237. (C) 2000 Wiley-Liss, Inc.