A census of protein repeats

Citation
Em. Marcotte et al., A census of protein repeats, J MOL BIOL, 293(1), 1999, pp. 151-160
Citations number
29
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
293
Issue
1
Year of publication
1999
Pages
151 - 160
Database
ISI
SICI code
0022-2836(19991015)293:1<151:ACOPR>2.0.ZU;2-G
Abstract
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14% of all pr oteins, eukaryotic proteins are three times more likely to have internal re peats than prokaryotic proteins. After clustering the repetitive sequence s egments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after t he prokaryotic and eukaryotic lineages diverged. Consequently, protein clas ses with the highest incidence of repetitive sequences perform functions un ique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repea t formation. The mechanism favors additional repeats once an initial duplic ation has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propo se that error-prone repeat expansion allows repetitive proteins to evolve m ore quickly than non-repeat-containing proteins. (C) 1998 Academic Press.