In this study, we analyzed all known protein sequences for repeating amino
acid segments. Although duplicated sequence segments occur in 14% of all pr
oteins, eukaryotic proteins are three times more likely to have internal re
peats than prokaryotic proteins. After clustering the repetitive sequence s
egments into families, we find repeats from eukaryotic proteins have little
similarity with prokaryotic repeats, suggesting most repeats arose after t
he prokaryotic and eukaryotic lineages diverged. Consequently, protein clas
ses with the highest incidence of repetitive sequences perform functions un
ique to eukaryotes. The frequency distribution of the repeating units shows
only weak length dependence, implicating recombination rather than duplex
melting or DNA hairpin formation as the limiting mechanism underlying repea
t formation. The mechanism favors additional repeats once an initial duplic
ation has been incorporated. Finally, we show that repetitive sequences are
favored that contain small and relatively water-soluble residues. We propo
se that error-prone repeat expansion allows repetitive proteins to evolve m
ore quickly than non-repeat-containing proteins. (C) 1998 Academic Press.