Amino acid sequences of very non-random composition ('low-complexity'
segments) are abundant in natural proteins. From recent statistical an
alyses of protein sequence databases, approximately 15% of the residue
s occur in segments of extreme compositional bias, and approximately 3
4% of proteins have at least one such interspersed segment. Sequences
of many elongated non-globular domains also have non-random compositio
nal bias, and these regions increase the proportion of residues in sta
tistically deviant segments to approximately 25% of the database. In c
ontrast, less than 1% of residues in known ordered crystal structures
are in segments of reduced complexity. Increasingly, low-complexity se
gments have been implicated in crucial biological functions, shown by
genetic engineering and mutagenesis experiments, variations in human d
isease and locations of autoimmune epitopes, but relatively little is
known about their range of possible molecular structures, dynamics and
interactions.