We used a nonredundant set of 621 protein-protein interfaces of known high-
resolution structure to derive residue composition and residue-residue cont
act preferences, The residue composition at the interfaces, in entire prote
ins and in whole genomes correlates well, indicating the statistical streng
th of the data set, Differences between amino acid distributions were obser
ved for interfaces with buried surface area of less than 1,000 Angstrom (2)
versus interfaces with area of more than 5,000 Angstrom (2). Hydrophobic r
esidues were abundant in large interfaces while polar residues were more ab
undant in small interfaces. The largest residue-residue preferences at the
interface were recorded for interactions between pairs of large hydrophobic
residues, such as Trp and Leu, and the smallest preferences for pairs of s
mall residues, such as Gly and Ala, On average, contacts between pairs of h
ydrophobic and polar residues were unfavorable, and the charged residues te
nded to pair subject to charge complementarity, in agreement with previous
reports, A bootstrap procedure, lacking from previous studies, was used for
error estimation, It showed that the statistical errors in the set of pair
ing preferences are generally small; the average standard error is approxim
ate to 0.2, i.e., about 8% of the average value of the pairwise index (2.9)
. However, for a few pairs (e.g., Ser-Ser and Glu-Asp) the standard error i
s larger in magnitude than the pairing index, which makes it impossible to
tell whether contact formation is favorable or unfavorable. The results are
interpreted using physicochemical factors and their implications for the e
nergetics of complex formation and for protein docking are discussed. (R) 2
001 Wiley-Liss, Inc.