DIVERSE INCIDENCES OF INDIVIDUAL OLIGOPEPTIDES (DIPEPTIDIC TO HEXAPEPTIDIC) IN PROTEINS OF HUMAN, BAKERS-YEAST, AND ESCHERICHIA-COLI ORIGINREGISTERED IN THE SWISS-PROT DATA-BASE
H. Doi et al., DIVERSE INCIDENCES OF INDIVIDUAL OLIGOPEPTIDES (DIPEPTIDIC TO HEXAPEPTIDIC) IN PROTEINS OF HUMAN, BAKERS-YEAST, AND ESCHERICHIA-COLI ORIGINREGISTERED IN THE SWISS-PROT DATA-BASE, Proceedings of the National Academy of Sciences of the United Statesof America, 92(7), 1995, pp. 2879-2883
Oligopeptidic permutations of the 20 amino acid residues give rise to
proteins of diverse functions. Our long-term goal is to produce a lexi
con of oligopeptides, classifying them into at least five categories:
(i) ubiquitous, (ii) function specific, (iii) group specific, (iv) spe
cies specific, and (v) nonexistent. To begin with, we report on the va
rying frequencies of individual oligopeptides (dipeptidic to hexapepti
dic in length) found among 2862 human proteins, 1942 Saccharomyces cer
evisiae proteins, and 2672 Escherichia coli proteins registered in the
Swiss-Prot data base (version 29.0, released in June 1994). At all le
ngths (dipeptides to hexapeptides), homooligopeptides were very promin
ent among the most frequently occurring varieties in proteins of human
and bakers' yeast origins. However, this was not the case with E. col
i. While all of the expected 20(3) varieties of tripeptides were found
among human proteins, three tripeptides (Cys-Cys-Trp, Trp Trp Cys, an
d Trp-Trp-His) were missing from the bakers' yeast proteins. Three tri
peptides (Cys-Ile-Trp, Cys-Met-Tyr, and Cys-Trp-Trp) were also absent
from E. coil proteins. Inasmuch as the Swiss-Prot data base already co
ntained 67% of the expected total of 4000 E. coli proteins, it is virt
ually certain that 96,000 varieties of hexapeptides containing at leas
t one or another of the three missing tripeptides noted above shalt be
nonexistent in E. coli. Furthermore, the observation of missing tripe
ptides in the bakers' yeast proteins suggests that nonexistent hexapep
tides shall be highly phylum specific. Because of the sample size, onl
y a small fraction of the 20(6) varieties of hexapeptides were expecte
d to be encountered in the present survey. Indeed, only 1.21.5% of the
possible hexapeptides were found, and the average copy number of obse
rved hexapeptides varied between 1.06 and 1.25. Nevertheless, 33 varie
ties of hexapeptides occurred in 102-169 copies among human proteins.
Furthermore, 15 of the 33 varieties contained such rarely used residue
s as Tyr, His, Cys, and Trp.