The SWISS-PROT protein sequence data bank contains at present nearly 75000
entries, almost two thirds of which include the potential N-glycosylation c
onsensus sequence, or sequon, NXS/T (where X call be any amino acid but pro
line) and thus may be glycoproteins. The number of proteins filed as glycop
roteins is however considerably smaller, 7942, of which 749 have been chara
cterized with respect to the total number of their carbohydrate units and s
ites of attachment of the latter to the protein, as well as the nature of t
he carbohydrate-peptide linking group. Of these well characterized glycopro
teins, about 90% carry either N-linked carbohydrate units alone or both N-
and O-linked ones, attached at 1297 N-glycosylation sites (1.9 per glycopro
tein molecule) and the rest are O-glycosylated only. Since the total number
of sequons in the well characterized glycoproteins is 1968, their rate of
occupancy is 2/3. Assuming that the same number of N-linked units and rate
of sequon occupancy occur in all sequon containing proteins and that the pr
oportion of solely O-glycosylated proteins (ca. 10%) will also be the same
as among the well characterized ones, we conclude that the majority of sequ
on containing proteins will be found to be glycosylated and that more than
half of all proteins are glycoproteins. (C) 1999 Elsevier Science B.V. All
rights reserved.