The specificity of UDP-GalNAc:polypeptide N-acetylgalactosaminytransfe
rase (GalNAc-transferase) is consistent with the existence of an exten
ded site composed of nine subsites, denoted by P-4, P-3, P-2, P-1, P-0
, P-1', P-2', P-3', and P-4', where the acceptor at P-0 is being eithe
r Ser or Thr. To predict whether a peptide will react with the enzyme
to form a Ser- or Thr-conjugated glycopeptide, a vector projection met
hod is proposed which uses a training set of amino acid sequences surr
ounding 90 Ser and 106 Thr O-glycosylation sites extracted from the Na
tional Biomedical Research Foundation Protein Database. The model post
ulates independent interactions of the 9 amino acid moieties with thei
r respective binding sites. The high ratio of correct predictions vs,
total predictions for the data in both the training and the testing se
ts indicates that the method is self-consistent and efficient. It prov
ides a rapid means for predicting O-glycosylation and designing effect
ive inhibitors of GalNAc-transferase. (C) 1995 Wiley-liss,Inc.