Je. Hansen et al., PREDICTION OF O-GLYCOSYLATION OF MAMMALIAN PROTEINS - SPECIFICITY PATTERNS OF UDP-GALNAC-POLYPEPTIDE N-ACETYLGALACTOSAMINYLTRANSFERASE, Biochemical journal, 308, 1995, pp. 801-813
The specificity of the enzyme(s) catalysing the covalent link between
the hydroxyl side chains of serine or threonine and the sugar moiety N
-acetylgalactosamine (GalNAc) is unknown. Pattern recognition by artif
icial neural networks and weight matrix algorithms was performed to de
termine the exact position of in vivo O-linked GalNAc-glycosylated ser
ine and threonine residues from the primary sequence exclusively. The
acceptor sequence context for O-glycosylation of serine was found to d
iffer from that of threonine and the two types were therefore treated
separately. The context of the sites showed a high abundance of prolin
e, serine and threonine extending far beyond the previously reported r
egion covering positions -4 through +4 relative to the glycosylated re
sidue. The O-glycosylation sites were found to cluster and to have a h
igh abundance in the N-terminal part of the protein. The sites were al
so found to have an increased preference for three different classes o
f beta-turns. No simple consensus-like rule could be deduced for the c
omplex glycosylation sequence acceptor patterns. The neural networks w
ere trained on the hitherto largest data material consisting of 48 car
efully examined mammalian glycoproteins comprising 264 O-glycosylation
sites. For detection neural network algorithms were much more reliabl
e than weight matrices. The networks correctly found 60-95% of the O-g
lycosylated serine/threonine residues and 88-97% of the non-glycosylat
ed residues in two independent test sets of known glycoproteins. A com
puter server using E-mail for prediction of O-glycosylation sites has
been implemented and made publicly available. The Internet address is
NetOglyc@cbs.dtu.dk.