Je. Hansen et al., NETOGLYC - PREDICTION OF MUCIN-TYPE O-GLYCOSYLATION SITES BASED ON SEQUENCE CONTEXT AND SURFACE ACCESSIBILITY, Glycoconjugate journal, 15(2), 1998, pp. 115-130
The specificities of the UDP-GalNAc:polypeptide N-acetylgalactosaminyl
transferases which link the carbohydrate GalNAc to the side-chain of c
ertain serine and threonine residues in mucin type glycoproteins, are
presently unknown. The specificity seems to be modulated by sequence c
ontext, secondary structure and surface accessibility. The sequence co
ntext of glycosylated threonines was found to differ from that of seri
ne, and the sites were found to cluster. Non-clustered sites had a seq
uence context different from that of clustered sites. Charged residues
were disfavoured at position -1 and +3. A jury of artificial neural n
etworks was trained to recognize the sequence context and surface acce
ssibility of 299 known and verified mucin type O-glycosylation sites e
xtracted from O-GLYCBASE. The cross-validated NetOglyc network system
correctly found 83% of the glycosylated and 90% of the non-glycosylate
d serine and threonine residues in independent test sets, thus proving
more accurate than matrix statistics and vector projection methods. P
redictions of O-glycosylation sites in the envelope glycoprotein gp120
from the primate lentiviruses HIV-1, HIV-2 and SIV are presented. The
most conserved O-glycosylation signals in these evolutionary-related
glycoproteins were found in their first hypervariable loop, V1. Howeve
r, the strain variation for HIV-1 gp120 was significant. A computer se
rver, available through WWW or E-mail, has been developed for predicti
on of mucin type O-glycosylation sites in proteins based on the amino
acid sequence. The server addresses are http://www.cbs.dtu.dk/services
/NetOGlyc/ and netOglyc@cbs.dtu.dk.