NETOGLYC - PREDICTION OF MUCIN-TYPE O-GLYCOSYLATION SITES BASED ON SEQUENCE CONTEXT AND SURFACE ACCESSIBILITY

Citation
Je. Hansen et al., NETOGLYC - PREDICTION OF MUCIN-TYPE O-GLYCOSYLATION SITES BASED ON SEQUENCE CONTEXT AND SURFACE ACCESSIBILITY, Glycoconjugate journal, 15(2), 1998, pp. 115-130
Citations number
136
Categorie Soggetti
Biology
Journal title
ISSN journal
02820080
Volume
15
Issue
2
Year of publication
1998
Pages
115 - 130
Database
ISI
SICI code
0282-0080(1998)15:2<115:N-POMO>2.0.ZU;2-Q
Abstract
The specificities of the UDP-GalNAc:polypeptide N-acetylgalactosaminyl transferases which link the carbohydrate GalNAc to the side-chain of c ertain serine and threonine residues in mucin type glycoproteins, are presently unknown. The specificity seems to be modulated by sequence c ontext, secondary structure and surface accessibility. The sequence co ntext of glycosylated threonines was found to differ from that of seri ne, and the sites were found to cluster. Non-clustered sites had a seq uence context different from that of clustered sites. Charged residues were disfavoured at position -1 and +3. A jury of artificial neural n etworks was trained to recognize the sequence context and surface acce ssibility of 299 known and verified mucin type O-glycosylation sites e xtracted from O-GLYCBASE. The cross-validated NetOglyc network system correctly found 83% of the glycosylated and 90% of the non-glycosylate d serine and threonine residues in independent test sets, thus proving more accurate than matrix statistics and vector projection methods. P redictions of O-glycosylation sites in the envelope glycoprotein gp120 from the primate lentiviruses HIV-1, HIV-2 and SIV are presented. The most conserved O-glycosylation signals in these evolutionary-related glycoproteins were found in their first hypervariable loop, V1. Howeve r, the strain variation for HIV-1 gp120 was significant. A computer se rver, available through WWW or E-mail, has been developed for predicti on of mucin type O-glycosylation sites in proteins based on the amino acid sequence. The server addresses are http://www.cbs.dtu.dk/services /NetOGlyc/ and netOglyc@cbs.dtu.dk.