GlcNAcylation refers to glucose aminoacylation, a modification process in which an N-acetylglucose amino acid (GlcNAc) group is added to a hydroxyl amino acid residue of a protein or other biomolecule. CD BioGlyco has many years of experience in providing Glycoinformatics-assisted Structural and Functional Prediction Services to provide professional prediction services about GPI-anchored Sites, Mannosylation Sites, Mucin-type Glycosylation Sites, and GlcNAcylation sites.
We predict GlcNAcylation sites by analyzing conserved modalities and specific amino acid patterns in protein sequences. By utilizing the glycoinformatics technique, we compare the known GlcNAcylation site and non-site protein sequences to find out the common features between them.
Our experts extract biophysical and chemical features from the protein sequence. These features include amino acid composition, secondary structure, solvent accessibility, and hydrophobicity. Factors such as protein folding speed, stability, and interactions with other molecules are also taken into account when constructing predictive models.
Subsequently, machine learning algorithms are utilized to construct prediction models that are trained with both locus and non-locus datasets to be able to accurately predict new sites. This method is faster and more accurate in discovering GlcNAcylation modification sites on proteins, which provides an important reference for further research on protein function and related diseases. Meanwhile, by accumulating and updating the dataset and combining it with the continuous optimization of machine learning algorithms, we continue to improve the accuracy and stability of the prediction model.
Finally, to enhance prediction accuracy, we integrate various forecasting methods based on demand and employ experimental validation to validate the predictions.
Technology: Machine learning
Journal: BMC Bioinformatics
IF: 3.242
Published: 2015
Results: This article describes a two-layer machine learning approach for identifying O-GlcNAcylation sites and O-GlcNAc transferase substrate templates of proteins. The researchers manually extracted 410 experimentally confirmed O-GlcNAcylation sites from dbOGAP, OGlycBase, and UniProtKB and detected conserved modalities using maximum dependency decomposition. Then, a first layer model was learned for each identified O-GlcNAc transferase (OGT) substrate modality using a profile hidden Markov model (profile HMM). Next, a second layer model was generated using a support vector machine (SVM) based on the output values of the first layer profile HMM. This two-layer predictive model was evaluated by five-fold cross-validation, yielding a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. In addition, an independent test set from PhosphoSitePlus was used, demonstrating that the method can provide promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools.
Fig.1 Proposed schematic for building a dual-layered predictive model using substrate motifs identified from MDD. (Kao, et al., 2015)
CD BioGlyco is the top choice for glycoinformatics-assisted GlcNAcylation site prediction service. We offer comprehensive support to our clients from sequence analysis to machine learning. If you are interested in our services, please don't hesitate to contact us about your needs and specifications.
Reference
We envision a future where the intricate world of carbohydrate is no longer shrouded in mystery, but rather illuminated by the power of cutting-edge computational tools.