Glycosylation site prediction is the use of bioinformatics methods and computational models to predict where glycosylation modifications are likely to occur in a protein sequence. These predictions are usually based on known glycosylation patterns and protein sequence features. CD BioGlyco has been innovating its glycoinformatics tools for many years and has been providing our clients with satisfactory Glycoinformatics-assisted Analysis/Prediction Services. Our comprehensive Glycoinformatics-assisted Structural and Functional Prediction Service provides our clients with a wide range of assistance in their glycobiological research. Our glycoinformatics experts use machine learning methods to help you predict N-, O-, and C-glycosylation sites.
Our experts two-step feature selection strategy to extract the most relevant features. In the first step, they use mutual information and correlation coefficients to evaluate the correlation between features and glycosylation sites and select the most relevant features. In the second step, they used a recursive feature elimination algorithm to further filter the features to improve the predictive performance of the model.
After feature selection, we use a random forest algorithm to construct the prediction model. We take protein sequences as input features and use known glycosylation sites for training. The performance of the model is evaluated by cross-validation of the training set and the predictive accuracy of the model is evaluated using receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) values.
After evaluating the predictive accuracy of the model, we set the specificity level to 99.0% and screened potential glycosylated proteins and their corresponding glycosylation sites for you in the complete human proteome. We provide you with a complete list of predicted glycosylated proteins and their glycosylation sites.
Technology: Machine learning
Journal: PloS one
IF: 2.9
Published: 2017
Results: This article presents a computational method for predicting N-linked glycosylation sites. The article describes the importance and characteristics of N-linked glycosylation in proteins and points out the problems of existing models. To solve these problems, the authors propose a computational method based on machine learning and comprehensive feature extraction techniques. The method is trained using a dataset from the UniProt database and features related to post-translational modification sites are extracted. Then, a neural network is trained using backpropagation methods and the model is validated using a variety of quantitative metrics. The article also describes some methods of calculating statistical moments and how to transform a one-dimensional protein sequence into a two-dimensional matrix for calculation.
Fig.1 The verification of the predictive model. (Akmal, et al., 2017)
CD BioGlyco has a range of tools for predicting the structure and function of glycans, offering a comprehensive glycoinformatics-based N-, C- and O-linked glycosylation site prediction service. If you further details, please feel free to contact us.
References
We envision a future where the intricate world of carbohydrate is no longer shrouded in mystery, but rather illuminated by the power of cutting-edge computational tools.