At CD BioGlyco, our machine learning (ML)-assisted glycoinformatics model development service uses advanced ML techniques to address the Complexities of Glycoscience Data analysis. Our service provides robust and efficient solutions to the analysis and interpretation of glycan-related information, leading to significant advancements in the field.
Fig.1 Flowchart of ML model development in the current research. (Dimitrov, et al., 2020)
Random forest (RF) is particularly effective at recognizing patterns and relationships within sugar data, it can perform classification and regression tasks by constructing multiple decision trees during training. Based on finding the best hyperplane for separating different classes of data points, support vector machines (SVMs) are effective for glycosylation site prediction, and they require a clear distinction between different glycan structures.
Maintain data consistency by normalizing and standardizing the glycan dataset, which enhances the accuracy and performance of ML models. Extract and identify the most important features from the glycan dataset to improve model training and performance.
ML models are trained using large glycan datasets, allowing them to recognize patterns and make predictions based on the data. These models undergo rigorous validation through techniques such as cross-validation and independent test sets to ensure their accuracy and reliability.
ML models assist in interpreting mass spectrometry data by accurately identifying and characterizing glycan structures. Glycosylation Site Prediction: Predictive models identify potential glycosylation sites on proteins, aiding in the understanding of protein-glycan interactions and functions.
DOI: 10.3390/vaccines8040709
Technology: ML method
Journal: Vaccines
IF: 7.8
Published: 2020
Results: The article begins by introducing the concept of immunogenicity prediction and its importance in vaccine design. It then provides an overview of various ML methods that have been previously used for immunogenicity prediction, such as VaxiJen, Vaxign, and VacSol. The authors highlight the limitations of these methods and propose their own approach as a potential solution. The proposed approach involves transforming protein sequences into numerical descriptors called autocovariance and cross-covariance (ACC) descriptors. These descriptors capture the spatial arrangement of amino acids in the protein sequence and are used as input features for the ML models. The authors experiment with several ML algorithms, including partial least squares-based discriminant analysis (PLS-DA), k nearest neighbor (kNN), SVM, RF, random subspace method (RSM), and extreme gradient boosting (Xgboost). To evaluate the performance of their approach, the authors use a dataset of experimentally validated immunogenic and non-immunogenic bacterial proteins. They compare the predictive accuracy of their models with existing methods and demonstrate that their approach outperforms other methods in terms of sensitivity, specificity, and overall accuracy.
By integrating ML techniques, CD BioGlyco provides a cutting-edge solution that enhances the capabilities of glycoinformatics research, enabling scientists to derive deeper insights and make more informed decisions in their studies. Please feel free to contact us for more information.
Reference
We envision a future where the intricate world of carbohydrate is no longer shrouded in mystery, but rather illuminated by the power of cutting-edge computational tools.