CD BioGlyco offers the DL-assisted glycoinformatics model development service, this service is designed to address the complex challenges of glycoscience by advanced DL techniques. Our service includes several key components that work together to create powerful predictive models that provide greater accuracy and efficiency in tasks such as Glycosylation Site Prediction and Glycan-binding Specificity Analysis. By integrating DL methods, CD BioGlyco provides researchers with innovative tools to gain deeper insights and drive significant advances in glycoscience.
First, we ensure the accuracy and consistency of the collected data through a meticulous validation and cleaning process. Then, we annotate the dataset with relevant biological and chemical information to enhance its utility for training DL models.
We develop complex neural network architectures based on specific glycoinformatics tasks. In other words, we train neural networks using established datasets and optimize models to learn complex patterns and relationships in glycan data.
We try different neural network architectures, hyperparameters, and training methods to determine the most efficient configuration. We also conduct parallel training sessions to accelerate the model development process. This approach allows multiple model variants to be trained simultaneously, thus facilitating faster optimization and refinement.
We transform raw glycan data into numerical features suitable for neural network input. This involves encoding glycan sequences, capturing structural attributes, and integrating relevant physicochemical properties. We develop comprehensive data representations that encapsulate the essential characteristics of glycans, enabling the neural networks to learn and generalize from the input data effectively. Additionally, We use some technologies like principal component analysis (PCA) to reduce the complexity of the input data, improving the efficiency and accuracy of the neural network models.
DOI: 10.1039/d1sc05681f
Technology: GlyNet development, consortium for functional glycomics (CFG) glycan array, SweetNet, CCARL
Journal: Chemical science
IF: 8.4
Published: 2022
Results: The research presented in this article focuses on the development of GlyNet, a multi-task neural network designed to predict protein-glycan interactions. The authors assessed GlyNet's performance by comparing its predictions to experimental data from the CFG glycan array. Results demonstrated that GlyNet accurately predicted the top-20 proteins, including an average of 11 out of the actual top-20 proteins in its predictions, significantly outperforming random selection. Additionally, the predictions showed a weak correlation between the number of common entries in the top-20 lists and the mean squared error (MSE) of the glycan, indicating GlyNet's effectiveness in predicting the top-20 glycans even in samples with poor MSE. Comparisons with other models, such as SweetNet and CCARL, revealed that GlyNet outperformed SweetNet in terms of MSE and training efficiency, highlighting the general utility of its multi-output design. Furthermore, GlyNet surpassed CCARL in performance, achieving a higher area under the ROC curve (AUC) for 20 proteins.
Fig.1 Schematic overview of the learned model described in this paper. (Carpenter, et al., 2022)
CD BioGlyco’s DL-assisted glycoinformatics model development service offers a powerful and innovative approach to glycoinformatics, enabling researchers to achieve deeper insights and make significant advancements in the field of glycoscience. Please feel free to contact us for more information about our AI-assisted Glycoinformatics Development Service.
Reference
We envision a future where the intricate world of carbohydrate is no longer shrouded in mystery, but rather illuminated by the power of cutting-edge computational tools.