Data Analysis and Visualization
Overview of DNA-encoded Glycan Library (DEGL)
DEGL is an innovative tool that combines the principles of glycomics and DNA encoding to study complex carbohydrate structures and their interactions with proteins, cells, and other biomolecules. DEGL creates a comprehensive library by attaching unique DNA sequences to individual glycan molecules, which encodes information about the identity of each glycan. This enables high-throughput screening (HTS) and identification of glycan-binding proteins, thereby facilitating the discovery of new biological interactions and potential therapeutic targets. At CD BioGlyco, we provide DEGL Design, Construction, HTS, Next-generation Sequencing, Data Analysis and Visualization, and Hit Validation and Assessment Services. We help clients use the DEGL approach to systematically analyze and decode a large number of glycan structures present in nature, thereby accelerating drug development and glycan-related biological mechanism research.
Unravel Complexity, Discover Lead Compounds: Your Go-to Decoding Service
HTS of DEGL represents an advanced approach in the field of glycobiology, evaluating the biological activity or interaction of large numbers of glycan structures. Data analysis and visualization are key components in transforming raw sequencing data into meaningful insights, helping to identify relevant glycan molecules and their functional associations. Through sophisticated data analysis techniques, researchers screen large data sets and transform complex data into intuitive graphics through visualization tools to analyze results and identify trends. At CD BioGlyco, our DEGL data analysis and visualization services are as follows:
Construct DEGL
The construction of DEGLs involves key steps: The first step is synthesizing or obtaining a variety of glycan structures representing various monosaccharide combinations and connections. The second step is conjugating each glycan with a unique DNA barcode as an identifier. This process typically uses bioorthogonal chemistry to attach DNA sequences to glycans without interfering with their biological properties. After the conjugation step, the libraries are pooled together to create a comprehensive and powerful tool for HTS and glycan interaction analysis, helping to elucidate their roles in various biological processes and diseases.
HTS
The HTS service at CD BioGlyco involves the simultaneous evaluation of a large number of glycan-DNA conjugates to determine specific interactions and biological activities. The process typically begins with the binding of a glycan-DNA construct to its target (e.g., protein, cell, or other biomolecule). Glycans bound to the target are then separated from unbound glycans by a washing step. Unbound or non-specifically bound components are washed away, while specific interactions are retained. These binding complexes are then decoded, where the DNA barcodes attached to the glycans are sequenced to determine which glycans are present in the retained fraction. These sequencing data are analyzed to determine binding affinity, specificity, etc. Our strategies for HTS screening are as follows:
Data Analysis and Visualization
Our experts process the raw sequencing data to decode the DNA tags coupled to glycans and determine which glycans interact with the target molecule. Computational tools and bioinformatics pipelines are used to filter, align, and quantify these interactions, converting the raw reads into meaningful binding profiles. Statistical analysis helps to discern important binding events from background noise, enabling the identification of high-affinity glycans. Visualization tools such as heat maps, scatter plots, and network diagrams are then used to depict interaction patterns, affinities, and potential correlations, providing a deeper understanding of glycan-target interactions. Analyzing and visualizing HTS data enables researchers to identify promising glycan candidates for further validation and research exploration.
- Negative Binomial (NB) Distribution-assisted Data Analysis
- NB distribution aids data analysis for effectively managing and interpreting the variability and overdispersion commonly found in HTS data. The NB distribution is particularly useful when data exhibit greater variability than expected under a simple Poisson distribution, making it more suitable for modeling overdispersed count data. The approach fits the glycan-encoding DNA sequence counts to an NB model to accurately capture their distribution and quantify their binding affinity or biological activity. Important glycan-DNA interactions are more reliably identified by filtering statistical noise and providing better estimates of the true signal. Visualizations of these results typically include distribution fit plots, count histograms, and other statistical graphics to illustrate the efficacy and specificity of the identified glycan ligands.
- Data Aggregation Method-assisted Data Analysis
- DEGL screening involves generating large data sets representing the interactions of various glycans with target molecules. To understand this massive data, we use data aggregation methods to assist data analysis. This approach involves grouping and aggregating raw data to identify meaningful patterns and trends. By aggregating data points, researchers reduce complexity and increase the clarity of their results, making it easier to discern which glycan structures exhibit the most significant interactions with their targets. Aggregated data are analyzed using statistical techniques to identify correlations and potential matches, and then visually represented in the form of graphs or heat maps to further aid in interpreting the results and making informed indications of subsequent experimental steps.
- Z-Score Enrichment Metric-assisted Data Analysis
- Z-score enrichment metric-assisted data analysis normalizes and standardizes the HTS results to identify significant glycan interactions. Quantify how much the binding affinity of a specific glycan deviates from the expected background distribution by calculating the Z-score for each glycan. This approach helps filter out noise and identify glycan sequences with truly significant interactions, further prioritizing candidates.
Workflow

Applications
- Drug candidate discovery: Identify glycans that bind to targets relevant to drug development through DEGL screening and data analysis, and use the glycans as potential lead compounds for drug development.
- Functional glycomics: HTS and data analysis help reveal the biological roles of various glycans, advance biological knowledge, and provide insights into complex glycan-mediated processes.
- Biomarker identification: Identify glycan structures associated with specific diseases and discover glycans that serve as biomarkers for various diseases for biomarker development.
- Pathogen research: Study the glycan binding properties of viruses, bacteria, and parasites to develop anti-adhesion therapies and understand pathogen entry mechanisms.
Advantages
- We provide one-stop services from DEGL construction to data analysis and visualization, with multiple strategies available for each step of the process. Our expert team tailors the optimal strategy combination for each step according to the specific needs of clients to ensure satisfactory results.
- Multiple HTS strategies are used to simultaneously screen thousands to millions of glycan structures to obtain target data in a relatively short time, reducing time and labor costs.
- Visualization tools are used to convert complex data into easy-to-interpret visual formats such as heat maps, scatter plots, and 3D structures.
- DEGL is used in various fields such as glycobiology, drug discovery, immunology, and biomarker discovery.
Publication Data
Technology: NB distribution
Journal: PloS one
Published: 2020
IF: 2.9
Results: Sequence count data are often assumed to follow a NB distribution, however, this approach does not always succeed in controlling the false discovery rate (FDR) to its nominal level. The authors propose a new statistical goodness-of-fit (GoF) test for the NB distribution in regression models, which is commonly used in RNA sequencing and microbiome studies. The authors also show that the model violates the NB assumption in many publicly available RNA-Seq and 16S rRNA microbiome datasets.
Fig.1 Boxplots of estimated feature-specific dispersion parameters of the negative binomial distribution per dataset. (Hawinkel, et al., 2020)
Frequently Asked Questions
- How to visualize the screening results?
- We often use visualization techniques such as heatmaps, scatter plots, and bar graphs to represent screening results. Heatmaps show the binding affinity of various glycan-DNA binders, with color gradients representing different degrees of interaction strength. Scatter plots help identify trends or correlations between specific glycan structures and their biological activities. Bar graphs are used to compare the relative frequency or enrichment scores of different glycans. 3D plots also provide deeper insights, allowing researchers to interactively explore complex relationships in the data.
- How does data analysis enhance DEGL research?
- Data analysis transforms large amounts of raw HTS data into meaningful insights, involving the application of statistical methods (e.g., NB distribution) to normalize and interpret the abundance of DNA coding sequences associated with glycan interactions. Through rigorous data aggregation, noise reduction, and pattern recognition, researchers identify glycan molecules with specific binding affinities and biological activities. Data visualization techniques further facilitate the identification of trends and outliers.
CD BioGlyco effectively analyzes and visualizes the HTS results, making data interpretation and communication more efficient and impactful. Please feel free to contact us to learn about the strategies and processes for data analysis and visualization, and customize your solutions.
Reference
- Hawinkel, S.; et al. Sequence count data are poorly fit by the negative binomial distribution. PloS one. 2020, 15(4): e0224909.
For research use only. Not intended for any clinical use.
Related Solutions