banner
CryoSeek - Opening Up A New Direction for Glycobiology Research

CryoSeek - Opening Up A New Direction for Glycobiology Research

May 14, 2025

Application of cryo-EM Technology and AI in Structural Biology

Cryo-electron microscopy (cryo-EM) technology has brought about a resolution revolution, allowing us to observe and depict protein structures at atomic-level high resolution. Artificial intelligence (AI)-based tools such as AlphaFold have achieved rapid and accurate prediction of protein structures. Today, AlphaFold has predicted the structures of almost all known proteins on Earth.

Historically, structural biology has focused on studying known substances. Now, structural biology is undergoing a paradigm shift - from targeted structure determination to structure-guided discovery of previously uncharacterized biological entities. The high-resolution capabilities of cryo-EM and the structural prediction capabilities of artificial intelligence provide unprecedented opportunities for exploring completely unknown biological entities.

Proposal and Application of CryoSeek Strategy

Recently, Nieng Yan and others from Tsinghua University published several papers in succession, proposing a new strategy called CryoSeek, which uses cryo-EM as an observation tool, combined with AI-assisted automatic modeling and bioinformatics analysis, to discover completely unknown new biological entities in nature.

On December 31, 2024, Nieng Yan et al. published a research paper titled "CryoSeek II: Cryo-EM analysis of glycofibrils from freshwater reveals well-structured glycans coating linear tetrapeptide repeats" in PNAS.

Despite recent breakthroughs in protein structure determination and prediction, the study of Carbohydrate structure remains a challenge.

Discovery and Analysis of TLP-4

In this latest study, the research team reported cryo-EM analysis of glycoprotein fibers found in the freshwater of Tsinghua Lotus Pond. The research team named it TLP-4, which is composed of a linear polypeptide chain of tetrapeptide repeats, coated with >4 nanometers thick Glycans. In each repeat, two glycans are O-linked to a 3,4-dihydroxyproline (diHyp) and another glycan is linked to an adjacent serine or threonine. The fiber structure is maintained entirely by glycan filling.

Bioinformatics analysis confirmed that the TLP-4 repeat sequence is conserved among species, suggesting that there are still a large number of glycoprotein fibers to be discovered in nature.

In addition, structural studies of TLP-4 and other glycoprotein fibers can establish valuable data sets for training artificial intelligence (AI)-based tools for accurate glycan structure prediction, model building, and binder design.

Overall, this discovery not only provides valuable insights into the structural role of glycans in biological assembly, but also demonstrates the potential of the CryoSeek research strategy recently developed by the research team in finding biological entities and prototyping for structural studies of carbohydrates.

Nieng Yan said that the launch of this paper perfectly marks the official start of the laboratory's new direction in 2025 - using CryoSeek as a starting point to study Glycoproteins and glycobiology. In addition, the laboratory will submit more preprints to bioRxiv recently and in the future, because new fields require too much cooperation, so new discoveries will be released first, hoping to establish extensive cooperation in many aspects such as carbohydrate identification, Chemical Synthesis, biological synthesis pathways, and glycoprotein fiber functions.

Two previous research papers on CryoSeek

The first exploration of the CryoSeek strategy

On October 9, 2024, Nieng Yan, Zhangqiang Li and others published a research paper titled "CryoSeek: A strategy for bioentity discovery using cryoelectron microscopy" in PNAS.

The CryoSeek workflow includes the following steps:

  • Collect samples from natural sources;
  • Process samples with simple procedures such as filtration and concentration;
  • Perform standard cryo-sample preparation and cryo-EM data acquisition, or characterize samples through other methods (such as metagenomic sequencing and mass spectrometry);
  • Cryo-EM data processing;
  • AI-assisted automatic modeling;
  • Combined with other bioinformatics analysis results, identify the corresponding biological entities based on their structure.

First, Nieng Yan's team used cryo-EM to observe and analyze the filtered water samples of Tsinghua Lotus Pond and found a rich variety of biomacromolecules, among which fibrous structures of varying lengths and thicknesses dominated. Then, Nieng Yan's team obtained high-resolution electron microscope density maps of multiple fibrous structures through three-dimensional reconstruction.

Next, the AI-based CryoNet software developed by Qiangfeng Zhang's team at Tsinghua University was used to automatically build models, and the three-dimensional structures of two highly similar helical fibrous proteins were obtained. The Nieng Yan's team named them TLP-1a and TLP-1b, each with a diameter of about 8 nm.

The picture shows a snapshot of the Tsinghua Lotus Pond and a schematic diagram of the CryoSeek strategy.

Fig. 1 Discovery of natural bioentities via the strategy CryoSeek. (Wang, et al., 2024)

Further bioinformatics analysis showed that the two fibrous proteins, TLP-1a and TLP-1b, have unique shapes and thicknesses and come from completely unknown species. The research team believes that they are likely to be pili used by some bacteria for material transfer and auxiliary movement.

Overall, this study demonstrates a paradigm shift in Structural Biology. Previously, structural biology was always applied to known substances, but now, when the protein sequence and origin are completely unknown, the material identification and function prediction of unknown biological entities are achieved based entirely on high-resolution structural determination, making structural biology a driving force for exploring completely unknown substances.

In addition, the CryoSeek strategy proposed in this study can also be extended to identify biological entities from rivers, oceans, raindrops, and even from extreme environments such as the deep sea, hydrothermal vents, and even space, thus helping to expand structural biology to structural X-ology, such as structural pathology, structural ecology, and structural archeology.

Further Research and Discovery

On December 15, 2024, Nieng Yan, Zhangqiang Li and others published a research paper titled "The 8-nm spaghetti: well-structured glycans coating linear tetrapeptide repeats discovered from freshwater with CryoSeek" on the preprint platform bioRxiv.

In this new study, the research team discovered a highly glycosylated protein fiber, TLP-4b, whose main molecular mass is attributed to a thick glycan shell. Since multiple AI-assisted software could not automatically build its protein structure, the research team manually built it and determined its structure. The 3.3 Å resolution cryo-EM structure reconstruction revealed the only protein component of the glycoprotein fiber, a tetrapeptide repeating linear polypeptide chain with a diameter of about 8 nm. Each tetrapeptide repeat sequence consists of one conserved diHyp, one serine or threonine, and two less conserved amino acid residues. Among them, the 3-OH and 4-OH of 3,4-dihydroxyproline are highly O-glycosylated, and serine or threonine also has O-Glycosylation.

(A) High-resolution 3D reconstruction of TLP-4b with helical parameters. (B) Three glycan branches of TLP-4b.

Fig. 2 TLP-4b consists of linear tetrapeptide repeats coated with dense glycans. (Wang, et al., 2024)

In the three-dimensional reconstructed segment, the fiber structure is highly regular, and its folding form is completely maintained by the interaction between carbohydrates, and due to the high repeatability, the assembly of these carbohydrates is also highly ordered. By calculating the ratio of amino acids and carbohydrates, it was found that the mass proportion of carbohydrates in the fiber structure reached an astonishing 95%.

Overall, this study reveals the key role of glycans in the folding of Glycoconjugate structures and helps to understand the carbon/nitrogen ratio in the biosphere. This study also further demonstrates CryoSeek's ability to discover completely unknown biological entities, and is expected to become the starting point for a series of new studies.

Related Services & Products

Reference

  1. Wang, T., et al. (2025). CryoSeek II: Cryo-EM analysis of glycofibrils from freshwater reveals well-structured glycans coating linear tetrapeptide repeats. Proceedings of the National Academy of Sciences, 122(1), e2423943122. DOI: 1073/pnas.2423943122.
  2. Wang, T., et al. (2024). CryoSeek: A strategy for bioentity discovery using cryoelectron microscopy. Proceedings of the National Academy of Sciences, 121(42), e2417046121. DOI: 1073/pnas.2417046121.
  3. Wang, T., et al. (2024). The 8-nm spaghetti: well-structured glycans coating linear tetrapeptide repeats discovered from freshwater with CryoSeek. bioRxiv, 2024-12. DOI: 1101/2024.12.15.627649.
Similar Posts

About Us

CD BioGlyco is a world-class biotechnology company with offices in many countries. Our products and services provide a viable option to what is otherwise available.

Contact Us

Copyright © CD BioGlyco. All rights reserved.
0