Hepatocellular carcinoma (HCC) ranks among the malignancies with the highest mortality rates globally. Current clinical surveillance strategies primarily rely on abdominal ultrasound combined with alpha-fetoprotein (AFP) testing; however, the insufficient sensitivity of AFP often leads to the missed diagnosis of a significant number of patients in the early stages of the disease. In recent years, liquid biopsy technologies have flourished, with serum N-glycomics garnering particular attention due to its capacity to systematically reflect hepatic synthetic function.
In January 2026, a team led by Haojie Lu from Fudan University, in collaboration with other researchers, published a study in Nature Communications titled "Large-scale serum N-glycomics tracks N-glycosylation dynamics in hepatocellular carcinoma progression and enables early diagnosis." By analyzing serum N-glycomic data from 1,074 subjects—integrating glycoproteomics, transcriptomics, and machine learning—this study offers a novel perspective on the precise diagnosis and mechanistic investigation of HCC.

Fig. 1 Overview of study design and workflow. (Fu, et al. 2026)
The study enrolled a total of 1,074 subjects, spanning four distinct disease stages: healthy controls, chronic hepatitis B, liver cirrhosis, and hepatocellular carcinoma. The primary discovery cohort (GZ-I) comprised 744 samples, while two independent external validation cohorts—GZ-II (186 samples) and XZ (144 samples)—were established for verification purposes. The research team employed a high-throughput workflow involving N-glycan release, derivatization, HILIC enrichment, and MALDI-TOF mass spectrometry to systematically profile the serum N-glycome across all samples. Quality control data demonstrated a median coefficient of variation as low as 0.08, with median Pearson correlation coefficients reaching 0.98 both between batches and across cohorts, thereby ensuring the reliability of the data.
Within the GZ-I cohort, the study identified 201 distinct N-glycan compositions; following rigorous screening, 64 structurally defined glycans were selected for downstream analysis. Furthermore, 120 samples underwent additional LC-MS/MS-based glycoproteomic and proteomic profiling, while 186 samples were subjected to derivatization analysis specifically targeting sialic acid linkages.
As the liver serves as the primary site for the synthesis of serum glycoproteins, alterations in the serum N-glycome can sensitively capture even subtle changes in hepatic function. This study systematically evaluated the associations between N-glycomic profiles and conventional liver function markers, as well as the ALBI score and Child-Pugh classification. The findings revealed:
Subgroup analyses confirmed that these associations were particularly robust in patients with cirrhosis and HCC, suggesting that serum N-glycomics serves not merely as a disease biomarker, but also as an effective tool for assessing hepatic synthetic and metabolic functions.
Differential analysis revealed that 48 out of 64 identified glycans were significantly dysregulated during disease progression, exhibiting distinct stage-specific patterns:
The study clustered the 48 differentially expressed glycans into five Glycan Co-expression Modules (GCMs), each exhibiting a distinct expression trajectory:
At the structural level, HCC patients presented a composite phenotype characterized by a decrease in bi-antennary glycans, an increase in tri-/tetra-antennary glycans, reduced galactosylation, and elevated levels of bisecting GlcNAc and fucosylation. Analysis of sialic acid linkages further revealed a specific reduction in α2,6-linked sialylation in patients with cirrhosis.
After controlling for liver function parameters through covariate analysis, 30 glycans remained significantly differentially expressed, demonstrating that these glycosylation alterations are not merely passive concomitants of liver dysfunction but are, rather, intimately linked to specific pathological processes.
Based on unsupervised consensus clustering, patients with hepatocellular carcinoma (HCC) were classified into three distinct N-glycome molecular subtypes, each characterized by unique clinical features:
While Subtypes 1 and 3 exhibit no significant differences in terms of liver function or tumor stage, their diametrically opposed glycosylation patterns suggest the potential involvement of distinct pathogenic mechanisms. Patients in Subtype 2 are more frequently found in earlier pathological stages, thereby providing a potential basis for clinical risk stratification.
Serum glycomics reflects the collective signal of circulating glycoproteins; however, precisely which proteins—and which specific sites—contribute to disease-associated glycans? Through a data-dependent acquisition (DDA)-based glycoproteomic analysis of 120 samples, this study identified 3,057 glycopeptides, 2,824 site-specific glycans, 276 glycosylation sites, and 168 glycoproteins.
By constructing pseudo-glycome profiles (generated by aggregating the signals of identical glycan compositions across all identified sites and proteins), the study revealed that the average Pearson correlation coefficient between the pseudo-glycome and the actual serum glycome reached an impressive 0.95. This finding demonstrates that data at the site-specific level can effectively account for and explain observed serum glycome phenotypes. More importantly, different glycans exhibit highly heterogeneous site origins:
Differential analysis distinguished two driving mechanisms:
This finding suggests that serum glycomics data must be interpreted in conjunction with changes in protein abundance to accurately decipher their underlying biological origins.
Utilizing the TCGA-LIHC dataset and seven GEO datasets, this study systematically characterized the expression profiles of 307 glycogenes involved in the N-glycosylation pathway within HCC tumor tissues. Differential analysis revealed that 118 glycosylation-related genes were significantly upregulated, while only two were downregulated, indicating a state of global activation:
Notably, although the β1,4-galactosyltransferases B4GALT2/3/4/6 were all upregulated, B4GALT1—the predominant β1,4-galactosyltransferase in human liver—was not significant in the TCGA-LIHC dataset but has been reported elsewhere to be downregulated in HCC tissues. This may explain why an overall decrease in galactosylation was still observed at the glycomic level. Survival analysis further revealed that specific glycosylation-related genes, such as B4GALT2 and B4GALT3, were significantly associated with patient prognosis.
This study selected 26 high-abundance glycans with a missing rate below 1% to construct four diagnostic models using an AutoML framework:
Key performance highlights include:
Furthermore, the Model 4-derived probability score (PP4) showed significant correlations with ALBI staging, Child-Pugh classification, and TNM staging, suggesting that it serves not only to distinguish between benign and malignant conditions but also to reflect hepatic functional reserve and tumor burden.
SHAP interpretability analysis revealed that H6N5F1S3, H5N4F1, and H3N4F1 consistently ranked as the top three most important features across all models, closely followed by H5N2, H6N5S3, and H5N4S1. Training a model using these six glycans alone resulted in virtually no significant decline in performance, demonstrating that—through synergistic, non-linear interactions—they constitute the diagnostic core of the system.
Through the utilization of large-scale cohorts and multi-omics integration, this study systematically elucidated the dynamic patterns of the serum N-glycome during the pathogenesis and progression of HCC. Its core value is manifested across three levels:
First, the study confirms that the serum N-glycome serves as a sensitive indicator for assessing liver function; the changes in galactosylation and sialylation reflected therein are closely correlated with hepatocyte synthetic capacity and Golgi functional integrity, holding promise as a complement to existing liver function scoring systems.
Second, the study elucidates the dual origins of alterations in the serum glycome associated with HCC: these encompass both protein-driven effects—arising from changes in the abundance of acute-phase proteins such as A1AG1—and glycosylation-driven effects, stemming from changes in specific glycoforms at sites such as A2MG N869. This deconstruction of multi-level regulation provides a more precise molecular framework for understanding tumor-associated glycosylation.
Finally, a machine learning model based on 26 specific glycans was validated across multiple external cohorts, demonstrating diagnostic performance superior to that of AFP—particularly among AFP-negative patients—thereby offering a clinically viable screening tool for non-invasive surveillance.
Admittedly, the current study is primarily based on a Chinese cohort of patients with hepatitis B-related HCC; future research will require further validation in Western populations with HCC associated with non-alcoholic fatty liver disease or alcoholic liver disease. Furthermore, the lack of longitudinal follow-up data currently precludes a direct assessment of the model's capacity to predict the risk of progression from cirrhosis to HCC. As the throughput of glycomics assays continues to rise and AI algorithms undergo iterative optimization, serum N-glycome analysis is poised to become an integral component of a precision monitoring system for HCC, driving the evolution of liver disease management from reliance on traditional protein biomarkers toward a multi-dimensional molecular diagnostic paradigm.
Reference
1. Fu, B., et al. (2026). Large-scale serum N-glycomics tracks N-glycosylation dynamics in hepatocellular carcinoma progression and enables early diagnosis. Nature Communications. DOI: 10.1038/s41467-026-68579-x.