Identification of immune infiltration-related genes in HCC patients
According to the criteria for DEGs, we used the DESeq2 algorithm and identified 5010 DEGs between 374 TCGA-HCC samples and 50 normal liver samples. The 5010 identified DEGs and 636 immune infiltration-related genes obtained from the CIBERSORTX database were used for Venn diagram analysis. Through the above analysis, we obtained 89 immune infiltration-related genes in HCC (Fig. 1A). Then, we verified the expression of the 89 immune infiltration-related genes in the GSE25097, GSE87630 and GSE89377 data sets from the GEO database (Fig. 1B, Fig. S1, and Fig. S2). We conducted further enrichment analysis to explore the functions of the selected genes. The genes were significantly enriched in neutrophil chemotaxis, neutrophil migration, the external side of the plasma membrane, tertiary granule lumen, chemokine activity, and chemokine receptor binding (Fig. 1C). Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis showed that viral protein interaction with cytokine and cytokine receptor, cytokine-cytokine receptor interaction, and chemokine signalling pathway were all significantly enriched (Fig. 1D). The complete results of the enrichment analysis are shown in Table S2.
Construction and assessment of the risk scoring system
First, univariate Cox regression analysis was performed to explore the relationship between the expression levels of 89 immune infiltration-related genes and the OS times of patients in the TCGA-HCC cohort. Using the cut-off value of Cox P < 0.05, 27 potential predictive genes related to OS were screened out (Table S3). Then, LASSO regression analysis was used to refine the gene sets (Fig. 2A, B). Seventeen genes were identified as the most valuable predictive genes, and the risk scoring system was established based on the above formula (Table 1). Kaplan–Meier analysis of the 17 genes is shown in Fig. S3.
To observe the expression of these genes in HCC and normal liver tissues, we further conducted research using immunohistochemical data from the Human Protein Atlas (HPA) database. The results are shown in Fig. 3. The immunohistochemical data of some genes were temporarily unavailable from the HPA database.
The risk score of each patient in the TCGA-HCC data set was calculated based on the expression levels and regression coefficients of the 17 immune infiltration-related genes. The distribution of risk scores in the TCGA-HCC data set is shown in Fig. 4A. According to the median risk score, the patients in the TCGA-HCC cohort were divided into high-risk and low-risk groups. In addition, the survival time distribution indicated that the higher the risk score was, the worse the prognosis (Fig. 4A). Figure 4A also shows the corresponding expression levels of the 17 immune infiltration-related genes. The performance of the risk scoring system according to the time ROC curves in terms of 1-year, 3-year, and 5-year prognoses is shown in Fig. 4B. The areas under the time ROC curves (AUCs) were 0.766, 0.757, and 0.773 for the 1-year, 3-year, and 5-year OS times, respectively, in the TCGA-HCC cohort. Kaplan–Meier analysis and the log-rank test showed that the prognosis of the high-risk group was significantly worse than that of the low-risk group (P < 0.001; Fig. 4C).
Correlation between the risk score and clinical features
We also analysed the association between the risk score and the clinical features of patients in the TCGA-HCC cohort. We found significant differences between the risk score and the following clinical features (Fig. 5 A–F): histological grade (G1&2 vs. G3&G4, P < 0.001), T stage (T1&T2 vs. T3&T4, P < 0.01), residual tumour (R0 vs. R1&R2, P < 0.01), pathologic stage (stage 1 & stage 2 vs. stage 3&stage 4, P < 0.01), vascular invasion (no vs. yes, P < 0.05) and AFP (≤400 vs. > 400, P < 0.05).
Construction and verification of the nomogram
First, we performed univariate and multivariate Cox regression analyses of potential predictors, such as T stage, gender, age, residual tumour, histologic grade, AFP, vascular invasion, tumour status, and risk group, that may affect the prognosis of HCC patients (Table 2). The results showed that T stage, tumour status, and risk group were independent risk factors for OS in HCC patients. The independent predictors, including T stage, tumour status, and risk group, which affect the OS of HCC patients, were incorporated into the nomogram model (Fig. 6A). The C-index of the nomogram model we established was 0.692 (0.664–0.720). Then, we calculated the score of each HCC patient based on the nomogram and evaluated the predictive ability of the nomogram through ROC analysis. In the TCGA-HCC cohort, the nomogram AUCs for the 1-year, 3-year, and 5-year OS rates were 0.755, 0.781, and 0.832, respectively (Fig. 6B). Moreover, we used the calibration curve to evaluate the agreement of the nomogram. Compared with the ideal model, the calibration plots of the nomogram model showed good agreement for the 1-year, 3-year, and 5-year OS rates (Fig. 6C).
To reveal the potential impact of immune infiltration-related genes on the occurrence and development of HCC, we performed GSEA on the DEGs between the high-risk group and the low-risk group. GSEA showed that the DEGs between the high-risk group and low-risk group were mainly enriched in several pathways, including disease, matrisome, haemostasis, innate immune system, metabolism of lipids, transport of small molecules, infectious disease, metabolism of amino acids and derivatives, vesicle-mediated transport, and adaptive immune system (Fig. 7). These findings suggested that immune infiltration-related genes may play a potential role in amino acid and lipid metabolism, matrisome and small molecule transportation, immune system regulation, and hepatitis virus infection in HCC.
Immune cell infiltration level analysis
We also calculated the correlation between this prognostic model based on patients in the TCGA-HCC cohort and immune cell infiltration. The results showed that the high-risk group showed lower levels of immune cell infiltration, such as B cells (P < 0.01), CD8 T cells (P < 0.001), neutrophils (P < 0.001), DCs (P < 0.001), Tregs (P < 0.01), and NK cells (P < 0.001) (Fig. 8A). Moreover, the risk score was negatively correlated with infiltrating immune cells, including B cells, CD8 T cells, neutrophils, DCs, Tregs, and NK cells (Fig. 8B-G).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.