DEGs identification, Gene Ontology (GO) and pathway enrichment analysis

The GEO database was utilized to obtain gene expression profile datasets in peripheral blood of septic patients. Four datasets (GSE28750, GSE57065, GSE65682 and GSE69528) representing different populations from Australia, France, Malta and United States were first obtained from the GEO database. The number of sepsis samples in GSE28750, GSE57065, GSE65682 and GSE69528 was 10, 26, 51 and 83 respectively, and the number of control samples was 20, 25, 42 and 28 respectively.

The limma R package was used to screened out the DEGs. As a result, 1662, 1340, 2603 and 1359 DEGs were identified from each dataset. After integrated bioinformatical analysis, a total of 444 common DEGs were identified (Fig. 2A, Supplementary Table 2), including 246 up-regulated and 198 down-regulated genes (Fig. 2B).

Fig. 2
figure 2

Consistent DEGs screening, GO enrichment and pathway enrichment analysis. (A and B) Identification of consistently changed DEGs from the four datasets (GSE28750, GSE57065, GSE65682 and GSE69528). The 444 common DEGs can be classified into 246 up-regulated and 198 down-regulated genes . Each color area repersented the corresponding dataset. (C, D and E) The results of GO analysis for the common DEGs were shown in three groups: cellular component (C), molecular function (D), and biological process (E). F The top 10 significant GO terms were shown in a GOCircle plot. The height of bars in the inner ring indicated the -log10 (P values) of GO terms, with higher bars representing higher significance. The colors of these bars indicated the z-score (standard score), with darker colors representing larger absolute value. The scatter plots in the out ring showed the regulation of each gene in the corresponding GO terms, with red representing up-regulated and blue representing down-regulated. The descriptions of GO categories were displayed in the table by the side. G The common DEGs and their linked GO terms were showed by GO chord plot. Different colors corresponding to the genes indicated different fold change levels. H Signaling pathway enrichment analysis for the common DEGs. DEGs, Differentially Expressed Genes. GO, Gene Ontology

To better understand biological meanings of the common DEGs, GO analysis was conducted with DAVID. The immune response, T cell receptor complex and MHC class II protein receptor activity were the most significant terms for the category of biological process, cellular component and molecular function respectively. (Fig. 2C-E). GO analysis of up-regulated and down-regulated genes was also performed separately (Supplementary Table 3). Moreover, the GOCircle plots showed the top 10 most significant GO terms (Fig. 2F), and genes involved in the the top 5 terms were exhibited using a chord plot (Fig. 2G).

Pathway enrichment analysis for the common DEGs was conducted using “KEGG PATHWAY”, “Reactome”, “Biocyc” and “Panther” databases with KOBAS 3.0 tool. Results showed that they were most significantly enriched in neutrophil deregulation and immune system (Fig. 2H). Besides, pathway analysis of up-regulated and down-regulated genes was also performed separately (Supplementary Table 4).

Weighted Gene Co-expression Network Analysis (WGCNA) and module detection

The gene expression matrixes of GSE28750, GSE57065, GSE65682, and GSE69528 were respectively clustered using Pearson’s correlation coefficient according to the expression profiles of the 444 common DEGs in these datasets. Clustering trees for each dataset were established and no outliers were found (Fig. 3A-D). Next, the gene modules, which represented groups of genes with similar patterns of expression, were calculated. Four gene modules were finally identified by the hierarchical clustering dendrogram. And the gray module represented genes that cannot be clustered into any other modules (Fig. 3E). Among the modules, the turquoise one was the largest, which contained many genes related to hemopoietic stem cell differentiation, such as CD4, ITGAM and IL1R1. And the blue module contained many genes such as TRAT1, ZAP70, CD8A, and CD3E, which were related to T cell activation, differentiation, receptor binding and costimulation. Therefore, this module was likely T-cell specific. Heatmap was constructed to visualize the gene co-expression network (Fig. 3F).

Fig. 3
figure 3

WGCNA analysis and module identification. A-D Sample clustering dendrogram to detect outliers in WGCNA. All samples from the four datasets (GSE28750, GSE57065, GSE65682, and GSE69528) had passed the cuts and most of the samples with the same disease were clustered together. E Clustering dendrograms of genes, based on topological overlap, together with assigned module colors. As a result, 4 co-expression modules were constructed and were shown in different colors. F The gene co-expression network was visualized in the form of heatmap. Light color represented low co-expression and progressively darker red color represented higher co-expression. The darker colors along the diagonal were the modules. G Module-trait associations. Each row corresponded to a module eigengene, and the column to the traits (diagnosed sepsis or healthy). Each cell contained the corresponding correlation and p-value. The table was color-coded by correlation according to the color legend

Screening for clinically related modules and genes

Module eigengene is the first principal component of a given module, which can be considered a representative of the gene expression profiles in a module. The correlation between each module eigengene and clinical phenotypes was calculated [26]. The results showed that the turquoise module had the strongest association with sepsis (Fig. 3G). So, for each gene in this module, gene significance (GS) was calculated to evaluate the correlation between gene expression level and sepsis. Fifteen genes were identified according to GS value (Supplementary Table 5). And many of these genes (such as CD177 [27, 28], S100A12 [29, 30], and CLEC4D [31]) played a critical role in sepsis pathology.

Identification of hub genes and blocks using protein-protein interaction (PPI) network

The activity of protein-protein interactions is considered to be the prime target of cellular biology study and works as a precondition for system biology. Proteins perform their operation inside a cell with the interaction of another protein, and information that is produced from a PPI network raises perception about the function of the proteins [23]. For the reasons above, the proteins corresponding to the common DEGs were constructed into a PPI network using the STRING database (Fig. 4A). The network was composed of 369 nodes (proteins) and 2032 edges (interactions), and 75 of the 444 genes were filtered out.

Fig. 4
figure 4

PPI network construction and significant block screening. A A total of 369 proteins corresponding to the common DEGs were screened out and constructed into the PPI network. The two highlighted circle areas were the most significant protein blocks. B and C were the details of the two blocks

Nodes that have the most interactions were considered as hub genes [23]. Among the 369 nodes, 24 were identified as the hub genes with the criteria of node degree > 35 (Supplementary Table 6), meaning that each protein expressed from these genes has more than 35 interactions. It is worth noting that many of these proteins, such as MPO [32] and CD28 [33], have been reported to play a role in sepsis. Other proteins like TLR8 could act as a potential therapeutic target [34].

Then the Molecular complex detection (MCODE) plug-in was subsequently applied to select the significant blocks in the PPI network. Two significant blocks with the highest scores were screened out. Block 1 consisted of 21 nodes and 208 edges, while block 2 was composed of 44 nodes and 371 edges (Fig. 4B-C). Notably, ARG1 was located in the central position of block 1 (Fig. 4B).

Identification of ARG1 as a key gene in sepsis

Then genes most relevant to sepsis screened by WGCNA (Supplementary Table 5) was compared with hub genes with more than 35 interactions identified by the PPI network (Supplementary Table 6). ARG1 was found to be the only one overlapped gene in both results (Fig. 5), indicating that this gene was not only highly correlated with the clinical phenotype of sepsis, but also played a hub role in protein-protein interactions. At the same time, ARG1 was also located in the central position in block 1 of the PPI network (Fig. 4B). These results showed that ARG1 was a key gene in sepsis.

Fig. 5
figure 5

Identification of ARG1 as the key gene. The genes screened by WGCNA with the greatest GS values were compared with hub genes with more than 35 interactions identified through the PPI network. ARG1 was the only one that existed in both groups

ARG1 is sharply up-regulated in the whole blood cells of septic patients

In order to verify the role of ARG1 in sepsis, more GSE datasets were brought into our analysis and validation system. The number of sepsis samples in GSE95233, GSE134347, GSE154918, GSE13015, GSE60424, GSE131761, GSE8121, GSE26378, GSE26440 and GSE145227 was 51, 156, 39, 29, 3, 81, 60, 82, 98 and 10 respectively, and the number of control samples was 22, 83, 40, 5, 4, 15, 15, 21, 32 and 12 respectively. Of these datasets: (i) Six were from studies conducted in adults and four in pediatric subjects; (ii) Five were from studies that took place in North America, four in Europe and one in Asia; (iii) Eight were performed using microarray and 2 using RNA-seq. Across 10 datasets, a significant increase in transcript abundance of ARG1 was observed in the peripheral blood of septic patients compared with that in the control groups (Fig. 6), regardless of ethnicity, age, or experimental settings. Besides, ROC curves generated from these datasets further confirmed the role of ARG1 in sepsis (Fig. 6). A good biomarker should exhibit high sensitivity (the fraction of correctly identified true positives) and specificity (the fraction of correctly identified true negatives), while the sensitivity and specificity are reflected by the area under the curves (AUC) value in the ROC curve. The AUC value for ARG1 in all plots were equal or close to 1, indicating the diagnostic character of this gene (Fig. 6).

Fig. 6
figure 6

The upregulation of ARG1 in septic individuals compared to controls. The plots represented transcript abundance of ARG1 in peripheral blood, as measured by microarray or RNA-seq. The first six were conducted in adults and the following in pediatric subjects. The ROC curve for each dataset was located below the corresponding dot plot. The area under the curves (AUC) for all ROC curves were used to predict diagnostic value

ARG1 helps to make an accurate diagnosis, discriminate the severity and predict the treatment response of sepsis

Considering the high expression of ARG1 in sepsis, we next investigated whether ARG1 played a role in distinguishing sepsis from diseases with similar symptoms. We only found two datasets (GSE131411 from Spain and GSE131761 from Italy) that contained peripherial blood samples from both septic and non-septic shock cases. The number of septic shock cases in GSE131411 and GSE131761 was 63 and 81 respectively, and the number of non-septic shock cases was 33 and 30 respectively. We found that the expression levels of ARG1 were significantly higher in septic shock compared with non-septic shock (Fig. 7A-B). Since septic shock is a severe form of sepsis, and shares similar signs and symptoms with non-septic shock, it is of great value to utilize ARG1 as a potential biomarker to distinguish the two conditions in clinical practice.

Fig. 7
figure 7

ARG1 could play a role in distinguishing sepsis from other similar diseases, predicting the response of treatment, and reflecting the severity of sepsis. A-B ARG1 was upregulated in peripheral blood of patients from Spain (A) and Italy (B) with septic shock compared with non-septic shock. The plots represented the transcript abundance of ARG1, as measured by microarray or RNA-seq. C Dataset from the USA showed the expression level of ARG1 gene in severe sepsis and lethal sepsis was significantly higher than that in uncomplicated sepsis. D Dataset from Germany showed ARG1 was significantly up-regulated in patients with septic shock compared with general sepsis. E Dataset from Italy showed a significant increase of ARG1 expression in non-responders to the early stage of treatment compared with responders

Furthermore, since the GSE63042 dataset contained 28 lethal sepsis cases, 21 severe sepsis cases, and 24 uncomplicated sepsis cases, we further revealed the role of ARG1 in discriminating the severity of this disease. In this set of data, the expression level of ARG1 in severe sepsis and lethal sepsis was significantly higher than that in uncomplicated sepsis (Fig. 7C). Moreover, ARG1 expression was also found up-regulated in patients with septic shock (20 cases) compared with patients with general sepsis (19 cases) based on the dataset from Germany (Fig. 7D). These findings indicated that quantification of the expression level of ARG1 may help to identify those at the greatest risk of progression and mortality.

Besides, our following investigations found that ARG1 could also act as an indicator for judging whether it is responsive to early supportive therapy. In the dataset from Italy, patients received a blood check at Intensive Care Unit (ICU) admission at first, and then their responses to the early symptomatic treatment were recorded in the next few days. No significant difference was found between 32 responders and 24 non-responders regarding the source of infection, circulating markers of inflammation, or leukocyte and lymphocyte counts [35]. Interestingly, ARG1 was high expressed in non-responders compared with responders of septic patients (P = 0.0017) (Fig. 7E). This finding indicated that ARG1 may play a role in establishing the treatment response, and be helpful to predict whether early treatment for sepsis is effective.

Validation of ARG1 as a key biomarker using quantitative real-time PCR

To verify the high expression of ARG1 in sepsis, cecal ligation and puncture (CLP) was performed on mice to induce experimental sepsis. The quantitative real-time PCR showed that the transcription abundance of AGR1 increased dramatically in the peripheral blood of septic mice (Fig. 8), demonstrating that ARG1 is highly correlated with sepsis and have potential to act as a key biomarker.

Fig. 8
figure 8

Validation of ARG1 as a key biomarker of sepsis. Real-time PCR showed that ARG1 was sharply up-regulated in CLP-induced septic mice. N = 7 for each group. *** P < 0.001

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.


This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (