Identification of SPX domain proteins from algae to legumes

While in several plant species four families of SPX proteins were characterized, much less is known about these proteins in legumes: in soybean and common bean just 10 and 3 members of class 1 were characterized and no SPX proteins in M. truncatula. Therefore, we intended to characterize this protein family in these legume species and set it into evolutionary context by analysis of SPX proteins from algae and basal plants. Sequences of SPX proteins were obtained by BLASTP searches at EnsemblPlants from the legumes (G. max, P. vulgaris, and M. truncatula), moss (P. patens), liverwort (M. polymorpha), lycophyte (S. moellendorffii), basal angiosperms (P. somniferum, A. trichopoda, and N. colorata) rhodophytes (C. merolae, G. sulphuraria, and C. crispus), chlorophytes (C. reinhardtii and O. lucimarinus), and charophytes (C. braunii) protein databases using full-length amino acid sequences of SPXs from Arabidopsis (20 proteins). After removing sequences lacking the SPX domains and redundant and partial sequences, we compiled all SPX proteins in the latest version of protein database in EnsemblPlants for these 15 species. Some proteins were shorter than 200 aa and were excluded from further analyses, including four short proteins of soybean (GLYMA_12G154800, GLYMA_10G097000, GLYMA_09G098200, GLYMA_20G032200), two partial proteins of common bean (PHAVU_010G0720001g, PHAVU_010G0720000g), one protein of M. truncatula (MTR_8g058603). In addition, we excluded one protein of M. truncatula (MTR_0262S0060), where its corresponding gene is located on a scaffold but not chromosomes, and one protein of common bean (PHAVU_007g1245000g), which had different structure from other SPX genes. Finally, 34 SPX proteins in G. max, 19 in M. truncatula, 17 in P. vulgaris, 22 in P. patens, 10 in M. polymorpha, 2 in C. merolae, 4 in G. sulphuraria, 2 in C. crispus, 5 in C. reinhardtii, 2 in O. lucimarinus, 4 in C. braunii, 42 in P. somniferum, 11 in A. trichopoda, 16 in N. colorata, and 31 in S. moellendorffii were identified (Supplemental Table S1). Furthermore, the proteins were classified into the four subfamilies based on their additional domains. Interestingly, in some algae and basal plants, we found extra domains that have not been previously reported (Fig. 1). Totally, among these species, class EXS was with 88 proteins the largest, followed by SPX class with 48 proteins and MFS and RING classes containing 29 and 26 proteins, respectively. Subsequently, the corresponding SPX genes in soybean, M. truncatula and common bean were named in each subfamily based on their chromosomal positions (Supplemental Table S1).

Fig. 1

Evolution and frequency of genes in different SPX classes from algae to current Angiosperms. The species tree was constructed based on protein sequences of identified SPXs. Types of classes are shown in different colored boxes, the numbers in boxes represent the number of identified genes in each class while the total number of identified SPXs in each species is written in red on the branches

As can be seen in the Fig. 1, all basal and current angiosperms possess only the four main classes of SPX proteins. On the other hand, some additional domains were observed in liverwort, lycophyte, and algae based on Pfam and CDD scanning of sequences; SPX-VTC (vacuolar transporter chaperone), EIN3-SPX (Ethylene intensive 3), SPX-CitMHS (Citrate transporter), SPX-Na_sulph_symp (sodium sulphate symporter), SPX-RING-BET (Bromodomain extra-terminal-transcription regulation), S6PP_C-SPX (Sucrose-6F-phosphate phosphohydrolase C-terminal), EIN3-S6PP_C-SPX, Kelch-SPX (Galactose oxidase), SPX-EXS-rve, and SPX-Sugar_tr (Fig. 1). The exact roles of these additional domains in the basal plants and algae are not completely known. It was previously reported that in some SPX proteins, SPX domain was located at C terminal instead of N terminal [33]. Indeed, we observed this structure in 4 different classes in S. moellendorffii, including EIN3-S6PP_C-SPX, Kelch-SPX, EIN3-SPX, and S6PP_C-SPX.

Predicted physiochemical and biochemical parameters of these SPX proteins in legume crops are listed in Supplemental Table S1. Indeed, members of the same subfamily have similar properties. The most variation in physiochemical parameters was observed in EXS class, while MFS class was the most similar. For example, lengths of all SPX-MFS proteins in the three species ranged from 691 to 700 aa, but the corresponding SPX-EXS proteins ranged from 475 to 1570 aa with the MtEXSs having the largest proteins in comparison with soybean and common bean. SPX-EXS and SPX-RING classes have the highest isoelectric point (pI), above 9 and 8, respectively. The calculated values for aliphatic index of SPX proteins show that the SPX-MFS subfamily have most thermostability, with a range of 105 to 111. GRAVY value (grand average of hydropathicity) is the sum of the hydropathy values of all amino acids divided by the protein length. Except for the proteins in the SPX-MFS subfamily, nearly all of the GmSPXs are hydrophilic, with a GRAVY value less than 0. Subcellular localization prediction performed with Wolf PSORT revealed that most of the GmSPX proteins are located in the plasma membrane or endomembrane system, followed by nucleus and chloroplast. In PSORT results, all members of SPX-EXS and SPX-MFS subfamilies were located in the plasma membrane, and all members of SPX-RING were located in nucleus, corresponding to the known functions of representatives of these subfamilies in Arabidopsis.

Phylogenetic tree

Multiple alignment of the SPX protein sequences from soybean, M. truncatula, common bean, Arabidopsis, rice, wheat, rapeseed, A. trichopoda, C. braunii, C. reinhardtii, C. crispus, C. merolae, G. sulphuraria, M. polymorpha, N. colorata, O. lucimarinus, P. somniferum, P. patens, and S. moellendorffii, as well as proteins from mouse, human, and Caenorhabditis elegans as an out-group, followed by phylogenetic analysis revealed four distinct clades of SPX proteins, SPX, EXS, MFS, and RING (Fig. 2). This topology and distinct separation of four classes are consistent with previous studies on SPX gene family [3, 12, 13, 27, 34]. SPX and EXS sequences formed two distinct clades, while MFS and RING along with box. C (OSTLU26654.EXS, CHC T00007225001.SPX.CitMHS, CHLRE 09g251650V5.SPX.Na_Sulph_symp, C5167 020395.NLA.BET, Gsu16460.SPX.NLA, and CMP022C.SPX) and box. D (Gsu35240.SPX.Na_sulph_symp) have diverged from a common ancestor and form the third major clade. SPX clade was divided into three sub-clades; SPX-I, SPX-II, and SPX-III. SPX-II and SPX-III are specific to the basal and current angiosperms and the proteins in these two sub-clades are homologs of AtSPX3 and ATSPX1/2, respectively. On the other hand, SPX-I is comprised from homologs of the basal plants (lycophytes, liverwort, moss) and algae and few proteins from the basal and current angiosperms, all being homologs of AtSPX4. Proteins in box A and in box B could be ancient homologs for SPX-I and SPX-II/III, respectively. Likewise, EXS clade was divided into three sub-clades; EXS-I is specific to lower plants (S. moellendorffii, M. polymorpha, and P. patens), EXS-II is a mixed group from monocots, eudicots, and basal angiosperms, all homologs of AtPHO1 and AtPHO1;H1, and EXS-III contain eudicots and the basal angiosperms without any genes of monocots. The outgroup genes used in this study were grouped in box E clustered with EXS clade. Overall, topology of EXS class is consistent with He et al., (2013), in that basal plants (lycophytes and moss) EXS homologs were grouped separately from the angiosperms, and also with the previous reports on EXS genes that monocots only possess homologs for AtPHO1 and AtPHO1;H1 [6, 24, 35].

Fig. 2

Phylogenetic analysis of 218 SPX containing proteins from 19 plant species. The phylogenetic tree was constructed using the Maximum Likelihood method. The SPX genes of Arabidopsis, rice, wheat, rapeseed, M. truncatula, soybean, and common bean are represented with At, Os, Ta, Bna, Mt, Gm, and Pv abbreviations, respectively. Other species are named based on their Gene IDs and their domains. Four different clades are marked in colors: SPX (green), RING (brown), MFS (pink), and EXS (blue). Sub-clades of each clade are shown with light and dark shades of the respective colors. Five boxes show paraphyletic branches; box E comprises the outgroup species

Box C with ancient genes for both MFS and RING and box D as sister for MFS class together with MFS and RING clades seem to have evolved from a common ancestor. MFS homologs in monocots specifically grouped in MFS-I, while MFS-II contained all MFS orthologs from the other species. This could suggest that differentiation among MFS proteins has occurred after the divergence of monocot and dicots from a common ancestor. Similarly, RING clade was divided into two sub-clades, but both contained RING orthologs from all of species; RING-I was grouped with the ancestor from P. patens, while RING-II included S. moellendorffii orthologs as its sister. The overall tree topology is very similar to results of Wang et all (2021), who investigated SPX gene family in chlorophytes and streptophytes, with focus on algae.

Protein motifs gain and loss in SPX family throughout evolution

Conserved protein motifs were predicted using MEME program for each SPX protein class and all species (Additional file 1: Figs. S1 to S5). This analysis may explain when different classes of SPX proteins have appeared and how motifs were gained or lost in each class during the evolution. The ancestral motifs in SPX domains such as motifs 3, 4, 2, and 1 seem to originate from red algae (Additional file 1: Fig. S1). There is a high fluctuation of motif composition during the evolution. Some motifs are species specific like motifs 13, 14, and 19 that are present only in legumes, probably arising after legume whole-genome duplication event. The most variability in the motif composition was observed in S. moellendorffii with some specific motifs like 8, 15, and 18. The lengths of proteins in angiosperms were very similar but shorter than in the basal plants. The EXS domain was detected only in O. lucimarinus with 9 motifs – 9, 6, 5, 2, 3, 11, 4, 10, and 1 (Additional file 1: Fig. S2). Almost all these motifs have been retained during the evolution as ancestral motifs. In addition, some other motifs appeared in C. braunii such as 15, 7, 16, 20, 12, and 8, suggesting they were present in the common ancestor of Chlorophyta and Streptophyta. Although Wang et al. [24] reported one SPX-MFS in M. polymorpha genome, we could not find an intact SPX-MFS domain, but SPX-Sugar_tr domain with a highly similar motif composition with other MFSs was identified (Additional file 1: Fig. S3). As it has previously been reported, PHT5 genes in B. napus have SPX domain connected to overlapping MFS and Sugar_tr domains [36], however, we only found SPX and Sugar-tr domains in M. polymorpha genome. The first SPX-MFS protein was observed in C. braunii with 18 common motifs with other species. Two newly observed motifs in P. patens, motifs 16 and 13, probably have evolved by dispersed duplication in P. patens and have been retained in all basal and current angiosperms. Interestingly, other five MFSs in P. patens, without the motifs 16 and 13, have been no longer found in angiosperms.

The evolutionary oldest NLA has been detected in G. sulphuraria and it was retained during the course of evolution of current angiosperms, but was not found in other Rhodophytes or Chlorophytes. In fact, the only NLA identified in G. sulphuraria just showed two motifs in common with other species, motifs 2 and 3 (Additional file 1: Fig. S4). Therefore, these motifs could be considered as ancestral motifs of NLA class which then further evolved by dispersed duplication in M. polymorpha, adding motifs 8, 7, 1, and 6 into the ancestral domains. One NLA in P. somniferum underwent dispersed duplication and gained motif 10 that has only been retained in the core eudicots, while two NLAs in S. moellendorffii segmentally duplicated and gained two specific motifs 13 and 19. Motif 16 was just observed in legume genomes that might evolved after legume whole-genome duplication (WGD) event. The most variability in motif composition of NLA class was observed in P. somniferum. Motif compositions in the new identified classes showed a high variation and it was impossible to find their ancestral motif (Additional file 1: Fig. S5). However, it could be concluded that SPX-Na_Sulph_sym and SPX-CitMHS with high similarity in the motif composition, probably have similar origin and function. In summary, during the evolution different duplication events added new motifs to the ancestral motifs and other motifs specifically appeared in individual species to acquire new functions.

Consensus sequences of SPX domains from algae to eudicots

We then predicted conserved motifs among all identified SPXs (Additional file 1: Fig. S6). There are four conserved motifs in SPX class members, among them two motifs, 2 and 4, are common in the almost whole span of SPXs. Therefore, we can hypothesize that these two motifs have an important role for all SPXs. Afterwards, consensus sequences of these two motifs were constructed across all phyla (algae, charophytes, liverwort, bryophytes, lycophytes, basal angiosperms, and current angiosperms) and also across each class (SPX, EXS, MFS, RING, and new identified classes) (Additional file 1: Figs. S7-S10). Motif 4 is 29 aa in length and was present in all SPX proteins except the following ten: C5167_005902.EXS, C5167_032842.EXS, C5167_043562.EXS, C5167_043565.EXS, C5167_003186.NLA, C5167_046257.NLA, SELMODRAFT_419593.SPX, SELMODRAFT_419593.SPX, OsSPX4 and PvPHO1. Five amino acid residues, number 5, 9, 15, 19, and 24, were almost 100% conserved, except the fifth residue in C. braunii (Additional file 1: Fig. S7). Regarding conservation in different classes (Additional file 1: Fig. S8), the leucine (residue 9) was completely conserved in the EXS, MFS, RING, and new identified classes, then the phenylalanine (residue 19) was completely conserved in EXS and MFS classes, but SPX class had some members with different residues in these five positions with a very high overall conservation in this class. In addition, each class had other conserved residues, suggesting special functions.

Motif 2 is 21 aa long and was absent in CHLRE_02g111650v5.SPX, AMTR_s00106p00066860.SPX, NC1G0101580.SPX, C5167_011965.SPX, Gasu_57230.SPX, C5167_043539.EXS, SELMODRAFT_450458.EXS, SELMODRAFT_431864.SPX, SELMODRAFT_419593.SPX, and only one protein from the current angiosperm, PvPHO1;5, which is a partial protein. This motif exhibited more conserved residues at positions 1, 7, 8, 14, 15, 16, 17, 18, 20, and 21. Residue 17 was completely conserved in the all proteins containing motif 2 and residues 14, 18, and 21 were conserved in the all proteins except a few in S. moellendorffii and P. sumniferum showing different residues instead of lysine (Additional file 1: Fig. S9). The lysine residues 14, 17, and 21 form a Lysine Surface Cluster (LSC), and were found to interact with sulfate in the crystal structure of human phosphate transporter XPR1, and to be a part of a larger binding site for PP-InsP [9]. Consequently, in the different classes of the SPX proteins (Additional file 1: Fig. S10), some of the 10 conserved positions were completely conserved such as K1, N8, KILKK (14 to 18) in RING and MFS, K18 in SPX, K21 in RING, and N8, I15, K18, as well as K21 were completely conserved across the new identified classes. Overall, these two motifs were conserved in all but a few proteins from S. moellendorffii and P. sumniferum, PvPHO1;5, and OsSPX4, implying that they might possibly interact with InsP/PP-InsP in a different manner, as previously reported for OsSPX4 [9]. In addition, different conserved residues in different classes could suggest that they may have different phosphate-containing ligand or different levels of Pi in cells.

Expansion pattern of SPX genes and collinearity analysis

To pinpoint the expansion modes in the land plants, we investigated duplication types in basal and current angiosperms, liverwort, hornwort, and S. moellendorffii (Fig. 3 and Supplemental Table S2). Taken together, WGD, segmental, and dispersed duplications contributed most to the SPX gene family expansion. The expansion patterns in soybean, P. somniferum, N. colorota, and S. moellendorffii mostly arose from WGD/segmental duplication type. However, S. moellendorffii did not have any WGD events, therefore, its expansion and unique SPX classes must have arisen through local or segmental gene duplication [37]. WGD/segmental duplication type did not participate in the SPX expansion in A. trichopoda and M. polymorpha genomes and it only resulted in one duplicated block in P. patens genome. In these three species, SPX expansion were affected mostly by dispersed duplication type. The high number of WGD/segmental types of duplication in S. moellendorffii, soybean, and P. somniferum can shed light on the reason of high variation of gene family sizes in the closely related plants.

Fig. 3

SPX gene family expansion from algae to the current Angiosperms. Duplication event types were predicted in the P. patens, S. moellendorffii, M. polymorpha, A. trichopoda, P. somniferum, N. colorota, O. sativa, A. thaliana, P. vulgaris, M. truncatula, G. max

To get more information about evolutionary process of genes, collinearity analysis can provide information about conserved genomic regions of genes in different species [38]. Synteny relationship among two or a set of genes from two species means that they located in the same chromosome [39], but collinearity is a specific form of synteny with conserved gene order [40]. Collinearity analysis was conducted in three steps; 1. across P. somniferum, N. colorota, rice, Arabidopsis, and three legumes 2. Among P. somniferum and N. colorota, P. patens, and S. moellendorffii and 3. Among legumes.

Collinearity analysis among legume crops, Arabidopsis, rice, and two basal angiosperms; P. somniferum and N. colorata discovered 121 collinear blocks (Fig. 4, Supplemental Table S3); 30 blocks in Gm/Pv, 23 blocks in Gm/Mt, 15 blocks in Gm/Gm, 14 blocks in Ps/Ps, 10 blocks in Gm/At, 6 blocks in Nc/Nc and Pv/Mt, 3 blocks in Ps/Nc, Mt/At, Pv/At, and Os/Os, 2 blocks in Ps/Gm, and 1 block in At/At, Pv/Pv, Ps/Mt, and Mt/Mt. Rice as the only monocot in this analysis did not show any collinearity relationship for SPX gene family with other species.

Fig. 4

Circular collinearity plot of SPX gene family members among G. max (blue), M. truncatula (pink), P. vulgaris (green), A. thaliana (grey), O. sativa (orange), P. somniferum (yellow), and N. colorota (red). Collinear genes are linked by lines and boxes are representing chromosomes

Collinear SPX genes among P. somniferum, S. moellendorffii, N. colorata, and A. trichopoda were predicted (Supplemental Table S3). S. moellendorffii did not show any collinearity relationship with other species, while N. colorata and P. somniferum had the most inter species collinear relationships [14]. The most intra-genome collinear relationships were found in P. somniferum [14] and S. moellendorffii [10]. The collinear analysis was performed also for the three legume crops (Fig. 5, Supplemental Table S3). Of the 34, 19, and 17 SPX genes in soybean, M. truncatula, and common bean 32, 14, and 15 genes participated in collinear blocks. In total, 78 collinearity blocks between these plant species were discovered. A high level of collinearity relationships was found at 27/30 SPX genes in soybean/common bean and 19/23 SPX genes in soybean/M. truncatula, while the corresponding figure for M. truncatula/common bean was 6/7. However, just 15, 7, and 2 collinearity blocks were found in soybean/soybean, M. truncatula /M. truncatula, and common bean/common bean groups. All in all, after these three collinearity analyses, we concluded that inter-species collinearity patterns among basal angiosperms and among current angiosperms have changed. Across basal angiosperms, SPX class had the least inter-species collinearity, while among Arabidopsis and legumes, SPX showed the most inter-collinearity relationships. It can be concluded that except in SPX class, collinearity in the other classes has been lost.

Fig. 5

Circular collinearity plot of SPX gene family members among G. max, M. truncatula, P. vulgaris. Chromosomes of G. max, M. truncatula and P. vulgaris are respectively in green, red and blue. Links between G. max and M. truncatula are colored red, G. max and P. vulgaris in blue, M. truncatula and P. vulgaris in yellow as well as links within G. max, M. truncatula and P. vulgaris are colored in green, black and pink

Evolution of Cis-acting elements from algae to eudicots

Transcription factors bind to the cis-acting elements (CREs) in the promoter and regulate the transcription of corresponding genes [41]. Therefore, genes with similar expression patterns may contain the same regulatory elements in their promoters [27]. To explore whether transcription factor biding sites have evolved together with the coding regions of SPX genes, 1.5 kb upstream of the transcriptional start sites of all identified SPXs were downloaded and analyzed using PlantCARE database. In total, 124 CREs were detected (Supplemental Table S4) that can be classified in three major groups: responsive to abiotic stresses (drought, low temperature, hypoxia, wounding, defense, and stress), hormones (gibberellin, abscisic acid (ABA), salicylic acid (SA), ethylene, methyl jasmonate (MeJA), and auxin), and development-related elements (endosperm, meristem, MYB, and zein metabolism regulation). After the essential elements in promoter like TATA-box and CAAT-box, the most highly represented cis-acting elements were those involved in response to MeJA (CGTCA-motif and TGAG-motif) and ABA (ABRE and ARE). Since SPX genes play role in P homeostasis we were interested in a presence of P1BS motif (PHR1-binding site important in P starvation response) in the respective promoters. However, the P1BS motif is not included in the PlantCare database. Therefore, additionally, we performed promoter analysis using PlantPan3 for C. reinhardtii and G. max to figure out if this motif existed in both ancient and current species or it just appeared in the new angiosperms. The P1BS motif was found in all five SPX genes in C. reinhardtii and in almost all GmSPXs except GmSPX.PHO1;3 and GmSPX.PHO1;13 (Supplemental Table S4).

Looking for evolutionary pattern in these cis-acting elements, we performed hierarchical clustering on principal components (HCPC) using FactMineR-package. The HCPC grouped the genes into three clusters (Additional file 1: Fig. S11).

Almost all SPXs from the current angiosperms fell into cluster 1 along with 2 SPXs of G. sulphuraria and few SPXs from basal angiosperms (Table 1, Additional file 1: Fig. S11). Cluster 2 comprised mostly genes from basal angiosperms and few members of the current angiosperms, as well as all SPXs of C. reinhardtii and two SPXs from G. sulphuraria. Cluster 3, the smallest cluster, had 12 genes mostly from P. patens and just one SPX of the current angiosperms, AtPHO1;H5. Trying to find an evolutionary pattern across these clusters, we found out that they showed different frequencies of two MeJA responsive elements, TGACG and CGTGA motifs, that in cluster 3 all genes, in cluster 2 around 93%, and in cluster 1 only around 62% of genes possessed these two elements (Table 1). Besides, we extracted the most enriched CREs in each cluster to visualize frequencies of these elements across clusters. As can be seen in the Additional file 1: Fig. S12, CREs involved in the developmental processes (CCGTCC motif, CCGTCC box, A-box) and stress response (DRE core, MYB recognition site, CCAT box) were significantly higher in cluster 2 than in the other clusters. Cluster 1 had higher frequency of two hormone responsive elements, TCA (salicylic acid responsive elements) and ERE (Ethylene-responsive elements) in comparison to the other clusters. Overall, it seems that during the evolution of angiosperms, SPX promoters were enriched by stress responsive elements and hormonal responsive elements, especially ERE and TCA.

Table 1 Number of genes having MeJA and ABA responsiveness elements in their promoter sequence

Finding cis-elements responsive to stresses and hormones triggered two major questions, how SPX genes are regulated by phytohormones and whether SPX genes and hormone and stress responsive genes show overlaps in response to phosphate deficiency. To address these questions we reanalyzed three datasets from publicly available databases related to these conditions. The first data was for Arabidopsis response to ABA, salicylic acid (SA), jasmonic acid (JA) and their combination at 3 and 24 h after treatment (GSE28600). Among these three phytohormones, ABA caused changes in of almost all SPX expressions, especially for members of SPX class (Fig. 6A). However, ABA in combination with (SA) and (JA) showed less induction. The second dataset analyzed how GmSPX genes are correlated with hormonal and stress responsive genes in response to phosphate deficiency. For this purpose, we performed correlation network analysis using TPMs calculated from 40 samples (root and leaf) of a RNA-seq dataset (PRJNA544698). Correlation network was separately constructed for root and leaf using stress and hormonal responsive genes (Fig. 6B and C). This analysis showed that in root and leaf, different SPX genes are correlated with different stress and hormone responsive genes. The third dataset was aimed to determine association of SPX genes with phytohormones in response to dehydration in different soybean organs (leaves, stems, and root). We conducted weighted co-expression network analysis (WGCNA) using microarray and phytohormone data published by Maruyama et al. (2020). WGCNA clustered 12,106 genes into 16 modules and SPXs were distributed in 8 modules (Fig. 6D). Afterwards, the module eigengenes were calculated and then used to create module-trait relationship (Fig. 6E). Among 8 modules that contained SPXs, dark grey, brown, and darkgreen modules showed significant correlation with four phytohormones; trans-zeatin (tZ), cis-zeatin (cZ), cis-zeatin riboside (cZR), cis-zeatin riboside phosphate (cZRPs), and isopenthyl adenine (iP). It has been shown that cytokinin can repress PSR genes by modulating nitrate-triggered phosphate signaling [11, 42]. Module dark magenta contained the main players of phosphate starvation responses, including GmSPX1/3/6/7, GmMFS1/4, and GmNLA3 that consistently showed negative correlations with trans-zeatin riboside (tZR), trans-zeatin riboside phosphatase (tZRPs), and iP riboside phosphate (iPRPs) forms of cytokinin. Moreover, orange, darkslateblue, and darkturquiose modules, contain GmSPX4, GmPHO1;6, GmSPX9, GmSPX10 that showed a significant correlation with ABA. Jasmonic acid (JA) showed positive correlations with modules orange, palevioletred, and brown, but salicylic acid (SA) accumulation did not show any correlation with any 16 obtained modules. All in all, results from these analyses showed that while many SPX genes are indeed regulated by phytohormones or part of the same transcriptional networks like hormone regulated genes, as expected from the presence of corresponding cis elements, there is not a clear one-to-one relationship between cis-elements in promoters and the expression of corresponding genes. Previous studies also reported only weak associations between the presence of specific cis-elements and the observed expression profile [43]. Thus, other environmental and developmental factors might modulate responses of SPX genes to phytohormones and stress [44].

Fig. 6

A Expression profile of AtSPXs in response to different phytohormones. B and C GmSPX orrelation network with hormone and stress responsive genes in response to phosphate deficiency in root and leaf, respectively. GmSPX genes are shown in pink color. D Module-trait correlation and corresponding P-values. Modules are shown in the left panel and color scale for module-trait correlation from 1 to − 1 is shown in the right panel. tZ = trans-zeatin, tZR = trans-zeatin riboside, tZRPs = trans-zeatin riboside phosphatase, cZ = cis-zeatin, cZR = cis-zeatin riboside, cZRPs = cis-zeatin riboside phosphatase, iP = isopenthyl adenine, iPR = iP riboside, iPRPs = ip riboside phosphatase, SA = salicylic acid, JA = jasmonic acid, IAAsp = indole-3-acetylaspartic acid, ABA = abscisic acid. E The distribution of SPX genes in 8 modules

Selective pressure and SPX history model in legumes

The Ks (number of synonymous substitutions per synonymous site) and Ka (number of nonsynonymous substitutions per nonsynonymous site) values of pairs of segmental duplicated SPX genes in soybean, M. truncatula and common bean were retrieved from Plant Genome Duplication Database (PGDD) (Supplemental Table S5). The Ka/Ks ratios < 1 indicate purifying selection and Ka/Ks values > 1 indicate positive selection [45, 46]. The Ka/Ks values for all pairs of segmental duplicated genes were < 0.3 implying an intense purifying selection on these gene pairs (Supplemental Table S5). In addition, the Ka/Ks ratio of duplicated gene pairs between soybean and M. truncatula, soybean and common bean, and M. truncatula and common bean were retrieved (Supplemental Table S5). The mean Ka/Ks values of 0.18, 0.16, and 0.14, respectively, suggest that the genetic pairs between species were subjected to purifying selection.

Based on the Ks values of duplication blocks retrieved from PGDD, the divergence times were estimated. In total, 36, 7, and 3 duplication blocks were retrieved for soybean, M. truncatula, and common bean, respectively (Supplemental Table S5). All duplication blocks related to MFS and RING class have Ks < 1.5, and the most recent duplication events belonged to MFS members in soybean. Evolutionary process of GmSPX genes was modeled based on Ks of duplication blocks (Fig. 7). The duplicated SPX genes in SPX, EXS, MFS, and RING were classified into 3, 2, 2, and 1 groups, respectively. GmSPX-A firstly generated three copies after the Gamma WGT event, followed by loss of one copy. The two retained copies were further doubled after Legume WGD event, and after losing one copy, the rest three copies duplicated after Glycine WGD event, resulting in genes, GmSPX8, GmSPX7, GmSPX3, GmSPX4, and GmSPX2. GmSPX3 lost its linked duplicated gene (Fig. 7). Unexpectedly, all three generated copies of GmSPX-EXS-A in Gamma WGT event were retained but their duplicated genes after Legume WGD were lost. Therefore, Glycine WGD resulted in generation of five genes (GmSPX-PHO1;10, GmSPX-PHO1;5, GmSPX-PHO1;3, GmSPX-PHO1;9, and GmSPX-PHO1;6) after a loss of one of the linked genes. However, GmSPX-EXS-B lost one copy in the first and second round of duplication events and lastly generated six genes (GmSPX-PHO1;1, GmSPX-PHO1;4, GmSPX-PHO1;14, GmSPX-PHO1;8, GmSPX-PHO1;7, GmSPX-PHO1;2). GmSPX-B and –C as well as GmSPX-MFS-B shared the same evolutionary trajectory and generated two duplicated genes in the same way after three rounds of the evolution processes. In addition, GmSPX-MFS-A and GmSPX-RING were somewhat similar as both produced two duplicated blocks, although one copy was lost in GmSPX-RING, resulting finally in three and four genes, respectively.

Fig. 7

The evolutionary history of GmSPX genes. The reserved and lost blocks in the corresponding evolution are displayed by solid and empty blocks, respectively

Functional characterization of orthologous genes in legumes

Orthologs and orthogroups among seven current angiosperms were determined with OrthoFinder. Altogether, from 218 genes, 216 genes could be classified in seven orthogroups and just two genes of rapeseed (BnaA6.PHO1;H3c and BnaA9.PHO1;H3b) were not grouped, maybe suggesting a brassica-specific function for these proteins. All members of SPX, SPX-MFS, and SPX-RING were assigned into one group; 1, 3, and 4, respectively. On the other hand, members of EXS family were divided into four distinct groups: group 2 that was dicot-specific; group 7, brassicaceae-specific; as well as groups 5 and 6 that contained genes from all species (Table 2). All genes in an orthogroup are descended from a single ancestral gene.

Table 2 Ortholog groups among soybean, common bean, Medicago, Arabidopsis, rice, wheat, and brassica

Orthologous genes across Arabidopsis and the three legume crops are presented in Table 3. Some genes showed a simple one-to-one orthology relationship, such as GmSPX6, PvSPX2, and MtSPX5 with AtSPX4; GmPHO1;3, PvPHO1;1, and MtPHO1;4 with AtPHO1;H10; and GmNLA3, PvNLA1, and MtNLA2 with AtNLA2. Others showed one-to-many and many-to-many orthology relationships. Interestingly, the pattern of AtSPXs orthology relationships were the same among three legumes, and each SPX gene has the same evolutionary trajectories. To overcome the difficulty of one-to-many and many-to-many orthology inference, expressologs of AtSPXs with soybean and Medicago were retrieved from the Expression Tree Viewer [32]. Expression Tree Viewer allows to visualize expressologs depending on both sequence similarity and expression pattern similarity. Implementing this web tool resulted in postulating expressologs between Arabidopsis and soybean and Medicago (Supplemental Table S6). Generally, the results were in very good agreement with previous results from phylogenetic tree and OrthoFinder. Based on the Expression Tree Viewer results, we could designate GmPHO1;2/7 and MtPHO1;1/2 as the functional orthologs of AtPHO1 and AtPHO1;H1 with the function of long-distance Pi transport. However, it was difficult to find expressologs for other SPXs. Consistently, the function of GmSPX1 [31] and GmSPX3 [29] were characterized with negative and positive regulatory roles in phosphate deficiency that are the same for AtSPX1/2 and AtSPX3 [6].

Table 3 Ortholog genes between legumes and Arabidopsis

Expression analysis of SPXs in Arabidopsis and soybean

SPX genes are involved in various physiological process but they are specifically known for their role in phosphate signaling and phosphate homeostasis. To get insight into the potential developmental roles and preferential tissue expression, we analyzed a raw RNA-seq dataset from different developmental stages of different soybean tissues (PRJNA238493). We profiled the GmSPXs expression across 17 different samples (Additional file 1: Fig. S13). Overall, we observed different expression patterns of GmSPXs in various developmental stages of different tissues, indicating a functional divergence in each class of GmSPXs [27, 47]. For example, GmMFS2/5 and GmPHO1;2/7 showed the same expression in almost all samples but were preferentially expressed in leaf and root, respectively. It can be concluded that they are not involved in the developmental processes. On the other hand, duplicated gene pairs arising from Glycine-specific WGD showed very similar expression patterns across all the samples, especially the GmMFS2/5 gene pair, but except GmSPX5/9 and GmPHO1;5/10 pairs. Taking together, both groups of duplicated genes with the same or different expression pattern showed the evidence of sub-functionalization during the soybean evolution [47].

In order to gain insight how individual SPX genes are regulated by Pi deficiency, we analysed publicly available RNAseq dataset (PRJNA544698) [48] and used DPGP software to cluster genes with similar response patterns. DPGP clustering revealed 6 and 4 clusters for root (Additional file 1: Fig. S14) and leaf (Additional file 1: Fig. S15), respectively. We designated names for each cluster based on their patterns; up-reg-fast (cluster 3 in root and cluster 1 in leaf), down-reg-fast (cluster 2 in root and cluster 3 in leaf), the lowest-peak-T1 (cluster 6 in root), the lowest-peak-T2 (cluster 5 in root), the highest-peak-T1 (cluster 1 in root), up-reg-slow (cluster 4 in leaf), and the highest-peak-T2 (cluster 4 in root and cluster 2 in leaf). As can be seen in the Table 4, some genes have opposite pattern of regulation in different tissues. To exemplify, GmSPX1 was placed in down-reg-fast in root and up-reg-fast in leaf, GmSPX-PHO1;10 is found in the highest-peak-T1 in root and the highest-peak-T2 in the leaf, while GmSPX6, GmSPX-NLA1, and GmSPX-NLA3 were in the lowest-peak-T2 cluster in root and the highest-peak-T2 in leaf. The homologs of AtPHO1 and AtPHO1;H1(PHO1;2/7/14) showed an up-reg-fast pattern of cluster 4 in root and the highest-pick-T2 in clusters 2 leaf. Supporting these patterns, He et al. (2013) reported similar expression pattern for these genes, however, there is no clear association between increasing mRNA level of these genes in leaves during phosphate deficiency and growth or shoot Pi content [15]. Overall, for the genes which show tissue-specific expression, we observed different patterns in root and shoot in response to phosphate deficiency.

Table 4 Different patterns of clusters in root and leaf in the time series dataset of soybean

Finally, after investigating developmental and dynamical expression patterns of GmSPX, we used another RNA-seq dataset from Arabidopsis and soybean to examine the expression of SPXs in three different zones of root [49]. The original data were generated in multiple species, however, we only used RPKM values from Arabidopsis and soybean. A general comparison showed that almost all SPX tended to group species-based rather than orthology-based, except AtPHO1 which clustered with their orthologs, GmPHO1;2 and GmPHO1;7 (Additional file 1: Fig. S16). Thus, we can conclude that the tissue-specific genes pose difficulty to identify functional orthologs because of probable tissue inequivalences among species.

GmSPX gene expression in response to nitrogen, phosphorus, and combined deficiency

Cross-talk between P and N signaling has been reported in previous studies [11, 50,51,52,53], and some molecular mechanisms involved in N and P cross-talk have been identified, such as the nitrate-NRT1.1B-SPX4 cascade in rice [11] and NIGT1-SPX-PHR cascade in Arabidopsis [53]. Clearly, SPX gene family seems to have a vital role in this cross-talk. To assess whether GmSPX genes are regulated during N and P deficiency and test the validity of our in silico analyses to identify functional orthologs in soybean, qRT-PCR was performed for several GmSPX genes. GmSPXs showed different regulation in leaf and root in response to N, P, and NP deficiencies (Fig. 8). Except GmSPX6, transcript levels of other members of SPX class significantly increased in response to P and, vice versa, decreased in N deficiency. This antagonistic behavior was reported previously in maize [50, 54]. The transcript levels of GmSPX6 were not much affected by the treatment which is consistent with post-translational regulation of it rice orthologue OsSPX4 [11]. GmPHO1;2 and 1;7 are involved in long-distance P transportation, however, whereas GmPHO1;2 was upregulated in both N and P deficiency in root but not in leaf, the GmPHO1;7 was not significantly affected. GmNLA3 was only induced by the combined deficiency (NP) and GmMFS1 did not show significantly different expression in root but was induced in N and NP deficiency in leaf. Thus, the expression pattern of many of the tested GmSPX genes is consistent with their involvement in P and/or N homeostasis.

Fig. 8

Relative expression of GmSPX genes in response to nitrogen (N), phosphorus (P), and the combined deficiency (NP). Expression levels were determined by qRT-PCR using 3 biological and 2 technical replicates. The data are shown as means ± SD, the values in controls were set to 1. Yellow and blue bars are representing of root and leaf, respectively. Asterisks mark values different from controls (P < 0.05; T-test)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.


This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (