Leaf anthocyanin leavels
Leaf anthocyanin content was 0.52 mg/g in RedR3 and 0.68 in PurpleR2, higher than that in the control (Shepody) (Fig. 1A, B).
We detected 758 metabolites (Table S1), normalized their levels, and generated a heatmap (Fig. 2A). The clustering in the heatmap reveals significant differences in flavonoids between the varieties, with four main clusters. The metabolites in clusters 1 and 4 were most abundant in RedR3, those in cluster 2 were most abundant in PurpleR2, and those in cluster 3 were most abundant in Shepody and relatively scarce in the colored varieties. For each sample, the three biological replicates clustered together, indicating that the biological replicates had good homogeneity and provided reliable data. Differences in flavonoid metabolite content were closely related to leaf color. Relative to those detected in Shepody, 346 and 362 metabolites were detected in RedR3 and PurpleR2, respectively. More than 130 flavonoid metabolites, including apigenin, chrysin, hesperetin, naringenin, luteolin, and their glycosides, were detected (Fig. 2B). Of the anthocyanins, 13 were detected in RedR3, with the contents of cyanidin, delphinidin, pelargonin, and their corresponding glycosides being significantly increased; 17 were detected in PurpleR2, with the contents of cyanidin, malvidin, peonidin, petunidin, and their corresponding glycosides being significantly increased. The top 20 most significantly differentially expressed metabolites (based on |Log2 FC| ≥ 1 and variable importance in projection [VIP] > 1) are shown in Fig. 2. Selgin 5-O-hexoside content was significantly increased in the colored varieties. Among the anthocyanin metabolites, the contents of malvidin 3-O-galactoside, petunidin 3-O-glucoside, and malvidin 3-O-glucoside (oenin) were significantly decreased in RedR3 (Fig. 2C); in PurpleR2, the contents of pelargonidin 3-O-beta-D-glucoside (callistephin chloride) and cyanidin 3-O-galactoside were significantly decreased, whereas those of peonidin 3-sophoroside-5-glucoside, cyanidin 3-O-glucoside (kuromanin), and petunidin 3, 5-diglucoside were significantly increased (Fig. 2D).
Full-length transcriptome sequencing
To explore the molecular basis of flavonoid synthesis in the colored variety leaves, we analyzed the leaf transcriptome via RNA-seq to identify differentially expressed genes (DEGs), and conducted nanopore transcriptome sequencing (RNA sequence integrity results shown in Fig. 3A). Leaves from the three varieties were subjected to full-length transcriptome sequencing, each generating 7.94 Gb of clean data. We combined the full-length transcriptome sequencing data for the samples and removed redundancy after comparison with the reference genome, obtaining 43,575 full-length potato transcript sequences. Shepody and RedR3 had similar gene expression patterns (Fig. 3B).
Pairwise comparison of samples (Fig. 3C–E, Table S2) revealed that the DEGs were distributed on all chromosomes, with many occurring on chromosome 1. Relative to those in Shepody (the control), PurpleR2 had 6145 significantly differentially expressed transcripts (2949 upregulated and 3196 downregulated), and RedR3 had 5789 significantly differentially expressed transcripts (2819 upregulated and 2970 downregulated). Relative to those in RedR3, PurpleR2 had 4947 significantly differentially expressed transcripts (2694 upregulated and 2253 downregulated). The number of differentially expressed genes was found to be similar between the colored varieties compared to the control cultivars, revealing differences in gene expression between the different varieties.
The limitations of second-generation high-throughput sequencing technology prevented us from obtaining sufficiently accurate reference genome annotations. Therefore, to optimize the original genome annotations, we used nanopore full-length transcriptome sequencing, which can accurately identify transcript structures. This revealed 3543 additional gene loci (chromosomal distribution shown in Fig. 3F, G) and optimized 7321 sites (Table S3).
From the full-length transcription sequencing data, we identified 1072 long noncoding RNA (lncRNA) transcripts (Table S4). Based on the reference genome annotation information for the genes on which these lncRNAs are located, they can be divided into four categories: large intergenic noncoding RNA (lincRNA), anti-sense lncRNA, intronic lncRNA, and sense lncRNA. Sense lncRNA includes gene promoter–related lncRNA and UTR-region lncRNA. Transcripts of lincRNA, sense lncRNA, anti-sense lncRNA, and intronic lncRNA were present in proportions of 60.4, 24.2, 14.2, and 1.2%, respectively (chromosomal lncRNA distribution shown in Fig. 3H–K). Gene annotation revealed that these lncRNAs regulate PAL, F3H, and CHS expression in the potato anthocyanin synthesis pathway (Figs. 3L, 4C). PAL was the target gene of lncRNA1a (PONTK.13936.1), lncRNA1b (PONTK.13936.3), lncRNA2 (PONTK.13937.1), lncRNA3 (PONTK.13930.2), lncRNA4 (PONTK.13938.1); F3H was the target of lncRNA6 (PONTK.3920.2) Gene; CHS was the target gene of lncRNA5a (PONTK.2668.13), lncRNA5b (PONTK.2668.15). LncRNA1a and lncRNA1b belong to anti-sense lncRNA; lncRNA2, lncRNA3, lncRNA4, lncRNA6 belong to lincRNA; lncRNA5a and lncRNA5b belong to sense lncRNA.
Differential gene expression
The full-length transcriptome sequencing results were analyzed using Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment. Gene expression was highly correlated with leaf color for Shepody and PurpleR2 (Pearson correlation coefficient, 0.441) but not for Shepody and RedR3 (Pearson correlation coefficient, 0.235) (Fig. 4A). These findings indicate that PurpleR2 and Shepody have more DEGs than RedR3 and Shepody. In summary, the number of anthocyanin synthesis–related DEGs was positively correlated with changes in leaf color from light to dark.
We then compared the transcript expression of the varieties in pairs (Fig. 4B). In total, 114 transcripts were differentially expressed among the varieties. These differentially expressed transcripts have important functions in regulating potato anthocyanin biosynthesis and color. Based on KEGG enrichment analysis of the significantly differentially expressed transcripts from the RedR3 and PurpleR2 leaves, many of the DEGs were enriched in the flavonoid biosynthetic pathway (KEGG pathway ko00941) (Fig. 4D). This indicates that differential gene expression in this pathway is an important driver of potato leaf color. Figure 4C shows the expression of significant DEGs related to potato anthocyanin biosynthesis and color differences; these include three forms of DFRa (PGSC0003DMT400009287, PONTK.3988.2, and PONTK.3988.12) and four of DFRb (PONTK.3988.3, PONTK.3988.7, PONTK.3988.8, and PONTK.3988.11). Relative to that in Shepody, DFR transcript expression was significantly upregulated in RedR3 and PurpleR2.
The transcript expression of the three transcriptome sequencing materials was compared in pairs (Fig. 4B). It can be seen that 114 transcripts were differentially expressed in the three potato varieties, and these transcripts have important functions in regulating potato anthocyanin biosynthesis and color changes. In order to study further, KEGG enrichment analysis was performed on the significantly differentially expressed transcripts in leaves of RedR3 and PurpleR2 (Fig. 4D). The results showed that a large number of DEG were enriched in the flavonoid biosynthetic pathway (ko00941). This indicates that the differential expression of genes in the flavonoid biosynthetic pathway is an important reason for the different colors of potato leaves. Based on the above results, the expression of significant DEG related to potato anthocyanin biosynthesis and color changes in potato leaves is shown in Fig. 4C. For the DFR, PGSC0003DMT400009287, PONTK.3988.2, and PONTK.3988.12 belong to the DFRa type; PONTK.3988.3, PONTK.3988.7, PONTK.3988.8, and PONTK.3988.11 belong to the DFRb type. The expression levels of DFR transcripts were lower in Shepody, but the expression of DFR was significantly up-regulated in RedR3 and PurpleR2.
Combined transcriptome and metabolomic analysis
Figure 5A lists some of the anthocyanin-related metabolites after data quality screening. Compared with those in Shepody, naringenin chalcone and aromadendrin contents were significantly increased in the colored varieties, with cyanidin and delphinidin contents increasing more significantly in PurpleR2; petunidin 3-O-glucoside and malvidin 3-O-glucoside contents were significantly increased in PurpleR2 but significantly decreased in RedR3. In the phenylpropanoid synthesis pathway, coumaric acid is catalyzed by a series of enzymes to generate both lignin and anthocyanins (Fig. 5B). However, in the colored varieties, the expressions of C3H, CCR, and other enzymes in the lignin synthesis pathway were downregulated (Fig. 5C), as was caffeic acid expression, thereby limiting the production of the lignin precursors coumarin, coniferyl alcohol, and sinapal. In contrast, in the colored varieties, the expressions of genes related to the production of CHS, CHI, DFR, ANS, and other enzymes in the anthocyanin synthesis pathway were upregulated, and their anthocyanin content was higher. These findings indicate that gene upregulation in the flavonoid metabolic pathway has a key role in promoting anthocyanin accumulation and in producing color differences.
Relative to that in Shepody, RedR3 contained more cyanidin and pelargonidin 3-O-glucoside, and PurpleR2 contained more cyanidin, delphinidin, petunidin 3-O-glucoside, and malvidin 3-O-glucoside. Delphinidin, which accumulates in the form of glycosides, is the key reason for the red/purple color difference. This indicates that anthocyanin biosynthesis regulation occurs mostly downstream of anthocyanin synthesis during, for instance, flavonoid biosynthesis (ko00941).
Transcriptomic data verification via quantitative reverse-transcription polymerase chain reaction (qRT-PCR)
We used qRT-PCR to verify the transcriptomic regulation of anthocyanin synthesis revealed via full-length transcriptome sequencing. For the six selected lncRNAs and key functional gene transcripts, PAL, lncRNA1a, lncRNA5a, lncRNA6, F3′5′H, and ANS, the qRT-PCR and RNA-seq results were consistent (Figs. 4C, 6A). RedR3 and PurpleR2 had opposite expression patterns for PAL and lncRNA1a. LncRNqA may negatively regulate PAL expression in colored varieties, F3′5′H gene expression was significantly upregulated (by 5.59-fold) only in RedR3.
The analysis results (Supplementary Fig. S6) for BGLU11-like fusion transcript expression in RedR3 were consistent with those of the transcriptome RNA-seq analysis (Fig. 6B). The F3′5′H fusion transcript was expressed in PurpleR2, was absent from Shepody (Fig. 6B), and was expressed at extremely low levels in RedR3 (Fig. 6B). Based on the gray value of the target band, F3′5′H fusion transcript expression was 8.57 times greater in PurpleR2 than that in RedR3. This indicates that F3′5′H plays a key role in anthocyanin synthesis and accumulation in the colored varieties but more so in PurpleR2 than in RedR3.
To verify DFR alternative splicing using primers on both sides of the DFR transcript intron-insertion site. We refer to the original annotated transcript without alternative splicing as DFRa; the alternatively-spliced transcript (hereafter DFRb) retains a 105 bp intron sequence between exons 3 and 4 (Fig. 6C). qRT-PCR revealed that intron retention in DFRb caused its expression to differ from that of DFRa. In RedR3, DFRa expression was 1.67 times greater than that of DFRb, and the intron-preserving alternative splicing was less likely to occur. In PurpleR2, DFRa expression was almost undetectable, with DFRb being predominant. These qRT-PCR results validate the DFR alternative splicing revealed by the full-length transcriptome sequencing results.
Anthocyanin 1 (AN1) cloning and overexpression
Based on GO annotation, 23 DEGs were found to be associated with DNA binding (GO:0003677). One of these, PGSC0003DMG400013965, on chromosome 10, is the R2R3-MYB transcription factor AN1, whose expression was significantly upregulated in the colored varieties. Software prediction revealed that in the anthocyanin synthesis pathway, the MYB regulatory element or binding site is present in the 2000 bp CDS upstream of PAL, C3H, 4CL, CHS, CHI, F3H, DFR, and ANS. Searching the Potato Genome Sequencing Consortium (PGSC) database (http://solanaceae.plantbiology.msu.edu/pgsc_download.shtml) revealed two existing annotated transcripts of this gene, PGSC0003DMT400036281 and PGSC0003DMT400036283. Using our nanopore full-length transcriptome sequencing results for sequence alignment, we identified an AN1 transcript (hereafter StAN1n). Transcript PGSC0003DMT400036281 contains exons a and c, and PGSC0003DMT400036283 contains exons a and b. StAN1n contains all three exons, a, b, and c. Relative to the known AN1 transcript sequence, we observed alternative splicing of the 5′ end of the exon a of StAN1n (Fig. S5); this also affected its CDS. We therefore subsequently cloned this transcript for further analysis.
We then used qRT-PCR of the coding sequences corresponding to the StAN1n transcript in RedR3 and PurpleR2 to verify these results. Transgenic tobacco overexpressing StAN1n from the colored varieties (OEStAN1) was obtained via Agrobacterium transformation (Fig. 7A, B). After Agrobacterium transformation, the tobacco leaf callus color changed to purple. After strict selfing, the T2 transgenic tobacco StAN1n-positive rate was 81% (Supplementary Fig. S6). Using StAN1n-positive plants (Fig. 7C), we determined the anthocyanin content of plants with high StAN1n expression. Wild-type tobacco has white flowers, and green leaves and pods. OEstAN1 plants had purple leaves, flowers, and pods. These findings indicate that StAN1n plays an important role in regulating plant color.
We evaluated anthocyanin content in the WT and OEStAN1 tobacco leaves: it was lower in WT green leaves than in OEStAN1 leaves (Fig. 7D). This reveals that StAN1n overexpression promotes anthocyanin synthesis and accumulation in OEStAN1 transgenic tobacco, causing it to turn purple.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.