A high-quality genome assembly

We selected a wild Ae plant for sequencing. The estimated genome size of this sample is of ~ 689 Mb based on a k-mer analysis (Supplementary Fig. 1) and ~ 750 Mb by using flow cytometry analysis. The level of genomic heterozygosity is 0.98% (Supplementary Fig. 1 and Supplementary Table 1). We generated 60.13 Gb PacBio single-molecule long reads (N50 = 14.7 kb), 79.54 Gb Illumina paired-end short reads (with library insert sizes of 350 Kb), and 105.30 Gb 10× Genomics linked reads (Supplementary Fig. 2 and Supplementary Table 2). We firstly performed a PacBio-only assembly with the additional step of phasing contigs. The resulting assembly was polished with both PacBio long reads and Illumina short reads. We next assembled contigs into scaffolds based on linked reads. The resulted contig sequences were 655 Mb with N50 of 2.00 Mb, and scaffold sequences are 657 Mb with N50 of 5.07 Mb (Supplementary Table 3).

The assembly was further improved using high-throughput chromatin conformation capture (Hi-C) map, resulted into a final reference scaffold assembly comprised 29 unambiguous chromosome-scale pseudomolecules covering 95.85% (~ 629 Mb) of the genome assembly (hereafter referred to as chromosomes), in which the minimal length of chromosome was greater than 14.9 Mb (Fig. 1 a and Supplementary Fig. 3, Supplementary Table 4). The accuracy and completeness of the assembly were assessed by aligning Illumina short reads back to it, resulting into a mapping rate of 97.8%, with 97.1% of the assembly covered by at least three reads. Furthermore, > 97.7% of the de novo assembled transcripts could be mapped back to the assembly; and 236 out of 248 core eukaryotic genes mapping approach (CEGMA) genes are complete in the assembly. We also investigated benchmarking universal single-copy orthologs (BUSCO) in the assembly, as a result 93% of the BUSCOs were completely presented, and 1.5% were fragmentedly presented (Supplementary Table 5). All these collectively suggested that the quality of our genome assembly is high, comparable to the reported Ac v3 and Ae cultivar genomes (Tang et al. 2019; Wu et al. 2019), and also better than the other older versions of the Ac genomes (Supplementary Table 3).

Fig. 1
figure 1

A high-quality genome assembly of Actinidia eriantha (Ae). a Circular representation of the chromosome-scale pseudomolecules. b The percentage of transposable-element coverage in sliding windows of 0.5 Mb. c Gene density in sliding windows of 0.5 Mb. d Length distribution of the presence/absence-variation (PAV) genes between Ae and Actinidia chinensis (Ac). e Genes evolved in relation to the respective Ad-α (light green) and Ad-β (light blue) whole-genome duplication (WGD) events. f The syntenic regions (only >N50 size shown here) involved into the Ad-α (light green) and Ad-β (light blue) WGDs, respectively

Genomic content and recent burst of LTR retrotransposons

Both homology-based and de novo approaches were used to investigate the repetitive DNA elements in the Ae genome. A total of 41.3% (271 Mb) of the Ae genome assembly was identified as repetitive sequences (Fig. 1 b, Supplementary Table 6), similar to those reported in the Ac (~ 36.0–43.4%) and Ae cultivar (43.3%) genomes. Long-term repeat (LTR) retrotransposons are predominant repetitive elements (196 Mb, 29.9% of the assembly), followed by about 40 Mb (6.1%) of DNA transposons, whereas the remainder was either assigned to other transposable elements (TEs) or tandem repeat families or could not be assigned (Supplementary Table 7). The composition of the different classes of repetitive DNA in Ae was also similar to that in the Ac v3 genome. We further identified 4005 and 3839 intact Gypsy and Copia retrotransposons, which belong to ten and seven Gypsy and Copia families, respectively (Supplementary Table 8). On the basis of these intact LTRs, the estimated bursts of both Gypsy and Copia retrotransposons occurred very recently (within one Mya; Supplementary Fig. 4).

We predicted a total of 41,521 high-confidence protein-coding genes from the Ae genome using an integrated strategy combining ab initio, transcript-based and homology-based predictions, with an average coding-sequence length of 1.1 kb and an average of 5.0 exons per gene (Fig. 1 c), which were close to those 40,464 genes annotated in the Ac v3.0 genome and 42,988 genes predicted in the Ae cultivar genome. Of these predicated genes, 78.3% (32,509) were expressed in vegetative and reproductive tissues and 99.4% (41,270) had substantial homology with known proteins or functional domains. Moreover, the gene elements in Ae, including lengths of mRNAs, distribution of CDS, and exons and introns, are comparable to those of Ac and five other representative plants, including tea (Camellia sinensis) in the Ericales, the sunflower (Helianthus annuus) as representative of asterids II, coffee (Coffea canephora) as a representative of asterids I, and grape (Vitis vinifera) (Supplementary Fig. 5). Two thousand eight hundred forty-five genes were identified as transcription factor (TF) genes, including 225 bHLH and 181 MYB genes (Supplementary Table 9). We also annotated noncoding RNA (ncRNA) genes, yielding 662 transfer RNA (tRNA) genes, 253 ribosomal RNA (rRNA) genes, 1411 small nuclear RNAs (snRNAs) and 1820 microRNA genes (miRNAs).

Structural variations between ae and ac genomes

We aligned the Ae chromosomes to the Ac v3 genome and the Ae cultivar genome, respectively. Approximately 60.0% of the Ae genome have one-to-one syntenic blocks with 60.3% of the Ac genome sequence, while 83.3% of the Ae genome can be mapped to the 79.4% Ae cultivar genome in one-to-one syntenic pattern (Supplementary Fig. 6 and Supplementary Table 10). The nonsyntenic sequences were mostly DNA repeats, including transposable elements and dispersed genes. For the aligned one-to-one syntenic blocks between Ae and Ac genomes, we identified 15,628,085 single nucleotide polymorphisms (SNPs) and 3,766,293 small insertions and deletion polymorphisms (indels), with an average of 24 SNPs and six indels per kilobase (Supplementary Table 10). For the longer syntenic blocks between both Ae genomes, 8,181,896 SNPs and 6,617,383 indels were found, with an average of 12 SNPs and 10 indels per kilobase (Supplementary Table 10). Comparative analysis further revealed 23,409 Ae and 29,947 Ac genes with corresponding orthologous genes or gene fragments in their syntenic blocks, of which 13,942 Ae and Ac genes had no amino acid changes. Moreover, 31,499 genes in our Ae genome are orthologous ones or gene fragments of 24,361 genes in the Ae cultivar genome, of which 17,273 genes are conserved in their amino acid sequences.

By further comparison between the Ae and Ac v3 genomes, we identified 36,697 Ae specific genomic segments (~ 27.7 Mb) and 64,815 Ac specific genomic segments (~ 53.2 Mb) with length > 500 bp, which represented the presence/absence-variation (PAV) between them. Of which, 146 (in total 1,112,455 bp) and 401 (3,457,203 bp) PAV sequences in the respective Ae and Ac genomes were longer than 5 kb and that they were unevenly distributed across both genomes with some clusters (Fig. 1 d and Supplementary Fig. 7). We also identified 282 and 427 Ae- and Ac-specific PAV genes, respectively. Similarly, 10,854 Ae (~ 9.4 Mb) and 24,165 Ae cultivar (~ 19.8 Mb) specific genomic segments (> 500 bp) were identified, of which 77 (566,023 bp) and 165 (1,257,547 bp) PAV sequences (> 5 kb) in the respective Ae and Ae cultivar genomes were found, including 170 and 187 Ae and Ae cultivar specific genes. A total of 28,863 orthogroups were further identified among the three genomes, including 18,454 ones commonly presented in them, while 695 (1756 genes), 397 (873 genes) and 829 (2053 genes) specifically occurred in the Ac, Ae and Ae cultivar genomes, respectively.

Functional annotation demonstrated that some of the Ae-specific PAV genes were related to specific functions. For example, three PAV genes scaf_25.6, scaf_75.500 and scaf_75.501 in our Ae genome are significantly enriched in Gene ontology (GO) category of defense response to other organism (GO:0098542), and seven genes (scaf_125.331, scaf_28.166, scaf_49.693, scaf_78.468, scaf_84.195, scaf_86.149, scaf_86.150) are related to cell wall organization or biogenesis (GO:0071554) (P < 0.05). Twelve orthologous genes specific to the Ae genome were also found to be enriched in GO term of response to biotic stimulus (GO:0009607) (Supplementary Table 11). To trace the possible origin of these PAV genes, we aligned genome-wide resequencing reads of 10 kiwifruit backbone taxa on both the Ae and Ac genomes. These backbone taxa were previously identified by a phylogenomic analysis and they were thought to represent the core diversity present in the Actinidia genus (Liu et al. 2017). The majority of hits (~ 90.7% PAV genes) corresponded to homologs presented in at least one of the relatives of Ae or Ac (Supplementary Fig. 8), suggesting that the retention of ancestral polymorphism and/or extensive introgressions or hybridization occurred across diversified kiwifruit taxa in the wild.

Genome evolution as an early diverging asterid lineage

We compared our Ae and the Ac v3 genomes with three other representative plants tea, sunflower and coffee in the asterid order, and also grape as an outgroup. Based on proteomes of these plants, we identified 23,204 orthologous gene families consisting of 167,770 genes (Supplementary Fig. 9). For clarity, we compare the gene families of the five asterid plants, of which a core set of 83,978 genes belong to 8094 gene families shared among these asterid species, whereas 919 genes from 497 gene families are unique to Ae (Fig. 2 a). Both Ae and Ac share the most gene families with tea, consistent with their close relationship within the Ericales (Fig. 2 b). With further analysis of gene family evolution, we found that 2671 gene families of Ae underwent expansion and 1106 gene families underwent contraction, of which 49 expanded (137 genes) and 94 contracted gene families (576 genes) were rapidly evolving (Fig. 2 b). Functional annotation demonstrated that the rapidly expanding gene families were enriched in gene ontology (GO) categories such as defense response to biotic stimulus (GO: 0009607) or response to freezing (GO: 0050826), suggesting the possible roles of these genes for the adaptation of Ae to harsh environments. We generated and dated a phylogenetic tree based on 1366 single-copy orthologous genes. The estimated divergence time of Ae and Ac was ~ 11 Mya, consistent with our recent estimation using resequencing data of multiple kiwifruit taxa (Liu et al. 2017). Moreover, the most recent common ancestor (MRCA) of Ae and Ac diverged from tea at about 81 Mya (Fig. 2b).

Fig. 2
figure 2

Evolution and comparative genomic analysis of Actinidia eriantha (Ae). a Number of gene families shared between Ae, A. chinensis (Ac) and three other asterid plants. b Phylogenomic analysis of Ae and other representative asterid plants. The potential whole genome duplication (WGD) events and expansion and contraction of gene families among these species are shown on the tree. The divergence time was estimated for each node. c Distribution of synonymous substitution rates (Ks) for pairs of syntenic paralogs in Ae and three other genomes. d A typical collinearity pattern between grape, tea and Ae genomes. Rectangles represent predicted gene models with color showing relative orientations (blue, same strand; green, opposite strand). Gray wedges connect matching gene pairs with one highlighted in red

Based on the identified orthologous genes among Ae, Ac, tea and grape, as well as paralogous genes within each genome, we found that as expected, the established palaeohistory of Ae was consistent to those reported in Ac, in which three WGD events were identified, including the two kiwifruit-specific events (Ad-α and Ad-β) and the common core eudicot γ event (Fig. 2 b, c). Moreover, the Ad-β occurred before kiwifruit diverged from tea (Fig. 2c). The typical synteny pattern further reflected a 1:2:4 relationship between genomic regions from grape compared with both the tea and Ae genomes (Fig. 2d). We carried out a detailed characterization of the retention of duplicated genes during both kiwifruit-specific WGD events on the basis of pairwise synonymous substitution rates (Ks values) of paralogs. The Ks value in 0–0.335 corresponded to the Ad-α event and in 0.335–1 corresponded to the Ad-β event (Supplementary Fig. 10). We found that about 22,932 and 9415 genes are present in Ad-α and Ad-β events respectively (Fig. 1e, f), and they are significantly functionally enriched in 37 and 19 GO terms (corrected p-value < 0.05, Supplementary Table 12).

Genes involved in ascorbic acid biosynthesis

Kiwifruit is a rich dietary source of vitamin C and the content of vitamin C in Ae is three to four times than that of Ac (Huang 2014). We investigated and compared genes involved in the ascorbic acid biosynthesis and regeneration pathway (Bulley et al. 2009) in both kiwifruit taxa. Although expansion in genes of both plants from the L-galactose, L-glucose and Glucuronate/myo-Inositol biosynthesis pathways were not significantly different, we did find that gene families, including PGT (Polygalacturonate 4-α-galacturonosyltransferase), PME (Pectin methylesterase), PG (endopolygalacturonase), and GalUR (Galacturonic acid reductase) involved in the D-galacturonate pathway first described in strawberry (Agius et al. 2003; Rigano et al. 2018) were significantly expanded in Ae (Fig. 3 a; Supplementary Table 13). Most of these expanded genes were expressed, in particular for being expressed higher in fruits of Ae (Fig. 3 a), and some of them were randomly selected and further validated using quantitative real-time polymerase chain reaction analysis (Supplementary Fig. 11 and Supplementary Table 14). In the Ae genome, we further identified a significant increase of AMR1 (ascorbic acid mannose pathway regulator 1) copies which negatively regulates the L-galactose biosynthetic pathway in Arabidopsis (Zhang et al. 2009), and a decreased number of ERF98 (ethylene response factor subfamily b-3 of ERF/AP2 transcription factor family), a regulation gene contributing to ascorbic acid biosynthesis in Arabidopsis (Zhang et al. 2012). The high vitamin C accumulation in Ae fruit is thus possibly largely reinforced by the D-galacturonate pathway similar to that reported in strawberry and tomato (Rigano et al. 2018), although the contribution from the L-galactose pathway could be also equally important.

Fig. 3
figure 3

Evolution and expressions of genes in relation to ascorbic acid biosynthesis and disease resistance. a Evolutionary expansions of genes involved in the D-galacturonate pathway of ascorbic acid biosynthesis. Expressions of these genes in five different tissues (from left to right: fruit, flower, leaf, stem and root) were revealed. b Differential expressions of the common R genes between Ae and Ac. A previously available RNA-seq data was used and two sampling repeats in each of the three stages of Psa infection are shown

Evolution and differential expression of disease-resistance genes of ae

We investigated those disease-resistance (R) genes in relation to both the pathogen-associated molecular pattern-triggered immunity (PTI) and the effector-triggered immunity (ETI) in the Ae genome. We identified 224 putative pattern-recognition receptor genes, which encode receptor-like kinases with a leucine-rich repeat domain (RLK-LRR, RLK for short) potentially contributed to PTI immunity. This number is somewhat less than the 263 RLK genes identified in Ac. We further found 95 nucleotide-binding site-LRR (NBS-LRR, NBS for short) genes which are possibly involved in the ETI immunity of kiwifruit. The number of NBS genes was consistently less than the 139 NBS genes found in the Ac genome. Many NBS genes found in both Ae and Ac genomes are truncated and the distribution of genes across different classes are distinct between two genomes (Supplementary Table 15).

To examine the expressions of 75 R gene commonly belonging to Ae and Ac, we used the available RNA-seq data that are derived from three stages of Psa infection on leaf tissues of both Ae and Ac, including the day 0 (without inoculation), day 2 and day 14 post inoculation (DPI) (Wang et al. 2017). We found that most of these R gene expressions were different between Ae and Ac (Fig. 3 b). In particular, we identified five genes that were especially expressed in Ae and six genes that were uniquely expressed in Ac, respectively (Supplementary Table 16). These genes are potentially important resistance genes in plants (Meyers et al. 2005) and possibly in relation to the distinct resistance/susceptibility between Ae and Ac against biotic and abiotic stress such the Psa invasions. We take a RPS2-like gene (scaf_105.262) especially expressed in Ae as an example. In Arabidopsis, RPS2 controls specific recognition of P. syringae strains expressing the avirulence gene avrRpt2 (Kunkel et al. 1993). A similar ETI-layer mechanism was therefore possible to confer resistance of Ae on Psa.

The genetic relationship between ae and other kiwifruit taxa

Diverse kiwifruit taxa can be distinguished by their fruit skin types, such as with soft and hairless skins (SHS), rough and warty skins (RWS) or rough and hairy skins (RHS). Ae is a member of the RHS group, and it has particularly milky white or sometimes pale brown fruit hairs, which are clearly distinct from those of many other kiwifruit taxa (Huang 2014). To assess genetic relationship between Ae and other kiwifruit taxa, we performed transcriptome sequencing of 21 kiwifruit samples belonging to 15 representative Actinidia taxa (Supplementary Table 17). We generated ~ 150 Gb RNA-seq reads with an average of 50 million reads per sample. Mapping of these RNA-seq reads to the Ae reference genome identified 3,414,917 SNPs and 328,013 small indels.

We examined genetic admixture among these samples using both Neighbor-joining (NJ) tree and STRUCTURE analyses based on transcriptomic data-derived SNPs. Generally, samples were clustered according to the three defined fruit skin types (Supplementary Table 17), with a further subdivision of the SHS group into two subgroups (Fig. 4a). However, both RWS taxa A. cylindrica (CYL) and A. callosa var. henryi (HEN) were clustered with those of RHS samples despite their rough and warty fruit skins (Fig. 4a). Conversely, the RHS taxon A. chinensis var. deliciosa (DEL) grouped with RWS samples. With the best estimate of K value of 4 for the STRUCTURE analysis (Supplementary Fig. 12), we found a similar relationship between kiwifruit samples investigated (Fig. 4b). Notably, we can identify widespread genomic admixture of many kiwifruit taxa which presented phylogenetic-phenotypical discordance, including CYL, HEN and DEL mentioned above. For DEL and ERI (Ae), both of which have fruit hairs, their genomic compositions are distinctly different. The genomic contents of DEL, a variety of A. chinensis (CHI), are mainly similar to those of CHI, despite possibly partial introgression from those of Ae. Comparatively, the observed genomic components of Ae possessed introgressive contents from those of the SHS group (Fig. 4b).

Fig. 4
figure 4

Evolutionary relationship of Actinidia taxa with diverse fruit skin types. a A neighbor-joining tree showing the genetic relationship of species investigated. Two subgroups I and II of SHS are identified. b Results of STRUCTURE analyses revealing widely genomic admixture of many Actinidia taxa. See Supplementary Table 17 for the full name of the abbreviation of each taxon. RHS: rough and hairy skins; RWS: rough and warty skins; SHS: soft and hairless skins

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Disclaimer:

This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (https://www.biomedcentral.com/)