The first use case focuses on a methylation-sensitive RNA restriction enzyme (MazF) assay to identify m6A modified residues in mRNA of HEK293 cells [9, 10]. The second use case complements the first by demonstrating a genetic approach to map m6A sites in HEK293 cells using DART-seq [11]. While all of the aforementioned use cases utilized Illumina sequencing data, the third use case analyzes Nanopore direct mRNA sequencing data from HEK293 wild-type and knockout cells (METTL3 -/-) [12].

Use case 1: m6A mapping by MazF treatment and sequencing

Lately, we have seen a whole wave of antibody-free methods to map m6A RNA modifications transcriptome-wide based on high-throughput sequencing data. A need for antibody-free complementary approaches has been clearly identified by a rigorous assessment of published MeRIP data sets. McIntyre et al. show that m6A peak overlap varies from ≈30 to 60% between studies, even in the same cell type [13]. An antibody-independent solution to map m6A sites is provided by an alternative biochemical approach, namely m6A methylation-sensitive enzymatic RNA digestions [9, 10]. We have tested the substrate specificity of the MazF enzyme on custom RNA oligos with a centrally modified A position. Figure 1A gives an overview on the taken approach to assess MazF substrate specificity. Briefly, the central A/m6A position is flanked by 7 random nucleotides on each site and a constant prefix and suffix. Total sequence length per oligo is 34bp. These RNA oligos were subjected to enzymatic digestion. As previously reported, MazF cuts 5’ to an ACA motif. Cutting efficiency is reduced by the presence of an m6A modification (A*CA motif, asterisk denotes modification). Importantly, the enzymatic reaction does not differ for cutting 5’ of ACA or ACA*, but only differs in the case of A*CA. We have reanalyzed the 6 Illumina libraries from Zhang et al. [9] using the new mode rt-arrest. Briefly, 3 replicates were treated with an Alpha-ketoglutarate-dependent dioxygenase (FTO), which converts m6A to A on mRNA [14], and 3 replicates were mock-treated. MazF cleavage efficiency (or arrest rate) does increase in the absence of m6A. Figure 1B shows the observed arrest rates in the data stratified by the presence or absence of an ACA motif. Arrest rates are significantly elevated in FTO-treated samples as compared to mock-treated samples. This characteristic read signature is used to predict m6A modified sites in mRNA in a head-to-head comparison using JACUSA2 (FTO vs mock treated samples). Figure 1C shows the number of predicted m6A gene targets (≥1 predicted site) and their respective overlap with consensus miCLIP gene targets, which are shared 3 independent experiments [1517] depending on the pvalue from the beta-binomial test in JACUSA2 rt-arrest. Our R package JACUSA2helper provides a vignette for MazF assay data analysis. Finally, a comparison to the MAZTER-mine workflow [10] can be found in the Additional file 1 on page 5. Briefly, JACUSA2 outperforms MAZTER-mine in terms of running time by orders of magnitude.

Fig. 1
figure 1

Summary of JACUSA2 results on MazF restriction enzyme assay and DARTseq assay in HEK293 cells. A Synthetic RNA Oligo digestion and sequencing. Position 8 hosts the central A/m6A. Position 1-7 and 9-16 are random. We have sequenced 2 libraries, which differ in position 8 (A/m6A). The MazF recognition motif becomes apparent when we consider cleaved fragments (3 ′ 8bp fragments). B Arrest rate analysis across replicates in published data from HEK-293 cells [9]. Results are shown for a set of 2792 well covered sites with robust and significant read arrest changes. One-sided Wilcoxon signed-rank test: Maz-FTO-ACA vs. MAZ-mock-ACA; one-sided Wilcoxon rank sum test pvalue <10−16. C Comparison of JACUSA2 rt-arrest predictions for [9] to a high-confident set of miCLIP sites (intersection of 3 miCLIP experiments [1517]) on gene set level. D Comparison of JACUSA2 call-2 predictions for [11] to a high-confident set of miCLIP sites (intersection of 3 miCLIP experiments [1517]) on gene set level. E cDNA distance of closest miCLIP site (intersection of 3 miCLIP experiments) to the respective MazF cleavage site (red) or DART C →U site (turquoise). Wilcoxon rank sum test pvalue: 5.957e−13

Use case 2: DART-seq

A complementary genetic approach is an extension of the TRIBE technique called DART-seq [11]. Meyer applied DART-seq on HEK293 cells where the APOBEC domain was fused to the YTH domain from human YTHDF2 (WT and mutated). In essence, new C →U editing events that are significantly enriched in the YTHDF2-WT, but not in the binding domain mutant are bona fide candidates for m6A RNA modification. Elevated C → U deamination on RNA level for the active domain over the mutated domain are used to predict m6A modifications on mRNA. Figure 1D shows the number of predicted m6A target genes (≥1 predicted site) and their respective overlap with consensus miCLIP gene targets as explained before. Generally, DART-seq has the advantage that it can be extended to 3rd generation long read sequences and requires only very little RNA input amounts (in vivo experiments) yet relies on genetic modification of the target system / organism. Additional file 1: Fig. S3 summarizes the analysis results as Venn diagram. Figure 1E shows the cDNA distance of either a MazF or a DART-seq target site to its closest miCLIP site. The majority of detected sites is less than 200bp away from the closest miCLIP peak. Our R package JACUSA2helper provides a vignette for DARTseq assay data analysis. Additional file 1: Fig. S4 summarizes the agreement between miCLIP, DART-seq and MazF experiments.

Use case 3: Nanopore direct mRNA sequencing

Direct RNA sequencing without involving any cDNA library preparation or PCR-based amplification steps is a unique feature of the ONT sequencing platform. Briefly, native poly-adenylated RNA is threaded through a nanopore in 3’ → 5’ orientation (Fig. 2A). Some ground-breaking work on m6A detection by direct RNA sequencing has been performed by Liu et al. [18] in yeast and Jenjaroenpun et al. [7] in mouse embryonic stem cells. We have identified a novel published direct RNA sequencing data set from HEK293 cells [12], which facilitates a direct comparison to the aforementioned Illumina-based approaches. Moreover, there are 3 miCLIP data sets available for HEK293 cells [1517], which we use as an antibody-based reference set.

Fig. 2
figure 2

Nanopore sequencing of HEK293 cell lines: Mettl3 KO and WT. A Direct RNA sequencing on Nanopore. B Signature detection with JACUSA2 comparing WT over KO cells. Three principal events are detected: base substitutions, insertions and deletions. All 5mers with a central A (NNANN) are considered. C Non-negative matrix factorization to identify characteristic patterns (matrix H) that are indicative of the m6A modification. D Sequence logo of miCLIP training data (2916 sites (5mers), red 3-overlap in G). E Profile of the strongest signal from the NMF pattern matrix (H, 4th row aka NMF4) across two independent Nanopore experiments. F Sequence enrichment logo for sites with Score(NMF4 >>other patterns). G: miCLIP subsets: red 3-overlap was used for NMF training, others (shown in light blue) are used for testing. H: JACUSA2 m6A predictions: total number of predicted m6A sites in green and exact overlap with miCLIP site in brown (test data, 27,355 sites). PPV = #true CLIP sites / (#true + #false predictions). xPore performance is indicated with dashed lines

We applied JACUSA2 (enhanced call-2 mode) to identify read error profile differences between the respective paired samples (see Fig. 2B). We are considering all positions with sufficient read coverage (> 4) across all replicates and focus on positions with a specific 5mer context, which is basically a central A nucleotide flanked by 2 adjacent random nucleotides (NNANN). Our initial unsupervised learning approach (Non-negative matrix factorization, NMF) is shown in Fig. 2C. We use 2916 previously reported consensus miCLIP m6A sites from [1517] as training set to identify characteristic patterns. The sequence composition of the 5-mer training data is shown as sequence logo in Fig. 2D.

Only non-overlapping 5mers outside of homo-polymer regions (JACUSA filter: Y) are subsequently used in training. The respective feature set consists of deletion, insertion, and substitution scores for every considered 5mer. This amounts to a feature matrix of 3 event scores ×5 motif positions ×2 experiments = 30 features. We subjected this feature matrix to a non-negative matrix factorization (NMF) analysis (Fig. 2C). Through robustness and concordance assessment, we could identify an optimal factorization rank of 5. The pattern matrix (H) highlights the importance of pos 3 (and 4) for the most relevant pattern (Fig. 2E). The loading of this factor correlates with the presence of the well known DRACH motif and can be reproduced across two independent experiments in HEK293 cells (Fig. 2D, data from [12]). The most relevant pattern is now applied to previously unseen miCLIP-predicted m6A sites (Fig. 2G). Figure 2G uses all light blue miCLIP sets as ground truth annotation and shows the overlap of JACUSA2 predictions with the corresponding miCLIP sets as well as the total number of predictions. The published performance of xPore is denoted with dashed lines (taken from SData 1 in [12]). A comparative assessment of JACUSA2 with Epinano 1.2 [18], ELIGOS2 [7] and xPore [12] is presented as part of the text in Additional file 1 on page 6ff. In short, only xPore produced competitive results in a direct comparison with JACUSA2 (see also Additional file 1: Fig. S5). Of note, the learned m6A pattern in Fig. 2E is represented by the sequence context in Fig. 2F and can be transferred to other species and cell types. To demonstrate this transfer, we use a murine model system, mouse embryonic stem cells (mESC). Jenjaroenpun et al. [7] has sequenced mESC mRNA on the Nanopore and Köertel et al. [16] mapped m6A residues via a novel miCLIP approach (antibody-based reference set). Additional file 1: Fig. S6A describes the covered adenosine residues of interest (NNANN) and their overlap with the miCLIP m6A sites. Additional file 1: Fig. S6B describes the concept and shows the top scoring predicted m6A sites from the Nanopore data, which are all supported by miCLIP-derived predictions. The global score distribution for mouse predictions is indistinguishable from CLIP-supported human prediction scores (Additional file 1: Fig. S6C).

Use case 4: Nanopore direct rRNA sequencing

Complementarily, we show case the utility of JACUSA2 on another RNA species: human 18S rRNA. We have established custom adapters for rRNA sequencing (Naarmann-de Vries et al. in preparation) and sequenced both, HEK293 wild-type 18S rRNA molecules and unmodified in vitro transcribed 18S rRNA molecules. In this particular use case, we employ JACUSA2 to identify uridine modifications. Taoka et al. [19] define the human rRNA modification landscape and estimated modification levels. We use this information to validate JACUSA2 scores for uridine modification predictions. Briefly, 42 Ψ and 12 2’-O-methyl uridines have been mapped on human 18S rRNA along with 339 unmodified uridines. We compute single nucleotide base substitution, deletion and insertion scores with JACUSA2 for every U residue. The UMAP projection of these scores is shown in Fig. 3A. A distinct “cluster” of modified residues exists, which is well separated from the rest of uridine residues in this respective projection. The size and color of the dots represent annotated modification levels and modification type, respectively. Figure 3B summarizes the performance of JACUSA2 by showing the number of predictions based on JACUSA2 call-2 scores in green and the overlap with annotated uridine modifications in brown. A comparison to a very recent solution for pseudouridine prediction, NanoPsu [20], is summarized in Additional file 1 on page 10 (see also Additional file 1: Fig. S7). Taken together JACUSA2 is able to predict uridine modifications using Nanopore sequencing data. An orthogonal show case using conventional Illumina sequencing employs a chemical approach to modify pseudouridines with carbodiimide and was analyzed with JACUSA2 as well (see Additional file 1 – Additional Use Case: Pseudouridylation site mapping in human rRNAs and Additional file 1: Fig. S8).

Fig. 3
figure 3

Nanopore sequencing of human 18S rRNA from HEK293 cells. A UMAP projection of JACUSA2 base substitution, insertion, and deletion scores for uridine positions. psU stands for Ψ modified positions, Um stands for 2 ′-O-methyl ribose modifications and m1acp3psU stands for 1-methyl-3-(3-amino-3-carboxypropyl)pseudouridine-5 ′-monophosphate. B Line plot showing by ascending sum of JACUSA2 call-2 scores the number of predictions in green and the overlap with annotated uridine modifications in brown. Modified positions as defined by Taoka et al. [19]

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Disclaimer:

This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (https://www.biomedcentral.com/)