Abstract
N6-methyladenosine (mA) is a widely-studied methylation to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases that collate mA profiles of distinct cell types have been created to facilitate quick and easy mining of mA signatures associated with cell-specific phenotypes. However, these databases contain inherent complexities that have not been explicitly reported, which may lead to inaccurate identification and interpretation of mA-associated biology by end-users who are unaware of them. Here, we review various mA-related databases, and highlight several critical matters. In particular, differences in peak-calling pipelines across databases drive substantial variability in both peak number and coordinates with only moderate reproducibility, and the inclusion of peak calls from early mA sequencing protocols may lead to the reporting of false positives or negatives. The awareness of these matters will help end-users avoid the inclusion of potentially unreliable data in their studies and better utilize mA databases to derive biologically meaningful results.
Introduction
N6-methyladenosine (mA) is the modification to adenosine (A) in RNAs via the addition of a methyl group at the nitrogen-6 position. It is the most prevalent, abundant, and conserved modification to messenger RNA (mRNA) molecules [1–5], but has also been found in other RNA species including long non-coding RNAs and circular RNAs [6–12]. In mature mRNAs, mA is typically enriched in the 3 UTRs and near the stop codons [13] and within a consensus motif, DRACH (D=A, G or U; R=G or A; H=A, C or U) [2, 13, 14]. mA has been implicated in myriad cellular processes including mRNA transcription, stability, splicing, export, and translation [15–18]. Not surprisingly, aberrant levels of mA have been associated with the development of diverse human diseases [19–26].
To understand the roles of mA in human diseases and biological processes, various techniques have been developed to profile mA. Two widely used techniques are specific antibody-based high-throughput mA sequencing and crossing linking-assisted mA sequencing (Fig. 1). Antibody-based RNA immunoprecipitation (IP) uses high-throughput sequencing with peak-calling [2, 13, 27]. mA- and meRIP-seq [2, 13, 27, 28] are the earliest and most commonly used methods under this class of techniques. They profile mA methylated regions as peaks in transcript coverage from immunoprecipitated RNA relative to input RNA. The early version of this protocol [29] typically requires 300 g of total RNA, which limits its application to cell lines. Subsequently, a refined protocol was developed by reducing the amount of total RNA to as low as 0.5 g and has been applied to low-input RNA samples, including those obtained from primary human tumors [27]. Both early and refined methods map mA enrichment within 200-nt peak regions, and cannot precisely identify methylated adenosine at a single-base resolution [13]. To overcome this issue, novel techniques, such as crossing linking-assisted mA sequencing, were developed to accurately detect mA at single-nucleotide resolution. These methods include miCLIP [30, 31], PA-mA-seq [32], mA-REF-seq [33], MAZTER-seq [34], DART-seq [35], miCLIP2 [36], and scDART-seq [37]. Among them, miCLIP/mA-CLIP is well known for single-nucleotide detection of mA methylation sites by crosslinking at the antibody binding site to induce mutations of methylated bases during reverse transcription. However, these techniques may give rise to a considerable level of background signal due to the limited specificity of antibodies.

Figure 1
Experimental methods used to profile mA and databases that catalogue the resulting data; databases incorporate mA peaks from publicly available mA-seq and RIP-seq data sets or mA sites from publicly available miCLIP and miCLIP2 data sets. Parts of the figure were generated using Bio-Render.
Nevertheless, consequent to implementing these mA detection methods, a large amount of mA modification data is now publicly available. To collate these data, several bioinformatics web servers and databases have been designed to collect, organize, integrate, and annotate mA data sets (Fig. 1) [38–50]. These databases can be categorized into three main groups. The first group of databases provides the status of RNA modifications based only on enriched peaks over input control obtained from immunoprecipitation sequencing experiments using antibodies against specific RNA modification marks. MeT-DB is the first epitranscriptome database that compiled the mA profiles for seven species in 26 different studies with a total of 74 meRIP-seq samples from 22 different experimental conditions [42, 43]. It contains additional functional data such as microRNA (miRNA) target sites, single nucleotide polymorphisms (SNPs), binding sites of splicing factors and RNA-binding proteins (RBPs), and information related to cancer genes. RMBase [40, 41] is another comprehensive epitranscriptome sequencing database that covers more than 100 types of RNA modifications in 13 species from 47 independent studies. These RNA modifications include pseudouridine () modifications, 5-methylcytosines (mC), mA, and 2-O-methylations (2-O-Me). RMBase provides novel web-based visualization tools for metagene profiles, logos of motifs associated with specific modifications, and various types of genomic features. In addition, it integrates epitranscriptome sequencing data to explore post-transcriptional modifications of RNAs and their relationships with miRNA binding events, RBP binding sites, disease-related SNPs, and genome-wide association study (GWAS) data. Recently, REPIC was released to record 10 million peaks from 672 samples of 49 independent studies, covering 61 cell lines or tissues in 11 organisms [44]. It also integrates 1418 histone ChIP-seq and 118 DNase-seq data to present a comprehensive atlas of mA methylation sites, histone modification sites, and chromatin accessibility regions. Accordingly, it allows users to explore cell/tissue-specific mA modifications and to investigate potential interactions between mA modifications and histone marks or chromatin accessibility.
The second group of databases collects mA both with peak-calling and at a single-base resolution from crossing linking-assisted mA sequencing. For instance, CVm6A compiles both mA-seq and miCLIP-seq/PA-mA-seq datasets in 23 human and eight mouse cell lines, and provides a visualization interface for searching and comparing the mA patterns in different cell lines [45]. Another database, m6A-Atlas, assembles over 400 000 high-confidence mA sites, and investigates cross-species conservation of mA sites among several species: seven vertebrate species (including human, mouse, and chimpanzee), 10 virus species (including HIV, KSHV, and DENV), and their host cells [46]. More recently, an updated version of mA-Atlas (v2.0) was generated to include nearly 800 000 reliable mA sites from 13 high-resolution methods and over 16 million mA peaks from meRIP-seq experiments [50].
Based on high confidence mA sites and mA peaks from datasets, some databases have included genetic variants that affect mA modification sites. Thus, the third group of databases provides information concerning the effects of genetic variants on mA to understand how they alter RNA methylation status to impact biological functions. For example, m6ASNP [47], m6AVar [48], and RMVar [51] adopted a random forest model to predict whether the methylation status of a mA site is altered by the disease-associated genetic variants within and flanking mA sites in human and mouse. m6AVar [48] and RMVar [51] collect a large number of mA-associated variants derived from millions of germline and somatic variants from miCLIP, PA-mA-seq, and meRIP-Seq experiments. The mA-associated variants are determined based on whether they localize to RBP-binding regions, splicing, and miRNA target sites. RMVar also explores the underlying relationship between the mA machinery and diseases by integrating the disease-association data from GWAS [52] and ClinVar [53] databases. m6A-TSHub presents a comprehensive platform for context-specific mA methylation and mA-affecting mutations in 23 human tissues in four key components (m6A-TSDB, m6A-TSFinder, m6A-TSVar, and m6A-CAVar) [49]. Inspired by m6AVar, m6A-Atlas [46] and m6A-Atlas v2.0 [50] inferred whether disease-associated genetic mutations could directly destroy the mA forming motif DRACH. A detailed summary of these databases is documented in Table 1.
Table 1A detailed summary of mA databases
Databases
. | Functions
. | Modifications
. | Samples
. |
---|
MeT-DB [42] Website not available | binding sites of miRNA, splicing factor (SF), RBPs | mA | 74 meRIP-seq from 22 different conditions |
MeT-DB v2.0 [43] www.xjtlu.edu.cn/metdb2 | miRNA target sites, SF binding sites,RBPs,Cancer related genes | mA | 185 samples for seven species from 26 independent studies |
RMBase [40] Website not available | protein genes, miRNA target sites, disease-related SNPs, regulatory ncRNAs | , mC, mA and 2-O-Me | 18 independent studies |
RMBase v2.0 [41] https://rna.sysu.edu.cn/rmbase | miRNA targets, RBP binding sites, SNVs and GWAS data | mA, mA, , mC, 2-O-Me | 47 studies among 13 species |
CVm6A [45] http://gb.whu.edu.cn:8080/CVm6A | regulation of cell-dependent mA modification in disease and development | mA | 23 human and eight mouse cell lines |
REPIC [44] https://repicmod.uchicago.edu/repic | cell- or tissue-specific mA modifications, histone marks or chromatin accessibility | mA | 672 samples of 49 studies, 61 cell lines in 11 organisms |
m6ASNP [47] http://m6asnp.renlab.org | if methylation status of an mA site is altered by variants around the site | mA | 59 234 human mA sites from [31] and [30] |
m6AVar [48] http://m6avar.renlab.org | RBPs, miRNA-targets and splicing sites associated with variants, GWAS and ClinVar | mA | 2 PA-mA-seq,7 miCLIP, 244 meRIP-Seq and a prediction based on Random Forest algorithm |
RMVar[51] http://rmvar.renlab.org | RBPs, miRNA targets, splicing events, circRNAs and isease-related information from ClinVar and GWAS | mA/m, mA, mC, , mU, 2-O-Me, A-to-I and mG | 150 independent studies in human and mouse |
m6A-Atlas [46] http://180.208.58.19/m6A-Atlas | RBPs, miRNA targets and splicing sites | mA | 67 datasets from seven base-resolution methods and 1363 mA-seq |
m6A-Atlas v2.0 [54] http://rnamd.org/m6a/ | RBPs, miRNA targets, SNPs, and splicing sites | mA | 2712 MeRIP-seq and 109 base-resolution samples |
m6A-TSHub [49] http://180.208.58.19/tshub | RBPs, miRNA targets, and splicing sites, along with their known disease and phenotype linkage integrated from GWAS and ClinVar | mA | 23 healthy human tissues and 25 tumour conditions |
Databases
. | Functions
. | Modifications
. | Samples
. |
---|
MeT-DB [42] Website not available | binding sites of miRNA, splicing factor (SF), RBPs | mA | 74 meRIP-seq from 22 different conditions |
MeT-DB v2.0 [43] www.xjtlu.edu.cn/metdb2 | miRNA target sites, SF binding sites,RBPs,Cancer related genes | mA | 185 samples for seven species from 26 independent studies |
RMBase [40] Website not available | protein genes, miRNA target sites, disease-related SNPs, regulatory ncRNAs | , mC, mA and 2-O-Me | 18 independent studies |
RMBase v2.0 [41] https://rna.sysu.edu.cn/rmbase | miRNA targets, RBP binding sites, SNVs and GWAS data | mA, mA, , mC, 2-O-Me | 47 studies among 13 species |
CVm6A [45] http://gb.whu.edu.cn:8080/CVm6A | regulation of cell-dependent mA modification in disease and development | mA | 23 human and eight mouse cell lines |
REPIC [44] https://repicmod.uchicago.edu/repic | cell- or tissue-specific mA modifications, histone marks or chromatin accessibility | mA | 672 samples of 49 studies, 61 cell lines in 11 organisms |
m6ASNP [47] http://m6asnp.renlab.org | if methylation status of an mA site is altered by variants around the site | mA | 59 234 human mA sites from [31] and [30] |
m6AVar [48] http://m6avar.renlab.org | RBPs, miRNA-targets and splicing sites associated with variants, GWAS and ClinVar | mA | 2 PA-mA-seq,7 miCLIP, 244 meRIP-Seq and a prediction based on Random Forest algorithm |
RMVar[51] http://rmvar.renlab.org | RBPs, miRNA targets, splicing events, circRNAs and isease-related information from ClinVar and GWAS | mA/m, mA, mC, , mU, 2-O-Me, A-to-I and mG | 150 independent studies in human and mouse |
m6A-Atlas [46] http://180.208.58.19/m6A-Atlas | RBPs, miRNA targets and splicing sites | mA | 67 datasets from seven base-resolution methods and 1363 mA-seq |
m6A-Atlas v2.0 [54] http://rnamd.org/m6a/ | RBPs, miRNA targets, SNPs, and splicing sites | mA | 2712 MeRIP-seq and 109 base-resolution samples |
m6A-TSHub [49] http://180.208.58.19/tshub | RBPs, miRNA targets, and splicing sites, along with their known disease and phenotype linkage integrated from GWAS and ClinVar | mA | 23 healthy human tissues and 25 tumour conditions |
Table 1A detailed summary of mA databases
Databases
. | Functions
. | Modifications
. | Samples
. |
---|
MeT-DB [42] Website not available | binding sites of miRNA, splicing factor (SF), RBPs | mA | 74 meRIP-seq from 22 different conditions |
MeT-DB v2.0 [43] www.xjtlu.edu.cn/metdb2 | miRNA target sites, SF binding sites,RBPs,Cancer related genes | mA | 185 samples for seven species from 26 independent studies |
RMBase [40] Website not available | protein genes, miRNA target sites, disease-related SNPs, regulatory ncRNAs | , mC, mA and 2-O-Me | 18 independent studies |
RMBase v2.0 [41] https://rna.sysu.edu.cn/rmbase | miRNA targets, RBP binding sites, SNVs and GWAS data | mA, mA, , mC, 2-O-Me | 47 studies among 13 species |
CVm6A [45] http://gb.whu.edu.cn:8080/CVm6A | regulation of cell-dependent mA modification in disease and development | mA | 23 human and eight mouse cell lines |
REPIC [44] https://repicmod.uchicago.edu/repic | cell- or tissue-specific mA modifications, histone marks or chromatin accessibility | mA | 672 samples of 49 studies, 61 cell lines in 11 organisms |
m6ASNP [47] http://m6asnp.renlab.org | if methylation status of an mA site is altered by variants around the site | mA | 59 234 human mA sites from [31] and [30] |
m6AVar [48] http://m6avar.renlab.org | RBPs, miRNA-targets and splicing sites associated with variants, GWAS and ClinVar | mA | 2 PA-mA-seq,7 miCLIP, 244 meRIP-Seq and a prediction based on Random Forest algorithm |
RMVar[51] http://rmvar.renlab.org | RBPs, miRNA targets, splicing events, circRNAs and isease-related information from ClinVar and GWAS | mA/m, mA, mC, , mU, 2-O-Me, A-to-I and mG | 150 independent studies in human and mouse |
m6A-Atlas [46] http://180.208.58.19/m6A-Atlas | RBPs, miRNA targets and splicing sites | mA | 67 datasets from seven base-resolution methods and 1363 mA-seq |
m6A-Atlas v2.0 [54] http://rnamd.org/m6a/ | RBPs, miRNA targets, SNPs, and splicing sites | mA | 2712 MeRIP-seq and 109 base-resolution samples |
m6A-TSHub [49] http://180.208.58.19/tshub | RBPs, miRNA targets, and splicing sites, along with their known disease and phenotype linkage integrated from GWAS and ClinVar | mA | 23 healthy human tissues and 25 tumour conditions |
Databases
. | Functions
. | Modifications
. | Samples
. |
---|
MeT-DB [42] Website not available | binding sites of miRNA, splicing factor (SF), RBPs | mA | 74 meRIP-seq from 22 different conditions |
MeT-DB v2.0 [43] www.xjtlu.edu.cn/metdb2 | miRNA target sites, SF binding sites,RBPs,Cancer related genes | mA | 185 samples for seven species from 26 independent studies |
RMBase [40] Website not available | protein genes, miRNA target sites, disease-related SNPs, regulatory ncRNAs | , mC, mA and 2-O-Me | 18 independent studies |
RMBase v2.0 [41] https://rna.sysu.edu.cn/rmbase | miRNA targets, RBP binding sites, SNVs and GWAS data | mA, mA, , mC, 2-O-Me | 47 studies among 13 species |
CVm6A [45] http://gb.whu.edu.cn:8080/CVm6A | regulation of cell-dependent mA modification in disease and development | mA | 23 human and eight mouse cell lines |
REPIC [44] https://repicmod.uchicago.edu/repic | cell- or tissue-specific mA modifications, histone marks or chromatin accessibility | mA | 672 samples of 49 studies, 61 cell lines in 11 organisms |
m6ASNP [47] http://m6asnp.renlab.org | if methylation status of an mA site is altered by variants around the site | mA | 59 234 human mA sites from [31] and [30] |
m6AVar [48] http://m6avar.renlab.org | RBPs, miRNA-targets and splicing sites associated with variants, GWAS and ClinVar | mA | 2 PA-mA-seq,7 miCLIP, 244 meRIP-Seq and a prediction based on Random Forest algorithm |
RMVar[51] http://rmvar.renlab.org | RBPs, miRNA targets, splicing events, circRNAs and isease-related information from ClinVar and GWAS | mA/m, mA, mC, , mU, 2-O-Me, A-to-I and mG | 150 independent studies in human and mouse |
m6A-Atlas [46] http://180.208.58.19/m6A-Atlas | RBPs, miRNA targets and splicing sites | mA | 67 datasets from seven base-resolution methods and 1363 mA-seq |
m6A-Atlas v2.0 [54] http://rnamd.org/m6a/ | RBPs, miRNA targets, SNPs, and splicing sites | mA | 2712 MeRIP-seq and 109 base-resolution samples |
m6A-TSHub [49] http://180.208.58.19/tshub | RBPs, miRNA targets, and splicing sites, along with their known disease and phenotype linkage integrated from GWAS and ClinVar | mA | 23 healthy human tissues and 25 tumour conditions |
All these mA-related databases have been created with the aim of providing useful information to study mA-associated functions, but the lack of understanding of what these data present may lead to misinterpretation of mA. First, databases include meRIP-seq experiments that lacked reproducibility even when data were generated in the same cell types [55]. meRIP-seq experiments are also unable to determine the stoichiometry of mA. The sensitivity of mA calling is highly variable between different cells and tissues, making the interpretation of mA challenging. In addition, databases include datasets generated using both early and refined mA profiling methods that vary in sensitivity and specificity. Second, cell/tissue-specific mA profiles are not available in several of these databases, hindering mA investigations for specific cells or tissues. Indeed, certain mA modifications are tissue-dependent and are dynamically altered in response to different stimuli [56–60]. Third, reliable mA sites that can be detected using improved technologies including GLORI [61] and eTAM-seq [62] on new datasets are not yet incorporated into databases. Base-resolution datasets, often generated using technologies like miCLIP [30, 31], remain the minority in databases due to the comparative difficulty to generate them and consequent rarity in the literature (e.g. only 109/2821 samples (3.6%) in m6A-Atlas v2.0). Fourth, main analysis methods, such as MACS2 and exomePeak used to analyse mA datasets in databases, do not model the variation of methylation within transcripts and across biological replicates. Fifth, databases detailing mA-related SNPs have included genetic variants in sequences that flank the DRACH motifs, but these variants are not known to affect mA modification. Sixth, databases have included mA-related SNPs that contribute to non-synonymous variants. Therefore, it is unclear whether disease pathogenicity is consequent to altered amino acid sequence or mA function(s).
In this study, we explore currently available mA databases and describe their characteristics and limitations. We start by determining the number of m6A peaks and sites in specific cell lines reported in databases, highlighting a huge variability in the reported numbers. Next, as a significant portion of datasets compiled in databases was generated using the early mA-RIP-seq methods [2, 28, 29], we compared an example of the dataset obtained with the early method to the one obtained using the refined mA-RIP-seq protocol [27] in the same cell-line. We pinpoint potential errors in the early mA profiling method, which implies the need for additional filtering to minimize erroneous calling of mA. We describe the challenges of linking mA-associated SNPs to disease pathogenicity when these SNPs also constitute non-synonymous variants. Finally, we propose future directions towards generating better mA-related databases from an end-user perspective.
Materials and methods
Evaluation of peak variability within accessions across databases
To explore how peak calls varied in response to differences in the preprocessing pipelines of databases, we compared database peak coordinates derived from two public accessions: NOMO-1 m6A-seq from Su et al. 2018 [63] (GSE87190, with input in GSM2324291, and immunoprecipitation in GSM2324292); and H1299 meRIP-seq from Lin et al. 2016 [64] (GSE76367, with input in GSM1982262, immunoprecipitation in GSM1982263). We note these two cell-types were chosen as each has only a single m6A-seq / meRIP-seq sequencing run in the literature; this controls for one source of variability across databases, which is that when a cell-type has multiple sequencing runs, some databases call peaks separately per run (e.g. REPIC), while others derive a combined peak list from processing all runs together and thus uses more information (e.g. m6A-TSHub). For m6A-Atlas and m6A-TSHub, coordinates were converted from hg19 to hg38 using rTrackLayer [65]. For database pipelines using MACS2, intergenic peaks were excluded from analyses. Peaks coordinates were standardized to a width of 125bp centred around the midpoint, and intersected using intersectBed from the BedTools suite [66], requiring a minimum overlap of 1bp.
Comparison of mA peaks detected in the human A549 cell line using early and refined mA-seq
To demonstrate variable calling of mA generated by widely used mA-seq protocols in existing databases, we compared the mA peaks identified using the early mA-seq protocol [67] and the refined RIP-seq protocol [27] in human A549 cell line. Data generated using either methods are publicly available for this cell line, and have been incorporated into mA-related databases. To call mA peaks, adaptor sequences, and low-quality reads were first trimmed from raw reads using Trimmomatic v0.38 [68] under default settings. FastQC v0.11.8 [69] was used to assess read quality, and STAR v2.5.2a [70] aligner was used to align clean reads to the human reference genome hg38 (ENSEMBL version 86). Samtools v1.6 [71] was used to select uniquely mapped reads to minimize the rate of false positives. MACS2 v2.1.0 [72] was used to call peaks enriched in immunoprecipitated over corresponding input samples. Peaks with enrichment scores >4 were selected for further analysis.
Validation of A549 cell line-associated peaks
To identify potential false positive mA, a total of 10 704 high confidence mA sites in the human A549 cell line quantified by the mA-CLIP were downloaded from the m6A-Atlas database [46]. Bedtools was used to validate the mA peaks identified using the early mA-seq protocol and refined RIP-seq protocol by intersecting them with the existing high confidence mA sites [66]. A pie chart was used to visualize the percentage of verified mA sites in each method.
Calculation of mA scores by an intelligent m6A (iM6A) model for mA sites in A549 cell lines
The intelligent m6A (iM6A) model was further used to show the potential erroneous mA sites generated by the early mA-seq protocol [67] and the refined RIP-seq protocol. To do this, the narrow peak output of MACS2 was expanded to +/- 250bp from its peak position. A 501bp sequence is used as input of iM6A (a deep learning model) to calculate the probability of the site being a mA site, consistent with the original article describing this model [73]. Each position in the peak sequence was assigned a probability ranging from 0 to 1: the larger the score, the higher the probability of a site being mA-methylated. The maximum probability of the sequence is used as mA probability of the called peaks. The line and box plots were used to visualize the difference in the mA probability between the early mA-seq protocol and the refined RIP-seq protocol.
Results and discussion
Existing databases are affected by highly variable calling of mA peaks
Although existing mA databases have been useful in providing a wealth of mA profiles of different cell-types and tissues, there are inherent complexities that may impact the meaningful derivation of data. Non-specialist end-users usually pay little attention to these complexities and neither do they realize that these complexities affect the reliability of mA profiles reported in these databases. The awareness of these complexities will help end-users to be more cautious when extracting data directly from these databases and understand how to make the most of these invaluable resources.
In most databases, mA enrichment in RNA transcripts is determined by peak calling which usually involves five steps. (1) QC: check reads’ quality; (2) trimming: remove adapters and low-quality reads; (3) mapping: align reads to the genome; (4) filtering: remove rRNAs and PCR duplicates; (5) peak calling: identify the regions enriched in IP RNA relative to input RNA. The first three are standard steps employed by all databases. However, steps (4) and (5) are executed differently by various databases as summarized in Table 2.
Table 2Comparison between settings applied in databases to call mA
Databases
. | Filtering
. | Peak-calling
. | Reported-peaks
. | Customizable setting of parameters for peak calling?
. | Reported mA sites
. |
---|
MeT-DB/ MeT-DBv2.0 | | exomePeak | FDR<0.05, P <0.05, FC>1 and MAPQ>30 | unavailable | Searching the RRACH motif in the identified peaks |
RMBase/ RMBasev2.0 | | exomePeak | FDR<0.05, P <0.01, MAPQ>30 and FC>2 | not allowed | Searching the RRACH motif in the identified peaks |
CVm6A | Picard (remove PCR duplicates) | MeTPeak | FDR<0.05, FE>1 and MAPQ>30 | not allowed | miCLIP-seq/PA-mA-seq |
REPIC | Removed rRNAs, FastQ Screen | exomePeak, MeTPeak, MACS2 | FDR<0.05, MAPQ>20, FE>2 | not allowed | mA-seq and meRIP-seq |
m6ASNP/ m6AVar/RMVar | | MACS2, MeTPeak, Meyer’s method | Consensus peak by MSPC | not allowed | miCLIP/MAZTER-seq/mA-REF-seq/mACE-seq/ DART-seq/PA-mA-Seq/mA-Label-seq (High confidence); meRIP-Seq (Medium confidence); Transcriptome-wide prediction (Low confidence) |
m6A-Atlas/ m6A-Atlas v2.0 | | exomePeak2, MACS2 (v2.0), TRESS (v2.0) | Fold-enrichment>1 | not allowed | mA-REF-seq/MAZTER-seq/miCLIP-seq/ mA-CLIP-seq/PA-mA-seq/mA-seq with improved protocol/mA-CLIP-seq |
m6A-TSHub | | exomePeak2 | P <1e-10, contain at least one DRACH | not allowed | miCLIP/mA-CLIP-seq, mA-REF-seq and DART-seq |
Databases
. | Filtering
. | Peak-calling
. | Reported-peaks
. | Customizable setting of parameters for peak calling?
. | Reported mA sites
. |
---|
MeT-DB/ MeT-DBv2.0 | | exomePeak | FDR<0.05, P <0.05, FC>1 and MAPQ>30 | unavailable | Searching the RRACH motif in the identified peaks |
RMBase/ RMBasev2.0 | | exomePeak | FDR<0.05, P <0.01, MAPQ>30 and FC>2 | not allowed | Searching the RRACH motif in the identified peaks |
CVm6A | Picard (remove PCR duplicates) | MeTPeak | FDR<0.05, FE>1 and MAPQ>30 | not allowed | miCLIP-seq/PA-mA-seq |
REPIC | Removed rRNAs, FastQ Screen | exomePeak, MeTPeak, MACS2 | FDR<0.05, MAPQ>20, FE>2 | not allowed | mA-seq and meRIP-seq |
m6ASNP/ m6AVar/RMVar | | MACS2, MeTPeak, Meyer’s method | Consensus peak by MSPC | not allowed | miCLIP/MAZTER-seq/mA-REF-seq/mACE-seq/ DART-seq/PA-mA-Seq/mA-Label-seq (High confidence); meRIP-Seq (Medium confidence); Transcriptome-wide prediction (Low confidence) |
m6A-Atlas/ m6A-Atlas v2.0 | | exomePeak2, MACS2 (v2.0), TRESS (v2.0) | Fold-enrichment>1 | not allowed | mA-REF-seq/MAZTER-seq/miCLIP-seq/ mA-CLIP-seq/PA-mA-seq/mA-seq with improved protocol/mA-CLIP-seq |
m6A-TSHub | | exomePeak2 | P <1e-10, contain at least one DRACH | not allowed | miCLIP/mA-CLIP-seq, mA-REF-seq and DART-seq |
Table 2Comparison between settings applied in databases to call mA
Databases
. | Filtering
. | Peak-calling
. | Reported-peaks
. | Customizable setting of parameters for peak calling?
. | Reported mA sites
. |
---|
MeT-DB/ MeT-DBv2.0 | | exomePeak | FDR<0.05, P <0.05, FC>1 and MAPQ>30 | unavailable | Searching the RRACH motif in the identified peaks |
RMBase/ RMBasev2.0 | | exomePeak | FDR<0.05, P <0.01, MAPQ>30 and FC>2 | not allowed | Searching the RRACH motif in the identified peaks |
CVm6A | Picard (remove PCR duplicates) | MeTPeak | FDR<0.05, FE>1 and MAPQ>30 | not allowed | miCLIP-seq/PA-mA-seq |
REPIC | Removed rRNAs, FastQ Screen | exomePeak, MeTPeak, MACS2 | FDR<0.05, MAPQ>20, FE>2 | not allowed | mA-seq and meRIP-seq |
m6ASNP/ m6AVar/RMVar | | MACS2, MeTPeak, Meyer’s method | Consensus peak by MSPC | not allowed | miCLIP/MAZTER-seq/mA-REF-seq/mACE-seq/ DART-seq/PA-mA-Seq/mA-Label-seq (High confidence); meRIP-Seq (Medium confidence); Transcriptome-wide prediction (Low confidence) |
m6A-Atlas/ m6A-Atlas v2.0 | | exomePeak2, MACS2 (v2.0), TRESS (v2.0) | Fold-enrichment>1 | not allowed | mA-REF-seq/MAZTER-seq/miCLIP-seq/ mA-CLIP-seq/PA-mA-seq/mA-seq with improved protocol/mA-CLIP-seq |
m6A-TSHub | | exomePeak2 | P <1e-10, contain at least one DRACH | not allowed | miCLIP/mA-CLIP-seq, mA-REF-seq and DART-seq |
Databases
. | Filtering
. | Peak-calling
. | Reported-peaks
. | Customizable setting of parameters for peak calling?
. | Reported mA sites
. |
---|
MeT-DB/ MeT-DBv2.0 | | exomePeak | FDR<0.05, P <0.05, FC>1 and MAPQ>30 | unavailable | Searching the RRACH motif in the identified peaks |
RMBase/ RMBasev2.0 | | exomePeak | FDR<0.05, P <0.01, MAPQ>30 and FC>2 | not allowed | Searching the RRACH motif in the identified peaks |
CVm6A | Picard (remove PCR duplicates) | MeTPeak | FDR<0.05, FE>1 and MAPQ>30 | not allowed | miCLIP-seq/PA-mA-seq |
REPIC | Removed rRNAs, FastQ Screen | exomePeak, MeTPeak, MACS2 | FDR<0.05, MAPQ>20, FE>2 | not allowed | mA-seq and meRIP-seq |
m6ASNP/ m6AVar/RMVar | | MACS2, MeTPeak, Meyer’s method | Consensus peak by MSPC | not allowed | miCLIP/MAZTER-seq/mA-REF-seq/mACE-seq/ DART-seq/PA-mA-Seq/mA-Label-seq (High confidence); meRIP-Seq (Medium confidence); Transcriptome-wide prediction (Low confidence) |
m6A-Atlas/ m6A-Atlas v2.0 | | exomePeak2, MACS2 (v2.0), TRESS (v2.0) | Fold-enrichment>1 | not allowed | mA-REF-seq/MAZTER-seq/miCLIP-seq/ mA-CLIP-seq/PA-mA-seq/mA-seq with improved protocol/mA-CLIP-seq |
m6A-TSHub | | exomePeak2 | P <1e-10, contain at least one DRACH | not allowed | miCLIP/mA-CLIP-seq, mA-REF-seq and DART-seq |
The levels of stringency and filtering can affect the confidence of true mA peaks being included in these databases. For example, some databases define RNA regions as being enriched for mA when peaks show fold-change (FC) > 1 against input (Table 2). However, low fold-change (FC < 1.5) of signal over input may lead to false positives. On the other hand, higher stringency of FC settings may lead to false negatives in some cell types that may have generally lower levels of mA compared with others. Ideally, end-users should be allowed to alter the default settings to obtain more reliable results, but this is not possible with most mA databases (Table 2). While some databases have included base-resolution methods to validate the presence of mA sites in enriched mA peaks, others have performed such validation based on the presence of the consensus mA motif, DRACH or RRACH (Table 2). It is important for end-users to recognize that not all DRACH or RRACH motifs are methylated and mA sites defined using such criteria require further verification.
We thus compared how the same input data can produce variable output in peak calls using an exemplar m6A-seq dataset from NOMO-1 cells (GSE87190) [63]. Peak calls derived from this accession were selectable in five databases (CVm6A, m6A-Atlas, m6A-Atlas v2.0, m6A-TSHub, and REPIC), but were either absent or unavailable for download from the other databases described in this review. A total of nine different processing pipelines were compared, as two databases reported results from multiple peak calling algorithms (m6A-Atlas v2.0 and REPIC). Database pipeline stringency varied greatly, with the number of peaks ranging from 13 770 in CVm6A to approximately 40 000 in m6A-Atlas v2.0 (when using the MACS2 caller) and m6A-TSHub, nearly a three-fold difference (Fig. 2A). CVm6a may have called the fewest peaks as it filtered out PCR duplicate reads using PICARD. Indeed, within each of the nine pipelines, the percent of peak calls that failed to intersect any of the other pipelines’ averaged 30.5%, but was highly variable (range: 0.8%–53.4%) (Fig. 2B). In contrast, only an average of 24.2% of peak calls per pipeline were highly reproducible, i.e. intersected with a peak from at least half the other pipelines (Fig. 2B). The database with the fewest unique peak calls was TRESS output from m6A-Atlas v2.0 (0.8%), while ExomePeak2 output from REPIC had the highest percentage of its peaks being highly reproducible (32.4%) (Fig. 2B).

Figure 2
Comparison of peak calls from NOMO-1 cells (GSE87190) across nine database pipelines from five databases; (A) barplot of the total number of peaks reported by each database pipeline, with colours representing the number of other database pipelines each peak has an intersection with; colours are per the legend of (B); (B) barplot of the percentage of each database pipeline’s peak set that intersects a peak in none, one, two to three, or of the eight other database pipelines; (C, D) clustered heatmap of the Jaccard Index (C) and Simpson Index (D) for the pairwise intersections between peak coordinates reported by each database pipeline; EP: ExomePeak; EP2: ExomePeak2.
We next evaluated the pairwise concordance between each of the database pipeline peak calls. Using the Jaccard Index (number of intersecting peaks / number of peaks in set union), we observed very low similarity overall, with similarity above 0.5 only observed within databases when multiple peak callers were used (Fig. 2C). However, as Jaccard penalizes set size imbalances, we also controlled for this using Simpson Similarity (number of intersecting peaks / number of peaks in smaller set), which further highlighted that the strongest concordance was within databases rather than across them (Fig. 2C). Interestingly, the highest across-database similarity was between m6A-TSHub and m6A-Atlas, where both used ExomePeak2 as their peak caller, suggesting that algorithm choice also drives similarity. These numbers are also broadly consistent with REPIC’s statistics comparing its three peak callers, where REPIC MACS2 calls rarely reached a Jaccard similarity of >0.5 to REPIC exomePeak or MeTPeak calls within the same accession (13.6% and 3%, respectively).
We also repeated these analyses using a second accession (H1299 meRIP-seq from GSE76367 [64]), which broadly replicated these trends (Fig. S1).
Overall, we found that a single accession can produce vastly different peak calls depending on the database a user queries, owing to different (re)processing pipelines. This variability is likely in addition to that present across experiments even when controlling for the pipeline; while replicates reach approximately 80% peak overlap, this figure falls to 45% when considered across studies in a single tissue [55]. Newer databases like REPIC and m6A-Atlas v2.0 have begun to incorporate peak calls from multiple algorithms, and while this added transparency may help users prioritize reproducible results, our results show that peak calls cluster most strongly according to databases rather than algorithms, and thus that robustness should be evaluated across rather than within databases.
We note that the above example demonstrates substantial cross-database variability in the simplest case, where, for a given cell type, there exists only one accession with no replicates. However, for more common model cell types, multiple different accessions may be available, each of which may have several replicates and/or perturbations. Using HepG2 as an example, CVm6A has re-processed one accession (GSE37005), REPIC uses two (GSE37002, GSE37003), m6A-TSHub takes three (GSE90642, GSE102336, and GSE110320), and m6A-Atlas calls from four (GSE37002, GSE90642, GSE102336, and GSE110320). Given that peaks called across m6A-seq experiments already show poor concordance [55], this may serve as an additional factor reducing agreement across databases, one which end-users should be cognizant of.
Differences in the sensitivity of the early mA-seq protocol and the refined RIP-seq protocol impact the accuracy of mA calling in databases
mA-seq protocol was originally reported in 2012 and has been widely used to profile mA in diverse cells and tissues [2]. However, to circumvent the requirement of over 300 g of total RNA [29] or 5 g of mRNAs, the RIP-seq protocol [27] was developed and requires as low as 0.5 g of total RNA. For the purpose of this discussion, we termed the original mA-seq protocol the ‘early protocol’ (EP) and RIP-seq as the ‘refined protocol’ (RP). Most mA databases only collected the mA-seq data generated using EP [40–43, 45] while few others included data generated using RP [44, 46, 51]. Given that the sensitivity of mA detection using EP and RP varies considerably, the sensitivity of mA calling varies between databases.
We compared the mA profile of a human lung cancer cell line, A549, previously generated using EP and RP protocols, and illustrated the erroneous mA peak calling in databases. Both EP and RP protocols utilized single-end sequencing and immunoprecipitation were performed using the same mA-antibody (Table 3). The read length and sequencing depth of the different protocols were similar, thereby minimizing any biases in the quality of data analysed (Table 3) [27, 29]. We applied our pipeline in Fig. 3A to reanalyse mA data generated from EP and RP, and used MACS2 to call narrow peaks. We selected peaks with fold enrichment > 4 over input control, consistent with the recommended calling of reliable peaks in chromatin immunoprecipitation sequencing (ChIP-seq) data [74].
Table 3Comparison of the sequencing parameters, output, and antibodies used in two studies that profiled mA in A549 using the early (EP) and refined (RP) protocols
Sample
. | SRA
. | RLbp
. | RDm
. | Platform
. | CL
. | Genome
. | Antibody
. |
---|
ma-seq (EP) INP | SRX1503161 | 50 | 30.9 | HiSeq2000 (S) | A549 | Human | |
ma-seq (EP) IP | SRX1503162 | 50 | 29.9 | HiSeq2000 (S) | A549 | Human | Synaptic Systems, 202003 |
RIP-seq (RP) INP | SRX4239819 | 51 | 42.9 | HiSeq2500 (S) | A549 | Human | |
RIP-seq (RP) IP | SRX4239820 | 51 | 33.8 | HiSeq2500 (S) | A549 | Human | Synaptic Systems, 202003 |
Sample
. | SRA
. | RLbp
. | RDm
. | Platform
. | CL
. | Genome
. | Antibody
. |
---|
ma-seq (EP) INP | SRX1503161 | 50 | 30.9 | HiSeq2000 (S) | A549 | Human | |
ma-seq (EP) IP | SRX1503162 | 50 | 29.9 | HiSeq2000 (S) | A549 | Human | Synaptic Systems, 202003 |
RIP-seq (RP) INP | SRX4239819 | 51 | 42.9 | HiSeq2500 (S) | A549 | Human | |
RIP-seq (RP) IP | SRX4239820 | 51 | 33.8 | HiSeq2500 (S) | A549 | Human | Synaptic Systems, 202003 |
Table 3Comparison of the sequencing parameters, output, and antibodies used in two studies that profiled mA in A549 using the early (EP) and refined (RP) protocols
Sample
. | SRA
. | RLbp
. | RDm
. | Platform
. | CL
. | Genome
. | Antibody
. |
---|
ma-seq (EP) INP | SRX1503161 | 50 | 30.9 | HiSeq2000 (S) | A549 | Human | |
ma-seq (EP) IP | SRX1503162 | 50 | 29.9 | HiSeq2000 (S) | A549 | Human | Synaptic Systems, 202003 |
RIP-seq (RP) INP | SRX4239819 | 51 | 42.9 | HiSeq2500 (S) | A549 | Human | |
RIP-seq (RP) IP | SRX4239820 | 51 | 33.8 | HiSeq2500 (S) | A549 | Human | Synaptic Systems, 202003 |
Sample
. | SRA
. | RLbp
. | RDm
. | Platform
. | CL
. | Genome
. | Antibody
. |
---|
ma-seq (EP) INP | SRX1503161 | 50 | 30.9 | HiSeq2000 (S) | A549 | Human | |
ma-seq (EP) IP | SRX1503162 | 50 | 29.9 | HiSeq2000 (S) | A549 | Human | Synaptic Systems, 202003 |
RIP-seq (RP) INP | SRX4239819 | 51 | 42.9 | HiSeq2500 (S) | A549 | Human | |
RIP-seq (RP) IP | SRX4239820 | 51 | 33.8 | HiSeq2500 (S) | A549 | Human | Synaptic Systems, 202003 |
![Comparison of the m$^{6}$A peaks in A549 identified using early and refined m$^{6}$A-seq protocols; (A) a workflow to process and compare data generated using the early and refined m$^{6}$A-profiling protocols; m$^{6}$A profiling data sets used in the work were generated from early m$^{6}$A-seq protocol (EP) [29] and refined RIP-seq protocol (RP) [27] for the human A549 cell line; MACS2 was used to call narrow peaks; iM6A was implemented to calculate the methylation probability of m$^{6}$A peaks (iM6A score); (B) overlap of m$^{6}$A peaks detected in the early m$^{6}$A-seq protocol (EP) and the refined RIP-seq protocol (RP); (C) IGV plots of FBXW4 and MMP19 showing m$^{6}$A peaks that are detectable by both EP and RP; (D) the number of overlapping m$^{6}$A peaks called by both methods in the top 4000 peaks ranked by the peaks′ fold-enrichment; Venn diagram showing top 500 m$^{6}$A peaks detected in EP that overlap with all peaks in RP and vice versa; (E) m$^{6}$A peak signals in NRG2, MAGI2, ZNF827 and PTH1R transcripts detected using RP but not EP. Parts of the figure were generated using Bio-Render.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bib/25/5/10.1093_bib_bbae434/7/m_bbae434f3.jpeg?Expires=1748013596&Signature=qKrIQz-i6IIFTnHHW9eVbmUk9svUXcZ33yacMZNNdeZ4HVHhbYbFkHNzIx57jNXJZoFFm7rxNFi3yQGvlS9JOAj1wCV4OB9UooXY~iirgDKc3rOhHzboSNRN-oyNdxpFGzmH6EZpgigHg57jhos~jBFjL6CkZu6ZgZM4flhbLsIK-tp~daM6GIH0qv0~2u2-3umNbkEJ7EdzLEdXWVxWA4KmEkRMFBPdNqg~F4JvIWoAZeCdYtLjR1QS-kK8BKM9v5SDHwVmwE636mNfHEs3SwDs-i1t8DaBYtrTbDesES6LwXp8n4nT7AAOhjIRDlMu1iQ512STk7jXzxshwdIvCw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Figure 3
Comparison of the mA peaks in A549 identified using early and refined mA-seq protocols; (A) a workflow to process and compare data generated using the early and refined mA-profiling protocols; mA profiling data sets used in the work were generated from early mA-seq protocol (EP) [29] and refined RIP-seq protocol (RP) [27] for the human A549 cell line; MACS2 was used to call narrow peaks; iM6A was implemented to calculate the methylation probability of mA peaks (iM6A score); (B) overlap of mA peaks detected in the early mA-seq protocol (EP) and the refined RIP-seq protocol (RP); (C) IGV plots of FBXW4 and MMP19 showing mA peaks that are detectable by both EP and RP; (D) the number of overlapping mA peaks called by both methods in the top 4000 peaks ranked by the peaks′ fold-enrichment; Venn diagram showing top 500 mA peaks detected in EP that overlap with all peaks in RP and vice versa; (E) mA peak signals in NRG2, MAGI2, ZNF827 and PTH1R transcripts detected using RP but not EP. Parts of the figure were generated using Bio-Render.
First, we checked the percentage overlap of peaks detected by the two protocols (Fig. 3B). In A549, the RP protocol detected three times more mA peaks (17 888) than those detected using EP (7708). 3962 mA peaks that account for 20 of peaks detected using RP overlapped with 50 of peaks detected using EP (Fig. 3B). Among the overlapping genes detected by EP and RP, we observed key oncogenes and tumour suppressor genes including FBXW4, MMP17, RBM15, SETDB1, PAX8, IRS1, ABL2, HDAC4, MST1R, GATA2, FOXL2, SOX2, TET3, TMEM127, VHL, XPC, TRAF5, RASA1, SPRTN, and CDKN2C (Figs 3C and S2), indicating the consistency in detecting mA genes relevant to a cancer cell line. Notably, nearly 78 of peaks detected in RP are not detected using EP, indicating that RP is more sensitive than EP (Fig. 3B).
Second, we ranked the peak signals from high to low and determined whether the top 500 peaks detected in EP are also detectable using RP. We then performed the opposite by overlapping the top 500 peaks in RP with all peaks detected using EP. RP mA peaks that overlapped with EP (274/500) were two times more than top EP peaks that were detectable using RP (154/500) (Fig. 3D), reaching a statistical significance of P < 0.03 (Kolmogorov–Smirnov test). Our analysis further suggests that peak signals detected using EP may be weaker and potentially lead to false negatives. Examples of mA peaks in cancer-associated genes (NGR2, MAGI2, ZNF827, and PTH1R) [75–78] detected using RP but not EP are shown in Fig. 3E. Our analysis indicates that the contribution of mA to these functionally essential genes may be missed if the profiling is based solely on EP.
In addition, some of the peaks detected exclusively by EP need to be interpreted with caution because the mA profile indicates enriched peaks that mapped perfectly to all exons end-to-end, similar to the mRNA-seq profile. Illustrations of these types of peaks (generated using the EP protocol and documented in the REPIC database) are shown in Fig. 4A and 4B. The peaks (indicated by purple bars) were not identified as being enriched over input in the RP data (Fig. 4B). Thus, data generated by EP require additional filters to remove these potential false-positive peaks, which may be caused by the non-specific binding of the mA antibody during library preparation.

Figure 4
Inconsistencies of the mA peaks in A549 identified using early and refined mA-seq protocols; (A) a screenshot from REPIC that indicates mA sites in ARF1, TGM2, NDUFA2, and TKT in human A549 cell lines; (B) IGV plots showing mA peaks in ARF1, TGM2, NDUFA2, and TKT detected in EP- but not RP-generated data.
Third, we checked the overlap of the high confidence mA sites in the human A549 cell line among all detected mA sites (Fig. 5A and 5B). As shown in Fig. 5B, mA peaks detected by the RP protocol reported a higher overlap percentage (18.79, 3361) than the EP protocol (13.01, 1003) in Fig. 5A. Furthermore, we ranked the mA peaks detected via EP and RP protocols separately using the scores generated by a recently published predictor of mA sites based on deep-learning (iM6A) [73]. We found that for sites with high confidence iM6A scores (probability value > 0.1), RP detected twice as many mA sites as EP (Fig. 5C and 5D). We further examined 1000 mA sites with top-ranked iM6A scores and found that mA sites generated using RP have significantly higher iM6A scores with a P value of 1e-230 (Fig. 5E and 5F). While in some cases both EP and RP protocols can discover peaks with equally high iM6A scores (Fig. 5G and 5H, Fig. S3), in many cases, the iM6A score imputed by the RP method was higher than that of EP. This result indicates that the RP method of mA profiling is typically more sensitive. Collectively, understanding the protocols used to generate the mA data is essential to determine whether mA are likely to be correctly identified in mRNA transcripts presented in mA-related databases.

Figure 5
High confidence mA peaks in A549 called in data generated using early and refined mA-seq protocols via miCLIP-seq validation and iM6A scores; (A, B): the distribution of peaks containing a high confidence mA site detected using the early mA-seq protocol, EP (A), and the refined RIP-seq protocol, RP (B); (C) the distribution of mA scores for all identified sites; (D) histogram showing the number of mA sites in EP and RP, plotted against mA score; (E) the distribution of mA scores for all identified top 1000 sites (cut-off at 0.8 for x-axis); (F) boxplot of mA scores for top 1000 sites identified using EP and RP; (G, H): IGV plot of PCED1A (G) and ZNF593; (H) containing a peak with a high score of iM6A in data generated using EP and RP; mA sites with a high iM6A score are highlighted in orange; the table above the plots shows iM6A scores.
The association between mA and SNPs reported in databases
The presence of SNPs within the DRACH motifs can lead to the loss or gain of mA sites, thereby affecting the expression, stability, and translation of mRNAs. Most databases employed a random forest model to predict whether the methylation status of a mA site is altered by variants that are within the flanking regions 30 nucleotides (both upstream and downstream) of a given mA residue in the DRACH motif. One example is shown in Fig. 6A where PPKAG2 has a SNP (highlighted in red rectangle) which is 21bp apart from the DRACH motif.

Figure 6
Issues of mA-related SNP databases; (A) screenshot showing examples of SNPs outside mA motifs; red rectangles are used to highlight the SNP variant; (B) screenshot from RMVar and m6A-TSHub showing high-confidence mutations that altered mA motifs; selected SNPs are underlined; blue rectangles are used to highlight the example of genes; (C) mutation information from COSMIC for cases highlighted in blue rectangles in (B).
In addition, databases do not include information on the status of SNPs within mA sites regarding whether they are synonymous or non-synonymous SNPs. If an SNP within a given mA site is non-synonymous and located in a coding exon, the SNP will lead to amino acid substitution. Consequently, the pathogenicity of the SNP cannot be attributed to a change in mA alone. Four examples to illustrate this scenario extracted from RMVar and m6A-TSHub are shown in blue rectangles in Fig. 6B and 6C. According to the Catalogue of Somatic Mutations in Cancer (COSMIC) database, all these four high-confidence somatic variants are described as missense and at least three of them may lead to disease pathogenicity regardless of whether mA is present at these sites (Fig. 6C).
Towards overcoming the pitfalls in mA-related databases generated based on peak-calling experiments
Several improvements can be performed to generate reliable mA-related databases that will benefit non-specialist end-users. First, mA-related databases that incorporate data from peak-calling experiments should disclose information regarding the mA-profiling methods used to obtain the data and allow end-users to select data that they desire based on analysis methods. Second, end-users should be allowed to alter the stringency of the analysis method used to minimize false positive and negative results. Third, datasets included in databases should be reanalysed using superior tools such as TRESS rather than MACS2 or exomePeak to correct for biological variance between replicates and improve the accuracy [79]. Indeed, the analysis using TRESS in m6A-Atlas v2.0 revealed fewer but more reproducible peaks compared with other methods, which indicates that additional peaks detected by other methods may not be as robust. Where available, METTL3 depletion experiments performed in parallel should be included as negative controls to identify true mA peaks in databases. Fourth, the iM6A model can be incorporated into databases to predict the probability of mA peaks being real. This feature was not previously possible in most mA-related databases as they were created prior to the publication of iM6A. Finally, databases reporting mA-associated SNPs should contain a filter for synonymous and non-synonymous SNPs and focus on SNPs in the DRACH motifs that are confirmed to be methylated.
Conclusion and future work
In conclusion, while mA databases provide a rich source to infer biological roles of mA in different cellular contexts, the understanding of how the data were generated and parameters that have been set to call mA are critical to derive biologically meaningful conclusions. By discussing the variability and undisclosed information related to mA databases we hope that end-users will become aware of ‘traps’ that may lead to inaccurate interpretation of data. Some of these ‘traps’ arise consequent to issues associated with older mA-seq technologies such as non-specific binding of antibodies and arbitrary cut-off used to call mA, and not due to variability in the data processing parameters per se. Recently, long-read techniques, such as Oxford nanopore sequencing, provided a novel way to measure mA levels for each adenosine at a transcriptomic scale, but few large-scale studies are available due to higher sequencing costs to acquire sufficient sequencing depth [80]. Similarly, base-resolution mA profiling methods analogous to bisulfite-sequencing of DNA methylation have recently been developed including GLORI [61], mA-SAC-seq [81], and eTAM-seq [62] but very few studies are currently available. Future generation of databases based on these base-resolution methods of mA profiling may provide more reliable resources to mine mA datasets with higher accuracy. However, until that becomes possible, databases generated based on antibody-based techniques are likely to be utilized and continue to be created. Our recommendations should give future creators of mA-related databases insights into creating more useful databases that will benefit non-specialist end-users. As antibody-based methods have also been used to profile other mRNA modifications including mA, mAm, and 5hmC, lessons learned from mA-related databases would be applicable to interpreting the data in current and future databases created for these emerging RNA modifications.
Key Points
Current mA databases re-call peaks from sequencing data. Differences in processing pipelines drive variability in peak number and coordinates, with only moderate reproducibility.
Databases that report mA peaks do not explicitly disclose the protocols used for generating the mA-immunoprecipitation sequencing data for each sample. We reveal that a more recent refined RIP-seq protocol is more sensitive and reliable than the mA-seq protocol developed prior. The inclusion of data generated using the early mA-seq protocol needs further filtering to avoid false positives.
Several mA-related databases reported mA-associated SNPs that are located over 20 nucleotides from mA sites, raising the question of whether they lead to altered mA. Moreover, databases do not include information related to synonymous and non-synonymous SNPs. Non-synomymous SNPs may lead to functional consequences independently of mA status.
Acknowledgments
The authors thank the anonymous reviewers for their valuable suggestions.
Author contributions
J.J.-L.W. and Q.L. conceived the study, R.S., G.J.S., and F.L. analysed the results. R.S., G.J.S., F.L., Q.L., and J.J.-L.W. wrote and reviewed the manuscript.
Funding
J.J.-L.W. was supported by the National Health and Medical Research Council of Australia (No. 1128175 & 2010647). F.L. was supported by the National Natural Scientific Foundation of China (No. 62202388). Q.L. was supported by the National Institute of General Medical Sciences (grant number P20GM121325).
Competing Interests
No competing interest is declared.
Data availability
Raw data files for the two independent datasets used to compare the mA profile of the human lung cancer cell line, A549, previously generated using EP and RP protocols are freely available for download via their GEO accession number GSE76367 and GSE116002, respectively. For NOMO-1 and H1299 peak calls accessed from databases, links can be found in Supplementary Tables S1 and S2, respectively.
References
1.
Desrosiers
R
,
Friderici
K
,
Rottman
F
.
Identification of methylated nucleosides in messenger RNA from Novikoff hepatoma cells
.
Proc Natl Acad Sci U S A
1974-10
;
71
:
3971
–
5
.
.
2.
Dominissini
D
,
Moshitch-Moshkovitz
S
,
Schwartz
S
. et al.
Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq
.
Nature
2012
;
485
:
201
–
6
.
.
3.
Shi
H
,
Wei
J
,
He
C
.
Where, when, and how: Context-dependent functions of RNA methylation writers, readers, and erasers
.
Mol Cell
2019
;
74
:
640
–
50
.
.
4.
Yang
Y
,
Hsu
PJ
,
Chen
Y-S
. et al.
Dynamic transcriptomic m6A decoration: Writers, erasers, readers and functions in RNA metabolism
.
Cell Res
2018
;
28
:
616
–
24
.
.
5.
He
PC
,
He
C
.
m6a rna methylation: From mechanisms to therapeutic potential
.
EMBO J
2021
;
40
:
e105977
.
.
6.
Liu
N
,
Pan
T
.
N 6-methyladenosine–encoded epitranscriptomics
.
Nat Struct Mol Biol
2016
;
23
:
98
–
102
.
.
7.
Zhang
C
,
Jinrong
F
,
Zhou
Y
.
A review in research progress concerning m6A methylation and immunoregulation
.
Front Immunol
2019
;
10
:
922
.
.
8.
Wei
W
,
Ji
X
,
Guo
X
. et al.
Regulatory role of N6–methyladenosine (m6A) methylation in RNA processing and human diseases
.
J Cell Biochem
2017
;
118
:
2534
–
43
.
.
9.
Liu
N
,
Parisien
M
,
Dai
Q
. et al.
Probing N6-methyladenosine RNA modification status at single nucleotide resolution in mRNA and long noncoding RNA
.
RNA
2013
;
19
:
1848
–
56
.
.
10.
Alarcón
CR
,
Lee
H
,
Goodarzi
H
. et al.
N6-methyladenosine marks primary microRNAs for processing
.
Nature
2015
;
519
:
482
–
5
.
.
11.
Wan
Y
,
Tang
K
,
Zhang
D
. et al.
Transcriptome-wide high-throughput deep m6A-seq reveals unique differential m6A methylation patterns between three organs in Arabidopsis thaliana
.
Genome Biol
2015
;
16
:
1
–
26
.
.
12.
Dominissini
D
,
Nachtergaele
S
,
Moshitch-Moshkovitz
S
. et al.
The dynamic N1-methyladenosine methylome in eukaryotic messenger RNA
.
Nature
2016
;
530
:
441
–
6
.
.
13.
Meyer
KD
,
Saletore
Y
,
Zumbo
P
. et al.
Comprehensive analysis of mRNA methylation reveals enrichment in 3 UTRs and near stop codons
.
Cell
2012
;
149
:
1635
–
46
.
14.
Schwartz
S
,
Mumbach
MR
,
Jovanovic
M
. et al.
Perturbation of m6A writers reveals two distinct classes of mRNA methylation at internal and 5 sites
.
Cell Rep
2014
;
8
:
284
–
96
.
15.
Roignant
J-Y
,
Soller
M
.
m6A in mRNA: An ancient mechanism for fine-tuning gene expression
.
Trends Genet
2017
;
33
:
380
–
90
.
.
16.
Shafik
A
,
Schumann
U
,
Evers
M
. et al.
The emerging epitranscriptomics of long noncoding RNAs
.
Biochim Biophys Acta Gene Regul Mech
2016
;
1859
:
59
–
70
.
.
17.
Yang
Y
,
Fan
X
,
Mao
M
. et al.
Extensive translation of circular RNAs driven by N6-methyladenosine
.
Cell Res
2017
;
27
:
626
–
41
.
.
18.
Louloupi
A
,
Ntini
E
,
Conrad
T
. et al.
Transient N-6-methyladenosine transcriptome sequencing reveals a regulatory role of m6A in splicing efficiency
.
Cell Rep
2018
;
23
:
3429
–
37
.
.
19.
Cui
Q
,
Shi
H
,
Ye
P
. et al.
m6A RNA methylation regulates the self-renewal and tumorigenesis of glioblastoma stem cells
.
Cell Rep
2017
;
18
:
2622
–
34
.
.
20.
Zhang
S
,
Zhao
BS
,
Zhou
A
. et al.
m6A demethylase ALKBH5 maintains tumorigenicity of glioblastoma stem-like cells by sustaining FOXM1 expression and cell proliferation program
.
Cancer Cell
2017
;
31
:
591
–
606.e6
.
.
21.
Li
Z
,
Weng
H
,
Rui
S
. et al.
FTO plays an oncogenic role in acute myeloid leukemia as a N6-methyladenosine RNA demethylase
.
Cancer Cell
2017
;
31
:
127
–
41
.
.
22.
Zhang
C
,
Samanta
D
,
Lu
H
. et al.
Hypoxia induces the breast cancer stem cell phenotype by HIF-dependent and ALKBH5-mediated m6A-demethylation of NANOG mRNA
.
Proc Natl Acad Sci
2016
;
113
:
E2047
–
56
.
.
23.
Chen
X-Y
,
Zhang
J
,
Zhu
J-S
.
The role of m6A RNA methylation in human cancer
.
Mol Cancer
2019
;
18
:
1
–
9
.
.
24.
Jiang
X
,
Liu
B
,
Nie
Z
. et al.
The role of m6A modification in the biological functions and diseases
.
Signal Transduct Target Ther
2021
;
6
:
1
–
16
.
.
25.
Zhang
Y
,
Ge
F
,
Li
F
. et al.
Prediction of multiple types of RNA modifications via biological language model
.
IEEE/ACM Trans Comput Biol Bioinform
2023
;
20
:
3205
–
14
.
.
26.
Chen
Z
,
Zhao
P
,
Li
F
. et al.
Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences
.
Brief Bioinform
2020
;
21
:
1676
–
96
.
.
27.
Zeng
Y
,
Wang
S
,
Gao
S
. et al.
Refined RIP-seq protocol for epitranscriptome analysis with low input materials
.
PLoS Biol
2018
;
16
:
e2006092
.
.
28.
Dominissini
D
,
Moshitch-Moshkovitz
S
,
Amariglio
N
. et al. .
Transcriptome-wide mapping of N6-methyladenosine by m6A-seq
. In:
Methods in Enzymology
.
Elsevier
,
2015
, vol.
560
, pp.
131
–
47
,
.
29.
Dominissini
D
,
Moshitch-Moshkovitz
S
,
Salmon-Divon
M
. et al.
Transcriptome-wide mapping of N 6-methyladenosine by m 6 A-seq based on immunocapturing and massively parallel sequencing
.
Nat Protoc
2013
;
8
:
176
–
89
.
.
30.
Ke
S
,
Alemu
EA
,
Mertens
C
. et al.
A majority of m6A residues are in the last exons, allowing the potential for 3 UTR regulation
.
Genes Dev
2015
;
29
:
2037
–
53
.
.
31.
Linder
B
,
Grozhik
AV
,
Olarerin-George
AO
. et al.
Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome
.
Nat Methods
2015
;
12
:
767
–
72
.
.
32.
Chen
K
,
Lu
Z
,
Wang
X
. et al.
High–resolution N6–methyladenosine (m6A) map using photo–crosslinking–assisted m6A sequencing
.
Angew Chem
2015
;
127
:
1607
–
10
.
.
33.
Zhang
Z
,
Chen
L-Q
,
Zhao
Y-L
. et al.
Single-base mapping of m6A by an antibody-independent method
.
Sci Adv
2019
;
5
:
eaax0250
.
34.
Garcia-Campos
MA
,
Edelheit
S
,
Toth
U
. et al.
Deciphering the ‘m6A code’ via antibody-independent quantitative profiling
.
Cell
2019
;
178
:
731
–
747. e16
.
35.
Meyer
KD
.
DART-seq: An antibody-free method for global m6A detection
.
Nat Methods
2019
;
16
:
1275
–
80
.
.
36.
Körtel
N
,
Zhou
CRY
,
Busch
A
. et al.
FX Reymond Sutandy, Jacob Haase, Mihika Pradhan, Michael Musheev, and Dirk Ostareck. Deep and accurate detection of m6A RNA modifications using miCLIP2 and m6Aboost machine learning
.
Nucleic Acids Res
2021
;
49
:
e92
–
2
.
.
37.
Tegowski
M
,
Flamand
MN
,
Meyer
KD
.
scDART-seq reveals distinct m6A signatures and mRNA methylation heterogeneity in single cells
.
Mol Cell
2022
.
38.
Boccaletto
P
,
Machnicka
MA
,
Purta
E
.
MODOMICS: A database of RNA modification pathways. 2017 upyear
.
Nucleic Acids Res
2018
;
46
:
D303
–
7
.
39.
Liu
Q
,
Gregory
RI
.
RNAmod: An integrated system for the annotation of mRNA modifications
.
Nucleic Acids Res
2019
;
47
:
W548
–
55
.
.
40.
Sun
W-J
,
Li
J-H
,
Liu
S
. et al.
RMBase: A resource for decoding the landscape of RNA modifications from high-throughput sequencing data
.
Nucleic Acids Res
2016
;
44
:
D259
–
65
.
.
41.
Xuan
J-J
,
Sun
W-J
,
Lin
P-H
. et al.
RMBase v2. 0: Deciphering the map of RNA modifications from epitranscriptome sequencing data
.
Nucleic Acids Res
2018
;
46
:
D327
–
34
.
42.
Liu
H
,
Flores
MA
,
Meng
J
. et al.
MeT-DB: A database of transcriptome methylation in mammalian cells
.
Nucleic Acids Res
2015
;
43
:
D197
–
203
.
.
43.
Liu
H
,
Wang
H
,
Wei
Z
. et al.
MeT-DB V2. 0: Elucidating context-specific functions of N 6-methyl-adenosine methyltranscriptome
.
Nucleic Acids Res
2018
;
46
:
D281
–
7
.
44.
Liu
S
,
Zhu
A
,
He
C
. et al.
REPIC: A database for exploring the N6-methyladenosine methylome
.
Genome Biol
2020
;
21
:
1
–
13
.
45.
Han
Y
,
Feng
J
,
Xia
L
. et al.
CVm6A: A visualization and exploration database for m6As in cell lines
.
Cells
2019
;
8
:
168
.
46.
Tang
Y
,
Chen
K
,
Song
B
. et al.
m6A-atlas: A comprehensive knowledgebase for unraveling the N 6-methyladenosine (m6A) epitranscriptome
.
Nucleic Acids Res
2021
;
49
:
D134
–
43
.
.
47.
Jiang
S
,
Xie
Y
,
He
Z
. et al.
m6ASNP: A tool for annotating genetic variants by m6A function
.
GigaScience
2018
;
7
:
giy035
.
48.
Zheng
Y
,
Nie
P
,
Peng
D
. et al.
m6AVar: A database of functional variants involved in m6A modification
.
Nucleic Acids Res
2018
;
46
:
D139
–
45
.
.
49.
Song
B
,
Huang
D
,
Zhang
Y
. et al.
m6A-TSHub: Unveiling the context-specific m6A methylation and m6A-affecting mutations in 23 human tissues
.
Genomics Proteomics Bioinf
2023
;
21
:
678
–
94
.
50.
Liang
Z
,
Ye
H
,
Ma
J
. et al.
m6a-atlas v2.0: Updated resources for unraveling the n6-methyladenosine (m6a) epitranscriptome among multiple species
.
Nucleic Acids Res
2024
;
52
:
D194
–
202
.
51.
Luo
X
,
Li
H
,
Liang
J
. et al.
RMVar: An updated database of functional variants involved in RNA modifications
.
Nucleic Acids Res
2021
;
49
:
D1405
–
12
.
.
52.
Buniello
A
,
MacArthur
JAL
,
Cerezo
M
. et al.
The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019
.
Nucleic Acids Res
47
:
D1005
–
12
.
.
53.
Landrum
MJ
,
Lee
JM
,
Benson
M
. et al.
ClinVar: Public archive of interpretations of clinically relevant variants
.
Nucleic Acids Res
2016
;
44
:
D862
–
8
.
.
54.
Liang
Z
,
Ye
H
,
Ma
J
. et al.
m6A-atlas v2.0: Updated resources for unraveling the N6-methyladenosine (m6A) epitranscriptome among multiple species
.
Nucleic Acids Res
2023
;
52
:
D194
–
202
.
55.
McIntyre
ABR
,
Gokhale
NS
,
Cerchietti
L
. et al.
Limits in the detection of m6a changes using merip/m6a-seq
.
Scientfic Reports
2020
;
10
:
6590
.
56.
Yue
Y
,
Liu
J
,
He
C
.
RNA N6-methyladenosine methylation in post-transcriptional gene expression regulation
.
Genes Dev
2015
;
29
:
1343
–
55
.
.
57.
Begik
O
,
Lucas
MC
,
Liu
H
. et al.
Integrative analyses of the RNA modification machinery reveal tissue-and cancer-specific signatures
.
Genome Biol
2020
;
21
:
1
–
24
.
.
58.
Zhang
H
,
Shi
X
,
Huang
T
. et al.
Dynamic landscape and evolution of m6A methylation in human
.
Nucleic Acids Res
2020
;
48
:
6251
–
64
.
.
59.
Li
K
,
Cai
J
,
Zhang
M
. et al.
Landscape and regulation of m6A and m6Am methylome across human and mouse tissues
.
Mol Cell
2020
;
77
:
426
–
440. e6
.
60.
An
S
,
Huang
W
,
Huang
X
. et al.
Integrative network analysis identifies cell-specific trans regulators of m6A
.
Nucleic Acids Res
2020
;
48
:
1715
–
29
.
.
61.
Liu
C
,
Sun
H
,
Yi
Y
. et al.
Absolute quantification of single-base m6A methylation in the mammalian transcriptome using GLORI
.
Nat Biotechnol
2023
;
41
:
355
–
66
.
.
62.
Xiao
Y-L
,
Liu
S
,
Ge
R
. et al.
Transcriptome-wide profiling and quantification of N 6-methyladenosine by enzyme-assisted adenosine deamination
.
Nat Biotechnol
2023
;
41
:
993
–
1003
.
.
63.
Rui
S
,
Dong
L
,
Li
C
. et al.
R-2hg exhibits anti-tumor activity by targeting fto/m6a/myc/cebpa signaling
.
Cell
2018
;
172
:
90
–
105
.
64.
Lin
S
,
Choe
J
,
Peng
D
. et al.
The m6a methyltransferase mettl3 promotes translation in human cancer cells
.
Mol Cell
2016
;
62
:
335
–
45
.
65.
Lawrence
M
,
Gentleman
R
,
Carey
V
.
Rtracklayer: An r package for interfacing with genome browsers
.
Bioinformatics
2009
;
25
:
1841
.
66.
Aaron
R
.
Quinlan and Ira M
.
BEDTools: A flexible suite of utilities for comparing genomic features Bioinformatics
2010
;
26
:
841
–
2
.
.
67.
Lin
S
,
Choe
J
,
Peng
D
. et al.
The m6A methyltransferase METTL3 promotes translation in human cancer cells
.
Mol Cell
2016
;
62
:
335
–
45
.
.
68.
Anthony
M
.
Bolger, Marc Lohse, and Bjoern Usadel
.
Trimmomatic: A flexible trimmer for Illumina sequence data Bioinformatics
2014
;
30
:
2114
–
20
.
.
69.
Andrews
S
.
FastQC: A Quality Control Tool for High Throughput Sequence Data
.
Cambridge, United Kingdom
:
Babraham Bioinformatics, Babraham Institute
,
2010
.
70.
Dobin
A
,
Davis
CA
,
Schlesinger
F
. et al.
STAR: Ultrafast universal RNA-seq aligner
.
Bioinformatics
2013
;
29
:
15
–
21
.
.
71.
Li
H
,
Handsaker
B
,
Wysoker
A
. et al.
The sequence alignment/map format and SAMtools
.
Bioinformatics
2009
;
25
:
2078
–
9
.
.
72.
Zhang
Y
,
Liu
T
,
Meyer
CA
. et al.
Model-based analysis of ChIP-Seq (MACS)
.
Genome Biol
2008
;
9
:
1
–
9
.
73.
Luo
Z
,
Zhang
J
,
Fei
J
. et al.
Deep learning modeling m6A deposition reveals the importance of downstream cis-element sequences
.
Nat Commun
2022
;
13
:
1
–
16
.
.
74.
Qiu
C
,
Yu
F
,
Deng
H-W
. et al.
Clinical epigenetics and epigenomics
.
Transl Bioinf Appl Clinical Bioinf
2016
;
269
–
93
.
.
75.
Trombetta
D
,
Sparaneo
A
,
Fabrizio
FP
. et al.
NRG1 and NRG2 fusions in non-small cell lung cancer (NSCLC): Seven years between lights and shadows
.
Expert Opin Ther Targets
2021
;
25
:
865
–
75
.
.
76.
David
SN
,
Arnold Egloff
SA
,
Goyal
R
. et al.
MAGI2 is an independent predictor of biochemical recurrence in prostate cancer
.
Prostate
2018
;
78
:
616
–
22
.
.
77.
Conomos
D
,
Reddel
RR
,
Pickett
HA
.
NuRD–ZNF827 recruitment to telomeres creates a molecular scaffold for homologous recombination
.
Nat Struct Mol Biol
2014
;
21
:
760
–
70
.
.
78.
Liang
H
,
Zhong
Y
,
Yu
H
. et al.
Type 1 receptor parathyroid hormone (PTH1R) influences breast cancer cell proliferation and apoptosis induced by high levels of glucose
.
Med Oncol
2012
;
29
:
439
–
45
.
.
79.
Guo
Z
,
Shafik
AM
,
Jin
P
. et al.
Detecting m6a methylation regions from methylated rna immunoprecipitation sequencing
.
Bioinformatics
2021
;
37
:
2818
–
24
.
80.
Acera Mateos
P
,
Sethi
AJ
,
Ravindran
A
. et al.
Prediction of m6a and m5c at single-molecule resolution reveals a transcriptome-wide co-occurrence of rna modifications
.
Nat Commun
2024
;
15
:
3899
.
81.
Ge
R
,
Ye
C
,
Peng
Y
. et al.
m6A-SAC-seq for quantitative whole transcriptome m6A profiling
.
Nat Protoc
2023
;
18
:
626
–
57
.
.
© The Author(s) 2024. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.