Abstract

Gene set analysis (GSA) is one of the methods of choice for analyzing the results of current omics studies; however, it has been mainly developed to analyze mRNA (microarray, RNA-Seq) data. The following review includes an update regarding general methods and resources for GSA and then emphasizes GSA methods and tools for non-mRNA omics datasets, specifically genomic range data (ChIP-Seq, SNP and methylation) and ncRNA data (miRNAs, lncRNAs and others). In the end, the state of the GSA field for non-mRNA datasets is discussed, and some current challenges and trends are highlighted, especially the use of network approaches to face complexity issues.

Introduction

In brief, the goal of all gene set analysis (GSA) methods is to take some experimental results [a list or a rank of differentially expressed (DE) genes, for example] and compare them to a database with some annotation that can increase our knowledge of the phenomenon under study. In other words, the input of a GSA is as follows: a query gene list or rank; a pathway, gene set or annotated ontology database; and a statistical procedure to find significant overlap, enrichment, de-regulation or interaction, between the two datasets. The output is usually a rank of significant pathways, gene sets or ontology terms, with their associated statistical significance.

There are multiple statistical methods, tools and web platforms to perform GSA, as well as numerous reviews of the field. We have collected references to 307 papers introducing GSA methods or tools and built a `GSA Reference Database’ with them (see Supplementary Material for the version of the database used in this manuscript and (1) for the current version). Table 1 shows the most popular methods during the last year, according to their citation number. We have also collected 57 papers with GSA reviews or method comparison studies. By far, the most popular GSA review was written in 2008, when many of the currently used GSA software and methods did not exist. Besides that, we found a significantly smaller number of method, software and review papers for most non-mRNA omics datasets, which is the primary motivation of the present study.

Table 1

Top 25 most cited GSA papers during the last year.

RankMethodYear, authorTitleCitations
1GSEA2005, SubramanianGene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles2476
2GOseq2010, YoungGene ontology analysis for RNA-seq: accounting for selection bias524
3GSEA2003, MoothaPGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes438
4Enrichr2016, KuleshovEnrichr: a comprehensive gene set enrichment analysis web server 2016 update434
5DAVID2003, DennisDAVID: Database for Annotation, Visualization, and Integrated Discovery430
6ClueGO2009, BindeaClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks409
7Enrichr2013, ChenEnrichr: interactive and collaborative HTML5 gene list enrichment analysis tool364
8GOrilla2009, EdenGOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists330
9KOBAS2011, XieKOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases295
10BiNGO2005, MaereBiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks286
11WEGO2006, YeWEGO: a web tool for plotting GO annotations248
12ToppGene2009, ChenToppGene Suite for gene list enrichment analysis and candidate gene prioritization243
13KOBAS2005, MaoAutomated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary234
14agriGO2010, DuagriGO: a GO analysis toolkit for the agricultural community197
15WebGestalt2013, WangWEB-based gene set analysis toolkit (WebGestalt): update 2013190
16DAVID2007, HuangThe DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists187
17GSVA2013, HanzelmannGSVA: gene set variation analysis for microarray and RNA-seq data173
18SSGSEA2009, BarbieSystematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1165
19GOstats2006, FalconUsing GOstats to test gene lists for GO term association150
20DAVID2007, HuangDAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists147
21FunRich2015, PathanFunRich: An open access standalone functional enrichment and interaction network analysis tool142
22topGO2006, AlexaImproved scoring of functional groups from gene expression data by decorrelating GO graph structure130
23agriGO2017, TianagriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update126
24WebGestalt2017, WangWebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit119
25WebGestalt2005, ZhangWebGestalt: an integrated system for exploring gene sets in various biological contexts116
RankMethodYear, authorTitleCitations
1GSEA2005, SubramanianGene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles2476
2GOseq2010, YoungGene ontology analysis for RNA-seq: accounting for selection bias524
3GSEA2003, MoothaPGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes438
4Enrichr2016, KuleshovEnrichr: a comprehensive gene set enrichment analysis web server 2016 update434
5DAVID2003, DennisDAVID: Database for Annotation, Visualization, and Integrated Discovery430
6ClueGO2009, BindeaClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks409
7Enrichr2013, ChenEnrichr: interactive and collaborative HTML5 gene list enrichment analysis tool364
8GOrilla2009, EdenGOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists330
9KOBAS2011, XieKOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases295
10BiNGO2005, MaereBiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks286
11WEGO2006, YeWEGO: a web tool for plotting GO annotations248
12ToppGene2009, ChenToppGene Suite for gene list enrichment analysis and candidate gene prioritization243
13KOBAS2005, MaoAutomated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary234
14agriGO2010, DuagriGO: a GO analysis toolkit for the agricultural community197
15WebGestalt2013, WangWEB-based gene set analysis toolkit (WebGestalt): update 2013190
16DAVID2007, HuangThe DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists187
17GSVA2013, HanzelmannGSVA: gene set variation analysis for microarray and RNA-seq data173
18SSGSEA2009, BarbieSystematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1165
19GOstats2006, FalconUsing GOstats to test gene lists for GO term association150
20DAVID2007, HuangDAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists147
21FunRich2015, PathanFunRich: An open access standalone functional enrichment and interaction network analysis tool142
22topGO2006, AlexaImproved scoring of functional groups from gene expression data by decorrelating GO graph structure130
23agriGO2017, TianagriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update126
24WebGestalt2017, WangWebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit119
25WebGestalt2005, ZhangWebGestalt: an integrated system for exploring gene sets in various biological contexts116

Top 25 most cited GSA methods or tools papers in the GSA field from May 2018 to April 2019, according to google scholar citations. The total number of GSA papers included in the analysis was 307 (for the full dataset, see: https://gsa-central.github.io/gsarefdb.html).

Table 1

Top 25 most cited GSA papers during the last year.

RankMethodYear, authorTitleCitations
1GSEA2005, SubramanianGene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles2476
2GOseq2010, YoungGene ontology analysis for RNA-seq: accounting for selection bias524
3GSEA2003, MoothaPGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes438
4Enrichr2016, KuleshovEnrichr: a comprehensive gene set enrichment analysis web server 2016 update434
5DAVID2003, DennisDAVID: Database for Annotation, Visualization, and Integrated Discovery430
6ClueGO2009, BindeaClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks409
7Enrichr2013, ChenEnrichr: interactive and collaborative HTML5 gene list enrichment analysis tool364
8GOrilla2009, EdenGOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists330
9KOBAS2011, XieKOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases295
10BiNGO2005, MaereBiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks286
11WEGO2006, YeWEGO: a web tool for plotting GO annotations248
12ToppGene2009, ChenToppGene Suite for gene list enrichment analysis and candidate gene prioritization243
13KOBAS2005, MaoAutomated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary234
14agriGO2010, DuagriGO: a GO analysis toolkit for the agricultural community197
15WebGestalt2013, WangWEB-based gene set analysis toolkit (WebGestalt): update 2013190
16DAVID2007, HuangThe DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists187
17GSVA2013, HanzelmannGSVA: gene set variation analysis for microarray and RNA-seq data173
18SSGSEA2009, BarbieSystematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1165
19GOstats2006, FalconUsing GOstats to test gene lists for GO term association150
20DAVID2007, HuangDAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists147
21FunRich2015, PathanFunRich: An open access standalone functional enrichment and interaction network analysis tool142
22topGO2006, AlexaImproved scoring of functional groups from gene expression data by decorrelating GO graph structure130
23agriGO2017, TianagriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update126
24WebGestalt2017, WangWebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit119
25WebGestalt2005, ZhangWebGestalt: an integrated system for exploring gene sets in various biological contexts116
RankMethodYear, authorTitleCitations
1GSEA2005, SubramanianGene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles2476
2GOseq2010, YoungGene ontology analysis for RNA-seq: accounting for selection bias524
3GSEA2003, MoothaPGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes438
4Enrichr2016, KuleshovEnrichr: a comprehensive gene set enrichment analysis web server 2016 update434
5DAVID2003, DennisDAVID: Database for Annotation, Visualization, and Integrated Discovery430
6ClueGO2009, BindeaClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks409
7Enrichr2013, ChenEnrichr: interactive and collaborative HTML5 gene list enrichment analysis tool364
8GOrilla2009, EdenGOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists330
9KOBAS2011, XieKOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases295
10BiNGO2005, MaereBiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks286
11WEGO2006, YeWEGO: a web tool for plotting GO annotations248
12ToppGene2009, ChenToppGene Suite for gene list enrichment analysis and candidate gene prioritization243
13KOBAS2005, MaoAutomated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary234
14agriGO2010, DuagriGO: a GO analysis toolkit for the agricultural community197
15WebGestalt2013, WangWEB-based gene set analysis toolkit (WebGestalt): update 2013190
16DAVID2007, HuangThe DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists187
17GSVA2013, HanzelmannGSVA: gene set variation analysis for microarray and RNA-seq data173
18SSGSEA2009, BarbieSystematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1165
19GOstats2006, FalconUsing GOstats to test gene lists for GO term association150
20DAVID2007, HuangDAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists147
21FunRich2015, PathanFunRich: An open access standalone functional enrichment and interaction network analysis tool142
22topGO2006, AlexaImproved scoring of functional groups from gene expression data by decorrelating GO graph structure130
23agriGO2017, TianagriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update126
24WebGestalt2017, WangWebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit119
25WebGestalt2005, ZhangWebGestalt: an integrated system for exploring gene sets in various biological contexts116

Top 25 most cited GSA methods or tools papers in the GSA field from May 2018 to April 2019, according to google scholar citations. The total number of GSA papers included in the analysis was 307 (for the full dataset, see: https://gsa-central.github.io/gsarefdb.html).

Genomic range data (from ChIP-x, SNP and methylation experiments) and ncRNA data (from miRNA, lncRNA and circRNA experiments) are the end products of a considerable portion of current omics experiments. However, after reviewing 53 papers with GSA methods and software tools applied to genomic range data and 28 papers with applications to ncRNA data, we found that there are fewer non-mRNA GSA methods and tools than their mRNA GSA counterparts, and such methods are also comparatively less cited (see Supplementary Material). Besides that, we found that, in most cases, such non-mRNA GSA methods are based on the most traditional GSA strategies (such as hypergeometric and GSEA) instead of the most recent developments. All of this shows how, despite its relevance, GSA for non-mRNA data has been less discussed and developed.

After a brief update on current mRNA-based methods and tools, we will review the current GSA approaches for genomic range and ncRNA data. We aim to categorize the existing methods according to their core approaches, as well as to describe them. Also, we will introduce a measure of each method’s popularity (in terms of citations) together with recommendations coming from benchmark studies (when available), so the reader may get a general idea of the methods that have been utilized and recommended. Finally, we will briefly discuss their limitations and some research trends.

GSA: an update

Khatri et al. (2) have reviewed many GSA tools/methods and classified them into three categories (or generations): over-representation analysis (ORA), functional class scoring (FCS) and pathway-topology-based (PT) methods. Such a classification has been followed by subsequent reviews, although alternative classification schemes can also be found (3–5). In recent years, network interaction (NI) and other types of methods have emerged and, consequently, they need to be added to the classifications mentioned above.

ORA evaluates the significance of the overlap between a query gene list and a reference dataset using a statistical distribution such as the hypergeometric, chi-squared or binomial distributions. A typical ORA workflow includes choosing up-regulated and/or down-regulated genes, choosing a pathway or gene set database, selecting a background, testing the query set against every pathway or gene set using the above-mentioned statistical tests, ranking the results according to p-values and applying multiple testing correction. The DAVID Functional Annotation Tool (6) is the most popular ORA tool in use, as it has a friendly website with multiple gene sets available, multiple test correction and ID conversion tools incorporated. Other popular ORA approaches are GOseq (7) and topGO (8) (see Table 1). However, numerous papers have pointed out a few important limitations of the ORA approach. First, it cannot be applied if there are no DE genes. Second, it will offer different results depending on the threshold used to select DE genes. Third, it does assume statistical independence between genes, which is not realistic due to the existence of gene–gene correlations. Fourth, it does assume that pathways are independent of each other and do not overlap.

FCS is based on the idea that thresholds for up- and down-regulated genes are arbitrary, and that not only large changes in gene expression may have significant effects on a pathway but also the contributions of multiple genes more modestly DE. Therefore, this approach uses all the genes ranked according to a measure of expression. The FCS approach has been summarized in three steps. The first step is to compute a specific statistic for each gene (such as signal-to-noise ratio, t-test and others). The second step is to combine all gene statistics into a gene set statistic (such as Kolmogorov–Smirnov, Wilcoxon sum rank or max–mean). The third and last step is to assess the significance of the gene set statistic (2, 9). Alternatively, some methods use multivariate statistics, while some others aggregate the values of gene univariate statistics. The first FCS method was Gene Set Enrichment Analysis (GSEA), which is still the most popular GSA method. Other FCS currently popular methods include GAGE (10) and globaltest (11). A 2013 comparison study (12) reported the best-performing FCS methods, including globaltest and PADOG (13). It has also been reported that the choice of gene-level statistics has a minimal effect on the results (2, 4), while the most critical component of an FCS method is the choice of the pathway or gene set-level statistic (14). Same as with ORA methods, FCS methods do not take into account pathway topological information.

PT methods work on the assumption that the specific organization of the reactions or interactions inside a pathway contains additional information that is useful to understand function. The most popular PT methods are Pathway-Express (15) and SPIA (16). Pathway-Express is a tool that works by computing two factors: a `gene perturbation factor’, which takes into account the fold change of a gene and the fold change of the genes upstream, and a `pathway impact factor’, which accounts for the gene perturbation factors of all the genes in a pathway. SPIA combines two types of factors: the over-representation of DE genes (using the hypergeometric approach) and a perturbation factor similar to the one in Pathway-Express, which includes the normalized expression change of a gene and the sum of perturbation factors of the genes directly upstream of the target gene, normalized by the number of downstream genes. Later developments have reported that many pathways can considerably affect each other's p-values through cross-talk both in ORA, FCS and PT methods, and the consequence is that different sets of significant pathways are found after a cross-talk correction (17). A new generation of PT methods, such as PathNet (18), PANA (19), PET (20) and SPATIAL (21), acknowledge the existence of pathway cross-talk. Bayerlova et al. (22) examined one ORA method (Fisher's), two FCS methods (Wilcoxon rank sum and Kolmogorov–Smirnov) and four PT methods (SPIA, CePa ORA, CePa FCS and PathNet) and concluded that PT-based methods only outperform FCS methods when there are no overlapping pathways, with PathNet being the best performer. A final group of related methods tries to find activated sub-pathways. They include SubpathwayMiner (23), sub-pathway-based approach (24), TEAK (25) and DEsubs (26). PT GSA has its own set of limitations. First, it is only for pathways (not applicable to general gene sets); second, it is restricted to pathways with known topology, and such information is only available for some organisms and common pathways; and third, pathway topology may change from cell type to cell type, and such information is scarce (2).

The most recent GSA group of methods addresses the problem of query genes that show no overlap with the gene set but interact with the gene set members through different types of relationships. This way, it replaces enrichment (or significant overlap) with significant interactions between the two sets of genes when located on a protein interaction network or a functional annotation network. Such significant interactions (also called `cross-talk’ in the literature) occur if the number of links between query and gene set nodes in the network is larger than the number of links expected by chance alone. Network Interaction (NI) methods are an interesting alternative because databases are highly incomplete; therefore, looking after interactions between gene sets in a network context can increase the GSA sensitivity. Examples of NI software are NEA (27), EnrichNet (28), CrossTalkZ (29), NetPEA (30) and BinoX (31). BinoX, for example, finds the number of observed links between a query list and a pathway, and then randomizes a functional annotation network a certain number of times and generates a distribution that can be used to estimate an empirical p-value, which is then corrected by multiple hypothesis testing. However, BinoX shares two limitations with ORA: First, gene lists are still generated using a subjective threshold value; and second, pathway topology is not taken into account.

Other types of GSA methods include Bayesian methods to join multiple gene sets, such as MAPE_I (32) and MetaOmics/MetaPath (33); dynamic methods for time-series data, such as STEM (34), TcGSA (35), timeClip (36), FUNNEL-GSEA (37) and Phantom (38); and single-sample (SS) methods. SS methods deal with sample-specific enrichment statistics, to find subject-specific (or patient-specific) responses. Examples include SSGSEA (39), PLAGE (40), PARADIGM (41), Pathifier (42) and GSVA (43). PARADIGM, for example, doesn't output a list of enriched pathways but a score for each sample-pathway pair. Such sample-pathway matrix can be subject to cluster analysis to group samples according to regulated pathways.

The two most popular ORA and FCS tools (DAVID and GSEA, respectively) are available on their websites (GSEA, for download). WebGestalt (44), Enrichr (45) and EnrichNet (28) are other websites with increasing popularity (see Table 1). There are also several species-specific websites (see some examples of plants, fungi, algae and prokaryotes, at the Supplementary Material and (1)) and even cross-species GSA websites (46). Other widely used tools are available as Cytoscape apps, with the most used being BiNGO (47) and ClueGO (48) (see Table 1). However, most GSA software exists as R packages. One R package (EGSEA) (49) gives us an ensemble of 12 methods, while another R package called ToPASeq (50) includes 7 PT methods. The GSVA package (43) consists of the GSVA, PLAGE, z-score and SSGSEA methods. Both DAVID and GSEA offer R packages as well. A complete list of published tools can be found at the Supplementary Material and reference (1). Figure 1 summarizes the GSA process.

GSA in a nutshell. The goal of all GSA methods is to interpret some experimental results (DE genes, for example) by comparing them to a database with biological annotation that can increase our knowledge of the phenomenon under study. (a) The type of questions that give origin to the main GSA approaches. (b) A flow diagram of the general GSA process.
Figure 1

GSA in a nutshell. The goal of all GSA methods is to interpret some experimental results (DE genes, for example) by comparing them to a database with biological annotation that can increase our knowledge of the phenomenon under study. (a) The type of questions that give origin to the main GSA approaches. (b) A flow diagram of the general GSA process.

A review of GSA tools for non-mRNA datasets

Several tools have been developed to extend GSA from expression enrichment to other omics domains (Figure 2a).

GSA for non-mRNA datasets. (a) The Past (a historical overview of the main GSA methods for non-mRNA datasets): the figure includes the main methods for both genomic range GSA and ncRNA GSA published until 2016. (b) The Future (network approaches to GSA): for genomic data, networks can be used either as links between chromatin regions related to transcription or as linkage disequilibrium clusters, which may redefine the mapping from peaks or SNPs to genes (non-depicted). For ncRNA data, multipartite ncRNA–mRNA correlation networks in tandem with community detection algorithms may become an avenue to understand the different correlation structures between ncRNAs and mRNAs and choose the right gene sets for GSA. Depicted: DE genes (inside a diamond) generate communities on an imaginary network, which may be used as gene sets for integrative GSA of RNA data.
Figure 2

GSA for non-mRNA datasets. (a) The Past (a historical overview of the main GSA methods for non-mRNA datasets): the figure includes the main methods for both genomic range GSA and ncRNA GSA published until 2016. (b) The Future (network approaches to GSA): for genomic data, networks can be used either as links between chromatin regions related to transcription or as linkage disequilibrium clusters, which may redefine the mapping from peaks or SNPs to genes (non-depicted). For ncRNA data, multipartite ncRNA–mRNA correlation networks in tandem with community detection algorithms may become an avenue to understand the different correlation structures between ncRNAs and mRNAs and choose the right gene sets for GSA. Depicted: DE genes (inside a diamond) generate communities on an imaginary network, which may be used as gene sets for integrative GSA of RNA data.

Table 2

Top 15 most cited GSA papers applied to genomic range data.

RankMethodYear, authorTitledoiCitations
1GREAT2010, McLeanGREAT improves functional interpretation of cis-regulatory regions10.1038/nbt.16301743
2-2007, WangPathway-based approaches for analysis of genomewide association studies10.1086/522374824
3MAGMA2015, de LeeuwMAGMA: Generalized gene-set analysis of GWAS data10.1371/journal.pcbi.1004219402
4MAGENTA2010, SegreCommon inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits10.1371/journal.pgen.1001058372
5ALIGATOR2009, HolmansGene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder10.1016/j.ajhg.2009.05.011366
6GenGen2009, WangDiverse genome-wide association studies associate the IL12/IL23 pathway with Crohn Disease10.1016/j.ajhg.2009.01.026266
7-2009, YuPathway analysis by adaptive combination of P-values10.1002/gepi.20422259
8-2010, ZhongIntegrating pathway analysis and genetics of gene expression for genome-wide association studies10.1016/j.ajhg.2010.02.020246
9-2010, PengGene and pathway-based second-wave analysis of genome-wide association studies10.1038/ejhg.2009.115233
10-2007, GaudermanTesting association between disease and multiple SNPs in a candidate gene10.1002/gepi.20219201
11INRICH2012, LeeINRICH: Interval-based enrichment analysis for genome-wide association studies10.1093/bioinformatics/bts191167
12iGSEA4GWAS2010, ZhangI-GSEA4GWAS: A web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study10.1093/nar/gkq324165
13GSEA-SNP2008, HoldenGSEA-SNP: Applying gene set 0enrichment analysis to SNP data from genome-wide association studies10.1093/bioinformatics/btn516163
14GRASS2010, ChenInsights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data10.1016/j.ajhg.2010.04.014147
15SNP Ratio2009, O’DushlaineThe SNP ratio test: pathway analysis of genome-wide association datasets10.1093/bioinformatics/btp448139
RankMethodYear, authorTitledoiCitations
1GREAT2010, McLeanGREAT improves functional interpretation of cis-regulatory regions10.1038/nbt.16301743
2-2007, WangPathway-based approaches for analysis of genomewide association studies10.1086/522374824
3MAGMA2015, de LeeuwMAGMA: Generalized gene-set analysis of GWAS data10.1371/journal.pcbi.1004219402
4MAGENTA2010, SegreCommon inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits10.1371/journal.pgen.1001058372
5ALIGATOR2009, HolmansGene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder10.1016/j.ajhg.2009.05.011366
6GenGen2009, WangDiverse genome-wide association studies associate the IL12/IL23 pathway with Crohn Disease10.1016/j.ajhg.2009.01.026266
7-2009, YuPathway analysis by adaptive combination of P-values10.1002/gepi.20422259
8-2010, ZhongIntegrating pathway analysis and genetics of gene expression for genome-wide association studies10.1016/j.ajhg.2010.02.020246
9-2010, PengGene and pathway-based second-wave analysis of genome-wide association studies10.1038/ejhg.2009.115233
10-2007, GaudermanTesting association between disease and multiple SNPs in a candidate gene10.1002/gepi.20219201
11INRICH2012, LeeINRICH: Interval-based enrichment analysis for genome-wide association studies10.1093/bioinformatics/bts191167
12iGSEA4GWAS2010, ZhangI-GSEA4GWAS: A web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study10.1093/nar/gkq324165
13GSEA-SNP2008, HoldenGSEA-SNP: Applying gene set 0enrichment analysis to SNP data from genome-wide association studies10.1093/bioinformatics/btn516163
14GRASS2010, ChenInsights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data10.1016/j.ajhg.2010.04.014147
15SNP Ratio2009, O’DushlaineThe SNP ratio test: pathway analysis of genome-wide association datasets10.1093/bioinformatics/btp448139

Top 15 most cited papers introducing GSA tools or platforms related to genomic data (ChIP-Seq, SNP and methylation data), written between 2007 and 2018, according to google scholar citations. The total number of papers included in the category was 53 (for the full dataset, see: https://gsa-central.github.io/gsarefdb.html).

Table 2

Top 15 most cited GSA papers applied to genomic range data.

RankMethodYear, authorTitledoiCitations
1GREAT2010, McLeanGREAT improves functional interpretation of cis-regulatory regions10.1038/nbt.16301743
2-2007, WangPathway-based approaches for analysis of genomewide association studies10.1086/522374824
3MAGMA2015, de LeeuwMAGMA: Generalized gene-set analysis of GWAS data10.1371/journal.pcbi.1004219402
4MAGENTA2010, SegreCommon inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits10.1371/journal.pgen.1001058372
5ALIGATOR2009, HolmansGene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder10.1016/j.ajhg.2009.05.011366
6GenGen2009, WangDiverse genome-wide association studies associate the IL12/IL23 pathway with Crohn Disease10.1016/j.ajhg.2009.01.026266
7-2009, YuPathway analysis by adaptive combination of P-values10.1002/gepi.20422259
8-2010, ZhongIntegrating pathway analysis and genetics of gene expression for genome-wide association studies10.1016/j.ajhg.2010.02.020246
9-2010, PengGene and pathway-based second-wave analysis of genome-wide association studies10.1038/ejhg.2009.115233
10-2007, GaudermanTesting association between disease and multiple SNPs in a candidate gene10.1002/gepi.20219201
11INRICH2012, LeeINRICH: Interval-based enrichment analysis for genome-wide association studies10.1093/bioinformatics/bts191167
12iGSEA4GWAS2010, ZhangI-GSEA4GWAS: A web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study10.1093/nar/gkq324165
13GSEA-SNP2008, HoldenGSEA-SNP: Applying gene set 0enrichment analysis to SNP data from genome-wide association studies10.1093/bioinformatics/btn516163
14GRASS2010, ChenInsights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data10.1016/j.ajhg.2010.04.014147
15SNP Ratio2009, O’DushlaineThe SNP ratio test: pathway analysis of genome-wide association datasets10.1093/bioinformatics/btp448139
RankMethodYear, authorTitledoiCitations
1GREAT2010, McLeanGREAT improves functional interpretation of cis-regulatory regions10.1038/nbt.16301743
2-2007, WangPathway-based approaches for analysis of genomewide association studies10.1086/522374824
3MAGMA2015, de LeeuwMAGMA: Generalized gene-set analysis of GWAS data10.1371/journal.pcbi.1004219402
4MAGENTA2010, SegreCommon inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits10.1371/journal.pgen.1001058372
5ALIGATOR2009, HolmansGene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder10.1016/j.ajhg.2009.05.011366
6GenGen2009, WangDiverse genome-wide association studies associate the IL12/IL23 pathway with Crohn Disease10.1016/j.ajhg.2009.01.026266
7-2009, YuPathway analysis by adaptive combination of P-values10.1002/gepi.20422259
8-2010, ZhongIntegrating pathway analysis and genetics of gene expression for genome-wide association studies10.1016/j.ajhg.2010.02.020246
9-2010, PengGene and pathway-based second-wave analysis of genome-wide association studies10.1038/ejhg.2009.115233
10-2007, GaudermanTesting association between disease and multiple SNPs in a candidate gene10.1002/gepi.20219201
11INRICH2012, LeeINRICH: Interval-based enrichment analysis for genome-wide association studies10.1093/bioinformatics/bts191167
12iGSEA4GWAS2010, ZhangI-GSEA4GWAS: A web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study10.1093/nar/gkq324165
13GSEA-SNP2008, HoldenGSEA-SNP: Applying gene set 0enrichment analysis to SNP data from genome-wide association studies10.1093/bioinformatics/btn516163
14GRASS2010, ChenInsights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data10.1016/j.ajhg.2010.04.014147
15SNP Ratio2009, O’DushlaineThe SNP ratio test: pathway analysis of genome-wide association datasets10.1093/bioinformatics/btp448139

Top 15 most cited papers introducing GSA tools or platforms related to genomic data (ChIP-Seq, SNP and methylation data), written between 2007 and 2018, according to google scholar citations. The total number of papers included in the category was 53 (for the full dataset, see: https://gsa-central.github.io/gsarefdb.html).

GSA of genomic ranges data

ChIP-x data

GSA has been applied to the functional interpretation of enrichment on DNA/chromatin binding or modifications, that is, ChIP-Seq, ChIP-exo, ChIP-chip, DamID, ChIP-PET and similar datasets (collectively called ChIP-x methods by some authors), but especially to ChIP-seq data, which provides information of transcription factor binding data, histone mark data or histone variant data. In essence, such `genomic GSA tools’ associate binding or modification on a genomic region to an annotated coding gene (mapping) and then perform GSA for that gene.

The most intuitive approach to the mapping problem is the `Nearest Gene Approach’, that is, to associate each genomic range to its nearest gene. Enrichr (45), for example, includes the option to upload either gene lists or bed files; in the second case, the genomic coordinates are mapped to the nearest coding gene. A second approach is to use a `window’ around each gene: GREAT (51) is a web tool that defines a `regulatory domain’ for each gene, building a `basal domain’ of 5 kb upstream and 1 kb downstream of the TSS and an `extended domain’ up to the basal regulatory domain of the nearest genes within 1 Mb. Then, it uses a binomial test (ORA) for the enrichment of each regulatory domain, to find out if the total number of loci within the domain is higher than expected.

ChIP-Enrich (52) is another window-based method, which uses a logistic regression approach and empirically adjusts gene locus length (length of the gene body and surrounding non-coding sequence). In summary, the method starts by assigning peaks to genes: either assign each peak to the nearest gene (either nearest TSS or nearest TES) or to the gene with the nearest TSS. Then, it does perform GSA using a logistic regression model (the variable is a binary vector: 1 if there are peaks assigned to the locus, 0 if no peaks are assigned) adjusted for locus length. ChIP-Enrich can be found both as a website and as an R package. Broad-Enrich (53) is a version of ChIP-Enrich for broad peaks (such as in some histone marks).

Other tools include CompGO and Seq2Pathway. CompGO (54) is a tool limited to coding regions and GO analysis, which uses ORA from the DAVID platform followed by the log odds ratio to determine GO enrichment. Seq2Pathway (55) is a window-based method that links sequences to genes in a radius of 100 kb; then, applies the FAIME method (Functional Analysis of Individual Microarray/RNA-Seq Expression) to detect pathways. CompGO and Seq2Pathway can be found as R packages as well.

A Venn diagram comparison of methods, performed by the authors of Seq2Pathway, states that ChIP-Enrich and Seq2Pathway make considerably more predictions than GREAT and share many predictions not found by GREAT. However, GREAT is still the most popular tool in the field (Table 2). We have calculated that an `agreement set’ built with all the results coming from at least two out of the three methods in the above-mentioned comparison, contains around 45% of the hits (the other 55% are cases found by only one method), which is a call for rigorous comparison studies of such tools. To our knowledge, there are no comparisons of the entire group of methods against benchmark data.

A group of related methods focuses on the enrichment of a particular group of genes inside the gene list generated from the ChIP-Seq track. TF2LncRNA (56) is a web tool that maps TF peaks to lncRNA's genomic coordinates and then computes if the TF has a statistically significant number of peaks within a window around the lncRNA gene, calculated using the hypergeometric test. Another method, focused on finding enrichment of miRNA targets regulated by a given TF, is mBISON (57). mBISON receives up to three ChIP-Seq datasets and maps peaks to genes choosing only the genes reported at least two times. miRNA targets are identified from computational predictions, and enrichment of miRNA targets in the gene list is then computed.

One of the problems with the current DNA-binding enrichment tools is that they make little or no use of the chromatin interactions that govern the 3D structure of the genome, such as topologically associated domains (TADs) and long-range interactions that link regulatory regions not necessarily to the closest gene. Such information should play a more critical role in the future (58).

SNP data

For SNP data, the main problem is to map SNPs (instead of peaks) to the corresponding genes. Different from ChIP-Seq GSA, SNP GSA must take into account Linkage Disequilibrium or LD (when alleles at two different loci are inherited together more often than it would be expected by chance).

Similar to ChIP-Seq, the mapping of SNPs to genes has been frequently done by either discarding SNPs in non-coding regions, mapping all SNPs to the nearest gene or, more generally, assigning the SNP to a gene if the SNP falls inside a given window around the gene. ALIGATOR, for example, is a method that assigns significant SNPs to genes when the significant SNPs fall within 20 kbp before the start of the first exon and after the end of the last exon, counting each gene only once no matter the number of SNPs (59). Other authors recommend using a 20–50 kbp window (60). Besides being arbitrary, such windows may bring the problem of overlapping regions that make SNPs belong to more than one gene and, therefore, additional methods are needed to deal with the multiple counting problem.

After the mapping step, different GSA methods have been used. A simple Fisher's exact test under the `competitive hypothesis’ would include four values: the number of significant and non-significant SNPs in a gene set and the number of significant and non-significant SNPs outside that gene set. The same Fisher's test under the `self-contained hypothesis’ would also include four values: the number of observed significant and non-significant SNPs in a gene set and the number of expected significant and non-significant SNPs in that gene set. Such tests have been implemented in methods such as GLOSSI and software such as GeSBAP (60). GeSBAP selects the p-value of the SNP with the minimum p-value per gene and then applies Fisher's exact test to the gene lists (61). The minimum p-value approach has been criticized because a larger gene is more likely to contain smaller p-values; therefore, alternatives have appeared either (a) summarizing all SNPs in a gene or (b) modeling the effects on the phenotype of all the SNPs in the gene. For the summarizing approach, methods such as Fisher’s method for combining p-values, the Gamma method, and the adaptive rank truncated product method have been proposed (62). Peng et al. (63), for example, start by combining the p-values of the SNPs in a gene (using Fisher's combination test and others) into a significance level for the gene, and then combining the genes in a pathway into a p-value for the pathway (using the hypergeometric test and others). Examples of joint modeling (using linear or logistic regression) will be discussed later.

A similar evolution of GSA methods (from minimum P-values to summary P-values) occurred to FCS methods as well. For example, Wang et al. (64) published a method using the SNP with the minimum P-value at the gene level, followed by GSA using the Kolmogorov–Smirnov test. Such strategy was modified by software such as GSEA-SNP (which introduced the max-test statistics and all the SNPs in a gene) (65) and GSA-SNP (which allows GSA using the Z-score statistic, maxmean statistic or GSEA) (66, 67). Also SSEA (68), which is a method that uses the adaptive truncated product statistic to identify all representative SNPs of each gene, rank such SNPs according to the significance of their association to a given trait and performs GSA using a weighted Kolmogorov–Smirnov test.

More sophisticated methods include principal component regressions and permutation of sample labels to control size and other confounding effects. ALIGATOR, for example, assigns significant SNPs to genes as explained above, maps each gene to GO terms and assesses the significance of each GO term by simulations, to correct for LD and variable gene size (59). Regression-based methods include GRASS, MAGMA, PCgamma, PAGWAS, SGL-BCGD and SPCA (62). For each gene set, GRASS starts by using PCA to define a group of `eigenSNPs’ that summarize the SNPs in a gene and account for correlations due to LD. Then, it finds representative eigenSNPs per gene and evaluates their joint association with phenotype (disease outcome), using group ridge logistic regression between the eigenSNP matrix and the phenotype. Phenotype permutations are also performed to compute observed and null association statistics, which lead to a P-value (69). SPCA, on the other hand, maps SNPs to gene sets, then selects the subset of SNPs most associated to phenotype, estimates a `latent variable’ using PCA and, finally, estimates the gene sets (latent variables) associated with phenotype using a linear model (70). MAGMA starts getting the principal components of the SNPs and performing multiple linear regression of principal components, then transforming the gene's P-values into z-values and building a second linear regression model for GSA under either the self-contained or the competitive hypotheses (71). MAGENTA is a method that assigns SNPs to genes according to a window and chooses the most significant SNP/P-value per gene. Then, the gene P-values are corrected for six specific confounding effects without using permutation analysis but applying a step-wise multiple linear regression to the z-scores obtained from the P-values. Such a regression method starts regressing out just the most significant confounding variable, and then adds the next significant variable, one by one, evaluating each time if the added variable should be kept in the model. After selecting only the P-values smaller than a specific cutoff, a GSEA-like method is applied (72).

Other interesting approaches, such as MRPEA (Mendelian randomization-based pathway enrichment analysis), do correct for the effect of environmental exposures, using both GWAS data from a target disease and GWAS data from an environmental exposure (73). Zhang et al. (74) identified expression-associated SNPs (eSNPs) from two eQTL databases and assessed the association of such SNPs with basal cell carcinoma's GWAS data, before applying a GSEA-like procedure; the authors claim that such an approach improves the detection of relevant pathways.

Contrary to other types of non-mRNA data, several comparison studies have been done between SNP GSA methods. Ballard et al. (75) compared seven methods and suggested that principal component regression is the best of them. Other studies suggest that using a subset of the most significant SNPs, which could be based on either a fixed truncation point or an adaptive threshold, is better than using the best SNP or all the SNPs (76). According to Fridley et al. (60), principal component methods perform better for a small number of markers, while the `truncated Fisher's method’ (using only P-values smaller than a threshold) outperforms PCA for a large number of markers. It has also been suggested that effects of SNPs are smaller than those of gene expression and, due to that, methods such as GSEA, Fisher's exact test and others lack statistical power (77). Finally, Leeuw et al. (78) have reported that self-contained GSA is not adequate to draw biologically meaningful conclusions, while competitive GSA is vulnerable to LD. Among competitive tests, the authors report that only INRICH and MAGMA display a good statistical performance.

Given the problems of mapping using the nearest gene or a window around the gene, it is interesting to explore alternative approaches that use information of the SNPs in LD, such as ProxyGeneLD (79) and INRICH (80). ProxyGeneLD uses a list of SNPs from a given study and the list of all SNPs from HapMap. The program detects all HapMap SNPs that are in LD to form `proxy clusters’, which are assigned to the nearest gene. A significance level for each gene is defined as the lowest P-value among the single SNPs from the study that are not present in any proxy cluster and those belonging to proxy clusters that include one or more study SNP assigned to that gene. In the end, the authors use GSEA, DAVID and IPA, to perform GSA. INRICH starts identifying the intervals containing both the SNPs and the SNPs in LD. Then merge overlapping intervals and overlapping genes inside a gene set, to avoid multiple counting. INRICH is a method that explicitly suggests its use for either SNPs or any other genomic regions (which they call `Intervals’) such as deletions or duplications. For SNPs, the enrichment statistic of each gene set is the number of intervals that overlap a gene in the gene set; for CNVs (which span large genomic regions), the statistic is the number of genes in the gene set that overlap an interval. In the end, INRICH uses a permutation approach to compute empirical P-values for each gene set.

Methylation data

Several GSA methods were applied to early methylation data to identify differentially methylated gene sets. They included topGO, GOstats and IPA. However, it has been now highlighted that such methods are biased, as genes with more probes (in microarrays) or more CpG sites (in sequencing) are more likely to appear as over-represented (81, 82). Geeleher et al. (7) suggested that the best strategy to solve such technological bias is the one followed by GOseq. GOseq faced the problem of RNA-Seq technologies' bias towards long genes and highly expressed genes, which leads to the fact that gene sets containing either long genes or highly expressed genes are more likely to appear. To attack this bias, they computed the likelihood of differential expression as a function of transcript length and incorporated it into the statistical test. Such incorporation was done by using Wallenius non-central hypergeometric distribution, which is an extension of the hypergeometric distribution to the case where the probabilities of success and failure differ. Such a strategy has been later adapted to methylation data by computing the differential methylation as a function of the number of CpG probes or CpG sites. It has been used in software such as the missMethyl R package (83) whose `gometh’ function is a modification of the GOseq method for the Illumina 450K array, which computes the probability of a gene being differentially methylated given the number of associated CpGs and applies a Wallenius-based test for each GO term or KEGG pathway. An R workflow for methylation array data analysis has recently been published (84). Such workflow also uses missMethyl for GSA.

For the analysis of high- and low-resolution methylomes, including bisulfite sequencing data, an alternative is using the methylPipe and compEpiTools R packages (85). compEpiTools, in particular, may start from the data processed by methylPipe and find both the annotation of the methylated regions and their GO terms, using topGO.

Table 2 shows a summary of the most cited genomic GSA tools.

GSA of ncRNA data

miRNA data

Several strategies have been developed to perform enrichment analysis of miRNA data. The first strategy is starting with a list of miRNAs from our experimental results, finding the mRNA targets of those miRNAs and then treating such targets as if they were DE genes, that is, using a GSA method/tool to find the over-represented pathways.

The second strategy is to use a software that already performs all the previously mentioned steps, such as SigTerms (86), CORNA (87), Gomir (88), miTALOS (89), miRSystem (90) or DIANA-miRPath (91). The DAVID web tool accepts miRNA lists as well (miRbase IDs). SigTerms, for example, includes the target predictions from miRanda (92) and other tools and computes the one-sided Fisher's exact test (ORA) for a set of targets for each miRNA. miTALOS also uses Fisher's exact test. The most advanced available tools add another level of complexity to target identification: as a set of miRNAs can cooperatively regulate a group of functionally related genes instead of a single gene, they search after gene sets enriched in miRNA clusters. These strategies have been reviewed by (93). In any case, target prediction is a crucial step in this approach, given that even a single difference in nucleic acid sequence can affect miRNA-target pairing (93). It is also worth noting that the most commonly used statistical enrichment methods are usually the oldest and simplest. One exception to this is miRNA target enrichment analysis (miTEA) (94), which develops an FCS method called minimum mHG (mmHG) that checks for mutual enrichment in two ranked lists.

There are several ways to find miRNA targets. Recent R packages include: targetscan. Hs.eg.db, RmiR.Hs.miRNA, CROME (http://www.maths.usyd.edu.au/u/vivek/), CORNA (87) and multiMiR (95). A recently published web tool, which allows tissue-specific miRNA-target studies in 23 different tissues, is IMOTA (96). Some other tools, such as multiMiR (95), miRnalyze (97) and miRwayDB (98), annotate miRNAs with associated gene sets but do not include proper enrichment analysis tools. multiMiR includes various sources of predicted and validated miRNA-target interactions, as well as links to three disease- and drug-related DBs: miR2Disease, Pharmaco-miR and PhenomiR. miRnalyze links miRNA-target predictions with KEGG pathway information, while miRwayDB links published experimental information of expression of 232 miRNAs with 122 associated pathways under 76 different disease conditions.

As a third option, instead of using KEGG, MSigDB and similar databases with little or no miRNA information, it is possible to use a database with miRNA annotation and skip the use of targets. One tool for that is TAM (99, 100): Here, miRNAs are stored in categories or `miRNA sets’ according to the miRNA family, genome locations, functions, associated diseases and tissue specificity. Information comes from databases such as miRBase (101) and the Human MicroRNA Disease Database (HMDD) (102), a database for miRNA–disease associations, while tissue specificity was obtained from one paper (102). TAM evaluates over-representation of each miRNA set (257 sets) among a `miRNA list’ using the hypergeometric test. Those miRNA sets are available for download, and therefore they could be used with different methods than the hypergeometric one. A second tool is miSEA (103). Here, most miRNA sets are targets from miRTarBase (104) and miRecords (105), followed by diseases (from HMDD) and transcription factors from TransmiR (106). miSEA uses GSEA instead of ORA (uses ranked lists of miRNA expression) and allows the user to use its own miRNA sets as well. Finally, there is miEAA (107), a web tool that provides more than 14 000 miRNA sets and offers both ORA and GSEA methods. In general, the disadvantage of the database approach is that there aren’t as many miRNA database resources as for protein-coding genes.

The fourth option is to use a platform with a full pipeline for miRNA analysis, such as miARma-Seq (108): this tool offers the identification of miRNA, mRNA and circRNAs, differential expression, miRNA–mRNA target prediction and functional analysis. The target prediction uses multiple algorithms, while the GSA uses the GOseq tool (7), with Gene Ontology (GO) and KEGG databases.

Godard et al. (109) have shown that usual enrichment analysis using miRNA targets and the hypergeometric test is biased for a subset of biological functions such as cell cycle and others. Particularly striking is the bias towards cancer compared to protein-coding genes. The authors suggest two alternative procedures: the first one is to modify the pathways and only keep the genes that are known to have at least one interaction with a miRNA and then search after target enrichment. The second one is to map the pathways to miRNA lists according to miRNA-mRNA interactions and perform enrichment analysis from miRNA query list to miRNA set; this way, a miRNA is only represented once in a pathway independently of the number of its target genes in this pathway. Besides that, the authors show that many pathways share a significant number of miRNAs, which leads to pathway co-identification. To avoid that (and decrease the amount of multiple hypothesis testing), such similar pathways are recommended to be aggregated before the enrichment analysis.

lncRNA data

The main current strategy for enrichment analysis of lncRNA data is to find correlations between lncRNA and mRNA expression data. To do that, we will review two alternatives.

The first option is to start with a lncRNA list and an mRNA list from the same experiment, find all the mRNAs correlated to the lncRNAs and then apply ORA to the found mRNAs (110). The second option would be to start with a lncRNA list and a database of mRNA–lncRNA correlated pairs in our specific tissue and then apply ORA. One resource that supports the second approach is LncRNA2Function (111), which stores expression correlation data between lncRNAs and protein-coding genes across 19 normal human tissues. It does use the hypergeometric test, with GO and 12 pathway databases (in total, 9625 human lncRNA mapped to GO terms and pathways). A significant correlation is defined as the one with an absolute value of the Pearson correlation coefficient of >0.9 and adjusted P-value of <0.05. Another resource is the LncRNAtor platform, which contains 208 RNA-Seq datasets (6295 samples) for six species, with lncRNA–mRNA co-expression data, protein–lncRNA interaction data and results of GSA analysis (GO and KEGG) (112).

Some recent improvements include Co-LncRNA (113) and lncRNAs2Pathways (114). Both of these tools take into account the combinatorial effects of a list of lncRNAs on biological functions. Co-lncRNA is a web tool that contains 241 human RNA-Seq datasets, lncRNA–mRNA co-expression and enrichment using GO and KEGG. Co-lncRNA explores the combinatorial effects of a group of lncRNAs; however, it does analyze the directly co-expressed pairs, while lncRNAs2Pathways explores the effects on the downstream genes in a lncRNA–mRNA network. lncRNAs2Pathways is a method and R package that starts by building a coding-non-coding network, whose edges were either expression correlation or protein–protein interactions. The query set of DE lncRNAs is mapped to the network as source nodes, and a global network propagation algorithm (random walk with restart) computes propagation scores for the coding genes, in order to find the closer coding genes to the lncRNA list. In the end, the procedure generates a protein-coding gene rank according to propagation scores, and such a rank is then subject to GSA by using KEGG pathways and a Kolmogorov–Smirnov-like statistical measure weighted by the propagation scores.

An interesting alternative is Linc2GO (115). Linc2GO is a web tool with functional annotation of long intergenic non-coding RNA (lincRNA), which employs a different approach to protein-coding co-expression. This tool is based on the competing endogenous RNA hypothesis (ceRNA hypothesis), which states that lincRNAs interact directly with miRNAs to prevent them from binding mRNA. Therefore, a lincRNA is predicted to have the same function as an mRNA if both of them interact with the same miRNA. This way, they can build a database of 7202 lincRNAs with GO biological process annotation. As a complement, a recently published database, LncCeRBase (116) collects 432 experimentally verified human lncRNA–miRNA–mRNA interactions.

Table 3

Top 15 most cited GSA papers applied to ncRNA data.

RankMethodYear, authorTitledoiCitations
1miRNApath2008, CogswellIdentification of miRNA changes in Alzheimer's disease brain and CSF yields putative biomarkers and insights into disease pathways10.3233/JAD-2008-14103685
2DIANA miRPath2012, VlachosDIANA miRPath v. 2.0: investigating the combinatorial effect of microRNAs in pathways10.1093/nar/gks494450
3DIANA miRPath2015, VlachosDIANA-miRPath v3. 0: deciphering microRNA function with experimental support10.1093/nar/gkv403418
4DIANA miRPath2009, PapadopoulosDIANA-mirPath: Integrating human and mouse microRNAs in pathways10.1093/bioinformatics/btp299280
5miRSystem2012, LumiRSystem: an integrated system for characterizing enriched functions and pathways of microRNA targets10.1371/journal.pone.0042390160
6SigTerms2008, CreightonA bioinformatics tool for linking gene expression profiling results with public databases of microRNA target10.1261/rna.1188208136
7TAM2010, LuTAM: a method for enrichment and depletion analysis of a microRNA category in a list of microRNAs10.1186/1471-2105-11-419108
8lncRNAtor2014, ParklncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs10.1093/bioinformatics/btu32587
9LncRNA2Function2015, JiangLncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-Seq data10.1186/1471-2164-16-S3-S275
10multiMiR2014, RuThe multiMiR R package and database: integration of microRNA–target interactions along with their disease and drug associations10.1093/nar/gku63173
11LncRNA2Target2015, JiangLncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression10.1093/nar/gku117372
12Linc2GO2013, LiuLinc2GO: a human LincRNA function annotation resource based on ceRNA hypothesis10.1093/bioinformatics/btt36169
13Co-LncRNA2015, ZhaoCo-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data10.1093/database/bav08262
14CORNA2009, WuCORNA: testing gene lists for regulation by microRNAs10.1093/bioinformatics/btp05962
15GOmir2009, RoubelakisHuman microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application10.1186/1471-2105-10-S6-S2055
RankMethodYear, authorTitledoiCitations
1miRNApath2008, CogswellIdentification of miRNA changes in Alzheimer's disease brain and CSF yields putative biomarkers and insights into disease pathways10.3233/JAD-2008-14103685
2DIANA miRPath2012, VlachosDIANA miRPath v. 2.0: investigating the combinatorial effect of microRNAs in pathways10.1093/nar/gks494450
3DIANA miRPath2015, VlachosDIANA-miRPath v3. 0: deciphering microRNA function with experimental support10.1093/nar/gkv403418
4DIANA miRPath2009, PapadopoulosDIANA-mirPath: Integrating human and mouse microRNAs in pathways10.1093/bioinformatics/btp299280
5miRSystem2012, LumiRSystem: an integrated system for characterizing enriched functions and pathways of microRNA targets10.1371/journal.pone.0042390160
6SigTerms2008, CreightonA bioinformatics tool for linking gene expression profiling results with public databases of microRNA target10.1261/rna.1188208136
7TAM2010, LuTAM: a method for enrichment and depletion analysis of a microRNA category in a list of microRNAs10.1186/1471-2105-11-419108
8lncRNAtor2014, ParklncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs10.1093/bioinformatics/btu32587
9LncRNA2Function2015, JiangLncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-Seq data10.1186/1471-2164-16-S3-S275
10multiMiR2014, RuThe multiMiR R package and database: integration of microRNA–target interactions along with their disease and drug associations10.1093/nar/gku63173
11LncRNA2Target2015, JiangLncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression10.1093/nar/gku117372
12Linc2GO2013, LiuLinc2GO: a human LincRNA function annotation resource based on ceRNA hypothesis10.1093/bioinformatics/btt36169
13Co-LncRNA2015, ZhaoCo-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data10.1093/database/bav08262
14CORNA2009, WuCORNA: testing gene lists for regulation by microRNAs10.1093/bioinformatics/btp05962
15GOmir2009, RoubelakisHuman microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application10.1186/1471-2105-10-S6-S2055

Top 15 most cited papers introducing GSA tools or platforms related to ncRNA data (miRNA, lncRNA and circRNA data), written between 2008 and 2018, according to google scholar citations. The total number of papers included in this category was 28 (for the full dataset, see: https://gsa-central.github.io/gsarefdb.html).

Table 3

Top 15 most cited GSA papers applied to ncRNA data.

RankMethodYear, authorTitledoiCitations
1miRNApath2008, CogswellIdentification of miRNA changes in Alzheimer's disease brain and CSF yields putative biomarkers and insights into disease pathways10.3233/JAD-2008-14103685
2DIANA miRPath2012, VlachosDIANA miRPath v. 2.0: investigating the combinatorial effect of microRNAs in pathways10.1093/nar/gks494450
3DIANA miRPath2015, VlachosDIANA-miRPath v3. 0: deciphering microRNA function with experimental support10.1093/nar/gkv403418
4DIANA miRPath2009, PapadopoulosDIANA-mirPath: Integrating human and mouse microRNAs in pathways10.1093/bioinformatics/btp299280
5miRSystem2012, LumiRSystem: an integrated system for characterizing enriched functions and pathways of microRNA targets10.1371/journal.pone.0042390160
6SigTerms2008, CreightonA bioinformatics tool for linking gene expression profiling results with public databases of microRNA target10.1261/rna.1188208136
7TAM2010, LuTAM: a method for enrichment and depletion analysis of a microRNA category in a list of microRNAs10.1186/1471-2105-11-419108
8lncRNAtor2014, ParklncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs10.1093/bioinformatics/btu32587
9LncRNA2Function2015, JiangLncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-Seq data10.1186/1471-2164-16-S3-S275
10multiMiR2014, RuThe multiMiR R package and database: integration of microRNA–target interactions along with their disease and drug associations10.1093/nar/gku63173
11LncRNA2Target2015, JiangLncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression10.1093/nar/gku117372
12Linc2GO2013, LiuLinc2GO: a human LincRNA function annotation resource based on ceRNA hypothesis10.1093/bioinformatics/btt36169
13Co-LncRNA2015, ZhaoCo-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data10.1093/database/bav08262
14CORNA2009, WuCORNA: testing gene lists for regulation by microRNAs10.1093/bioinformatics/btp05962
15GOmir2009, RoubelakisHuman microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application10.1186/1471-2105-10-S6-S2055
RankMethodYear, authorTitledoiCitations
1miRNApath2008, CogswellIdentification of miRNA changes in Alzheimer's disease brain and CSF yields putative biomarkers and insights into disease pathways10.3233/JAD-2008-14103685
2DIANA miRPath2012, VlachosDIANA miRPath v. 2.0: investigating the combinatorial effect of microRNAs in pathways10.1093/nar/gks494450
3DIANA miRPath2015, VlachosDIANA-miRPath v3. 0: deciphering microRNA function with experimental support10.1093/nar/gkv403418
4DIANA miRPath2009, PapadopoulosDIANA-mirPath: Integrating human and mouse microRNAs in pathways10.1093/bioinformatics/btp299280
5miRSystem2012, LumiRSystem: an integrated system for characterizing enriched functions and pathways of microRNA targets10.1371/journal.pone.0042390160
6SigTerms2008, CreightonA bioinformatics tool for linking gene expression profiling results with public databases of microRNA target10.1261/rna.1188208136
7TAM2010, LuTAM: a method for enrichment and depletion analysis of a microRNA category in a list of microRNAs10.1186/1471-2105-11-419108
8lncRNAtor2014, ParklncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs10.1093/bioinformatics/btu32587
9LncRNA2Function2015, JiangLncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-Seq data10.1186/1471-2164-16-S3-S275
10multiMiR2014, RuThe multiMiR R package and database: integration of microRNA–target interactions along with their disease and drug associations10.1093/nar/gku63173
11LncRNA2Target2015, JiangLncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression10.1093/nar/gku117372
12Linc2GO2013, LiuLinc2GO: a human LincRNA function annotation resource based on ceRNA hypothesis10.1093/bioinformatics/btt36169
13Co-LncRNA2015, ZhaoCo-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data10.1093/database/bav08262
14CORNA2009, WuCORNA: testing gene lists for regulation by microRNAs10.1093/bioinformatics/btp05962
15GOmir2009, RoubelakisHuman microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application10.1186/1471-2105-10-S6-S2055

Top 15 most cited papers introducing GSA tools or platforms related to ncRNA data (miRNA, lncRNA and circRNA data), written between 2008 and 2018, according to google scholar citations. The total number of papers included in this category was 28 (for the full dataset, see: https://gsa-central.github.io/gsarefdb.html).

It is also known that lncRNAs display several other molecular functions, such as DNA binding. Several studies have explored the gene expression changes after a given lncRNA knockdown or overexpression, and databases such as LncRNA2Target (117, 118) collect such lncRNA-target information. LncRNA2Target v.2.0 includes 72 102 of such interactions for human, plus 1465 interactions from immunoprecipitation assays, RNA pull-down assays and luciferase report assays. There are also multiple tools to predict lncRNA interactions with both DNA and other RNAs, which have been recently reviewed (119). Such information can also be used for GSA purposes.

More recent resources include the following: NeuraNetL2GO (120), which is a software that predicts lncRNA's GO function based on neural networks trained on the GO; LncFunNet (121), which infers lncRNA function from an integrative network of ChIP-Seq, CLIP-Seq and RNA-Seq data; and decodeRNA (122), which is a database providing inferred function for both lncRNAs and miRNAs in human cancer and normal cell types. A related non-GSA network-based method is lncFunTK (123), which starts from a lncRNA regulatory network built from ChIP-Seq, CLIP-Seq and RNA-Seq data, and computes a `Functional Information Score’ for each lncRNA to infer function from GO terms of neighbor genes.

Other ncRNA data

When studying other types of ncRNAs, such as circRNAs, similar procedures could be followed in principle. For example, the starBase database (124) uses RNA–RNA interactome data (such as LIGR-Seq, PARIS or SPLASH methods) to find all mRNAs interacting with ncRNAs; such mRNAs can be subject to GSA. In another example, Su et al. (125) start from the assumption that circRNAs are `miRNA sponges’, then proceed to computationally detect circRNA–miRNA interactions for all DE circRNAs and then to detect function from the corresponding miRNA targets. A similar analysis uses DIANA-miRTaR to find the miRNA target function, which they assume is the function of `the circRNA-miRNA axis’ (126). In contrast, it has also been suggested that circRNAs act as molecular scaffolds for RNA-binding proteins (RBPs) that regulate transcription (127), which would point to a different type of correlation. As we learn more about ncRNAs, it seems that circRNA-miRNA or circRNA-RBP correlations alone may be an over-simplification and we might expect very complex interaction networks of different types of RNAs.

Just as with other ncRNAs, it is also possible to avoid correlation studies and directly use specialized circRNA databases with functional annotation such as CircFunBase (128).

According to data collected for this review, the most popular ncRNA GSA methods are miRNApath, DIANA miRPath and miRSystem for miRNAs and lncRNAtor, LncRNA2Function and LncRNA2Target for lncRNAs (see Table 3 and Supplementary Material).

Discussion

It is evident that genomic range GSA methods have been mainly based on their simpler mRNA counterparts (ORA and FCS methods), with a few differences due to the needs of combining multiple peaks/SNPs in the same gene and mapping peaks/SNPs falling in the non-coding regions. It is also noteworthy that the development of ChIP-x, SNP and methylation GSA methods seems mostly independent from each other, even though all genomic range-based GSA software may be closely related (as suggested by the authors of INRICH). Indeed, the clustering approach used by ProxyGeneLD and INRICH to join regions in LD is analogous to the approach we have suggested for linking peaks in chromatin interactions for ChIP-x data, and therefore network analysis could be a useful framework for unifying genomic GSA methods. However, it is important to keep in mind that chromatin interaction domains and linkage disequilibrium blocks do not seem to correlate, suggesting that the networks of physical (regulatory) interactions and genetic interactions are independent (129).

Regarding ncRNA GSA, we want to highlight the possibility of building complex interaction networks involving all types of RNAs, as we mentioned in the previous section. The starBase database, for example, reports interactions between miRNA–mRNA, miRNA–lncRNA, miRNA–circRNA, RBP–mRNA, RBP–lncRNA, RBP–circRNA and others, from which we could expect a network containing RBPs, mRNAs, miRNAs, lncRNAs, circRNAs, sncRNAs and other RNAs. Therefore, network methodologies (such as those used by lncRNAs2Pathways) seem adequate and promising for functional interpretation studies under the correlation approach. Correlation network methodologies seem even more relevant when we acknowledge the existence of resources to link both genomic range and ncRNA data. For example, Lnc2Meth (130), a database for regulatory relationships between lncRNAs and DNA methylation and lncRNASNP2 (131), which is a database with the effects of SNPs on lncRNA structure and function. Figure 2b illustrates the idea of using integrative correlation networks on ncRNA GSA.

Conclusion

A landscape of the existing GSA methods and tools outside of the mRNA realm has been portrayed. On the positive side, it raises awareness of the existence of multiple less-known alternatives for functional interpretation of omics data and how they are evolving, which is useful to the biomedical researcher. On the negative side, it shows that the non-mRNA tools follow a few steps behind the mRNA-based tools, and many of the new and the currently popular methods still wait to be used. Therefore, it is necessary to work on the application of the new PT cross-talk, SS and NI methodologies (beyond ORA, FCS and SPIA) to genomic range and ncRNA datasets. We have also identified and highlighted a few desired future directions, such as the use of chromatin interaction data for genomic range GSA and the use of integrative interaction networks (instead of pairwise correlations/targets) on ncRNA GSA (Figure 2b) and also the need of a comprehensive functional annotation of ncRNAs to be used in databases.

In addition, non-mRNA-based tools share some additional limitations and open problems with their mRNA counterparts. For example, the lack of databases that annotate pathways, gene sets and ontologies with the different experimental conditions in which those datasets were generated, moving from a general to a more personalized approach. Also, the need for benchmark data and comprehensive comparative studies between tools, to identify the best performing tools, which currently is a difficult task.

Due to lack of space, we have failed to discuss GSA of metabolomics, metagenomics and integrative (multi-omics) data. All those fields report similar levels of progress to the ones discussed above, and we plan to address them in the future.

The discussion of the different available functional interpretation methods and their limitations is essential because it does give arguments to researchers, reviewers and the general public to be more critical regarding the functional interpretation analyses performed inside any biomedical studies and the validity of their conclusions. The author hopes that this paper helps in that direction.

Key Points

  • Based on a sample of 307 papers related to GSA methods and software (coming from the mRNA realm), we present an updated review of current GSA methods. We also review 53 papers with methods and tools applying GSA to genomic range data (ChIP-x, SNP and methylation data) and 28 papers with applications to ncRNA data (miRNA and lncRNA). In comparison to mRNA methods, we find that non-mRNA methods have been less discussed and developed.

  • GSA of genomic range data is based on mapping genomic ranges to genes. Methodologies to achieve this goal include mapping ranges to the nearest gene and mapping ranges inside a window around the gene. Such methodologies need to be enriched with biological knowledge regarding chromatin organization (for ChIP-x data) and linkage disequilibrium (for SNP data), among others.

  • GSA of miRNAs is mainly based on either determining miRNA targets or using databases of miRNA sets, while GSA of lncRNAs is based on either lncRNA–mRNA expression correlation or other pairwise correlations. Given the complex structure of correlations between RNAs and RBPs, current methodologies need to be enriched with efforts such as integrative correlation networks.

  • According to our data, GREAT is the most popular tool for GSA of ChIP-x data, while MAGMA is the most popular tool for SNP data in recent years. Regarding ncRNAs, DIANA miRPath is the most popular tool for miRNA data, while lncRNAtor and LncRNA2Function are the most popular tools for lncRNA data.

Data Availability

All datasets generated for this study are included in the manuscript and the supplementary files. Updated versions of the datasets will be available at https://gsa-central.github.io/gsarefdb.html

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Author contributions

A.M. wrote the full review and built the reference database.

Acknowledgments

The author thanks the support of both colleagues and students at the Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences).

Funding

This work was funded by the Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences).

Antonio Mora is an associate professor at the Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health–Chinese Academy of Sciences. His research focuses on methods and software for the functional interpretation of omics data.

References

1.

Mora
 
A.
 GSARefDB (The Gene Set Analysis References Database),
2019
. .

2.

Khatri
 
P
,
Sirota
 
M
,
Butte
 
AJ
.
Ten years of pathway analysis: current approaches and outstanding challenges
.
PLoS Comput Biol
 
2012
;
8
(
2
):
e1002375
.

3.

Goeman
 
JJ
,
Buhlmann
 
P
.
Analyzing gene expression data in terms of gene sets: methodological issues
.
Bioinformatics
 
2007
;
23
(
8
):
980
7
.

4.

Ackermann
 
M
,
Strimmer
 
K
.
A general modular framework for gene set enrichment analysis
.
BMC Bioinformatics
 
2009
;
10
:
47
.

5.

Huang da
 
W
,
Sherman
 
BT
,
Lempicki
 
RA
.
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists
.
Nucleic Acids Res
 
2009
;
37
(
1
):
1
13
.

6.

Huang
 
DW
,
Sherman
 
BT
,
Tan
 
Q
, et al.  
DAVID bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists
.
Nucleic Acids Res
 
2007
;
35
:
W169
75
.

7.

Young
 
MD
,
Wakefield
 
MJ
,
Smyth
 
GK
, et al.  
Gene ontology analysis for RNA-seq: accounting for selection bias
.
Genome Biol
 
2010
;
11
(
2
):
R14
.

8.

Alexa
 
A
,
Rahnenfuhrer
 
J
,
Lengauer
 
T
.
Improved scoring of functional groups from gene expression data by decorrelating GO graph structure
.
Bioinformatics
 
2006
;
22
(
13
):
1600
7
.

9.

Garcia-Campos
 
MA
,
Espinal-Enriquez
 
J
,
Hernandez-Lemus
 
E
.
Pathway analysis: state of the art
.
Front Physiol
 
2015
;
6
:
383
.

10.

Luo
 
W
,
Friedman
 
MS
,
Shedden
 
K
, et al.  
GAGE: generally applicable gene set enrichment for pathway analysis
.
BMC Bioinformatics
 
2009
;
10
:
161
.

11.

Goeman
 
JJ
,
van de Geer
 
SA
,
de Kort
 
F
, et al.  
A global test for groups of genes: testing association with a clinical outcome
.
Bioinformatics
 
2004
;
20
(
1
):
93
9
.

12.

Tarca
 
AL
,
Bhatti
 
G
,
Romero
 
R
.
A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity
.
PLoS One
 
2013
;
8
(
11
):
e79217
.

13.

Tarca
 
AL
,
Draghici
 
S
,
Bhatti
 
G
, et al.  
Down-weighting overlapping genes improves gene set analysis
.
BMC Bioinformatics
 
2012
;
13
:
136
.

14.

Hung
 
JH
,
Yang
 
TH
,
Hu
 
Z
, et al.  
Gene set enrichment analysis: performance evaluation and usage guidelines
.
Brief Bioinform
 
2012
;
13
(
3
):
281
91
.

15.

Draghici
 
S
,
Khatri
 
P
,
Tarca
 
AL
, et al.  
A systems biology approach for pathway level analysis
.
Genome Res
 
2007
;
17
(
10
):
1537
45
.

16.

Tarca
 
AL
,
Draghici
 
S
,
Khatri
 
P
, et al.  
A novel signaling pathway impact analysis
.
Bioinformatics
 
2009
;
25
(
1
):
75
82
.

17.

Donato
 
M
,
Xu
 
Z
,
Tomoiaga
 
A
, et al.  
Analysis and correction of crosstalk effects in pathway analysis
.
Genome Res
 
2013
;
23
(
11
):
1885
93
.

18.

Dutta
 
B
,
Wallqvist
 
A
,
Reifman
 
J
.
PathNet: a tool for pathway analysis using topological information
.
Source Code Biol Med
 
2012
;
7
(
1
):
10
.

19.

Ponzoni
 
I
,
Nueda
 
M
,
Tarazona
 
S
, et al.  
Pathway network inference from gene expression data
.
BMC Syst Biol
 
2014
;
8
(
Suppl 2
):
S7
.

20.

Dussaut
 
JS
,
Gallo
 
CA
,
Cecchini
 
RL
, et al.  
Crosstalk pathway inference using topological information and biclustering of gene expression data
.
Biosystems
 
2016
;
150
:
1
12
.

21.

Bokanizad
 
B
,
Tagett
 
R
,
Ansari
 
S
, et al.  
SPATIAL: A System-level PAThway Impact AnaLysis approach
.
Nucleic Acids Res
 
2016
;
44
(
11
):
5034
44
.

22.

Bayerlova
 
M
,
Jung
 
K
,
Kramer
 
F
, et al.  
Comparative study on gene set and pathway topology-based enrichment methods
.
BMC Bioinformatics
 
2015
;
16
:
334
.

23.

Li
 
C
,
Li
 
X
,
Miao
 
Y
, et al.  
SubpathwayMiner: a software package for flexible identification of pathways
.
Nucleic Acids Res
 
2009
;
37
(
19
):
e131
.

24.

Chen
 
X
,
Xu
 
J
,
Huang
 
B
, et al.  
A sub-pathway-based approach for identifying drug response principal network
.
Bioinformatics
 
2011
;
27
(
5
):
649
54
.

25.

Judeh
 
T
,
Johnson
 
C
,
Kumar
 
A
, et al.  
TEAK: topology enrichment analysis framework for detecting activated biological subpathways
.
Nucleic Acids Res
 
2013
;
41
(
3
):
1425
37
.

26.

Vrahatis
 
AG
,
Balomenos
 
P
,
Tsakalidis
 
AK
, et al.  
DEsubs: an R package for flexible identification of differentially expressed subpathways using RNA-seq experiments
.
Bioinformatics
 
2016
;
32
(
24
):
3844
6
.

27.

Alexeyenko
 
A
,
Lee
 
W
,
Pernemalm
 
M
, et al.  
Network enrichment analysis: extension of gene-set enrichment analysis to gene networks
.
BMC Bioinformatics
 
2012
;
13
:
226
.

28.

Glaab
 
E
,
Baudot
 
A
,
Krasnogor
 
N
, et al.  
EnrichNet: network-based gene set enrichment analysis
.
Bioinformatics
 
2012
;
28
(
18
):
i451
i7
.

29.

McCormack
 
T
,
Frings
 
O
,
Alexeyenko
 
A
, et al.  
Statistical assessment of crosstalk enrichment between gene groups in biological networks
.
PLoS One
 
2013
;
8
(
1
):
e54945
.

30.

Liu
 
L
,
Ruan
 
J
.
Network-based pathway enrichment analysis
.
Proceedings (IEEE Int Conf Bioinformatics Biomed)
 
2013
;
218
21
.

31.

Ogris
 
C
,
Guala
 
D
,
Helleday
 
T
, et al.  
A novel method for crosstalk analysis of biological networks: improving accuracy of pathway annotation
.
Nucleic Acids Res
 
2017
;
45
(
2
):
e8
.

32.

Shen
 
K
,
Tseng
 
GC
.
Meta-analysis for pathway enrichment analysis when combining multiple genomic studies
.
Bioinformatics
 
2010
;
26
(
10
):
1316
23
.

33.

Wang
 
X
,
Kang
 
DD
,
Shen
 
K
, et al.  
An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection
.
Bioinformatics
 
2012
;
28
(
19
):
2534
6
.

34.

Ernst
 
J
,
Bar-Joseph
 
Z
.
STEM: a tool for the analysis of short time series gene expression data
.
BMC Bioinformatics
 
2006
;
7
:
191
.

35.

Hejblum
 
BP
,
Skinner
 
J
,
Thiebaut
 
R
.
Time-course gene set analysis for longitudinal gene expression data
.
PLoS Comput Biol
 
2015
;
11
(
6
):
e1004310
.

36.

Martini
 
P
,
Sales
 
G
,
Calura
 
E
, et al.  
timeClip: pathway analysis for time course data without replicates
.
BMC Bioinformatics
 
2014
;
15
(
Suppl 5
):
S3
.

37.

Zhang
 
Y
,
Topham
 
DJ
,
Thakar
 
J
, et al.  
FUNNEL-GSEA: FUNctioNal ELastic-net regression in time-course gene set enrichment analysis
.
Bioinformatics
 
2017
;
33
(
13
):
1944
52
.

38.

Gu
 
J
,
Wang
 
X
,
Chan
 
J
, et al.  
Phantom: investigating heterogeneous gene sets in time-course data
.
Bioinformatics
 
2017
;
33
(
18
):
2957
9
.

39.

Barbie
 
DA
,
Tamayo
 
P
,
Boehm
 
JS
, et al.  
Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1
.
Nature
 
2009
;
462
(
7269
):
108
12
.

40.

Tomfohr
 
J
,
Lu
 
J
,
Kepler
 
TB
.
Pathway level analysis of gene expression using singular value decomposition
.
BMC Bioinformatics
 
2005
;
6
:
225
.

41.

Vaske
 
CJ
,
Benz
 
SC
,
Sanborn
 
JZ
, et al.  
Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM
.
Bioinformatics
 
2010
;
26
(
12
):
i237
45
.

42.

Drier
 
Y
,
Sheffer
 
M
,
Domany
 
E
.
Pathway-based personalized analysis of cancer
.
Proc Natl Acad Sci U S A
 
2013
;
110
(
16
):
6388
93
.

43.

Hanzelmann
 
S
,
Castelo
 
R
,
Guinney
 
J
.
GSVA: gene set variation analysis for microarray and RNA-seq data
.
BMC Bioinformatics
 
2013
;
14
:
7
.

44.

Wang
 
J
,
Duncan
 
D
,
Shi
 
Z
, et al.  
WEB-based GEne SeT AnaLysis toolkit (WebGestalt): update 2013
.
Nucleic Acids Res
 
2013
;
41
:
W77
83
.

45.

Kuleshov
 
MV
,
Jones
 
MR
,
Rouillard
 
AD
, et al.  
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
.
Nucleic Acids Res
 
2016
;
44
(
W1
):
W90
7
.

46.

Kang
 
H
,
Choi
 
I
,
Cho
 
S
, et al.  
gsGator: an integrated web platform for cross-species gene set analysis
.
BMC Bioinformatics
 
2014
;
15
:
13
.

47.

Maere
 
S
,
Heymans
 
K
,
Kuiper
 
M
.
BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks
.
Bioinformatics
 
2005
;
21
(
16
):
3448
9
.

48.

Bindea
 
G
,
Mlecnik
 
B
,
Hackl
 
H
, et al.  
ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks
.
Bioinformatics
 
2009
;
25
(
8
):
1091
3
.

49.

Alhamdoosh
 
M
,
Law
 
C
,
Tian
 
L
, et al.  
Easy and efficient ensemble gene set testing with EGSEA
.
F1000Res
 
2017
;
6
:
2010
.

50.

Ihnatova
 
I
,
Budinska
 
E
.
ToPASeq: an R package for topology-based pathway analysis of microarray and RNA-seq data
.
BMC Bioinformatics
 
2015
;
16
:
350
.

51.

McLean
 
CY
,
Bristor
 
D
,
Hiller
 
M
, et al.  
GREAT improves functional interpretation of cis-regulatory regions
.
Nat Biotechnol
 
2010
;
28
(
5
):
495
501
.

52.

Welch
 
RP
,
Lee
 
C
,
Imbriano
 
PM
, et al.  
ChIP-enrich: gene set enrichment testing for ChIP-seq data
.
Nucleic Acids Res
 
2014
;
42
(
13
):
e105
.

53.

Cavalcante
 
RG
,
Lee
 
C
,
Welch
 
RP
, et al.  
Broad-enrich: functional interpretation of large sets of broad genomic regions
.
Bioinformatics
 
2014
;
30
(
17
):
i393
400
.

54.

Waardenberg
 
AJ
,
Basset
 
SD
,
Bouveret
 
R
, et al.  
CompGO: an R package for comparing and visualizing gene ontology enrichment differences between DNA binding experiments
.
BMC Bioinformatics
 
2015
;
16
:
275
.

55.

Wang
 
B
,
Cunningham
 
JM
,
Yang
 
XH
.
Seq2pathway: an R/bioconductor package for pathway analysis of next-generation sequencing data
.
Bioinformatics
 
2015
;
31
(
18
):
3043
5
.

56.

Jiang
 
Q
,
Wang
 
J
,
Wang
 
Y
, et al.  
TF2LncRNA: identifying common transcription factors for a list of lncRNA genes from ChIP-seq data
.
Biomed Res Int
 
2014
;
2014
:
317642
.

57.

Gebhardt
 
ML
,
Mer
 
AS
,
Andrade-Navarro
 
MA
.
mBISON: finding miRNA target over-representation in gene lists from ChIP-sequencing data
.
BMC Res Notes
 
2015
;
8
:
157
.

58.

Mora
 
A
,
Sandve
 
GK
,
Gabrielsen
 
OS
, et al.  
In the loop: promoter-enhancer interactions and bioinformatics
.
Brief Bioinform
 
2016
;
17
(
6
):
980
95
.

59.

Holmans
 
P
,
Green
 
EK
,
Pahwa
 
JS
, et al.  
Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder
.
Am J Hum Genet
 
2009
;
85
(
1
):
13
24
.

60.

Fridley
 
BL
,
Biernacka
 
JM
.
Gene set analysis of SNP data: benefits, challenges, and future directions
.
Eur J Hum Genet
 
2011
;
19
(
8
):
837
43
.

61.

Medina
 
I
,
Montaner
 
D
,
Bonifaci
 
N
, et al.  
Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies
.
Nucleic Acids Res
 
2009
;
37
:
W340
4
.

62.

Mooney
 
M
,
Wilmot
 
B
.
Gene set analysis: a step-by-step guide
.
Am J Med Genet B Neuropsychiatr Genet
 
2015
;
168
(
7
):
517
27
.

63.

Peng
 
G
,
Luo
 
L
,
Siu
 
H
, et al.  
Gene and pathway-based second-wave analysis of genome-wide association studies
.
Eur J Hum Genet
 
2010
;
18
(
1
):
111
7
.

64.

Wang
 
K
,
Li
 
M
,
Bucan
 
M
.
Pathway-based approaches for analysis of genomewide association studies
.
Am J Hum Genet
 
2007
;
81
(
6
):
1278
83
.

65.

Holden
 
M
,
Deng
 
S
,
Wojnowski
 
L
, et al.  
GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies
.
Bioinformatics
 
2008
;
24
(
23
):
2784
5
.

66.

Nam
 
D
,
Kim
 
J
,
Kim
 
SY
, et al.  
GSA-SNP: a general approach for gene set analysis of polymorphisms
.
Nucleic Acids Res
 
2010
;
38
:
W749
54
.

67.

Yoon
 
S
,
Nguyen
 
HCT
,
Yoo
 
YJ
, et al.  
Efficient pathway enrichment and network analysis of GWAS summary data using GSA-SNP2
.
Nucleic Acids Res
 
2018
;
46
(
10
):
e60
.

68.

Weng
 
L
,
Macciardi
 
F
,
Subramanian
 
A
, et al.  
SNP-based pathway enrichment analysis for genome-wide association studies
.
BMC Bioinformatics
 
2011
;
12
:
99
.

69.

Chen
 
LS
,
Hutter
 
CM
,
Potter
 
JD
, et al.  
Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data
.
Am J Hum Genet
 
2010
;
86
(
6
):
860
71
.

70.

Chen
 
X
,
Wang
 
L
,
Hu
 
B
, et al.  
Pathway-based analysis for genome-wide association studies using supervised principal components
.
Genet Epidemiol
 
2010
;
34
(
7
):
716
24
.

71.

de
 
CA
,
Mooij
 
JM
,
Heskes
 
T
, et al.  
MAGMA: generalized gene-set analysis of GWAS data
.
PLoS Comput Biol
 
2015
;
11
(
4
):
e1004219
.

72.

Segre
 
AV
,
Consortium
 
D
,
investigators
 
M
, et al.  
Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits
.
PLoS Genet
 
2010
;
6
(
8
).

73.

Fan
 
Q
,
Zhang
 
F
,
Wang
 
W
, et al.  
GWAS summary-based pathway analysis correcting for the genetic confounding impact of environmental exposures
.
Brief Bioinform
 
2018
;
19
(
5
):
725
30
.

74.

Zhang
 
M
,
Liang
 
L
,
Morar
 
N
, et al.  
Integrating pathway analysis and genetics of gene expression for genome-wide association study of basal cell carcinoma
.
Hum Genet
 
2012
;
131
(
4
):
615
23
.

75.

Ballard
 
DH
,
Cho
 
J
,
Zhao
 
H
.
Comparisons of multi-marker association methods to detect association between a candidate region and disease
.
Genet Epidemiol
 
2010
;
34
(
3
):
201
12
.

76.

Wang
 
L
,
Jia
 
P
,
Wolfinger
 
RD
, et al.  
An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies
.
Bioinformatics
 
2011
;
27
(
5
):
686
92
.

77.

Jia
 
P
,
Wang
 
L
,
Meltzer
 
HY
, et al.  
Pathway-based analysis of GWAS datasets: effective but caution required
.
Int J Neuropsychopharmacol
 
2011
;
14
(
4
):
567
72
.

78.

de
 
CA
,
Neale
 
BM
,
Heskes
 
T
, et al.  
The statistical properties of gene-set analysis
.
Nat Rev Genet
 
2016
;
17
(
6
):
353
64
.

79.

Hong
 
MG
,
Pawitan
 
Y
,
Magnusson
 
PK
, et al.  
Strategies and issues in the detection of pathway enrichment in genome-wide association studies
.
Hum Genet
 
2009
;
126
(
2
):
289
301
.

80.

Lee
 
PH
,
O'Dushlaine
 
C
,
Thomas
 
B
, et al.  
INRICH: interval-based enrichment analysis for genome-wide association studies
.
Bioinformatics
 
2012
;
28
(
13
):
1797
9
.

81.

Geeleher
 
P
,
Hartnett
 
L
,
Egan
 
LJ
, et al.  
Gene-set analysis is severely biased when applied to genome-wide methylation data
.
Bioinformatics
 
2013
;
29
(
15
):
1851
7
.

82.

Harper
 
KN
,
Peters
 
BA
,
Gamble
 
MV
.
Batch effects and pathway analysis: two potential perils in cancer studies involving DNA methylation array analysis
.
Cancer Epidemiol Biomarkers Prev
 
2013
;
22
(
6
):
1052
60
.

83.

Phipson
 
B
,
Maksimovic
 
J
,
Oshlack
 
A
.
missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform
.
Bioinformatics
 
2016
;
32
(
2
):
286
8
.

84.

Maksimovic
 
J
,
Phipson
 
B
,
Oshlack
 
A
.
A cross-package bioconductor workflow for analysing methylation array data
.
F1000Res
 
2017
;
5
(
1281
):
52
.

85.

Kishore
 
K
,
de
 
S
,
Lister
 
R
, et al.  
methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data
.
BMC Bioinformatics
 
2015
;
16
:
313
.

86.

Creighton
 
CJ
,
Nagaraja
 
AK
,
Hanash
 
SM
, et al.  
A bioinformatics tool for linking gene expression profiling results with public databases of microRNA target predictions
.
RNA
 
2008
;
14
(
11
):
2290
6
.

87.

Wu
 
X
,
Watson
 
M
.
CORNA: testing gene lists for regulation by microRNAs
.
Bioinformatics
 
2009
;
25
(
6
):
832
3
.

88.

Roubelakis
 
MG
,
Zotos
 
P
,
Papachristoudis
 
G
, et al.  
Human microRNA target analysis and gene ontology clustering by GOmir, a novel stand-alone application
.
BMC Bioinformatics
 
2009
;
10
(
Suppl 6
):
S20
.

89.

Kowarsch
 
A
,
Preusse
 
M
,
Marr
 
C
, et al.  
miTALOS: analyzing the tissue-specific regulation of signaling pathways by human and mouse microRNAs
.
RNA
 
2011
;
17
(
5
):
809
19
.

90.

Lu
 
TP
,
Lee
 
CY
,
Tsai
 
MH
, et al.  
miRSystem: an integrated system for characterizing enriched functions and pathways of microRNA targets
.
PLoS One
 
2012
;
7
(
8
):
e42390
.

91.

Vlachos
 
IS
,
Zagganas
 
K
,
Paraskevopoulou
 
MD
, et al.  
DIANA-miRPath v3.0: deciphering microRNA function with experimental support
.
Nucleic Acids Res
 
2015
;
43
(
W1
):
W460
6
.

92.

Betel
 
D
,
Koppal
 
A
,
Agius
 
P
, et al.  
Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites
.
Genome Biol
 
2010
;
11
(
8
):
R90
.

93.

Xu
 
J
,
Wong
 
CW
.
Enrichment analysis of miRNA targets
.
Methods Mol Biol
 
2013
;
936
:
91
103
.

94.

Steinfeld
 
I
,
Navon
 
R
,
Ach
 
R
, et al.  
miRNA target enrichment analysis reveals directly active miRNAs in health and disease
.
Nucleic Acids Res
 
2013
;
41
(
3
):
e45
.

95.

Ru
 
Y
,
Kechris
 
KJ
,
Tabakoff
 
B
, et al.  
The multiMiR R package and database: integration of microRNA-target interactions along with their disease and drug associations
.
Nucleic Acids Res
 
2014
;
42
(
17
):
e133
.

96.

Palmieri
 
V
,
Backes
 
C
,
Ludwig
 
N
, et al.  
IMOTA: an interactive multi-omics tissue atlas for the analysis of human miRNA-target interactions
.
Nucleic Acids Res
 
2018
;
46
(
D1
):
D770
D5
.

97.

Subhra Das
 
S
,
James
 
M
,
Paul
 
S
, et al.  
miRnalyze: an interactive database linking tool to unlock intuitive microRNA regulation of cell signaling pathways
.
Database (Oxford)
 
2017
;
2017
(
1
).

98.

Das
 
SS
,
Saha
 
P
,
Chakravorty
 
N
.
miRwayDB: a database for experimentally validated microRNA-pathway associations in pathophysiological conditions
.
Database (Oxford)
 
2018
;
2018
.

99.

Lu
 
M
,
Shi
 
B
,
Wang
 
J
, et al.  
TAM: a method for enrichment and depletion analysis of a microRNA category in a list of microRNAs
.
BMC Bioinformatics
 
2010
;
11
:
419
.

100.

Li
 
J
,
Han
 
X
,
Wan
 
Y
, et al.  
TAM 2.0: tool for MicroRNA set analysis
.
Nucleic Acids Res
 
2018
;
46
(
W1
):
W180
W5
.

101.

Kozomara
 
A
,
Griffiths-Jones
 
S
.
miRBase: annotating high confidence microRNAs using deep sequencing data
.
Nucleic Acids Res
 
2014
;
42
:
D68
73
.

102.

Lu
 
M
,
Zhang
 
Q
,
Deng
 
M
, et al.  
An analysis of human microRNA and disease associations
.
PLoS One
 
2008
;
3
(
10
):
e3420
.

103.

Corapcioglu
 
ME
,
Ogul
 
H
.
miSEA: microRNA set enrichment analysis
.
Biosystems
 
2015
;
134
:
37
42
.

104.

Hsu
 
SD
,
Tseng
 
YT
,
Shrestha
 
S
, et al.  
miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions
.
Nucleic Acids Res
 
2014
;
42
:
D78
85
.

105.

Xiao
 
F
,
Zuo
 
Z
,
Cai
 
G
, et al.  
miRecords: an integrated resource for microRNA-target interactions
.
Nucleic Acids Res
 
2009
;
37
:
D105
10
.

106.

Wang
 
J
,
Lu
 
M
,
Qiu
 
C
, et al.  
TransmiR: a transcription factor-microRNA regulation database
.
Nucleic Acids Res
 
2010
;
38
:
D119
22
.

107.

Backes
 
C
,
Khaleeq
 
QT
,
Meese
 
E
, et al.  
miEAA: microRNA enrichment analysis and annotation
.
Nucleic Acids Res
 
2016
;
44
(
W1
):
W110
6
.

108.

Andres-Leon
 
E
,
Nunez-Torres
 
R
,
Rojas
 
AM
.
miARma-seq: a comprehensive tool for miRNA, mRNA and circRNA analysis
.
Sci Rep
 
2016
;
6
:
25749
.

109.

Godard
 
P
,
van
 
J
.
Pathway analysis from lists of microRNAs: common pitfalls and alternative strategy
.
Nucleic Acids Res
 
2015
;
43
(
7
):
3490
7
.

110.

Guttman
 
M
,
Amit
 
I
,
Garber
 
M
, et al.  
Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals
.
Nature
 
2009
;
458
(
7235
):
223
7
.

111.

Jiang
 
Q
,
Ma
 
R
,
Wang
 
J
, et al.  
LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data
.
BMC Genomics
 
2015
;
16
(
Suppl 3
):
S2
.

112.

Park
 
C
,
Yu
 
N
,
Choi
 
I
, et al.  
lncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs
.
Bioinformatics
 
2014
;
30
(
17
):
2480
5
.

113.

Zhao
 
Z
,
Bai
 
J
,
Wu
 
A
, et al.  
Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-seq data
.
Database (Oxford)
 
2015
;
2015
.

114.

Han
 
J
,
Liu
 
S
,
Sun
 
Z
, et al.  
LncRNAs2 pathways: identifying the pathways influenced by a set of lncRNAs of interest based on a global network propagation method
.
Sci Rep
 
2017
;
7
:
46566
.

115.

Liu
 
K
,
Yan
 
Z
,
Li
 
Y
, et al.  
Linc2GO: a human LincRNA function annotation resource based on ceRNA hypothesis
.
Bioinformatics
 
2013
;
29
(
17
):
2221
2
.

116.

Pian
 
C
,
Zhang
 
G
,
Tu
 
T
, et al.  
LncCeRBase: a database of experimentally validated human competing endogenous long non-coding RNAs
.
Database (Oxford)
 
2018
;
2018
.

117.

Jiang
 
Q
,
Wang
 
J
,
Wu
 
X
, et al.  
LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression
.
Nucleic Acids Res
 
2015
;
43
:
D193
6
.

118.

Cheng
 
L
,
Wang
 
P
,
Tian
 
R
, et al.  
LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse
.
Nucleic Acids Res
 
2018
;
47
(
D1
):
D140
44
.

119.

Antonov
 
IV
,
Mazurov
 
E
,
Borodovsky
 
M
, et al.  
Prediction of lncRNAs and their interactions with nucleic acids: benchmarking bioinformatics tools
.
Brief Bioinform
 
2019
;
20
(
2
):
551
64
.

120.

Zhang
 
J
,
Zhang
 
Z
,
Wang
 
Z
, et al.  
Ontological function annotation of long non-coding RNAs through hierarchical multi-label classification
.
Bioinformatics
 
2018
;
34
(
10
):
1750
7
.

121.

Zhou
 
J
,
Zhang
 
S
,
Wang
 
H
, et al.  
LncFunNet: an integrated computational framework for identification of functional long noncoding RNAs in mouse skeletal muscle cells
.
Nucleic Acids Res
 
2017
;
45
(
12
):
e108
.

122.

Ghent University
. decodeRNA 1.0.
2019
.
http://www.decoderna.org (accessed 15 March 2019)
.

123.

Zhou
 
J
,
Huang
 
Y
,
Ding
 
Y
, et al.  
lncFunTK: a toolkit for functional annotation of long noncoding RNAs
.
Bioinformatics
 
2018
;
34
(
19
):
3415
6
.

124.

SYSU
. StarBase database.
2019
 
http://starbase.sysu.edu.cn (accessed 15 March 2019)
.

125.

Su
 
H
,
Lin
 
F
,
Deng
 
X
, et al.  
Profiling and bioinformatics analyses reveal differential circular RNA expression in radioresistant esophageal cancer cells
.
J Transl Med
 
2016
;
14
(
1
):
225
.

126.

Cheng
 
J
,
Zhuo
 
H
,
Xu
 
M
, et al.  
Regulatory network of circRNA-miRNA-mRNA contributes to the histological classification and disease progression in gastric cancer
.
J Transl Med
 
2018
;
16
(
1
):
216
.

127.

Barrett
 
SP
,
Salzman
 
J
.
Circular RNAs: analysis, expression and potential functions
.
Development
 
2016
;
143
(
11
):
1838
47
.

128.

Meng
 
X
,
Hu
 
D
,
Zhang
 
P
, et al.  
CircFunBase: a database for functional circular RNAs
.
Database (Oxford)
 
2019
;
2019
.

129.

Whalen
 
S
,
Pollard
 
KS
.
Most chromatin interactions are not in linkage disequilibrium
.
Genome Res
 
2019
;
29
:
334
43
.

130.

Zhi
 
H
,
Li
 
X
,
Wang
 
P
, et al.  
Lnc2Meth: a manually curated database of regulatory relationships between long non-coding RNAs and DNA methylation associated with human disease
.
Nucleic Acids Res
 
2018
;
46
(
D1
):
D133
D8
.

131.

Miao
 
YR
,
Liu
 
W
,
Zhang
 
Q
, et al.  
lncRNASNP2: an updated database of functional SNPs and mutations in human and mouse lncRNAs
.
Nucleic Acids Res
 
2018
;
46
(
D1
):
D276
D80
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)