Abstract

Biological pathways reflect the key cellular mechanisms that dictate disease states, drug response and altered cellular function. The local areas of pathways are defined as subpathways (SPs), whose dysfunction has been reported to be associated with the occurrence and development of cancer. With the development of high-throughput sequencing technology, identifying dysfunctional SPs by using multi-omics data has become possible. Moreover, the SPs are not isolated in the biological system but interact with each other. Here, we propose a network-based calculated method, CNA2Subpathway, to identify dysfunctional SPs is driven by somatic copy number alterations (CNAs) in cancer through integrating pathway topology information, multi-omics data and SP crosstalk. This provides a novel way of SP analysis by using the SP interactions in the system biological level. Using data sets from breast cancer and head and neck cancer, we validate the effectiveness of CNA2Subpathway in identifying cancer-relevant SPs driven by the somatic CNAs, which are also shown to be associated with cancer immune and prognosis of patients. We further compare our results with five pathway or SP analysis methods based on CNA and gene expression data without considering SP crosstalk. With these analyses, we show that CNA2Subpathway could help to uncover dysfunctional SPs underlying cancer via the use of SP crosstalk. CNA2Subpathway is developed as an R-based tool, which is freely available on GitHub (https://github.com/hanjunwei-lab/CNA2Subpathway).

Introduction

Cancer is a complex disease caused by the interactions of multiple dysfunctional genes and pathways [1]. Biological pathways are models that contain structural information between genes, such as interaction, regulation, modification and binding properties, which reflect the key cellular mechanisms that dictate cancer states, drug response and altered cellular function [2]. Pathway analysis is a powerful tool for understanding the biology system and helps to understand the pathogenesis and progression of cancer. Generally, genes in the same pathway usually coordinate to achieve some specific biological functions. Many pathway analysis methods, such as gene set enrichment analysis (GSEA) [3] and signaling pathway impact analysis (SPIA) [4], have been proven to be effective for revealing complex traits and dysregulated pathways behind human diseases. Edge set enrichment analysis [5] was developed to identify dysregulated pathways by investigating the changes of biological relationships of pathways in the context of gene expression (GE). LncRNAs2Pathways [6] found the functional pathways influenced by the combinatorial effects of a set of lncRNAs of interest based on a global network propagation algorithm. Although these methods have successfully identified dysregulated pathways, they did not consider the interaction between pathways. To do this, PAGI [7] was designed to identify dysregulated pathways by considering both within pathway effects and crosstalk between pathways. Lisa et al. proposed a latent pathway identification analysis (LPIA) [8], which constructs a pathway network that implicitly links pathways through their common function in the cells, to looks for statistically significant dysregulated pathways.

Recently, the concept of subpathways (SPs) was proposed, which were defined as the local areas of pathways. Compared with the original pathways, the SPs contain fewer components and reflect more specific biological functions. Several studies have reported that the dysregulated SPs were associated with the occurrence and development of cancer. SubpathwayMiner [9] facilitated the SP identification of metabolic pathways by using pathway topological structure information. It identified a ‘norepinephrine metabolism’ SP, which belongs to a local region of the ‘Tyrosine metabolism’ pathway, and which is highly associated with cancer initiation and progression [10]. However, the entire ‘Tyrosine metabolism’ pathway may not be found to be significant by the traditional pathway enrichment analysis method. This indicates that the SPs may tend to perform certain type-specific functions (e.g. norepinephrine metabolism) and may be more associated with certain phenotypes (e.g. cancer). SP-GM [11] was proposed to identify metabolic SPs by integrating the information of genes and metabolites, and their positions in a given pathway. This method identified 36 SPs associated with the progression of lung cancer, enriched by differentially expressed genes. MIDAS [12] determined condition-specific SPs, via GE data and network centrality information across multiple phenotypes. A novel systems biology R-based software package psSubpathway [13] identified phenotype-specific SPs in breast cancer dataset with multiple categories. It was also found that the SPs could be used as prognostic biomarkers of patients with breast cancer [14]. These SP analysis methods mainly identified SPs by using only the GE data between normal and tumor tissues, but considering multi-omics data will find more reliable SPs associated with cancer.

With the development of high-throughput sequencing technology, such as next-generation sequencing, the pathogenic mechanism of cancer will be better understood at the molecular level [15]. Multi-omics data integration allows the joint analysis of multiple omics data types to provide a global view of the biological system [16]. Somatic copy number alterations (CNAs) are nearly ubiquitous in cancer and alter a greater portion of the cancer genome than any other type of somatic genetic alteration [17, 18]. Finding cancer-related somatic CNA genes is an important challenge in identifying cancer molecular markers. Several recent methods of integrating CNA and GE data have been proposed. For example, the iGC package was effective and useful for identifying CNA-driven genes [19]. Oncodrive-CIS measured the cis effect of copy number changes that may be useful to identify genes involved in tumorigenesis due to CNAs [20]. These methods demonstrated that CNAs are both, directly and indirectly, correlated with changes in their expression [21]. To better study the function mechanism of the CNAs in cancer, the SP analysis by integrating CNA and GE data will be a promising strategy. Thus, it is essential to develop a calculated method to detect dysfunctional SPs driven by the CNAs at the system biology level. This will help to clearly understand the biological mechanism of cancer and develop new drug targets.

Here, we propose a network-based calculated method, CNA2Subpathway, to identify dysfunctional SPs with respect to somatic CNAs in cancer by integrating CNA data, GE data and SP crosstalk network. CNA2Subpathway provides a novel way of SP analysis by using the SP interactions. In the method, we first extract the SPs from the canonical biological pathways and construct a weighted SP crosstalk network. The weights of the network consider both the crosstalk relationship of the biological functions performed among those SPs and the differential expression extent of genes influenced by the CNAs. Next, we calculate the centrality scores of SPs by a random walk with a restart algorithm on the network, which reflects the extent of SPs influenced by the somatic CNAs and SPs crosstalk. Finally, a bootstrap-based randomization method is used to calculate the statistical significance of the scores. We analyze The Cancer Genome Atlas (TCGA) breast invasive carcinoma (BRCA) and head and neck squamous cell carcinoma (HNSC) data sets to illustrate the performance of CNA2Subpathway. We further compare our results with the GSEA and hypergeometric test methods based on somatic CNA and GE data without considering SP crosstalk (defined as CNA-GSEA and CNA-hypergeometric test). With these analyses, we demonstrate that the CNA2Subpathway method is able to identify the cancer-relevant SPs driven by somatic CNAs.

Materials and methods

CNA2Subpathway is a network-based calculated method to identify dysregulated SP driven by somatic CNAs. In this section, we describe the schematic overview of the CNA2Subpathway method in Figure 1. In steps1–3, a crosstalk network of SPs is constructed that reflects the interactions among SPs in the system biological level, through their participation in common biological functions simultaneously augmented with measurements of transcriptional dysregulation associated with CNAs. In step 4, we calculate the centrality score for each SP by using the random walk algorithm, which reflects the extent of SPs visited by a random walker on the network. Finally, we use a bootstrap-based randomization method to calculate the statistical significance of SP centrality scores. The SPs with significant centralities are reported as potentially dysfunctional on the cascade of altered SPs activated by somatic CNAs.

Schematic overview of the CNA2Subpathway method.
Figure 1

Schematic overview of the CNA2Subpathway method.

Data preprocessing

During constructing the SP network, we use three diverse but interrelated biological data: (1) biological pathways obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, (2) biological molecular functions compiled in the gene ontology (GO) database, (3) CNA and GE data downloaded from TCGA.

SP extraction

We download the biological pathways from the KEGG database [2], which provides pathway structure information (such as interaction, regulation, modification, binding, etc., between genes) and is widely used in pathway analysis. Based on the structure information, each pathway is converted into an undirected graph. We extract SPs from each pathway graph based on the SubpathwayMiner method [9]. This method uses a k-clique algorithm to identify the SPs based on pathway topology information embedded in the graph, of which the genes in them have a highly similar function. The SubpathwayMiner method has been successfully applied in the SPs analysis method [13, 22]. Thus, 689 biological SPs are obtained.

Biological functions data

We use the GO [23] molecular function data obtained from MSigDB (version 7.0) [3], which C5 GO gene sets that contain genes annotated by the same GO term. The biological function terms with more than 5 genes or less than 100 genes are selected, which will avoid the terms from being either too specific or too vague. A total of 1441 biological function terms are retained.

Gene expression profile and CNA data preprocessing

We download gene expression profiles (GEPs) and CNA data with the same samples from the GDC TCGA portal (https://portal.gdc.cancer.gov/). In the study, BRCA and HNSC data sets are used. For the GEP data, we choose TCGA RNA-seq data, and the GE values are normalized by the Fragments per Kilobase per Million (FPKM). Then, the FPKM values for each gene are normalized with z-score. CNA data use the Gene-level GISTIC 2.0 [24] segmented and thresholded CNAs downloaded from the GDC database. While the segmented data contain the copy number estimate, the thresholded copy number data contain the thresholded estimate, i.e. −2, −1,0,1,2, representing homozygous deletion, single copy deletion, diploid normal copy, low-level copy number amplification and high-level copy number amplification.

Calculation of gene CNA-DEscore by integrating GE and somatic CNA data

In step 1, we define a gene CNA-DEscore to reflect the extent of GE influenced by its somatic CNA (Figure 1). Specifically, for each gene, the samples are classified into two different groups based on the CNA status derived from GISTIC2 analysis. The samples with copy deletion genes (CNA estimate values with −2 and − 1) are assigned with label ‘1’ and others are assigned with ‘0,’ or the samples with copy number amplification genes (CNA estimate values with 1 and 2) are assigned with label ‘1’ and others are assigned with ‘0.’ Thus, the samples obtain a binary label vector for each gene. Then, for each gene, we use Student’s t-test method to calculate the gene differentially expressed the extent between CNA labels. To compare the differentially expressed extent of genes influenced by CNAs at a common level, we convert the t-test P-value of each gene to z-score using |$z={\phi}^{-1}(1-p)$|⁠, where |${\phi}^{-1}$| is the inverse normal cumulative density function. The z-score of the gene is defined as CNA-DEscore. If the expression of a gene is strongly influenced by its CNA, the CNA-DEscore will be relatively large.

Construction of a GO/SP bipartite network

In step 2, to constructing our SPs crosstalk network, we construct a bipartite network, where the two sets of nodes are SPs and GO terms. We think two SPs are functionally related if there are some genes in two SPs that share at least more than one common biological function (GO term). To do this, we first construct the bipartite network by defining an edge between each pair of SP and GO term (G), if the intersection of SP and G is nonempty. The weight of the edge in this network reflects both the relationship between G and SP and the transcriptional dysregulation of genes driven by the CNAs. We define this weight as follows:
(1)
where |${J}_{\mathrm{SP}\cdotp G}=\frac{\mid \mathrm{SP}\cap G\mid }{\mid \mathrm{SP}\cup G\mid }$| is the Jaccard index between SP and G, and |$\mathrm{med}\{\mathrm{CNA}-{\mathrm{DE}}_x|x\in \mathrm{SP}\cap G\}$| is the median CNA-DEscore over all genes in SP∩G. The |${J}_{\mathrm{SP}\cdotp G}$|⁠, in measuring the relative overlap of SP and G, is a standard measure of similarity among sets and reflects the degree to which SP participates in the G in our method. Besides, the weight is also modulated by a measure of the CNA-DEscores of genes common to both SP and G, which integrated CNA and GE information to combine the network. Therefore, both participation of SP in the biological function G and CNAs bring about high differential expression genes that common to SP and G are necessary for the edge in the GO/SP bipartite network to be heavily weighted.

Construction of the SP crosstalk network based on the bipartite network

In step 3, the SP crosstalk network is constructed based on the GO/SP bipartite network. The bipartite network can be represented as an incidence matrix |$W=[{W}_{i,j}]$|⁠, where the rows of the matrix represent SPs and the columns represent GO terms. In the bipartite network, two SPs share more neighbor GO nodes; they tend to participate in similar biological functions and implement crosstalk with each other.

To further establish the SP network, we define a weighted adjacency matrix:
(2)
We assign a value |${A}_{i{i}^{\prime }}$| with each edge weight in our SP network:
(3)

The |${N}_G$| denotes all number of GO terms. That means the weight of an edge between two SPs is the sum of the contributions, concerning the CNAs induced transcriptional dysregulation of all biological functions (GO terms) shared between them. Two SPs have an edge in this network when and only when they share at least one biological function induced by CNAs and the edge weights will be increased with GO terms to both SPs. Thus, a weighted SP crosstalk network is constructed, where self-interactions are removed. This process has been used by the LPIA algorithm for constructing the pathway network previously [8], we use it to construct the SP crosstalk network in the context of somatic CNAs.

Evaluation of dysfunctional SPs with eigenvector centrality

In step 4, we evaluate the dysfunctional SPs driven by the somatic CNAs. The construction of a weighted SP network adopts the functional similarity between SPs in the context of the effect of CNAs on transcriptionally altered. In the weighted SP network, the weighted edges reflect the evidence that an SP may be potentially altered can be expected to be reinforced by the evidence of its neighbors. We calculate the eigenvector centrality score (EC score) of SP in the network to determine the significance of an SP that is influenced by the somatic CNAs. Eigenvector centrality is a measure of the influence of a node in a network, and a node is inclined to be influenced if it is linked to many other neighbor nodes and the edges have great weight. The random walk with restart algorithm reflects this notion of importance and is particularly used to calculate the EC scores of SPs in the weighted network. In the algorithm, nodes are more central the more likely they are to be visited in the random walk. In the network, we define a probability transition matrix for a random walk, |$T$|⁠, by row-normalizing the adjacency matrix |$A$|⁠. The formula is as follows:
(4)
where |${N}_{\mathrm{SP}}$| is the number of SPs in our network, |${T}_{i{i}^{\prime }}$| is the probability that, from the subpathway |${\mathrm{SP}}_i$| to the next step will be to subpathway |${\mathrm{SP}}_{i^{\prime }}$|⁠. The formula for the random walk with restart algorithm is given as:
(5)
where |$T$| is the row-normalized adjacency matrix of |$A$|⁠; |${\pi}^t$|is the vector of nodes at time step t, and its ith element |${\pi}_i^t$| holds the probability of being at node i at time step t; the initial probability vector |${\pi}^0$| is constructed by assigning to each node with the same value and makes their sum to be 1. The random walk with restart algorithm that we used here is defined as the limiting distribution of a random walk, i.e. the relative frequency at which you would find a random walker at each of the nodes in the weighted network, in the limit of walking for a very long time. As such, the starting values are not important here. What is important is the use of random restarts, because if the network has a strong community structure, the random walking can be slower than one might hope as the walker stays in one community or another for a long time.

The parameter r is the restart probability, which has been demonstrated to have only a slight effect on the results when it fluctuated between 0.1 and 0.9 [25]. The default parameters r is set at 0.9. After an infinite number of steps, the probability |${\pi}^t$| will converge to a unique steady state |$\pi$|⁠, whose ith element |${\pi}_i$| is defined as the EC score of the subpathway |${SP}_i$|⁠. The EC score of SP will be influenced not only by the numbers of its neighbors but the values of the weights on the edges linked to it. A larger EC score |${\pi}_i$| indicates the subpathway |${SP}_i$| may be more inclined to be influenced by the cascade of altered SPs activated by somatic CNAs.

We use a bootstrap-based randomization procedure to estimate the statistical significance of SP EC scores. The EC scores are calculated in the SP network, which is constructed based on the intersection of each pair of SP and GO term and their CNA-DEscores. The intersection is constant, and the CNA-DEscores of genes are variable, which are determined by transcriptional dysregulation of genes driven by the CNAs. Thus, we performed a bootstrap resampling from the original CNA-DEscores over all genes and repeated the above algorithm to find statistical significance dysregulated SPs driven by CNAs. Thus, we obtained a vector of random EC scores,|${\pi}^{\ast }$|⁠. This process was repeated 1000 times, which obtain a set of random EC score vectors |$\{{\pi}^{\ast 1},\cdots, {\pi}^{\ast 1000}\}$|⁠. The EC scores for the original data indicated as |$\pi$| are compared with the set of random EC score vectors. We computed a P-value for each SP |${SP}_i$| as follows:
(6)
where I is the indicator function. To correct for multiple comparisons, the P-values were adjusted using the false discovery rate (FDR) method proposed by Benjamin and Hochberg [26]. The SPs with FDR < 0.2 are deemed as potential dysregulation driven by the somatic CNAs in cancer. CNA2Subpathway has been developed as a freely available R package on the GitHub repository (https://github.com/hanjunwei-lab/CNA2Subpathway).

Results

SP analysis of BRCA

In the study, CNA2Subpathway was developed to identify cancer-relevant SPs driven by CNAs in cancer through integrating pathway topology information, multi-omics data and SP crosstalk. Specifically, we first converted each pathway into an undirected graph based on pathway structure information and extracted SPs from each pathway graph based on the SubpathwayMiner method [9], which uses a k-clique algorithm to identify the SPs based on pathway topology embedded in the graph. Then, we defined a gene CNA-DEscore by integrating genomic and transcriptome data to reflect the extent of GE influenced by its somatic CNA. Thirdly, we constructed a weighted GO/SP bipartite network. The weight of the edge in this network reflects both the relationship with SPs and GO terms and the CNA-DEscores. Finally, the SP crosstalk network is constructed based on the GO/SP bipartite network. The EC score of SP was calculated with a random walk with a restart algorithm in the network to determine the significance of an SP that is influenced by the somatic CNAs.

It has been reported in the literature that somatic CNAs offer new insight into the underlying of cancer [27]. CNA2Subpathway was designed to identify the SPs driven by the somatic CNAs. In the method, all the SPs will be evaluated, and then ranked with their eigenvector centrality score and FDR. The default threshold value of FDR was set at 0.2 in the study. However, the researchers could select the FDR threshold value as needed by using our ‘CNA2Subpathway’ package. To demonstrate the CNA2Subpathway method, we downloaded the BRCA dataset from the GDC TCGA database, which includes 1059 samples with both GE and somatic CNA data. We applied CNA2Subpathway to BRCA datasets, and the top 10 significant SPs are shown in Table 1, of which their original pathways include the estrogen signaling pathway, Citrate cycle (TCA cycle), fluid shear stress and atherosclerosis, etc. Most of these pathways have been reported to be related to tumor occurrence, development and alteration [28–31].

Table 1

Top 10 subpathways identified by CNA2Subpathway in the BRCA dataset

RankSubpathIDPathwaySizeaP-valueFDR
1path:00020_6Citrate cycle (TCA cycle)12<0.001<0.001
2path:05418_16Fluid shear stress and atherosclerosis21<0.001<0.001
3path:04915_13Estrogen signaling pathway280.0010.23
4path:05212_1Pancreatic cancer210.0030.517
5path:05166_11HTLV-I infection230.0070.758
6path:04066_3HIF-1 signaling pathway240.0080.758
7path:05167_21Kaposi’s sarcoma-associated herpesvirus infection260.0090.758
8path:05418_12Fluid shear stress and atherosclerosis150.010.758
9path:01524_4Platinum drug resistance250.0110.758
10path:04623_4Cytosolic DNA-sensing pathway90.0110.758
RankSubpathIDPathwaySizeaP-valueFDR
1path:00020_6Citrate cycle (TCA cycle)12<0.001<0.001
2path:05418_16Fluid shear stress and atherosclerosis21<0.001<0.001
3path:04915_13Estrogen signaling pathway280.0010.23
4path:05212_1Pancreatic cancer210.0030.517
5path:05166_11HTLV-I infection230.0070.758
6path:04066_3HIF-1 signaling pathway240.0080.758
7path:05167_21Kaposi’s sarcoma-associated herpesvirus infection260.0090.758
8path:05418_12Fluid shear stress and atherosclerosis150.010.758
9path:01524_4Platinum drug resistance250.0110.758
10path:04623_4Cytosolic DNA-sensing pathway90.0110.758

aThe number of genes contained in the subpathway.

Table 1

Top 10 subpathways identified by CNA2Subpathway in the BRCA dataset

RankSubpathIDPathwaySizeaP-valueFDR
1path:00020_6Citrate cycle (TCA cycle)12<0.001<0.001
2path:05418_16Fluid shear stress and atherosclerosis21<0.001<0.001
3path:04915_13Estrogen signaling pathway280.0010.23
4path:05212_1Pancreatic cancer210.0030.517
5path:05166_11HTLV-I infection230.0070.758
6path:04066_3HIF-1 signaling pathway240.0080.758
7path:05167_21Kaposi’s sarcoma-associated herpesvirus infection260.0090.758
8path:05418_12Fluid shear stress and atherosclerosis150.010.758
9path:01524_4Platinum drug resistance250.0110.758
10path:04623_4Cytosolic DNA-sensing pathway90.0110.758
RankSubpathIDPathwaySizeaP-valueFDR
1path:00020_6Citrate cycle (TCA cycle)12<0.001<0.001
2path:05418_16Fluid shear stress and atherosclerosis21<0.001<0.001
3path:04915_13Estrogen signaling pathway280.0010.23
4path:05212_1Pancreatic cancer210.0030.517
5path:05166_11HTLV-I infection230.0070.758
6path:04066_3HIF-1 signaling pathway240.0080.758
7path:05167_21Kaposi’s sarcoma-associated herpesvirus infection260.0090.758
8path:05418_12Fluid shear stress and atherosclerosis150.010.758
9path:01524_4Platinum drug resistance250.0110.758
10path:04623_4Cytosolic DNA-sensing pathway90.0110.758

aThe number of genes contained in the subpathway.

With the default FDR < 0.2, CNA2Subpathway identified two statistically significant SPs. The most significant SP was path:00020_6 in the TCA cycle pathway, which contains 12 genes (Figure 2A). In the SP, two genes, fumarate hydratase (FH) and succinate dehydrogenase complex flavoprotein subunit A (SDHA) are copy number amplificated (labeled with pink in Figure 2A), and three genes, dihydrolipoamide S-succinyltransferase (DLST), succinate dehydrogenase complex iron–sulfur subunit B (SDHB), succinate dehydrogenase complex subunit D (SDHD), are copy number deleted (labeled with blue in Figure 2A). To study the biological function of the SP, we mapped the 12 genes to the original TCA cycle pathway. Interestingly, these SP genes are mostly concentrated in a local area of the pathway (marked with a red rectangle in Figure 2B), which including the continuous metabolic reaction of fumarate, succinate, succinyl-CoA, and S-succinyldihydrolipoamide-E. The CNAs of these genes such as FH, SDHA, DLST, SDHB and SDHD may result in enzymes abnormal of their encoded. And the abnormal metabolic enzymes such as FH and succinate dehydrogenase may cause metabolic dysregulation of this local pathway area (SP) [32]. Moreover, several literature have shown that the genetic and acquired changes of three TCA cycle enzymes, FH, succinate dehydrogenase, and isocitrate dehydrogenase, play a causal role in cancer progression, indicating that the metabolic changes may be potential signs of cancer [33, 34]. Especially, growing evidence suggest that FH mutations may also be involved in the pathogenesis of breast and bladder tumors [35].

(A) Dysregulation subpathway (path:00020_6) of the TCA cycle pathway in BRCA. The t-test is used to calculate the gene differentially expressed the extent between CNA labels. A gene with a smaller P-value corresponding to a larger circle, indicating that it may be strongly influenced by its CNA. The copy number amplification (CN-AMP) genes are labeled with pink, the copy number deletion (CN-DEL) genes are labeled with blue and the non-CNA genes are labeled with gray. (B) TCA cycle pathway in the KEGG database. The genes in the subpathway are mapped to the pathway. The subpathway region is marked with a red rectangle.
Figure 2

(A) Dysregulation subpathway (path:00020_6) of the TCA cycle pathway in BRCA. The t-test is used to calculate the gene differentially expressed the extent between CNA labels. A gene with a smaller P-value corresponding to a larger circle, indicating that it may be strongly influenced by its CNA. The copy number amplification (CN-AMP) genes are labeled with pink, the copy number deletion (CN-DEL) genes are labeled with blue and the non-CNA genes are labeled with gray. (B) TCA cycle pathway in the KEGG database. The genes in the subpathway are mapped to the pathway. The subpathway region is marked with a red rectangle.

We tested if the dysregulated SPs driven by the CNAs were associated with the prognosis of breast cancer patients. For the SP, the risk score of the SP for each sample was calculated by the sum of the weight expression value across all the genes in the SP, where the gene weight was the univariate Cox proportional hazards regression coefficient estimated on the GE value and the overall survival data. According to the median of SP risk scores, patients were classified into high-risk and low-risk groups (Figure 3A; log-rank P = 1.4e-3 in Kaplan–Meier survival analysis). With another significant SP path:05418_16 in fluid shear stress and atherosclerosis pathway, patients were also classified into two different groups (Figure 3C; log-rank P = 1.2e-4). These results indicate that the SPs identified by the CNA2Subpathway method may be used as potential prognostic biomarkers.

(A) Kaplan–Meier survival curves of patients classified into high-risk and low-risk groups using subpathway path:00020_6 in the TCA cycle pathway. P-value was estimated using the log-rank test. (B) Box plot to show the stromal and immune scores for the patients in high-risk and low-risk groups classified by the subpathway path:00020_6. (C) Kaplan–Meier survival curves of patients classified into high-risk and low-risk groups using subpathway path:05418_16 in the fluid shear stress and atherosclerosis pathway. (D) Box plot to show the stromal and immune scores for the patients in high-risk and low-risk groups classified by the subpathway path:05418_16. (E) Heat map of the normalized expression values of the genes in the subpathway path:00020_6.
Figure 3

(A) Kaplan–Meier survival curves of patients classified into high-risk and low-risk groups using subpathway path:00020_6 in the TCA cycle pathway. P-value was estimated using the log-rank test. (B) Box plot to show the stromal and immune scores for the patients in high-risk and low-risk groups classified by the subpathway path:00020_6. (C) Kaplan–Meier survival curves of patients classified into high-risk and low-risk groups using subpathway path:05418_16 in the fluid shear stress and atherosclerosis pathway. (D) Box plot to show the stromal and immune scores for the patients in high-risk and low-risk groups classified by the subpathway path:05418_16. (E) Heat map of the normalized expression values of the genes in the subpathway path:00020_6.

Moreover, there was a recent study report that somatic CNAs play important roles in cancer immune [36]. Thus, we tested the difference of cancer immune between high-risk and low-risk groups classified by the significant SPs. We used the ESTIMATE method [37] to infer the stromal and immune scores in tumor patients. This method calculates stromal and immune scores to predict the level of infiltrating stromal and immune cells by performing single-sample GSEA on GE data. We applied the R-based ‘ESTIMATE’ package to implement this method (https://sourceforge.net/projects/estimateproject/). For the patients in high-risk and low-risk groups classified by the SP path:00020_6, the stromal and immune scores presented a significant difference (ANOVA analysis, P < 1.5e-15 and P < 2.1e-7,) (Figure 3B). Similar results were also found in patient groups classified by the SP path:05418_16 (ANOVA analysis, P < 4e-9 and P < 1.6e-6, Figure 3D). Interestingly, the original TCA cycle pathway of path:00020_6 has been reported to be related to the immunity of transformed cells and malignant tumors [38]. Furthermore, some genes, include DLST, FH, SDHB, SDHD, which occurred copy number variation in this SP, were used as marker genes for an estimate of the endothelial cells and macrophages in the study of Alejandro et al. [39]. These results indicate that the significant SPs may be associated with the tumor immune microenvironment.

We then tested the expression changes of genes in the significant SPs. For the SP path:00020_6 in the TCA cycle pathway, the heatmap of normalized GE values shows obvious differences between patients in high-risk and low-risk groups (Figure 3E). This might be the result of the CNAs of the genes in the SP, which may alter the activity of SP and further influence the progress of cancer.

SP analysis of HNSC

The HNSC dataset includes 526 samples with both CNAs and GE data. In the dataset, CNA2Subpathway identified two significant SPs (FDR < 0.2) that are SP path:04140_8 in the autophagy pathway and path:05418_16 in the fluid shear stress and atherosclerosis pathway. Some studies proposed that these two pathways play important roles in the initiation and progression of cancer [40, 41]. For example, fluid shear stress and atherosclerosis pathway have been reported to be associated with altering the cancer cell response to receptor-mediated apoptosis [42, 43]. Moreover, most of the top 10 significant SPs (listed in Supplementary Table S1) are also associated with cancer, such as path:04150_16 in the mTOR signaling pathway and path:04066_3 in the HIF-1 signaling pathway, etc. To further explore their effect on the cancer process, the SP risk score is calculated in the same manner in the previous case. We then performed a Cox proportional hazard regression model estimated the risk score of each SP and the overall survival data. The results showed that these SPs are all risk factors [hazard ratios (HRs) > 1, P < 0.001] (Figure 4A), which indicates that these SPs may play important roles in the process of HNSC. We also calculated the SP risk scores of patients by using genes with CNAs and all genes in the SP, respectively. For the SP path:04140_8 in the autophagy pathway, the results showed that the risk score based on all genes in the SP had a better-classifying effect than those of the genes with CNAs (Figure 4B). This indicates that the CNAs may not only influence the states of their genes but also the neighboring genes in the SP, and further alter the activity of the SP.

(A) Forest plot shows HR, the 95% confidence interval (CI), and P-values of the top 10 significant subpathways in HNSC calculated by the univariate Cox proportional hazards regression model. (B) Kaplan–Meier plots show the prognostic signal related to the subpathway path:04140_8 in the autophagy pathway. The upper shows all genes in the path:04140_8 as signature; the lower shows only CNA genes in the path:04140_8 as a signature.
Figure 4

(A) Forest plot shows HR, the 95% confidence interval (CI), and P-values of the top 10 significant subpathways in HNSC calculated by the univariate Cox proportional hazards regression model. (B) Kaplan–Meier plots show the prognostic signal related to the subpathway path:04140_8 in the autophagy pathway. The upper shows all genes in the path:04140_8 as signature; the lower shows only CNA genes in the path:04140_8 as a signature.

We then tested genes with CNAs in the SPs, which may alter the SP activity. For the SP path:04140_8 in the autophagy pathway, three genes occur copy number amplification (IGF1R, PRKAA1 and IRS2) and nine genes occur copy number deletion (AKT1, MTOR, HRAS, INS, IRS1, NRAS, PIK3CD, PTEN and RHEB) (Supplementary Figure S1A). We use the ‘maftools’ package [44] to provide visualization of those 12 genes across the HNSC samples. The waterfall plot shows that out of 526, 425 (80.8%) samples are copy number altered for the genes (Figure 5A). Then, Fisher’s exact test was used to perform co-occurrence and mutual exclusivity analysis between each pair of CNAs genes in the SP. The results showed that more than 60% of the gene pair are co-occurrence (P < 0.05) (Figure 5B). This indicates that the CNAs genes in the SP may tend to jointly implement the biological function.

(A) Waterfall plot of 12 genes with CNAs in the subpathway path:04140_8 in the autophagy pathway. (B) Co-occurrence and mutual exclusivity plots between genes with CNAs. Fisher’s exact test was used to calculate the statistical significance between each pair of genes.
Figure 5

(A) Waterfall plot of 12 genes with CNAs in the subpathway path:04140_8 in the autophagy pathway. (B) Co-occurrence and mutual exclusivity plots between genes with CNAs. Fisher’s exact test was used to calculate the statistical significance between each pair of genes.

We further tested if the significant SPs were associated with the tumor immune microenvironment in HNSC. For the significant SP path:04140_8 in the autophagy pathway and SP path:05418_16 in the fluid shear stress and atherosclerosis pathway, the patients were classified into high-risk and low-risk groups according to median SP risk scores (Supplementary Figure S2A and C). For the patients in high-risk and low-risk groups classified by the SP path:04140_8, the stromal and immune scores were, respectively, calculated with ESTIMATE [37]. These two scores were shown to be a significant difference between the two groups (ANOVA analysis, P < 0.012 and P < 1.3e-7, Supplementary Figure S2B). For the patients classified by the SP path:05418_16, the immune scores also presented a significant difference (ANOVA analysis, P < 1.6e-4, Supplementary Figure S2D). This indicates that the dysregulated SPs may influence the tumor immune microenvironment in HNSC.

To detect the biological function of the SP, we mapped the genes in the SP path:04140_8 to its original pathway (Supplementary Figure S1A), and a local region of the autophagy pathway was identified (Supplementary Figure S1B). It has been proposed that the autophagy pathway is implicated in the pathogenesis of various diseases, like cancer, infectious diseases and neurodegenerative disorders [45]. According to this pathway map, we found that the SP region has crosstalk with the mTOR signaling pathway and the insulin signaling pathway (Supplementary Figure S1B).

SP analysis of multi-cancers

We selected eight cancers in TCGA, including BRCA, cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), colon adenocarcinoma (COAD), HNSC, lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), pancreatic adenocarcinomas (PAAD) and stomach adenocarcinoma (STAD), to quantitatively determine the similarity between different tumor types at the SP level. To provide a general comparison, the top 10 most significant SPs for each cancer identified by CNA2Subpathway were enrolled. Through comparison, 38 SPs only appear in a single cancer type and three SPs are overlapped in at least four (1/2) cancer types (Figure 6A). The three highly overlapped SPs are path:05418_16 in fluid shear stress and atherosclerosis, path:04066_3 in the HIF-1 signaling pathway, and path:04915_13 in the estrogen signaling pathway.

(A) Comparison of the top 10 most significant subpathways for each of eight cancers cancer identified by CNA2Subpathway. (B) Dot plot of univariate HRs and P-values for the three highly overlapped subpathways associated with patient outcome (HR > 1.3, P < 0.01) in their respective cancers but STAD. (C) Forest plot showing HR and 95% CI for the three overlapped subpathways across seven cancers except for STAD. HR, 95% CI, and P-values are calculated by using the univariate Cox proportional hazards regression model.
Figure 6

(A) Comparison of the top 10 most significant subpathways for each of eight cancers cancer identified by CNA2Subpathway. (B) Dot plot of univariate HRs and P-values for the three highly overlapped subpathways associated with patient outcome (HR > 1.3, P < 0.01) in their respective cancers but STAD. (C) Forest plot showing HR and 95% CI for the three overlapped subpathways across seven cancers except for STAD. HR, 95% CI, and P-values are calculated by using the univariate Cox proportional hazards regression model.

Through univariate Cox proportional hazards regression analysis, the three highly overlapped SPs were all found to be associated with patient prognosis in their respective cancers but STAD (HR > 1.35, P < 0.01) (Figure 6B). The SP path:05418_16 are implicated in seven cancer types, and SP path:04066_3 and path:04915_13 are implicated in four cancer types. The detailed HRs information for these SPs across seven cancers except for STAD was shown with a forest plot (Figure 6C). The highly overlapped SPs driven by CNAs across multiple cancers may play some consistent biological function. These overlapped SPs thereby highlight the repurposing potential of anti-cancer drugs targeting the SPs.

Comparison of CNA2Subpathway with other pathway analysis methods

To determine whether CNA2Subpathway could provide new biological insights in identifying important SPs driven by somatic CNAs, we compared our results with the widely used pathway or SP analysis methods, which are the hypergeometric test, GSEA [2], SPIA [4], SP-GM [11] and SubpathwayMiner [9]. The summary of these methods was listed in Supplementary Table S2. These methods are mainly designed to identify dysregulated pathways based on GE data; however, CNA2Subpathway identifies dysregulated pathways driven by CNAs by using SP crosstalk. To provide a fair comparison, we improved these methods to make them apply CNAs and GE data. Specifically, for GSEA, the CNA-DEscores of genes are calculated by integrating CNAs and GE data (see Method section), and a ranked list of genes is constructed according to decreasing CNA-DEscores. The genes in each pathway are mapped to the ranked gene list, and pathway enrichment scores were calculated to identify the statistically significant pathways. For the hypergeometric test, SPIA, SP-GM and subpathwayMiner methods, the Student’s t-test was used to compute the gene differentially expressed level between CNA labels (CNAs versus non-CNAs). The genes with FDR < 0.05 in the t-test were inputted into these methods to identify statistically significant pathways or SPs.

We applied the above six methods to the HNSC datasets. As these methods use different strategies to calculate the statistically significant level of pathways, we took the top 10 most significant pathways to provide a general comparison and used the pathway level for analysis. Through comparison, CNA2Subpathway uniquely identified four pathways, fluid shear stress and atherosclerosis, HIF-1 signaling pathway, cGMP-PKG signaling pathway and pathways in cancers, which were simultaneously missed by the other five methods (Table 2). These pathways have been reported by the literature to be associated with HNSC. For example, Traci et al. have shown the effects of cGMP pathway activators on the growth and survival of HNSC cells [46]. Additionally, Hanneke et al. and Makoto et al. have shown that the HIF-1 signaling pathway was used as a novel molecular therapeutic target in HNSC [47, 48]. We further compared the results of CNA2Subpathway with the other five methods in BRCA, CESC, COAD, LUAD, LUSC, PAAD and STAD. We also found that several cancer-related pathways were uniquely identified by CNA2Subpathway in these cancers. The detailed results of the comparison for the above cancers are listed in Supplementary Table S3–S10. These results indicate that CNA2Subpathway which applies the SP crosstalk may reveal some new biological dysregulated SPs driven by CNAs.

Table 2

Comparison of top 10 pathways identified by CNA2Subpathway, GSEA, hypergeometric test, SPIA, subpathway-GM, SubpathwayMiner in HNSC

PathwayCNA2SubpathwayHypergeometric testGSEASPIASubpathway-GMSubpathwayMiner
Autophagy animal
Fluid shear stress and atherosclerosis
Adipocytokine signaling pathway
mTOR signaling pathway
Longevity regulating pathway
Apelin signaling pathway
HIF-1 signaling pathway
Pancreatic cancer
cGMP-PKG signaling pathway
Pathways in cancer
PathwayCNA2SubpathwayHypergeometric testGSEASPIASubpathway-GMSubpathwayMiner
Autophagy animal
Fluid shear stress and atherosclerosis
Adipocytokine signaling pathway
mTOR signaling pathway
Longevity regulating pathway
Apelin signaling pathway
HIF-1 signaling pathway
Pancreatic cancer
cGMP-PKG signaling pathway
Pathways in cancer
Table 2

Comparison of top 10 pathways identified by CNA2Subpathway, GSEA, hypergeometric test, SPIA, subpathway-GM, SubpathwayMiner in HNSC

PathwayCNA2SubpathwayHypergeometric testGSEASPIASubpathway-GMSubpathwayMiner
Autophagy animal
Fluid shear stress and atherosclerosis
Adipocytokine signaling pathway
mTOR signaling pathway
Longevity regulating pathway
Apelin signaling pathway
HIF-1 signaling pathway
Pancreatic cancer
cGMP-PKG signaling pathway
Pathways in cancer
PathwayCNA2SubpathwayHypergeometric testGSEASPIASubpathway-GMSubpathwayMiner
Autophagy animal
Fluid shear stress and atherosclerosis
Adipocytokine signaling pathway
mTOR signaling pathway
Longevity regulating pathway
Apelin signaling pathway
HIF-1 signaling pathway
Pancreatic cancer
cGMP-PKG signaling pathway
Pathways in cancer

Discussion

CNAs drive tumor growth and evolution; however, the functional roles of CNAs across the genome are still poorly understood [49]. Many specific cancer-related genes with CNAs have been proposed to be associated with cancer outcomes [50]. Biological pathways reflect the key cellular mechanisms and biological functions, whose dysfunction plays some important roles in tumor occurrence, development and prognosis [2]. SPs are defined as local gene subregions within biological pathways. Compared with the original pathways, the SPs contain fewer components and reflect more specific biological functions. Therefore, determining the dysregulated SPs driven by CNAs in cancer is important to explore the pathogenesis of cancer and develop new drug targets. In the study, we propose a new network-based method, CNA2Subpathway, to identify dysregulated SPs by integrating CNAs, GE data and SP crosstalk network. CNA2Subpathway applied the network topology to calculate network centrality for each SP, which reflects the degree of SP abnormality. This method provides a novel way of SP analysis by using the SP interactions, which may uncover something new insight into the biological system.

We applied CNA2Subpathway to BRCA and HNSC data to evaluate the performance of the method. For the BRCA data, the SP path:00020_6 and path:05418_16 were the most significant dysregulated SPs, belong to the TCA cycle and fluid shear stress and atherosclerosis pathway, respectively. These two pathways are basic and important in the human body, and they have been regarded as breast cancer-related pathways recently [51, 52]. Moreover, the two SPs could be used as prognostic markers and are also found to be associated with cancer immune (Figure 3A–D). For the HNSC data, through comparison of survival analysis with all genes and genes with CNAs in the significant SP, an interesting discovery was that all genes in the SP are altered but not only the genes with CNAs. This indicates that the CNAs may not only influence their corresponding genes but also influence other genes in the SP through gene interaction and SP crosstalk. We also tested the proportional hazards assumption using the scaled Schoenfeld residuals when performing the Cox proportional hazards regression model. For the significant SPs in BRCA and HNSC data (SP path:00020_6 and path:05418_16 in BRCA, and SP path:04140_8 and path:05418_16 in HNSC), the risk scores of these SPs are mostly not statistically significant (P > 0.05) in Schoenfeld residuals test (Supplementary Figure S3A–D). Therefore, the Cox proportional hazards regression model can be applied to the risk scores of these SPs.

To assess whether the survival prediction ability of the SP risk scores independent of the other clinical characteristics (e.g. sex, age and clinical stage), multivariate Cox proportional hazard regression analysis was performed. The genes in the SP and the clinical characteristics, such as sex, age and clinical stage, were used as covariates. Thus, the gene weight was adjusted with the multivariate Cox proportional hazards regression coefficient. In the BRCA dataset, we recalculated the risks score for the SP path:00020_6 in the TCA cycle pathway and path:05418_16 in the fluid shear stress and atherosclerosis pathway with the adjusted gene weights. The results showed that the patients were classified into high-risk and low-risk groups (Supplementary Figure S4A-B; log-rank P = 2e-11 and 9e-16, respectively). In the HNSC dataset, the risk scores of the SP path:04140_8 in the autophagy pathway and path:05418_16 in the fluid shear stress and atherosclerosis pathway were also recalculated. Similar results were obtained (Supplementary Figure S4C-D; log-rank P = 1e-07 and 6e-11, respectively). These results indicate that the SP risk score was independent of the clinical characteristics of patients (e.g. sex, age and clinical stage).

We then applied CNA2Subpathway to other 18 cancers in TCGA, which include BLCA, CESC, COAD, ESCA GBM, KIRC, KIRP, LIHC, LUAD, LUSC, OV, PAAD, READ, SARC, SKCM, STAD, THYM and UCEC. The detailed information about the top 10 most significant SPs for each of the above cancer was shown in Supplementary Table S11. For each cancer, some cancer-related pathways have been identified. Through comparing the top 10 most significant SPs across the above cancers, we found that five SPs were overlapped in at least five cancers, including SP path:04066_3 in the HIF-1 signaling pathway, path:04140_8 in autophagy, path:04380_15 in Osteoclast differentiation, path:04915_13 in the estrogen signaling pathway and path:05418_16 in fluid shear stress and atherosclerosis (Supplementary Table S12). These SPs have been reported to play important roles in the progress of cancer in the published literature. For example, the HIF-1 signaling pathway has been found to be associated with breast cancer [53], head and neck squamous cell carcinoma [47] and lung squamous carcinoma [54]; the autophagy pathway are associated with bladder urothelial carcinoma [55], ovarian cancer [56], stomach adenocarcinoma [57], etc. The recurrent SPs across multiple cancers could be used as potential anti-cancer drug targets for drug repurposing.

To further validate the CNA2Subpathway method, we compared our results with five other pathway or SP analysis methods based on CNA and GE data. CNA2Subpathway uniquely identified four SPs that were simultaneously missed by the other five methods, which have been reported to be associated with the cancer process [46–48]. This indicates that the CNA2Subpathway method which applies SP crosstalk may uncover something new biological insights into underlying cancer.

Although the CNA2Subpathway method has realized the identification of specific dysregulated SPs driven by copy number variation in cancer, our method can be improved from the following aspects. The study of the impact of CNAs on GE in cancer is novel but not comprehensive and should be combined with other omics data, such as epigenomics. In the follow-up, we will use network information and other omics data to further improve our method. To provide the users with convenient analytical tools, CNA2Subpathway was developed into an available R package on GitHub (https://github.com/hanjunwei-lab/CNA2Subpathway). In the future, the CNA2Subpathway method will be developed as an online tool. In summary, our CNA2Subpathway method used SP crosstalk provides new ideas for pathway research in cancer and will be a new tool for the development of potential drug targets for cancer therapy.

Key Points
  • Biological pathways are models containing the gene interaction networks that dictate disease states, drug response and altered cellular function. Subpathways defined as local gene subregions within a biological pathway reflect more specific and precision biological functions.

  • In the system biology level, the pathways are not isolated but interact with each other. In the study, CNA2Subpathway provides a novel way of subpathway analysis by applying the subpathway crosstalk.

  • CNA2Subpathway is developed to identify dysfunctional subpathways driven by somatic copy number alterations (CNAs) in cancer through integrating pathway topology information, multi-omics data and subpathway crosstalk network.

Funding

This work was supported in part by the National Natural Science Foundation of China (grant nos. 62072145, 81804158), the Natural Science Foundation of Heilongjiang Province (grant no. LH2019C042), the China Postdoctoral Science Foundation (grant nos. 2016M591566).

Conflict of Interest

None declared.

Yuqi Sheng is a master at the College of Bioinformatics Science and Technology, Harbin Medical University. Her research interest focuses on bioinformatics.

Ying Jiang is an associate professor at the College of Basic Medical Science, Heilongjiang University of Chinese Medicine. She is also a postdoctoral fellow at the traditional Chinese medicine research station, Heilongjiang University of Chinese Medicine. Her research interests focus on the molecular biology and cell biology of cancer.

Yang Yang is a master at the College of Bioinformatics Science and Technology, Harbin Medical University. Her research interest focuses on bioinformatics.

Xiangmei Li is a master at the College of Bioinformatics Science and Technology, Harbin Medical University. Her research interest focuses on bioinformatics.

Jiayue Qiu is a master at the College of Bioinformatics Science and Technology, Harbin Medical University. Her research interest focuses on computational system biology.

Jiashuo Wu is a master at the College of Bioinformatics Science and Technology, Harbin Medical University. His research interest focuses on computational system biology.

Liang Cheng is a professor and principal investigator at the College of Bioinformatics Science and Technology, Harbin Medical University. His research interest focuses on bioinformatics.

Junwei Han is a professor and principal investigator at the College of Bioinformatics Science and Technology, Harbin Medical University. His research interests focus on bioinformatics and computational system biology.

References

1.

Cancer Genome Atlas Research Network
.
Comprehensive molecular characterization of clear cell renal cell carcinoma
.
Nature
2013
;
499
:
43
9
.

2.

Kanehisa
M
,
Sato
Y
,
Kawashima
M
, et al.
KEGG as a reference resource for gene and protein annotation
.
Nucleic Acids Res
2016
;
44
:
D457
62
.

3.

Subramanian
A
,
Tamayo
P
,
Mootha
VK
, et al.
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci USA
2005
;
102
:
15545
50
.

4.

Tarca
AL
,
Draghici
S
,
Khatri
P
, et al.
A novel signaling pathway impact analysis
.
Bioinformatics
2009
;
25
:
75
82
.

5.

Han
J
,
Shi
X
,
Zhang
Y
, et al.
ESEA: discovering the dysregulated pathways based on edge set enrichment analysis
.
Sci Rep
2015
;
5
:13044.

6.

Han
J
,
Liu
S
,
Sun
Z
, et al.
LncRNAs2Pathways: identifying the pathways influenced by a set of lncRNAs of interest based on a global network propagation method
.
Sci Rep
2017
;
7
:46566.

7.

Han
J
,
Li
C
,
Yang
H
, et al.
A novel dysregulated pathway-identification analysis based on global influence of within-pathway effects and crosstalk between pathways
.
J R Soc Interface
2015
;
12
:20140937.

8.

Pham
L
,
Christadore
L
,
Schaus
S
, et al.
Network-based prediction for sources of transcriptional dysregulation using latent pathway identification analysis
.
Proc Natl Acad Sci USA
2011
;
108
:
13347
52
.

9.

Li
C
,
Li
X
,
Miao
Y
, et al.
SubpathwayMiner: a software package for flexible identification of pathways
.
Nucleic Acids Res
2009
;
37
:
e131
.

10.

Yang
EV
,
Sood
AK
,
Chen
M
, et al.
Norepinephrine up-regulates the expression of vascular endothelial growth factor, matrix metalloproteinase (MMP)-2, and MMP-9 in nasopharyngeal carcinoma tumor cells
.
Cancer Res
2006
;
66
:
10357
64
.

11.

Li
C
,
Han
J
,
Yao
Q
, et al.
Subpathway-GM: identification of metabolic subpathways via joint power of interesting genes and metabolites and their topologies within pathways
.
Nucleic Acids Res
2013
;
41
:
e101
.

12.

Lee
S
,
Park
Y
,
Kim
S
.
MIDAS: mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data
.
Methods
2017
;
124
:
13
24
.

13.

Han
J
,
Han
X
,
Kong
Q
, et al.
psSubpathway: a software package for flexible identification of phenotype-specific subpathways in cancer progression
.
Bioinformatics
2020
;
36
:
2303
5
.

14.

Han
J
,
Liu
S
,
Jiang
Y
, et al.
Inference of patient-specific subpathway activities reveals a functional signature associated with the prognosis of patients with breast cancer
.
J Cell Mol Med
2018
;
22
:
4304
16
.

15.

Zhang
T
,
Tan
P
,
Wang
L
, et al.
RNALocate: a resource for RNA subcellular localizations
.
Nucleic Acids Res
2017
;
45
:
D135
8
.

16.

Sathyanarayanan
A
,
Gupta
R
,
Thompson
EW
, et al.
A comparative study of multi-omics integration tools for cancer driver gene identification and tumour subtyping
.
Brief Bioinform
2019
;
21
:
1920
36
.

17.

Zhang
L
,
Feizi
N
,
Chi
C
, et al.
Association analysis of somatic copy number alteration burden with breast cancer survival
.
Front Genet
2018
;
9
:
421
.

18.

Heitzer
E
,
Ulz
P
,
Geigl
JB
, et al.
Non-invasive detection of genome-wide somatic copy number alterations by liquid biopsies
.
Mol Oncol
2016
;
10
:
494
502
.

19.

Lai
YP
,
Wang
LB
,
Wang
WA
, et al.
iGC-an integrated analysis package of gene expression and copy number alteration
.
BMC Bioinformatics
2017
;
18
:35.

20.

Lee
J-S
,
Tamborero
D
,
Lopez-Bigas
N
, et al.
Oncodrive-CIS: a method to reveal likely driver genes based on the impact of their copy number changes on expression
.
PLoS One
2013
;
8
:e55489.

21.

Lee
H
,
Kong
SW
,
Park
PJ
.
Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes
.
Bioinformatics
2008
;
24
:
889
96
.

22.

Liu
S
,
Zheng
B
,
Sheng
Y
, et al.
Identification of cancer dysfunctional subpathways by integrating DNA methylation, copy number variation, and gene-expression data
.
Front Genet
2019
;
10
:
441
.

23.

Ashburner
M
,
Ball
CA
,
Blake
JA
, et al.
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium
.
Nat Genet
2000
;
25
:
25
9
.

24.

Mermel
CH
,
Schumacher
SE
,
Hill
B
, et al.
GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers
.
Genome Biol
2011
;
12
:R41.

25.

Kohler
S
,
Bauer
S
,
Horn
D
, et al.
Walking the interactome for prioritization of candidate disease genes
.
Am J Hum Genet
2008
;
82
:
949
58
.

26.

Benjamini
Y
,
Hochberg
Y
.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J R Stat Soc B Methodol
1995
;
57
(
1
):
289
300
.

27.

Chi
C
,
Murphy
LC
,
Hu
P
.
Recurrent copy number alterations in young women with breast cancer
.
Oncotarget
2018
;
9
:
11541
58
.

28.

Jack
N
,
Edwards
J
,
Dhanessar
W
, et al.
A study of HTLV-I infection and breast cancers in Trinidad and Tobago
.
Int J Cancer
2000
;
85
:
298
9
.

29.

Manavathi
B
,
Dey
O
,
Gajulapalli
VN
, et al.
Derailed estrogen signaling and breast cancer: an authentic couple
.
Endocr Rev
2013
;
34
:
1
32
.

30.

Choi
HY
,
Yang
GM
,
Dayem
AA
, et al.
Hydrodynamic shear stress promotes epithelial-mesenchymal transition by downregulating ERK and GSK3beta activities
.
Breast Cancer Res
2019
;
21
:6.

31.

Ho
JY
,
Chang
FW
,
Huang
FS
, et al.
Estrogen enhances the cell viability and motility of breast cancer cells through the ERalpha-DeltaNp63-integrin beta4 signaling pathway
.
PLoS One
2016
;
11
:e0148301.

32.

Schmidt
C
,
Sciacovelli
M
,
Frezza
C
.
Fumarate hydratase in cancer: a multifaceted tumour suppressor
.
Semin Cell Dev Biol
2019
;
98
:
15
25
.

33.

Cardaci
S
,
Ciriolo
MR
.
TCA cycle defects and cancer: when metabolism tunes redox state
.
Int J Cell Biol
2012
;
2012
:
1
9
.

34.

Ryan
DG
,
Murphy
MP
,
Frezza
C
, et al.
Coupling Krebs cycle metabolites to signalling in immunity and cancer
.
Nat Metab
2019
;
1
:
16
33
.

35.

Lehtonen
HJ
,
Kiuru
M
,
Ylisaukko-Oja
SK
, et al.
Increased risk of cancer in patients with fumarate hydratase germline mutation
.
J Med Genet
2006
;
43
:
523
6
.

36.

Davoli
T
,
Uno
H
,
Wooten
EC
, et al.
Tumor aneuploidy correlates with markers of immune evasion and with reduced response to immunotherapy
.
Science
2017
;
355
:eaaf8399.

37.

Yoshihara
K
,
Shahmoradgoli
M
,
Martinez
E
, et al.
Inferring tumour purity and stromal and immune cell admixture from expression data
.
Nat Commun
2013
;
4
:2612.

38.

Scagliola
A
,
Mainini
F
,
Cardaci
S
.
The tricarboxylic acid cycle at the crossroad between cancer and immunity
.
Antioxid Redox Signal
2020
;
32
:
834
52
.

39.

Jimenez-Sanchez
A
,
Cast
O
,
Miller
ML
.
Comprehensive benchmarking and integration of tumor microenvironment cell estimation methods
.
Cancer Res
2019
;
79
:
6238
46
.

40.

Gerada
C
,
Ryan
KM
.
Autophagy, the innate immune response and cancer
.
Mol Oncol
2020
;
14
:
1913
29
.

41.

Wang
X
,
Zhang
Y
,
Feng
T
, et al.
Fluid shear stress promotes autophagy in hepatocellular carcinoma cells
.
Int J Biol Sci
2018
;
14
:
1277
90
.

42.

Rodrigues
JA
.
Isogeometric analysis for fluid shear stress in cancer cells
.
Math Comput Appl
2020
;
25
:
19
.

43.

Mitchell
MJ
,
King
MR
.
Fluid shear stress sensitizes cancer cells to receptor-mediated apoptosis via trimeric death receptors
.
New J Phys
2013
;
15
:015008.

44.

Mayakonda
A
,
Lin
DC
,
Assenov
Y
, et al.
Maftools: efficient and comprehensive analysis of somatic variants in cancer
.
Genome Res
2018
;
28
:
1747
56
.

45.

Moreau
K
,
Luo
S
,
Rubinsztein
DC
.
Cytoprotective roles for autophagy
.
Curr Opin Cell Biol
2010
;
22
:
206
11
.

46.

Tuttle
TR
,
Mierzwa
ML
,
Wells
SI
, et al.
The cyclic GMP/protein kinase G pathway as a therapeutic target in head and neck squamous cell carcinoma
.
Cancer Lett
2016
;
370
:
279
85
.

47.

Stegeman
H
,
Span
PN
,
Peeters
WJ
, et al.
Interaction between hypoxia, AKT and HIF-1 signaling in HNSCC and NSCLC: implications for future treatment strategies
.
Future Sci OA
2016
;
2
:FSO84.

48.

Adachi
M
,
Cui
C
,
Dodge
CT
, et al.
Targeting STAT3 inhibits growth and enhances radiosensitivity in head and neck squamous cell carcinoma
.
Oral Oncol
2012
;
48
:
1220
6
.

49.

Bhattacharya
A
,
Bense
RD
,
Urzua-Traslavina
CG
, et al.
Transcriptional effects of copy number alterations in a large set of human cancers
.
Nat Commun
2020
;
11
:715.

50.

Smith
JC
,
Sheltzer
JM
.
Systematic identification of mutations and copy number alterations associated with cancer patient prognosis
.
Elife
2018
;
7
:e39217.

51.

Tang
K
,
Yu
Y
,
Zhu
L
, et al.
Hypoxia-reprogrammed tricarboxylic acid cycle promotes the growth of human breast tumorigenic cells
.
Oncogene
2019
;
38
:
6970
84
.

52.

Novak
CM
,
Horst
EN
,
Taylor
CC
, et al.
Fluid shear stress stimulates breast cancer cells to display invasive and chemoresistant phenotypes while upregulating PLAU in a 3D bioreactor
.
Biotechnol Bioeng
2019
;
116
:
3084
97
.

53.

Wong
CC
,
Gilkes
DM
,
Zhang
H
, et al.
Hypoxia-inducible factor 1 is a master regulator of breast cancer metastatic niche formation
.
Proc Natl Acad Sci USA
2011
;
108
:
16369
74
.

54.

Liu
X
,
Chen
H
,
Xu
X
, et al.
Insulin-like growth factor-1 receptor knockdown enhances radiosensitivity via the HIF-1alpha pathway and attenuates ATM/H2AX/53BP1 DNA repair activation in human lung squamous carcinoma cells
.
Oncol Lett
2018
;
16
:
1332
40
.

55.

Chandrasekar
T
,
Evans
CP
.
Autophagy and urothelial carcinoma of the bladder: a review
.
Investig Clin Urol
2016
;
57
(
Suppl 1
):
S89
97
.

56.

Xia
H
,
Li
S
,
Li
X
, et al.
Autophagic adaptation to oxidative stress alters peritoneal residential macrophage survival and ovarian cancer metastasis
.
JCI Insight
2020
;
5
:e141115.

57.

Rao
SV
,
Solum
G
,
Niederdorfer
B
, et al.
Gastrin activates autophagy and increases migration and survival of gastric adenocarcinoma cells
.
BMC Cancer
2017
;
17
:68.

Author notes

Yuqi Sheng, Ying Jiang and Yang Yang authors contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)