-
PDF
- Split View
-
Views
-
Cite
Cite
Yu-e Huang, Shunheng Zhou, Haizhou Liu, Xu Zhou, Mengqin Yuan, Fei Hou, Sina Chen, Jiahao Chen, Lihong Wang, Wei Jiang, DRdriver: identifying drug resistance driver genes using individual-specific gene regulatory network, Briefings in Bioinformatics, Volume 24, Issue 2, March 2023, bbad066, https://doi.org/10.1093/bib/bbad066
- Share Icon Share
Abstract
Drug resistance is one of principal limiting factors for cancer treatment. Several mechanisms, especially mutation, have been validated to implicate in drug resistance. In addition, drug resistance is heterogeneous, which makes an urgent need to explore the personalized driver genes of drug resistance. Here, we proposed an approach DRdriver to identify drug resistance driver genes in individual-specific network of resistant patients. First, we identified the differential mutations for each resistant patient. Next, the individual-specific network, which included the genes with differential mutations and their targets, was constructed. Then, the genetic algorithm was utilized to identify the drug resistance driver genes, which regulated the most differentially expressed genes and the least non-differentially expressed genes. In total, we identified 1202 drug resistance driver genes for 8 cancer types and 10 drugs. We also demonstrated that the identified driver genes were mutated more frequently than other genes and tended to be associated with the development of cancer and drug resistance. Based on the mutational signatures of all driver genes and enriched pathways of driver genes in brain lower grade glioma treated by temozolomide, the drug resistance subtypes were identified. Additionally, the subtypes showed great diversity in epithelial–mesenchyme transition, DNA damage repair and tumor mutation burden. In summary, this study developed a method DRdriver for identifying personalized drug resistance driver genes, which provides a framework for unlocking the molecular mechanism and heterogeneity of drug resistance.
Introduction
Chemotherapy and molecular targeted therapy are principal ways for cancer treatment. However, there are various mechanisms that enable the survival of cancer cells during the treatment, such as drug target alteration, epithelial–mesenchyme transition (EMT) and DNA damage repair (DDR) [1]. Therefore, the therapeutic effects of drugs are evaded, leading to the emergence of drug resistance.
Drug resistance is one of the main reasons for cancer treatment failure. Current studies have shown that mutation is crucial in the development of drug resistance, especially the mutations of drug target [1, 2]. For example, imatinib is a molecular targeted drug that inhibits the fusion protein BCR-ABL1 in chronic myelogenous leukemia. The T315I mutation in BCR-ABL1 can change the structure of the protein and prevent the binding of imatinib to the target, leading to the development of drug resistance [3]. Therefore, distinguishing the mutations driving the development of drug resistance from the massive data can facilitate the understanding of the process of drug resistance, which provides new insights for the treatment of drug-resistant patients.
Previous researches have shown that drug resistance is heterogeneous [4]. For example, Lee et al. analyzed the sequencing data of 19 non-small cell lung cancer patients who were resistant to EGFR tyrosine kinase inhibitors. They identified EGFR T790M as the most common mechanism of resistance (63.2%). However, TP53 mutation predominated in the T790M-negative tumors [5]. In addition, among 45 BRAF V600-mutant metastatic melanoma patients with vemurafenib or dabrafenib resistance, 44.4% resistance were caused by alterations in MAPK pathway or downstream effectors. The changes of PI3K pathway and HOXD8 or RAC1 lead to the resistance of the remaining patients [6]. The emergence of resistance heterogeneity makes the treatment of cancer more difficult. Therefore, identifying the mutations, which drive drug resistance at the individual level, is an important way to deeply understand the molecular mechanism of personalized drug resistance and accurately predict the resistance subtypes of patients.
Extensive studies have been devoted to identifying personalized cancer driver genes. Different ways were utilized to measure the influence of mutated genes on differentially expressed genes (DEGs). A subset of mutated genes with best influence were screened as the personalized driver genes [7–11]. Currently, there were several researches adding the effect on non-differentially expressed genes (nDEGs), which could minimize the irrelevant influence [12]. In addition, most previous studies for identifying drug resistance gene were still limited to the DEGs between sensitive and resistant groups [13, 14], instead of personalized level.
To address the existing limitations, we developed an approach called DRdriver to identify drug resistance driver genes based on individual-specific network. We defined a cancer type treated by a drug as a condition. First, by comparing the mutation of the resistant patient and the corresponding sensitive patients in each condition, we identified the genes with differential mutations as candidates. Mapping the candidates into the known gene regulatory network (GRN), we extracted an individual-specific network for each resistant patient. Then, we utilized genetic algorithm (GA) to identify the genes from the candidates that regulated the most DEGs and the least nDEGs as personalized drug resistance driver genes. Next, the driver genes of each condition were verified by the mutation frequency, functional enrichment and mutational signature. To further analyze the resistance heterogeneity, all mutational signatures, and the enriched pathways in condition brain lower grade glioma (LGG) treated by temozolomide (LGG_Temozolomide) were used to cluster the patients into different subtypes (Figure 1). Taken together, DRdriver highlights the potential of drug resistance driver genes for uncovering the molecular mechanism and heterogeneity of drug resistance, which were benefit to the stratification and personalized therapy of patients with drug resistance.

Methods
Data collection and processing
In this study, we downloaded the gene expression profiles, mutation data and clinical information from Genomic Data Commons (GDC) of The Cancer Genome Atlas (TCGA) (Version March 2021). The clinical information included treating drugs, drug responses and survival data. According to the drug responses, we considered the patients who were complete response (CR) and partial response (PR) after treatment as sensitive patients. The patients who were stable disease (SD) and clinical progressive disease (PD) after treatment were considered as resistant patients [15, 16]. We retained the condition which the number of patients in sensitive and resistant groups was greater than 10. In total, 13 conditions were remained, including 8 cancer types and 10 drugs. The number of sensitive and resistant patients for each treatment condition was shown in Supplementary Table S1.
Focusing on the non-silent mutations in sensitive and resistant groups of each condition, we compared mutations between each resistant patient and all patients in the corresponding sensitive group to identify two types of mutations: (i) the mutations that only existed in the resistant patient; and (ii) the mutations which had higher variant allele frequency (fold change (FC) > 2) in the resistant patient than the patients in sensitive group. The union of the above two types of mutations were considered as the differential mutations. The genes with differential mutations were defined as candidate genes.
Based on expression profiles (read counts) of sensitive and resistant patients in each condition, we utilized R package edgeR (V3.32.1) [17] to calculate the statistical significance of differential expression between each resistant patient and all patients in sensitive group. edgeR uses the negative binomial (NB) distribution to model the expression of each gene. The Bayesian method and the quasi-likelihood (QL) method are used to estimate the variability of gene expression, even when there are small replicates. Next, the NB generalized linear model is constructed, and the QL F-test is used to compute the statistical significance. Here, we defined the DEGs and nDEGs using the threshold of P < 0.05.
Construction of individual-specific network
We collected the known GRN from Hu et al. [12], which integrated annotations from three high-quality pathway databases: Reactome, Kyoto Encyclopedia of Genes and Genomes (KEGG) and NCI-Nature Pathway Interaction Database. The undirected, redundant and small molecule-associated interactions were removed. Next, we unified the nodes into Ensembl ID, and obtained 5956 nodes and 108 193 edges (Supplementary Table S2). For each condition, we deleted the regulations whose targets had no expression. Next, we extracted the subnetwork which only included the candidate genes and their regulatory targets for each resistant patient. Based on the results of differential expression analysis, we labeled the targets as DEGs and nDEGs. Thus, the individual-specific network was constructed.
Identification of drug resistance driver genes
Based on the individual-specific network, GA was utilized to identify an optimal subset of genes which regulated the most DEGs and the least nDEGs as drug resistance driver genes for each resistant patient [12]. First, we encoded initial population as 1 or 0 (1 means that the gene was selected, and 0 indicates that the gene was not selected). The length of 1–0 string is the number of candidate genes. We set parameter k to control the proportion of genes retained from the old population. In the initialization, k was set as 0.5. It means half of genes in each old individual were retained. m indicated the number of individuals in the population, which means m candidate solutions were produced in each iteration. Next, the fitness was calculated as follows:
IRT means the targets regulated by the genes in the individual. ART indicates all the regulatory targets. We then sorted the individuals in descending order according to the fitness, and selected the top m/2 individuals as elite population. In addition, we used the roulette rule to screen out two individuals as parents. Furthermore, crossover and mutation (variation) were repeated 10 times to produce 20 offspring. Then the child with the largest fitness was selected. Repeating the process of parents’ selection and variation m/2 times, m/2 individuals in offspring were produced. Next, we combined the elite population and the offspring into new population. We then computed the difference (|$\delta$|) of the best fitness between new population and old population
|$\delta \le 0$| means that new population could not produce better solution. When |$\delta \le 0$| appeared in five consecutive iterations, it meant halving the number of genes could not improve the performance of the model. Therefore, we set the parameter k as 1 (it means that the number of gene was no longer halved from the old population). The population with larger best fitness was retained in the next iteration to ensure the better solution. We defined the |$\delta \le 0$| appeared in 10 consecutive iterations, which meant that the optimal solution might be included in current population, as the terminated threshold (Figure 2). In this study, m was set to 1000. Crossover rate was 0.9. Mutation rate was 0.3. Finally, the genes of the best individual were predicted as drug resistance driver genes.

Workflow of GA for identification of drug resistance driver genes.
Identification and characterization of resistance subtypes
The ability for resistance subtyping of drug resistance driver genes was explored. For all resistant patients, we utilized hierarchical clustering to cluster the patients using the mutational signatures from all driver genes. Next, taking the condition of LGG_Temozolomide as an example, we also clustered the patients based on driver pathway profiles, which were constructed by pathway enrichment of driver genes in each resistant patient. The clustering of the driver pathway profiles was computed by non-negative matrix factorization (NMF) based on consensus clustering using the R package NMF (V0.23.0) [18]. In order to produce robust clustering, the consensus clustering was obtained using 100 random runs of NMF optimization algorithm. In addition, in order to validate the accuracy of subtypes, we employed the R package IOBR (V0.99.9) [19] to calculate the EMT and DDR scores of different subtypes. IOBR collects several published signature gene sets, including signatures of EMT and DDR, and adopts three computational methods to calculate the signature score. Here, we utilized calculate_sig_score function to calculate the EMT and DDR scores using single-sample Gene Set Enrichment Analysis (ssGSEA) method.
Statistical analysis
Hypergeometric test was utilized for functional gene set enrichment analysis. In survival analysis, we compared the overall survival (OS) of patients separated by differential mutations in drug resistance driver genes. Kaplan–Meier curves were drawn and log rank P-values were computed using R package survival [20]. Kolmogorov–Smirnov test was used for comparing the density of differential mutations between driver and non-driver genes. Other statistical significances were calculated by Mann–Whitney U test and Kruskal–Wallis test. P < 0.05 was considered statistically significant.
Results
Global view of non-silent mutations
According to the clinical information, we classified patients into sensitive and resistant groups. Ensuring the number of patients in both groups was more than 10, we retained 13 conditions, involving 8 cancer types and 10 drugs. Next, all non-silent mutations of the 13 conditions were extracted. The top 20 frequently mutated genes of all conditions are shown in Figure 3A. Some known cancer driver genes, such as TP53, KRAS, MUC16 and IDH1 [21], were top mutated in the selected conditions. In addition, we compared the fractions of variant class between sensitive and resistant groups. The results showed that there was no obvious difference. The proportion of missense mutation in both groups was the largest, which were more than 80% (Figure 3B). For each condition, the missense mutations were also the most non-silent mutations in resistant group (Supplementary Table S3). For single nucleotide variants, we counted the fraction of six possible base-pair substitutions, and found that resistant patients had more C > T transition than sensitive patients (P < 0.01) (Figure 3C). Previous studies reported that some known resistant mutations, such as EGFR T790M and KRAS G12C, were C > T transitions [22, 23]. In addition, tumor mutation burden (TMB) reflects tumor mutation quantity. Thereby, we also calculated the TMB of each patient. The result showed that the prevalence of non-silent mutation was highly variable between and within the conditions, ranging from about 0 per megabase (Mb) to more than 100 per Mb (Figure 3D), which indicated that there was mutation heterogeneity across different patients. Therefore, it is more necessary to analyze the molecular mechanism of drug resistance from a personalized perspective.

Data landscape of non-silent mutations. (A) Oncoplot of non-silent mutations in 13 conditions. (B) Statistic of the proportion of variant class in sensitive and resistant groups. (C) Fraction of six possible base-pair substitutions in both groups. (D) TMB of each patient in different conditions. Every dot represents a patient, whereas the black horizontal lines are the median log10 TMB.
Identification and overview of drug resistance driver genes
DRdriver was designed for identification of drug resistance driver genes based on individual-specific network. The detailed framework is shown in Figure 1. For each condition, comparing each resistant patient with the corresponding sensitive patients, we identified the genes with differential mutations as candidates. Next, we mapped the candidates into the known GRN. Then a subnetwork which only included candidates and their regulatory targets was extracted as individual-specific network. Furthermore, we used GA to identify the genes which have the greatest influence as personalized drug resistance driver genes (details in Methods section). In total, we identified 1202 drug resistance driver genes in 13 conditions (Figure 4A and Supplementary Table S4). Through counting the number of conditions of each driver gene, we found that most driver genes were identified in only a few conditions (Figure 4A). In addition, the density of differential mutations in driver and non-driver genes showed that the driver genes had more differential mutations (Kolmogorov–Smirnov test, P < 2.2|$\times$|1016) (Figure 4B). The number of driver genes in each condition was also counted. There were different number of driver genes in different conditions, which were ranged from 47 to 281 (Figure 4C). We then only focused on missense mutations, and found that there were large overlaps (79–93%) in the identified driver genes based on missense mutations and all non-silent mutations for all conditions (Supplementary Table S5). Collectively, DRdriver could identify the drug resistance driver genes at the individual level, which was benefit to the exploration and treatment of drug resistance.

Overview of the drug resistance driver genes. (A) Circos of drug resistance driver genes in all conditions. Heatmap displays whether the gene was predicted as driver in each condition. Blue represents the gene was driver gene in the condition. Green indicates the gene was not driver. Different circular track of heatmap shows different condition. The conditions from inside to outside are BLCA_Cisplatin, BLCA_Gemcitabine, COAD_Fluorouracil, COAD_Leucovorin, LGG_Temozolomide, LUAD_Carboplatin, LUAD_Paclitaxel, LUAD_Pemetrexed, PAAD_Gemcitabine, SARC_Docetaxel, SKCM_Dacarbazine, STAD_Cisplatin and STAD_Fluorouracil. The orange line indicates the number of conditions, in which the gene was identified as driver gene. The genes marked in red are multidrug-resistant genes. (B) Density of differential mutations between driver and non-driver genes. (C) The number of drug resistance driver genes in different conditions.
Validation of the identified drug resistance driver genes
Many studies have been devoted to identifying cancer driver gene through mutation frequency, such as MutsigCV [24]. It was supposed that genes with higher mutation frequency were more likely to drive the development of cancer. Therefore, we compared the mutation frequency between the predicted driver genes and non-driver genes. We found that the driver genes were more frequently mutated (Figure 5A). Furthermore, we downloaded the Cancer Gene Census (CGC) genes, drug targets, known resistance genes and EMT genes, which had already been confirmed to be related to the development of cancer and drug resistance, from Catalogue of Somatic Mutations in Cancer (COSMIC) [21], DrugBank [25], Lau et al. [26] and dbEMT 2.0 [27], respectively. We found that most driver genes identified in all conditions or each condition were significantly enriched in these above functional genes (P < 0.05) (Figure 5B and C).

Validation of drug resistance driver genes. (A) Compare the mutation frequency between driver genes and non-driver genes in different conditions (*: P < 0.05, **: P < 0.01, ***: P < 0.001, ns: P > 0.05). (B) Enrichment of all driver genes in the four functional gene sets. CGC: CGC genes; DTG: drug targets; KRG: known resistance genes; EMT: EMT genes. (C) Enrichment of driver genes of each condition in the four functional gene sets. The size of point indicates the statistical significance. The blank represents the intersection of these two gene sets is not significant. (D) Functional enrichment of driver genes in all conditions. Here only displays the top 10. (E) Functional enrichment of driver genes in each condition. The color represents the significance of functional analysis. The functions listed on the left are shared functions. The LGG_Temozolomide specific functions are listed on the right.
Next, the functional enrichment of driver genes in all conditions and each condition were performed separately. For all conditions, some biological processes associated with drug resistance and cancer development were significantly enriched (Figure 5D). For example, purine nucleotides are fundamental and necessary for tumor cell proliferation. The abnormalities of purine nucleotides metabolic process might affect the growth, invasiveness and metastasis of cancer cells and induce the drug resistance [28, 29]. Based on the results of different conditions, we defined the functions which were identified in more than half of all conditions as shared functions, which might be involved in multidrug-resistance (MDR) (Figure 5E). For example, downregulating the permeable channels to change the calcium ion transmembrane transport promoted the MDR [30]. Furthermore, the condition specific functions were also noted. Taking LGG_Temozolomide as an example, we also identified some processes which had been reported to be related to temozolomide resistance, such as Notch and epidermal growth factor receptor signaling pathways [31, 32].
In addition, we further identified the mutational signatures of differential mutations in drug resistance driver genes by using the sig_auto_extract function in R package sigminer (V2.1.3), which utilized Bayesian NMF to obtain the mutational signatures and determine a proper signature number by maximizing the posterior probability [33, 34]. For driver genes of all conditions, we identified five mutational signatures (Figure 6A). COSMIC revealed the compendium of mutational signatures and their etiology by analyzing large-scale genome data. In order to link the etiology to the mutational signatures extracted by driver genes, we calculated the cosine similarity of our identified mutational signatures and the mutational signatures in COSMIC. Setting 0.65 as a threshold [35], the etiology of the similar mutational signatures in COSMIC was regarded as the etiology of ours. Among five mutational signatures, we found that four signatures (Sig1, Sig2, Sig3 and Sig4) had clear etiology, and were associated with drug resistance (Figure 6B). For example, SBS6 and SBS15 were reported to be associated with defective DNA mismatch repair [36], which was participated in drug resistance [1]. Sig5 was similar to SBS17b in COSMIC. Although it has not been clearly described, previous studies had associated SBS17b to fluorouracil chemotherapy [37]. For mutational signature in each condition (Supplementary Figure S1), we also calculated the cosine similarity with the signatures in COSMIC. According to their etiology, several of them were also implicated in drug resistance (Figure 6C). Moreover, we found some cancer-specific mutational signatures. For example, two tobacco-related signatures (SBS4: tobacco smoking, SBS29: tobacco chewing) were similar to the extracted signatures in lung adenocarcinoma (LUAD). Tobacco was the most established cause of lung cancer [38]. Ultraviolet was also a known etiology of skin cancer [39]. The signatures about ultraviolet (SBS7a: ultraviolet light exposure, SBS7d: ultraviolet light exposure) were similar to the extracted signature in skin cutaneous melanoma (SKCM). We also compared the similarity between mutational signatures in all conditions and single condition, and found that each mutational signature in all conditions tended to be similar to the signatures extracted in the specific cancer type (Figure 6D). For example, Sig3 was similar to the signatures extracted in LUAD.

Mutational signatures of drug resistance driver genes. (A) Five mutational signatures of driver genes in all conditions. (B) Four mutational signatures identified by driver genes in all conditions and their corresponding COSMIC mutational signatures with the etiology. The size of point indicates the degree. (C) Sankey plot of the mutational signatures identified in each condition and their corresponding COSMIC mutational signatures. The line thickness indicates the cosine similarity. (D) Heatmap of the cosine similarity between the mutational signatures identified by all and single condition’s driver genes. The shade of color represents the similarity.
To explore whether the driver genes were associated with prognosis, we compared the survival time between the patients with and without differential mutations in driver genes. For each condition, the patients with survival information of corresponding cancer type were divided into two groups by differential mutations in driver genes. Then, the OS between the two groups were compared, and the results suggested that patients with differential mutations in driver genes had poorer prognosis in most conditions (log rank P < 0.05) (Supplementary Figure S2).
In total, the driver genes were more frequently mutated and validated to be associated with the development of cancer and drug resistance from several perspectives, which indicated the accuracy of the predicted drug resistance driver genes.
Explanation of MDR genes and patients
We calculated the overlap of driver genes in different conditions, and found that they shared few driver genes (Figure 7A). Next, we counted the number of driver genes identified in different number of conditions, and found that most driver genes tended to be present in only a few conditions (Figure 7B). However, five driver genes (TP53, RYR2, CACNA1C, KRAS and KCNA5) (Figure 4A) were predicted in more than half conditions (Figure 7C), which demonstrated that they might be multidrug-resistant genes. TP53 and KRAS are known cancer driver genes. Many studies have shown their roles in the process of resistance to many drugs, such as cisplatin, temozolomide, doxorubicin, gemcitabine [40, 41]. RYR2 and CACNA1C encode the components of a calcium channel. Both of them play key roles in the transportation of calcium ions. Previous studies have shown that downregulating calcium ions channels to reduce calcium ions influx could promote the MDR [30, 42]. KCNA5 encodes a member of potassium channel, voltage-gated, shaker-related subfamily. The potassium channel activity had already confirmed to be related in MDR [30]. In addition, we also found that the shared driver genes in the same cancer type were more than that in different cancer types (Figure 7A). We hypothesized whether the result was caused by multidrug-resistant patients (MDRP) in the same cancer type. To validate the hypothesis, we computed the intersection of patients in different conditions, and found that there were several shared patients in the same cancer type indeed (Figure 7D). To explore the characteristic of MDRP, we compared the number of mutations, mutated genes, differential mutations, differentially mutated genes and driver genes between MDRP and single drug-resistant patients (SDRP) (Figure 7E and F, Supplementary Figure S3). The results demonstrated that MDRP carried more mutations and driver genes, which indicated the complex mechanisms of MDR.

Statistic of multidrug-resistant genes and patients. (A) Overlap of driver genes in different conditions. The size and color of point indicates the Jaccard coefficients. (B) Statistic of the number of driver genes identified in different number of conditions. (C) Multidrug-resistant gene distribution in different conditions. The above bar chart represents the number of conditions of each gene. The right bar chart indicates the number of genes of each condition. (D) Overlap of patients in different conditions. Comparison of the number of mutated genes (E) and driver genes (F) between SDRP and MDRP patients. The red point indicates the average.
Cluster and validation of drug resistance subtypes
Recently, some studies have shown that drug resistance is heterogeneous [4]. However, few studies explored the heterogeneity from drug resistance driver genes. Mutational signatures are the distinctive patterns of mutations from specific mutagenesis process [43]. Therefore, among five mutational signatures which were identified by all differential mutations in driver genes of all conditions, four with clear etiology were retained to cluster all resistant patients into four subtypes (C1, C2, C3 and C4) by hierarchical clustering (Figure 8A). Different subtypes might be driven by different mutational signatures. We found that no cancer type or drug bias within the same subtype. Many researches have shown that EMT and DDR participate in drug resistance [1]. Therefore, we calculated the EMT and DDR scores of different subtypes based on R package IOBR [19]. We found that C1 had the highest EMT and DDR scores (Figure 8B and C), which suggested that C1 might be more malignant than other subtypes. Moreover, previous studies suggested that high TMB could be used as a predictive marker for response to immune checkpoint blockade (ICB) therapy [44]. We thus computed the TMB of each patient in different subtypes, and found that C4 had the lowest TMB, which demonstrated that the patients in C4 might not be suitable for ICB therapy (Figure 8D).

Resistance subtypes and the characteristics. (A) Heatmap based on four mutational signatures identified by all differential mutations in driver genes of all conditions. Multi_condition means the patients belonged to multiple conditions. The difference of EMT score (B), DDR score (C) and TMB (D) among the resistance subtypes. (E) Consensus map based on NMF of pathways enriched by drug resistance driver genes in LGG_Temozolomide. The difference of log10 IC50 (F), age (G), EMT score (H), DDR score (I) and TMB (J) between LGG_Temozolomide resistance subtypes.
Furthermore, we also turned our attention to a single condition to further explore the resistance heterogeneity of single drug. Larger sample size may provide more opportunity to analyze the heterogeneity. We thus took LGG_Temozolomide as an example (Supplementary Table S1). Because only a mutational signature was identified, we performed the pathway enrichment based on the driver genes of each patient, and constructed the driver pathway profiles. We then used unsupervised consensus clustering using NMF to cluster the patients, and identified two robust subtypes (L1 and L2) (Figure 8E). Next, we compared the predicted half maximal inhibitory concentration (IC50) of temozolomide between the two subtypes, and found that the patients in L2 were more resistant (Figure 8F). Here, the predicted IC50 was calculated by the R package oncoPredict, which is a useful tool to estimate the drug response [45]. Using the gene expression profiles and drug response data from Genomics of Drug Sensitivity in Cancer (GDSC) V2 as training set, oncoPredict builds ridge regression models to predict the IC50 of 198 drugs for newly inputting expression profiles. Moreover, age has been reported as a malignant factor in LGG [46]. Here, the age of the patients in L2 was significantly higher than L1 (Figure 8G). We also calculated the EMT and DDR scores of the patients in the two subtypes, and found that L2 had higher scores, too (Figure 8H and I), which indicated that L2 was a more malignant subtype. Furthermore, TMB of L2 was significantly higher than L1 (Figure 8J).
In order to explore the precision treatment of resistant patients, we predicted the candidate drugs for each subtype. We compared the IC50 of drugs predicted by oncoPredict [45] between the two subtypes, and screened the more sensitive (P < 0.05) drugs as candidates. In total, nine drugs were screened, including seven drugs for L1 and two drugs for L2 (Supplementary Table S6). For example, the class I HDAC (histone deacetylases) inhibitor entinostat was a candidate drug in L1, which meant that this drug had lower predicted IC50 in L1 than in L2. Previous study reported that class I HDAC overexpression promotes temozolomide resistance in glioma [47]. Entinostat could inhibit the function of HDAC to sensitize the tumor cells to temozolomide treatment [47, 48]. AUY922 (HSP90 inhibitor) is the candidate drug in L2, which has been validated to reduce cell viability and colony formation of glioma [49]. In addition, E-cadherin is a key component of the adherens junctions that are integral in cell adhesion and maintaining epithelial phenotype of cells [50]. As previous study reported, AUY922 could increase the expression of E-cadherin to gain a more migratory mesenchymal phenotype, which inhibited EMT [51]. Additionally, Chk1 represents a core component central to the entire DDR [52]. However, increasing dose of AUY922 could reduce Chk1 expression, which indicated AUY922 could inhibit DDR [53]. Therefore, AUY922 may be more suitable to treat the patients in L2.
Collectively, the resistant patients can be accurately divided into different subtypes based on the identified drug resistance driver genes. The comparison of many perspectives, such as EMT, DDR and TMB, demonstrated that the subtypes showed great diversity. In addition, several candidate drugs were also predicted for the treatment of specific subtype, which provided a new insight for precision treatment of resistant patients.
Discussion
Drug resistance continues to be one of the principal limiting factors to cancer therapy. There are several mechanisms contributed to the resistance, including mutation of targets, EMT and DDR [1]. Among them, mutation plays an important role. Therefore, systematically identifying the mutations that drive the development of drug resistance may be an important way to understand the molecular mechanism, and provide new idea for tackling drug resistance.
In addition, previous studies have shown that drug resistance is heterogeneous [4]. It means that different patients are resistant to the same drug with different ways. Traditional methods to identify drug resistance genes were always at the population level, which relied on statistical power obtained by large cohorts [13, 14]. By doing so, they inevitably underestimate the importance of rare drivers that occurred only a handful of patients and ignored the differences between different individuals.
Numerous studies have focused on distinguishing cancer driver genes from a large amount of passenger genes. Except the methods at population [54, 55], there were also many methods for personalized driver gene detection [7–11]. Most of these methods were devoted to calculating the effect of mutated genes on DEGs. However, Hu et al. [12] also considered the effect on nDEGs, which could minimize the irrelevant influence.
In this study, we proposed a method called DRdriver to predict the drug resistance driver genes based on individual-specific network. First, based on the clinical information, the patients were divided into sensitive and resistant groups. Comparing the mutations of each resistant patient and corresponding all patients in sensitive group, the differential mutations were identified. The genes with differential mutations were considered as candidates. After mapping the candidates into the GRN, the individual-specific network with candidates and their targets were constructed. We then utilized GA to identify the genes, which had the greatest impact on DEGs and the least impact on nDEGs, as drug resistance driver genes in each patient. We found that the driver genes were more frequently mutated, and enriched in many drug resistance-related functional gene sets. In addition, the driver genes were associated with several drug resistance-related functions and mutational signatures. Survival analysis demonstrated that the driver genes were poor prognosis factors. Moreover, we screened five multidrug-resistance genes which had already confirmed in previous studies. The characteristics of MDRPs were also analyzed. The results suggested there were more mutations and drivers in MDRPs than SDRPs. Finally, we divided the patients into different subtypes based on mutational signatures and enriched driver pathways, and validated these subtypes by different resistant mechanisms.
However, the current study still had several limitations. First, with the abundance of the corresponding data resources, we will further validate DRdriver using independent dataset and known drug resistance driver genes. Further, multi-omics data, such as methylation, should be considered to facilitate the understanding of the development of drug resistance. In addition, functional experiments should be conducted to further elucidate the molecular mechanisms underlying the effects of driver genes. Finally, the broader ranges of parameters of GA should be tested to select better parameters, even if we have tested the parameters in several patients based on exhaustive algorithm.
In summary, we proposed an approach called DRdriver to predict drug resistance driver genes, which may be benefit to understand the molecule mechanisms of drug resistance and analyze resistance heterogeneity. Continued investigation of the drug resistance driver genes identified here will aid in the development of more precision treatment for drug-resistant patients.
We developed a network control approach DRdriver to identify drug resistance driver genes at individual level.
1202 drug resistance driver genes for 8 cancer types and 10 drugs were predicted by DRdriver.
The driver genes predicted by DRdriver were mutated frequently, and tended to be associated with development of cancer and drug resistance.
The drug resistance subtypes could be accurately clustered based on the mutational signatures and the enriched pathways of driver genes.
Data availability statement
All expression profiles, mutation data and clinical information in this study were obtained from GDC (https://gdc-portal.nci.nih.gov/). The scripts and the corresponding input data for identifying drug resistance driver genes are available at https://github.com/Bioccjw/DRdriver.
Funding
National Natural Science Foundation of China (62172213, 81972478 and 61872183).
Author Biographies
Yu-e Huang is a PhD student in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. Her research includes the areas of bioinformatics and computational system biology.
Shunheng Zhou is a PhD student in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. His research includes the areas of bioinformatics and computational system biology.
Haizhou Liu is a PhD student in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. His research includes the areas of bioinformatics and computational system biology.
Xu Zhou is a PhD student in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. His research includes the areas of bioinformatics and computational system biology.
Mengqin Yuan is a PhD student in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. Her research includes the areas of bioinformatics and computational system biology.
Fei Hou is a PhD student in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. His research includes the areas of bioinformatics and computational system biology.
Sina Chen is a master student in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. Her research includes the areas of bioinformatics and computational system biology.
Jiahao Chen is a master student in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. His research includes the areas of bioinformatics and computational system biology.
Lihong Wang is a professor in the Department of Pathophysiology, School of Medicine, Southeast University. Her research includes the areas of bioinformatics and cancer biology.
Wei Jiang is a professor in the Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics. His research includes the areas of bioinformatics and computational system biology.
References
Author notes
Yu-e Huang and Shunheng Zhou have contributed equally to this work.