-
PDF
- Split View
-
Views
-
Cite
Cite
Jiannan Yang, Zhen Li, William Ka Kei Wu, Shi Yu, Zhongzhi Xu, Qian Chu, Qingpeng Zhang, Deep learning identifies explainable reasoning paths of mechanism of action for drug repurposing from multilayer biological network, Briefings in Bioinformatics, Volume 23, Issue 6, November 2022, bbac469, https://doi.org/10.1093/bib/bbac469
- Share Icon Share
Abstract
The discovery and repurposing of drugs require a deep understanding of the mechanism of drug action (MODA). Existing computational methods mainly model MODA with the protein–protein interaction (PPI) network. However, the molecular interactions of drugs in the human body are far beyond PPIs. Additionally, the lack of interpretability of these models hinders their practicability. We propose an interpretable deep learning-based path-reasoning framework (iDPath) for drug discovery and repurposing by capturing MODA on by far the most comprehensive multilayer biological network consisting of the complex high-dimensional molecular interactions between genes, proteins and chemicals. Experiments show that iDPath outperforms state-of-the-art machine learning methods on a general drug repurposing task. Further investigations demonstrate that iDPath can identify explicit critical paths that are consistent with clinical evidence. To demonstrate the practical value of iDPath, we apply it to the identification of potential drugs for treating prostate cancer and hypertension. Results show that iDPath can discover new FDA-approved drugs. This research provides a novel interpretable artificial intelligence perspective on drug discovery.
Introduction
Artificial intelligence has recently shown the huge potential to subvert the typical drug discovery process [1]. Scientists are using deep learning technologies to discover candidate drugs for the treatment of COVID-19 [2–4], Alzheimer’s disease [5], cancers [6] and so on. Among various applications, drug repurposing, the identification of the new use of approved or investigational drugs that are outside of the original medical indication, can shorten the time of drug development while ensuring safety and thus attracts attention from drug discovery communities and industries [7, 8]. Existing computational approaches mainly study this problem from biological and clinical perspectives [9]. A common approach in the clinical view is using electronic health records to discover the efficacy of drugs on a specific population [10] or emulating clinical trials on real-world patient data [11]. In the biological computational approaches, molecular docking [12], genetic association [13] and so on [14] are the common techniques to identify drug repurposing.
With the development of high-throughput omics technologies, the detailed characterization of the molecular interactions of drugs in the human body became possible [15]. The protein–protein interaction (PPI) network serves as a ‘skeleton’ for the body’s signaling circuitry [16] and shows tremendous power in guiding drug discovery [17–20]. A series of studies explored the potential of mining the network properties of drugs in the PPI network in synergistic drug combination identification [17] and drug repurposing [18]. A recent study introduced advanced graph-based deep learning approaches to the identification of anti-cancer drug combinations by learning the graphic representations of the PPI network [21]. However, the molecular network in the human body is not limited to the PPI network, the gene regulatory mechanisms [22], the binding work of the proteins and chemicals [23] and the interactions of the chemicals [24, 25] also play a role in the mechanism of drug action (MODA). These processes rely on the drug’s interactions with various proteins and chemicals in the human body [26, 27]. Usually, the MODA is described by biological pathways, a series of biochemical and molecular steps to achieve a specific function or to produce a certain product [28]. Such biological pathways can be naturally denoted as a series of paths in the biological network. Furthermore, instead of targeting specific proteins, some drugs need to take further chemical reactions to be effective [29]. For example, cytarabine, an important drug in the treatment of acute myeloid leukemia [30], must be phosphorylated intracellularly to a nucleotide (cytarabine 5′-triphosphate, Ara-CTP) before it can exert its cytotoxic effect [31].
Previous machine learning approaches [32–37] have introduced multilayer information to drug repurposing. For example, Napolitano et al. [32] integrated the drug’s chemical information, PPI network and correlation of gene expression patterns after treatment together. By integrating multiple layers of information, these studies enhanced the drug repurposing prediction performance [32–34, 36, 37] and investigated the robustness of the system [38]. However, such multilayer information has not been fully utilized to characterize the MODAs. Furthermore, nearly all of these machine learning models are black-box. The lack of model interpretability hinders machine learning’s potential in practical drug discovery tasks. The need for explainable machine learning models led to the development of a series of novel neural network architectures, such as attribution methods [39, 40] and knowledge-graph-based models [41, 42]. These models have been further applied in healthcare [43–46], such as using biological-informed neural networks to identify anti-cancer drug combinations [45] and predict disease risk based on comorbidity network [46]. However, whether these explainable modules can accelerate drug discovery and further enhance the knowledge of the MODA is unknown.
To fill the aforementioned research gaps, based on our previous research on interpretable machine learning [4, 21, 46, 47], we propose the interpretable Deep learning-based Path-reasoning framework for drug repurposing (iDPath), which captures the MODA by identifying the critical paths from drugs to diseases in the human body. To accurately characterize the MODA, we build a comprehensive multilayer biological network instead of using the PPI network alone. The multilayer biological network is the integration of a gene regulatory layer, a PPI layer, a protein–chemical interaction (PCI) layer and a chemical–chemical interaction (CCI) layer, integrated with the drug and diseases-related information. Starting from this multilayer biological network, iDPath utilizes a graph convolutional network (GCN) module to capture the global connectivity information of the human molecular network and a long–short-term memory (LSTM) neural network module to capture the detailed mechanisms of drug action based on the shortest paths between drugs and diseases. Furthermore, iDPath introduces two attention modules, namely the path attention and the node attention, to enhance model interpretability. Experiments with drug screen data demonstrate the superior performance of iDPath in a general drug repurposing task featuring 1993 drugs and 2794 diseases. Further investigations demonstrate that iDPath can identify explicit critical paths that are consistent with clinical evidence. To demonstrate the practical value of iDPath, we apply it to identify potential drugs for the treatment of prostate cancer and hypertension. Results show that iDPath can successfully discover new FDA-approved drugs. These results indicate that iDPath can facilitate drug discovery and repurposing and has the potential to address other computational chemistry and biology tasks involving the understanding of the molecular interactions in the human body.
Methods
In this section, we describe the datasets and the proposed iDPath model, as well as baseline models for drug repurposing, including DeepWalk, GCN, LSTM networks and knowledge-aware path recurrent network (KPRN).
Data
Multilayer biological network
Gene regulatory network (GRN) layer
The GRN is adopted from RegNetwork [48], which collects the experimentally validated and the predicted regulations based on the transcription factor (TF) binding sites. The edges in RegNetwork start from TF and microRNA (miRNA) and target the regulated genes. In total, RegNetwork provides us with 369 277 gene regulations between 1456 TFs, 1904 miRNAs and 19 719 genes.
PPI layer
The PPI network consists of information from two sources. The first dataset, STRING dataset [49], is the most comprehensive database of known and predicted PPIs till now, with more than 1380 million PPIs among over 9 million proteins. We only keep the PPIs in the human body (Homo sapiens) and at high confidence or better (confidence level > 0.7). Another PPI dataset is the human interactome built by Cheng et al. [18]. This dataset is harnessed from multiple databases with experimental evidence. After preprocessing, our PPI network contains 614 970 interactions connected by 13 758 proteins.
PCI layer
We obtain a PCI network by curating from the STITCH database [50], which is the most comprehensive database of known and predicted interactions between chemicals and proteins till now. We select PCIs in the human body (H. sapiens) at high confidence or better (confidence level > 0.7). The processed PCI network consists of 203 551 interactions among 9393 proteins and 73 199 chemicals.
CCI layer
CCI network is curated from STITCH [50] database and further processed by selecting CCIs in the human body (H. sapiens) at high confidence or better (confidence level > 0.7). The processed CCI network has 396 284 interactions among 107 055 chemicals.
Constructing the multilayer biological network
We construct a multilayer biological network by mapping all the entities in GRN, PPI, PCI and CCI to the same nomenclature. The proteins are named by their encoded genes and the miRNAs are mapped to their corresponding genes by BioMart [51]. All the genes are encoded to their Entrez IDs [52]. All the chemicals are denoted by their PubChem CIDs (Compound ID number) [53].
Therapeutic drug–disease pairs
For the drug repurposing task, we collect therapeutic drug–disease pairs from the Therapeutic Target Database (TTD) [54], which provides the known and explored therapeutic protein and nucleic acid targets, the targeted disease and the pathway information of tens of thousands of drugs. We only keep the FDA-approved drugs in TTD and map them to PubChem CID to be consistent with the chemicals in the multilayer biological network. The diseases in TTD are in the ICD-11 coding system and are mapped to their corresponding ICD-10 codes. The cleaned dataset of therapeutic drug–disease pairs includes 1993 drugs and 2794 diseases and constitutes 19 500 pairs.
Drug–Protein associations and drug–chemical associations
We collect drug–protein associations from four datasets: the PCI network from STITCH [50], the drug–protein associations built by Cheng et al. [18], the TTD [54] and DrugBank [55]. The drug–chemical associations are extracted from STITCH by selecting the compounds that are drugs. DrugBank is a commonly used database containing comprehensive molecular information about drugs, their mechanisms, interactions and targets. The aggregated dataset contains 85 305 drug–protein associations between 20 405 drugs and 7796 proteins, 83 271 drug–chemical associations between 4630 drugs and 12 042 chemicals. All the drugs and chemicals are denoted by their PubChem CIDs, and all the proteins are represented by their encoded genes using Entrez ID.
Disease–Gene associations and disease–miRNA associations
The disease–gene associations include genes and variants associated with human diseases, curated from DisGeNET [56] by selecting expert-curated repositories. The miRNAs associated with human diseases come from the Human microRNA Disease Database [57], which is a database about curated experiment-supported evidence for human miRNA and disease associations. All the genes, variants and miRNAs are mapped to Entrez IDs, and diseases are mapped to ICD-10 codes. After processing, we have 230 837 associations among 7559 genes, 6830 variants and 705 miRNAs with 5602 diseases.
Overall architecture of iDPath
The iDPath framework for drug repurposing is presented in Figure 1. The MODA-related biological paths are identified by the shortest paths between the targets of drugs and diseases (Figure 1B) in the multilayer biological network (Figure 1A). To learn the global connectivity information of the multilayer biological network, iDPath first utilizes a three-layer GCN to learn the embeddings of associated nodes. Then, to capture the detailed MODA patterns, the embeddings of the nodes along the shortest paths between a drug and a disease are fed into an LSTM module to model their sequential dependencies. iDPath also introduces two attention modules to aggregate the embeddings of nodes and paths—path attention and node attention. These two attention modules are capable of discriminating the contribution of different nodes to one MODA-related biological path as well as the contribution of different paths to the final prediction.

The framework of iDPath on drug repurposing tasks. (A) The multilayer biological network consists of four layers: GRN layer (one-way red arrows), PPI layer (two-way black arrows), PCI layer (two-way purple arrows) and CCI layer (two-way orange arrows). The blue two-way dashed arrows represent that the two corresponding nodes in different layers are identical. The nodes associated with the drugs and diseases are marked by green dashed lines. (B) The schematic representation of the MODA-related biological paths. The MODA-related biological paths are identified by the shortest paths between drug and disease generated in the multilayer biological network. Since the targets of drugs and diseases are proteins, all the shortest paths have the form of <drug–protein–…–protein–disease>. (C) The schematic representation of the algorithm: the multilayer biological network is fed into three-layer GCN to learn the embeddings of all nodes. The GCN embeddings of nodes along the shortest path between one drug and one disease are fed into an LSTM module to learn the sequential dependencies. Node attention and path attention modules are introduced to aggregate the embeddings of the nodes and paths. The final prediction is the probability that one drug is effective for one disease.
GCN to capture the global connectivity information of the multilayer biological network
MODA-related biological paths
The MODA is dependent on the interactions of drugs with molecules in the human body, which can be represented as a series of paths in the multilayer biological network [28]. To accurately model the effects of drugs, we need to identify informative paths to represent the MODA in an efficient way. We prioritize the shortest paths because the shorter distance between drug and disease is found to be associated with higher chance of the therapeutic effect [17, 18, 59]. We adopted GPU-accelerated sssp algorithm implemented by NVidia’s cuGraph package to identify the shortest paths [60]. For a drug and a disease, the shortest paths between them are connected by their associated nodes in the multilayer biological network. Given a drug and a disease, the shortest paths between this pair form a set
LSTM layer
Node attention and path attention
Attention mechanism [63] is widely used in various deep learning tasks to enhance model intelligibility [21]. In this study, we introduced two attention modules to separately learn the importance of different nodes to one MODA-related biological path as well as the importance of different paths to the final prediction.
Node attention
Path attention
Objective function
Baselines
In this section, we describe a set of baseline models to compare with, including DeepWalk, GCN, LSTM and KPRN. We fed baseline models as much information as possible to have a fair comparison. The two path-based models (KPRN and LSTM) utilized the same input as iDPath. For GCN and DeepWalk, we used the drug targets and disease-related genes as input to train both models, but not the paths because these two models cannot handle sequential data naturally.
DeepWalk
DeepWalk [64] is a widely used graph embedding approach by modeling a stream of short random walks and has already been introduced to several drug-related tasks, such as drug–target identification [65].
GCN
GCN [58] is widely applied to many drug-related tasks, such as drug discovery using the drug’s Smiles features [66] and anti-cancer drug combination identification [21]. As a component of iDPath, we apply GCN individually to test its performance on the drug repurposing task. Specifically, besides the basic graph convolutional layer, we introduce a fully connected layer to combine the embeddings of drug and disease for the final prediction.
LSTM
LSTM has been applied to drug discovery [67, 68]. We apply a vanilla LSTM network and use the last hidden states of each path as its representation. A two-layer fully connected layer following the LSTM layer is employed to generate the final prediction.
KPRN
KPRN [42] is an advanced path-based model for a reasonable recommendation based on a knowledge graph. We apply KPRN to the drug repurposing task by feeding it the same input as iDPath.
Performance evaluation and experiment setup
The training of iDPath is a binary classification task: given one drug–disease pair, we feed all the shortest paths between this drug and disease into iDPath. And iDPath will generate one value indicating the probability that this drug has a therapeutic effect on this disease. All models are trained on this binary classification task, and we utilize commonly used metrics to evaluate and fine-tune the models. The metrics used in the binary classification task include accuracy, recall, the area under the receiver operating characteristic curve (AUROC), and the area under the precision–recall curve (AUPRC). The drug repurposing task can be viewed as a recommendation task: for each disease, we go through all the available drugs in our dataset and use the model to calculate the probability that one drug is effective in treating the disease and then rank all the drugs based on the probability. We introduce two commonly used metrics in the recommendation system,
Results and discussions
iDPath consistently outperforms baselines in drug repurposing
In general, baseline models can be classified into two categories: graph-embedding-based models (GCN and DeepWalk) and path-based models (LSTM and KPRN). iDPath presents a modeling framework that combines graph-embedding-based and path-based approaches. We compare the performance of iDPath with baselines in the drug repurposing problem. As shown in Figure 2A and C, iDPath outperforms all the baselines with an AUPRC of 0.97. In detail, iDPath achieves a 91.51% true-negative rate (TN) and 91.23% true-positive rate (TP) in the test dataset, indicating that only <10.00% of drug–disease pairs have not been correctly classified (Figure 2B).

Performance of iDPath. (A) The precision–recall (PR) curves of all the models on the testing set, the values in the bracket denote the AUPRC. (B) The TN rate, false-negative rate, false-positive rate and TP rate of iDPath on the testing set. (C) The performance (NDCG@K) of all the models on the drug recommendation task on the testing set. These models are trained on the binary classification task and used to generate the repurposing probabilities of all the drugs on different diseases in the testing set. (D) The performance of iDPath with different biological network layers. Here GRN–PPI–PCI–CCI denotes the multilayer biological network generated by these four networks, GRN denotes using gene regulatory network alone, and the same goes for PPI and PCI. The K values in c and d denote the top
In addition, the poor performance of the shuffled random model (AUPRC 0.76) demonstrates the importance of learning on the multi-layer biological network with correct biological interactions. The utilization of the shortest paths can significantly improve the performance, as demonstrated by the superior performance of iDPath and path-based models over graph-embedding-based models (Figure 2A and C). These results indicate that the extracted MODA-related biological paths have pharmacological relevance.
Incorporating multiple biological network layers improves the prediction performance
We further investigate the performance of iDPath with different biological network layers. Existing studies mainly make predictions using the PPI network alone [17, 18]. Here, we evaluate the performance of iDPath with only one layer (PPI, GRN or PCI). Note that CCI cannot be directly linked to diseases, so we do not evaluate the model with only the CCI layer. As shown in Figure 2D, the full multilayer biological network can improve the performance of iDPath. Comparing individual networks, PPI performs the best, followed by GRN, and finally PCI. Note that the iDPath variants combining two or three layers cannot compete with the model with all four layers. Then, we investigate the proportion of nodes and interactions at each network layer in the identified MODA-related biological paths (Figure 3), to examine the impact of different network layers on iDPath performance. We find that nodes and interactions in the PPI layer are not the most prevalent in the identified paths. Instead, GRN nodes and PCI interactions are more common. Combining these results with the dominating role of the PPI network in prediction performance, we find that although the connectivity in the PPI network can capture the key relationships between drugs and diseases, it requires additional information at the GRN, PCI and CCI layers to further reveal the hidden biological paths related to MODA. By revealing these hidden paths, iDPath achieves higher prediction accuracy. The full GRN–PPI–PCI–CCI network had the best performance (Figure 2D and Supplementary Figure S5 in supplementary information) because the additional PCI and CCI layers provide a more comprehensive characterization of the signaling circuitry. However, adding only one layer of either PCI or CCI will introduce bias toward the corresponding biochemical processes.

The proportion of nodes (A) and interactions (B) at each network layer in the identified MODA-related biological paths.

Interpretation of the MODA-related paths connecting abiraterone and prostate cancer and those connecting penbutolol and hypertension. (A) The Sankey diagram of the critical paths connecting abiraterone and prostate cancer identified by iDPath. (B) The Sankey diagram of the critical paths connecting penbutolol and hypertension identified by iDPath. The density of edge colors is determined by the path attention module. Edges with darker colors are more important. The density of node colors is determined by the node paths generated by the node attention module. Nodes with darker colors are more important.

KEGG pathway enrichment analysis of the paths between abiraterone and prostate cancer. a and b are the dotplots of the KEGG pathway enrichment analysis for the proteins that existed in the top-50 paths and bottom-50 paths between abiraterone and prostate cancer, respectively.

(A) The distribution of the role of drug-related proteins in top-k critical paths. (B) The relationship between the similarity of drugs and the similarity of critical paths. (C) The relationship between the similarity of diseases and the similarity of critical paths.
iDPath identified the critical paths related to MODA
To investigate whether the identified critical paths are representative of MODA, we visualize the critical paths of correctly classified drugs for prostate cancer and hypertension. Figure 4A and B show one example of abiraterone (anti-prostate cancer drug) and penbutolol (anti-hypertension drug). Here we define the critical paths as the top 50 paths ranked by their weights identified by the path attention module. The top 15 paths are presented in Figure 4.
As shown in Figure 4A, among 256 shortest paths between abiraterone and prostate cancer, iDPath prefers the paths traversing through the gene targeted by both abiraterone and prostate cancer, such as abiraterone → CYP3A4 → prostate cancer and abiraterone → AR → prostate cancer. Previous studies have shown that abiraterone is a moderate inhibitor of CYP3A4 [69], and CYP3A4 is associated with oxidative deactivation of testosterone, which is the etiology of prostate cancer [70]. Androgen receptor (AR) is highly relevant to the growth and differentiation of prostate cancer [71], and abiraterone inhibits androgen biosynthesis to control the progression of prostate cancer [72]. Abiraterone is found to be an inhibitor of CYP17A1 [73], which has also been identified by iDPath. Specifically, the path abiraterone → CYP17A1 → Hydrogen → DL-Pyroglutamic Acid → MSLN → prostate cancer contributes to the prediction the most among all the CYP17A1-related paths, which is also consistent with previous biological studies [74]. In conclusion, the critical paths identified by iDPath represent the biological pathways, which represent the cascade of molecular interactions triggering the drug action. While they do not exclude other MODAs exerted by the drug, the identified critical paths suggest a greater probability.
As shown in Figure 4B, iDPath identified critical paths between penbutolol and hypertension, such as penbutolol
To further validate that the paths with higher weights are more relevant to the progression of prostate cancer, we perform the KEGG Pathway enrichment analysis [77] on the proteins of the top 50 paths and bottom 50 paths for the abiraterone–prostate cancer pair. As shown in Figure 5A and B, the paths with higher weights focus on the P13K–Akt signaling pathway [78], regulation of actin cytoskeleton [79], prostate cancer pathway and so on, which are highly related to the progression of prostate cancer. For example, the activation of P13K–Akt signaling pathway appears to be characteristic of many aggressive prostate cancers and is more frequently observed as prostate cancer progresses toward a resistant and metastatic disease [78]. In contrast, the paths with lower weights (Figure 5B) are more enriched in the pathways related to other cancers or more general cancer progression, not specific to prostate cancer.
We investigated the roles of drug-related proteins in the identified critical paths by counting the frequency of proteins with different roles in the top-k critical paths. As show in Figure 6A, we found that the proteins in the identified critical paths are mainly disease targets, followed by enzyme, transporter and carrier, which is consistent with the principles of drug design and discovery [80]. Specifically, we also investigated the roles of drug-related proteins in abiraterone (Figure 4A) and penbutolol (Figure 4B), and the results are consistent with Figure 6A. For example, for abiraterone, we found CYP3A4 and AR are both in the high-weight paths, which are all commonly used targets for prostate cancer [70, 71]. We further investigated the relationship between the similarity of drugs (diseases) and the similarity of critical paths (see supplementary information for more details). As shown in Figure 6B and C, we found similar drugs or diseases have similar critical paths, indicating that similar drugs or diseases have similar MODA.
The top-3 critical paths and top-3 KEGG pathways of the potential drugs for the treatment of prostate cancer
Drug . | Critical paths (Top-3) . | KEGG pathways (Top-3) . |
---|---|---|
Dutasteride | SRD5A2 ORM1 SRD5A1 | PI3K-Akt signaling pathway Lipid and atherosclerosis Hepatitis B |
Aspirin | PLAUR FASLG TGFB1 | Proteoglycans in cancer Hepatitis B Lipid and atherosclerosis |
Erlotinib | STAT3 CYP3A5 STAT3 | EGFR tyrosine kinase inhibitor resistance Chemical carcinogenesis—receptor activation Prostate cancer |
Nicergoline | ADRA1A ARRA1A ADRA1A | Steroid hormone biosynthesis Prostate cancer Cysteine and methionine metabolism |
Acetohydroxamic acid | MMP13 MMP8 MMP13 | Human T-cell leukemia virus 1 infection Prostate cancer Human cytomegalovirus infection |
Midostaurin | RET AURKB CYP3A5 | PI3K-Akt signaling pathway Chemical carcinogenesis—receptor activation Chemical carcinogenesis—DNA adducts |
Apalutamide | Enzalutamide ABCB1 Abiraterone | Prostate cancer Chemical carcinogenesis—DNA adducts Metabolism of xenobiotics by cytochrome P450 |
Atorvastatin | ABCC4 CYP3A5 CYP2C19 | Chemical carcinogenesis—DNA adducts Drug metabolism—cytochrome P450 Metabolism of xenobiotics by cytochrome P450 |
Carisoprodol | CYP2C19 CYP2C19 Oxicone | Arachidonic acid metabolism Chemical carcinogenesis—DNA adducts Drug metabolism—cytochrome P450 |
Oxcarbazepine | AKR1C3 CYP2C19 ABCB1 | Steroid hormone biosynthesis Chemical carcinogenesis—DNA adducts Chemical carcinogenesis—reactive oxygen species |
Drug . | Critical paths (Top-3) . | KEGG pathways (Top-3) . |
---|---|---|
Dutasteride | SRD5A2 ORM1 SRD5A1 | PI3K-Akt signaling pathway Lipid and atherosclerosis Hepatitis B |
Aspirin | PLAUR FASLG TGFB1 | Proteoglycans in cancer Hepatitis B Lipid and atherosclerosis |
Erlotinib | STAT3 CYP3A5 STAT3 | EGFR tyrosine kinase inhibitor resistance Chemical carcinogenesis—receptor activation Prostate cancer |
Nicergoline | ADRA1A ARRA1A ADRA1A | Steroid hormone biosynthesis Prostate cancer Cysteine and methionine metabolism |
Acetohydroxamic acid | MMP13 MMP8 MMP13 | Human T-cell leukemia virus 1 infection Prostate cancer Human cytomegalovirus infection |
Midostaurin | RET AURKB CYP3A5 | PI3K-Akt signaling pathway Chemical carcinogenesis—receptor activation Chemical carcinogenesis—DNA adducts |
Apalutamide | Enzalutamide ABCB1 Abiraterone | Prostate cancer Chemical carcinogenesis—DNA adducts Metabolism of xenobiotics by cytochrome P450 |
Atorvastatin | ABCC4 CYP3A5 CYP2C19 | Chemical carcinogenesis—DNA adducts Drug metabolism—cytochrome P450 Metabolism of xenobiotics by cytochrome P450 |
Carisoprodol | CYP2C19 CYP2C19 Oxicone | Arachidonic acid metabolism Chemical carcinogenesis—DNA adducts Drug metabolism—cytochrome P450 |
Oxcarbazepine | AKR1C3 CYP2C19 ABCB1 | Steroid hormone biosynthesis Chemical carcinogenesis—DNA adducts Chemical carcinogenesis—reactive oxygen species |
These drugs are ranked top 10 by iDPath among all the FDA-approved drugs used in this study. The head (drug) and tail (prostate cancer) of these critical paths are ignored due to the limit of space. The top-3 critical paths are determined by the weights generated by the path attention module. The KEGG pathways are identified by KEGG enrichment analysis on the proteins existed in the top-50 critical paths and ranked by P-adjust values.
The top-3 critical paths and top-3 KEGG pathways of the potential drugs for the treatment of prostate cancer
Drug . | Critical paths (Top-3) . | KEGG pathways (Top-3) . |
---|---|---|
Dutasteride | SRD5A2 ORM1 SRD5A1 | PI3K-Akt signaling pathway Lipid and atherosclerosis Hepatitis B |
Aspirin | PLAUR FASLG TGFB1 | Proteoglycans in cancer Hepatitis B Lipid and atherosclerosis |
Erlotinib | STAT3 CYP3A5 STAT3 | EGFR tyrosine kinase inhibitor resistance Chemical carcinogenesis—receptor activation Prostate cancer |
Nicergoline | ADRA1A ARRA1A ADRA1A | Steroid hormone biosynthesis Prostate cancer Cysteine and methionine metabolism |
Acetohydroxamic acid | MMP13 MMP8 MMP13 | Human T-cell leukemia virus 1 infection Prostate cancer Human cytomegalovirus infection |
Midostaurin | RET AURKB CYP3A5 | PI3K-Akt signaling pathway Chemical carcinogenesis—receptor activation Chemical carcinogenesis—DNA adducts |
Apalutamide | Enzalutamide ABCB1 Abiraterone | Prostate cancer Chemical carcinogenesis—DNA adducts Metabolism of xenobiotics by cytochrome P450 |
Atorvastatin | ABCC4 CYP3A5 CYP2C19 | Chemical carcinogenesis—DNA adducts Drug metabolism—cytochrome P450 Metabolism of xenobiotics by cytochrome P450 |
Carisoprodol | CYP2C19 CYP2C19 Oxicone | Arachidonic acid metabolism Chemical carcinogenesis—DNA adducts Drug metabolism—cytochrome P450 |
Oxcarbazepine | AKR1C3 CYP2C19 ABCB1 | Steroid hormone biosynthesis Chemical carcinogenesis—DNA adducts Chemical carcinogenesis—reactive oxygen species |
Drug . | Critical paths (Top-3) . | KEGG pathways (Top-3) . |
---|---|---|
Dutasteride | SRD5A2 ORM1 SRD5A1 | PI3K-Akt signaling pathway Lipid and atherosclerosis Hepatitis B |
Aspirin | PLAUR FASLG TGFB1 | Proteoglycans in cancer Hepatitis B Lipid and atherosclerosis |
Erlotinib | STAT3 CYP3A5 STAT3 | EGFR tyrosine kinase inhibitor resistance Chemical carcinogenesis—receptor activation Prostate cancer |
Nicergoline | ADRA1A ARRA1A ADRA1A | Steroid hormone biosynthesis Prostate cancer Cysteine and methionine metabolism |
Acetohydroxamic acid | MMP13 MMP8 MMP13 | Human T-cell leukemia virus 1 infection Prostate cancer Human cytomegalovirus infection |
Midostaurin | RET AURKB CYP3A5 | PI3K-Akt signaling pathway Chemical carcinogenesis—receptor activation Chemical carcinogenesis—DNA adducts |
Apalutamide | Enzalutamide ABCB1 Abiraterone | Prostate cancer Chemical carcinogenesis—DNA adducts Metabolism of xenobiotics by cytochrome P450 |
Atorvastatin | ABCC4 CYP3A5 CYP2C19 | Chemical carcinogenesis—DNA adducts Drug metabolism—cytochrome P450 Metabolism of xenobiotics by cytochrome P450 |
Carisoprodol | CYP2C19 CYP2C19 Oxicone | Arachidonic acid metabolism Chemical carcinogenesis—DNA adducts Drug metabolism—cytochrome P450 |
Oxcarbazepine | AKR1C3 CYP2C19 ABCB1 | Steroid hormone biosynthesis Chemical carcinogenesis—DNA adducts Chemical carcinogenesis—reactive oxygen species |
These drugs are ranked top 10 by iDPath among all the FDA-approved drugs used in this study. The head (drug) and tail (prostate cancer) of these critical paths are ignored due to the limit of space. The top-3 critical paths are determined by the weights generated by the path attention module. The KEGG pathways are identified by KEGG enrichment analysis on the proteins existed in the top-50 critical paths and ranked by P-adjust values.
iDPath identified the critical paths to uncover the synergistic effect of drug combinations
iDPath represents a multilayer network approach to understanding the MODA. The interactions between proteins and chemicals and the interactions among chemicals can reveal more detailed therapeutic effects of individual drugs and potential drug combinations. Among the 16 interacted chemicals of abiraterone in our dataset, docetaxel and cabozantinib are identified as the most relevant contributors to the positive treatment effects of abiraterone on prostate cancer (Figure 4A). We notice that both docetaxel and cabozantinib show distinct molecular interactions targeting on prostate cancer while being used together with abiraterone. The combination of docetaxel and abiraterone can significantly improve radiographic progression-free survival for patients with metastatic castration-sensitive prostate cancer [81]. Cabozantinib enhances the anti-prostate cancer activity of abiraterone by inhibiting abiraterone’s upregulation of IGFIR phosphorylation [82]. The identification of these combinations shows that iDPath has the capability to herald the synergistic drug combinations, even iDPath is not explicitly trained to perform this task.
Drug repurposing for prostate cancer
To demonstrate iDPath’s utility in the real-world setting, we apply it to the discovery of potential drugs for treating prostate cancer among 1080 FDA-approved drugs that have not been labeled as therapeutic drugs for prostate cancer in our dataset. We found that compared to the bottom-ranked drugs, the top-ranked drugs are more similar to the FDA-approved drugs (Supplementary Figure S7), indicating that iDPath identified potential drugs for treating prostate cancer. The 10 drugs with the highest score, together with their top-3 critical paths and top-3 KEGG pathways, are shown in Table 1. Among the 10 drugs identified for prostate cancer, six drugs have already been proved effective in previous studies, including dutasteride [83], aspirin [84], erlotinib [85], midostaurin [86], apalutamide [87] and atorvastatin [88]. The critical paths identified by iDPath shown in Table 1 are also consistent with drugs’ MODA. For example, dutasteride is a medication primarily used to treat the symptoms of an enlarged prostate, shows therapeutic effects on prostate cancer by inhibiting dual 5α-reductase inhibitors (both SRD5A1 and SRD5A2) [89]. Aspirin is found to trigger cancer cell apoptosis by inducing the secretion of TGF-β1 (TGFB1) [90]. Apalutamide has recently been approved for the treatment of prostate cancer [91], but has not been labeled in our dataset. Specifically, iDPath finds that the most relevant paths for the efficacy of apalutamide are through enzalutamide or abiraterone (both are FDA-approved drugs for the treatment of prostate cancer and labeled as therapeutic in our dataset), where the combination with abiraterone has already been proved synergistic in a recent study [92]. For other drugs identified as therapeutic but not officially approved, the KEGG pathway enrichment analysis shows that the proteins that existed in their critical paths enriched in prostate cancer-related pathways, such as PI3K–Akt signaling pathway [93] and prostate cancer pathway.
Conclusion
In this study, we propose iDPath, an advanced deep learning framework to identify explainable biological paths to characterize the MODAs and predict the drugs that can be repurposed for treating certain diseases. iDPath is built on a multilayer biological network consisting of GRN, PPI, PCI and CCI networks. The proposed model achieves superior prediction performance compared with state-of-the-art models on a general drug repurposing task. Furthermore, we find that extending the PPI network to a multilayer biological network of the human body can significantly improve the prediction performance in drug repurposing. We investigate the identified critical paths of drugs for treating prostate cancer and hypertension and find that the critical paths are consistent with the known mechanism of the drug action. Then, we apply iDPath to the challenging problem of identifying potential drugs for the treatment of prostate cancer. Results show that iDPath can effectively identify the newly approved drugs not recorded by the database. We believe iDPath can bring revelation to the explainable deep learning technologies to drug discovery. As a deep learning approach, iDPath is limited to in silico study, which can be extended by in vitro and in vivo experiments to further validate its practical value and consistency with clinical evidence in future studies. In addition, the identified paths may contain rich biological knowledge beyond this study, such as some popular paths may be associated with common mechanisms of action in a class of diseases, which is worth further study.
A comprehensive multilayer biological network beyond protein–protein interactions is introduced to accurately characterize the mechanism of drug action.
We propose an interpretable deep learning framework—iDPath to model the pathways of drugs by identifying explainable biological paths from drug targets to disease targets in the multilayer biological network of the human body.
The superior performance of iDPath is verified by experiments on a general drug repurposing task.
The model interpretability and credibility of iDPath is further validated on drugs treating prostate cancer and hypertension.
Data availability
The data used to train iDPath and its source code and usage instructions are available in Github (https://github.com/JasonJYang/iDPath).
Author contributions statement
J.Y. and Q. Z.: study concept and design, development of methodology, writing of the manuscript; J.Y.: acquisition of samples and data, analysis and interpretation of data; Z.L., S.Y., Z.X., W.K.K.W. and Q.C.: interpretation of data; Q.Z.: study supervision and funding acquisition.
Funding
This work was supported by National Natural Science Foundation of China (71972164, 71672163, 62131009, and 82071889); Innovation and Technology Fund of Innovation and Technology Commission of Hong Kong (MHP/081/19); National Key Research and Development Program of China, Ministry of Science and Technology of China (2019YFE0198600).
Author Biographies
Jiannan Yang is a data scientist and a PhD candidate at City University of Hong Kong. His research interests are interpretable deep learning, drug design and disease prediction.
Zhen Li is a radiologist and a professor at Tongji Hospital, Huazhong University of Science and Technology. He has extensive experience in abdominal imaging and interventional radiology research.
William Ka Kei Wu is an associate professor at Chinese University of Hong Kong. He has extensive experience in evolutionary genomics, computational modeling of gene regulation and analysis of high-throughput genomic datasets.
Shi Yu is a biologist at the University of Southern California. He has extensive experience in plant circadian clock research.
Zhongzhi Xu is a data scientist with extensive experience in medical data analytics.
Qian Chu is an oncologist and a professor at Tongji Hospital, Huazhong University of Science and Technology. She has extensive experience in breast cancer research.
Qingpeng Zhang is an associate professor at City University of Hong Kong. He has extensive experience in healthcare data analytics, medical informatics, AI in drug discovery and network science.
References
Wang X,Wang DX, Xu CR et al. Explainable reasoning over knowledge graphs for recommendation.