-
PDF
- Split View
-
Views
-
Cite
Cite
Bo Yang, Hailin Chen, Predicting circRNA-drug sensitivity associations by learning multimodal networks using graph auto-encoders and attention mechanism, Briefings in Bioinformatics, Volume 24, Issue 1, January 2023, bbac596, https://doi.org/10.1093/bib/bbac596
- Share Icon Share
Abstract
Recent studies have shown that the expression of circRNAs would affect drug sensitivity of cells and thus significantly influence the efficacy of drugs. Traditional biomedical experiments to validate such relationships are time-consuming and costly. Therefore, developing effective computational methods to predict potential associations between circRNAs and drug sensitivity is an important and urgent task. In this study, we propose a novel method, called MNGACDA, to predict possible circRNA–drug sensitivity associations for further biomedical screening. First, MNGACDA uses multiple sources of information from circRNAs and drugs to construct multimodal networks. It then employs node-level attention graph auto-encoders to obtain low-dimensional embeddings for circRNAs and drugs from the multimodal networks. Finally, an inner product decoder is applied to predict the association scores between circRNAs and drug sensitivity based on the embedding representations of circRNAs and drugs. Extensive experimental results based on cross-validations show that MNGACDA outperforms six other state-of-the-art methods. Furthermore, excellent performance in case studies demonstrates that MNGACDA is an effective tool for predicting circRNA–drug sensitivity associations in real situations. These results confirm the reliable prediction ability of MNGACDA in revealing circRNA–drug sensitivity associations.
Introduction
As a family of noncoding RNA molecules with covalently closed circular structures, circRNAs have recently been discovered to be transcribed in eukaryotic organisms [1]. With the advances of high-throughput technologies, biological functions of circRNAs have continuously been identified [2]. For example, Hansen et al. [3] reported that the human circRNA ciRS-7 functions as a regulator by acting as a miRNA sponge. Other significant roles, like modulation of alternative splicing or transcription, and interaction with RNA-binding proteins, have also been detected in circRNAs [4]. Meanwhile, as circRNAs play significant roles in physiological and pathological processes, the dysregulation of circRNAs is closely related with human complex diseases [5]. The validated biological functions suggest that circRNAs are one new category of potential clinical diagnostic markers.
More recently, studies have found that circRNAs significantly affect drug sensitivity of cells. For example, Huang et al. [6] discovered two circRNAs (hsa_circ_0004350 and hsa_circ_0092857) transcribed from EIF3a could affect cisplatin resistance in lung cancer cells. Xia et al. [7] detected a circRNA named circTNPO3 contributed to paclitaxel (PTX) resistance in ovarian cancer cells through upregulating NEK2 expression by sponging miR-1299. These studies provide valuable resources to investigate drug mode of action and provide therapeutic implications for biomedical research community. To systematically reveal the effects of circRNAs on drug sensitivity, Ruan et al. [8] applied four identification algorithms to characterize the expression landscape of circRNAs across ~1000 human cancer cell lines, and observed strong associations between circRNA expressions and drug responses. It should be noted that our understanding about associations between circRNAs and drug sensitivity is far from completeness.
As traditional biomedical experiments are expensive and time-consuming, developing efficient and accurate computational methods to predict circRNA–drug sensitivity associations could greatly reduce cost and time. Deng et al. [9] proposed a deep learning-based computational framework GATECDA to predict associations between circRNAs and drug sensitivity, in which graph attention auto-encoder (GATE) was applied to extract the low-dimensional representation of circRNAs and drugs. Comprehensive experiments showed the effectiveness of GATECDA in inferring circRNA–drug sensitivity associations. As mentioned in their study, current computational efforts made in this direction are limited. To the best of our knowledge, GATECDA is the first computational effort for inferring associations between circRNAs and drug sensitivity. It should further be noted that known circRNA–drug sensitivity associations are incomplete, and many remain undetected. Therefore, more accurate computational methods are urgently in need for more reliable circRNA–drug sensitivity association predictions.
In this study, we propose a novel computational framework termed as MNGACDA to predict circRNA–drug sensitivity associations. First, MNGACDA uses multiple sources of data from circRNAs and drugs to construct integrated circRNA–circRNA and drug–drug similarity networks, as well as circRNA–drug sensitivity association networks. It then embeds node-level attention layers into a deep graph neural network framework to adaptively captures internal information between nodes in the multimodal networks through an attentional mechanism, and uses a convolution neural network (CNN) combiner module to receive the embedded representations of the layers. Finally, the embedding representations of circRNAs and drugs are used to construct an inner product decoder to predict circRNA–drug sensitivity association. To evaluate the effectiveness of MNGACDA, we compare it with six state-of-the-art methods on a benchmark data set under 5-fold and 10-fold cross-validations (5-CV and 10-CV). Experimental results demonstrate that MNGACDA is superior to the existing methods. In addition, we perform an ablation study and compare the results of our method under different views. Finally, case studies are conducted to illustrate the usefulness of MNGACDA in predicting circRNA–drug sensitivity associations in real situations.
Materials and methods
Data sets
We download the data sets from Ref. [9] for our study. In Ref. [9], Deng et al. collected circRNA–drug sensitivity associations from the database circRic [8], in which drug sensitivity data were obtained from the GDSC database [10]. After the Wilcoxon tests with false discovery rate <0.05, we extract these significant circRNA–drug sensitivity associations as a benchmark data set, which contains 4134 associations involving 271 circRNAs and 218 drugs. Based on these associations, we construct an association matrix |$A\in{R}^{271\times 218}$| to denote the relationships between circRNAs and drug sensitivity. For the element in |$A$|, |${A}_{ij}=1$| indicates that the circRNA i and the sensitivity of drug j are interrelated, otherwise |${A}_{ij}=0$|. In addition to the circRNA–drug sensitivity associations, we download the host gene sequences of circRNAs from the National Center for Biotechnology Information (NCBI) Gene Database [11] and the drug structure data from NCBI’s PubChem database [12] for similarity calculation.
Similarity measures
Sequence similarity of host genes of circRNAs
Similar to Ref. [9], sequence similarity between host genes of circRNAs is calculated as similarity between circRNAs. The similarity is calculated based on the Levenshtein distance of the sequences through the ratio function of Python’s Levenshtein package. We use a matrix |$\mathrm{CSS}\in{R}^{M\times M}$| to represent the circRNA sequence similarity, where |$M$| is the number of circRNAs.
Structural similarity of drugs
Based on the structural data of drugs from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/), we use RDKit [13] and Tanimoto to calculate the structural similarity of drugs. We first calculate the topological fingerprints of each drug using RDKit, and then calculate the structural similarity between drugs by the Tanimoto method. Finally, the derived structural similarity matrix of the drugs is denoted as |$\mathrm{DSS}\in{R}^{N\times N}$|, where |$N$| is the number of drugs.
Gaussian interaction profile kernel similarity of circRNAs and drugs
Similarity fusion
MNGACDA
Graph attention networks (GAT) [17] use attention mechanisms in their aggregation process to prioritize neighbors with more relevant information. Several studies in bioinformatics using GAT, such as HGATMDA [18] and MKGAT [19], have achieved impressive predictive performance. Inspired by these achievements, we propose a node-level attention graph auto-encoder model named MNGACDA for predicting potential circRNA–drug sensitivity associations. The proposed model MNGACDA, shown in Figure 1, consists of three main steps:

The flow chart of MNGACDA for predicting circRNA–drug sensitivity associations.
Step 1: using multiple sources of information to construct integrated similarity networks of circRNAs and drugs. In the integrated circRNA (or drug) similarity network, we link only the 25 most similar neighbors of each circRNA (or drug). In addition, we construct a circRNA–drug sensitivity association network based on the known relationships between circRNAs and drug sensitivities.
Step 2: learning and fusing multimodal circRNAs and drugs embedding representations. Since graph convolutional networks [20, 21] and GAT [22, 23] are widely used for representation learning, we apply a node-level attention auto-encoder to fuse the 1st-order neighborhood information from the integrated similarity networks and circRNA–drug association network for learning the embedding representations of circRNAs and drugs.
Step 3: predicting novel circRNA–drug sensitivity associations. The final circRNA–drug sensitivity scoring matrix is calculated by using inner product operation on the embedding representations of circRNAs and drugs. The predicted circRNA–drug sensitivity associations are prioritized according to the final scores.
Node-level attention graph auto-encoder
Let |${C}_N$|, |${D}_N$| and |${A}_N$| denote the integrated circRNA similarity network, the integrated drug similarity network and the circRNA–drug association network, |$\mathrm{CS}$|, |$\mathrm{DS}$| and |$A$| denote the matrices of |${C}_N$|, |${D}_N$| and |${A}_N$|, respectively. Since learning embedding representations on |${C}_N$|, |${D}_N$| and |${A}_N$|is a similar process, we take |${A}_N$| as an example to introduce the process of learning circRNA embedding in MNGACDA.
Finally, we obtained the embedding of the |$l$|th layer |${H}^{(l)}=\{{h}_1^{(l)},{h}_2^{(l)},\dots, {h}_{M+N}^{(l)}\},{h}_i^{(l)}\in{R}^{F^{(l)}}$|.
Residual module
CNN combiner
Inner product decoder
Results
Evaluation metrics
In order to comprehensively evaluate the performance of MNGACDA, we conduct 5-CV and 10-CV experiments on the benchmark data set. In the 5-CV experiments, we first randomly generate the same number of negative circRNA–drug sensitivity pairs as that of the experimentally confirmed positive samples, and then divide them into five subsets with equal numbers. Each subset is used in turn as a test sample set, and the remaining four subsets are used as training sample sets. We repeat this five times to receive reliable results. The same evaluation steps are implemented in the 10-CV experiments.
Performance comparison with other methods under 5-CV and 10-CV experiments
As stated in Ref. [9], current computational methods for circRNA–drug sensitivity association predictions are limited. In order to validate the performance of MNGACDA, we compare it with six state-of-the-art models, i.e. GATECDA [9], MINIMDA [25], LAGCN [26], MMGCN [27], GANLDA [28] and CRPGCN [29]. It should be noted that except GATECDA, the other five well-known methods have been used in other association prediction fields, such as miRNA–disease associations and drug–disease associations.
GATECDA [9]: a computational framework for predicting circRNA–drug sensitivity associations based on a graph attentional auto-encoder.
MINIMDA [25]: predicting miRNA–disease associations using GCN in a multimodal network by fusing higher-order neighborhood information of miRNAs and diseases.
LAGCN [26]: a method integrates known drug–disease associations, drug–drug similarities and disease–disease similarities into a heterogeneous network, and applies graph convolution operation for drug–disease association prediction.
MMGCN [27]: predicting potential miRNA–disease associations using multi-view multichannel attentional graph convolutional networks.
GANLDA [28]: an end-to-end computational model based on GAT to predict associations between lncRNAs and diseases.
CRPGCN [29]: a novel algorithm that is based on GCN constructed with random walk with restart and principal component analysis to predict associations between circRNAs and diseases.
We conduct 5-CV and 10-CV experiments on the data set for prediction performance evaluation, respectively. All methods are compared under the same experimental settings with the recommended optimal parameters in their studies. For the 5-CV experiments, as shown in Figure 2A, the average AUC value of MNGACDA is 0.9139, which is higher than that of other methods by 2.66% (GATECDA), 5.77% (MINIMDA), 6.34% (LAGCN), 3.73% (MMGCN), 6.22% (GANLDA) and 6.52%(CRPGCN), respectively. The results of AUPR are shown in Figure 2B, and we can observe that the average AUPR value of MNGACDA is 0.9209, which is higher than that of other methods by 2.94% (GATECDA), 6.75% (MINIMDA), 7.31% (LAGCN), 5.45% (MMGCN), 7.41% (GANLDA) and 5.25% (CRPGCN), respectively. In addition, the values of other performance metrics, including Accuracy, Precision, Recall, Specificity and F1-Score, are shown in Table 1, and MNGACDA obtains the best results of 0.8472, 0.8424, 0.8723, 0.8155 and 0.8247, respectively.

Method . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|
MNGACDA | 0.8472 | 0.8424 | 0.8723 | 0.8155 | 0.8247 |
GATECDA [9] | 0.8224 | 0.8186 | 0.8404 | 0.7966 | 0.8054 |
MINIMDA [25] | 0.7988 | 0.7901 | 0.8331 | 0.7472 | 0.7684 |
LAGCN [26] | 0.7900 | 0.7786 | 0.8338 | 0.7233 | 0.7516 |
MMGCN [27] | 0.8190 | 0.8183 | 0.8231 | 0.8135 | 0.8156 |
GANLDA [28] | 0.7936 | 0.7822 | 0.8384 | 0.7259 | 0.7542 |
CRPGCN [29] | 0.7899 | 0.7937 | 0.7738 | 0.8135 | 0.8081 |
Method . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|
MNGACDA | 0.8472 | 0.8424 | 0.8723 | 0.8155 | 0.8247 |
GATECDA [9] | 0.8224 | 0.8186 | 0.8404 | 0.7966 | 0.8054 |
MINIMDA [25] | 0.7988 | 0.7901 | 0.8331 | 0.7472 | 0.7684 |
LAGCN [26] | 0.7900 | 0.7786 | 0.8338 | 0.7233 | 0.7516 |
MMGCN [27] | 0.8190 | 0.8183 | 0.8231 | 0.8135 | 0.8156 |
GANLDA [28] | 0.7936 | 0.7822 | 0.8384 | 0.7259 | 0.7542 |
CRPGCN [29] | 0.7899 | 0.7937 | 0.7738 | 0.8135 | 0.8081 |
The bold result indicates the best one in each column.
Method . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|
MNGACDA | 0.8472 | 0.8424 | 0.8723 | 0.8155 | 0.8247 |
GATECDA [9] | 0.8224 | 0.8186 | 0.8404 | 0.7966 | 0.8054 |
MINIMDA [25] | 0.7988 | 0.7901 | 0.8331 | 0.7472 | 0.7684 |
LAGCN [26] | 0.7900 | 0.7786 | 0.8338 | 0.7233 | 0.7516 |
MMGCN [27] | 0.8190 | 0.8183 | 0.8231 | 0.8135 | 0.8156 |
GANLDA [28] | 0.7936 | 0.7822 | 0.8384 | 0.7259 | 0.7542 |
CRPGCN [29] | 0.7899 | 0.7937 | 0.7738 | 0.8135 | 0.8081 |
Method . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|
MNGACDA | 0.8472 | 0.8424 | 0.8723 | 0.8155 | 0.8247 |
GATECDA [9] | 0.8224 | 0.8186 | 0.8404 | 0.7966 | 0.8054 |
MINIMDA [25] | 0.7988 | 0.7901 | 0.8331 | 0.7472 | 0.7684 |
LAGCN [26] | 0.7900 | 0.7786 | 0.8338 | 0.7233 | 0.7516 |
MMGCN [27] | 0.8190 | 0.8183 | 0.8231 | 0.8135 | 0.8156 |
GANLDA [28] | 0.7936 | 0.7822 | 0.8384 | 0.7259 | 0.7542 |
CRPGCN [29] | 0.7899 | 0.7937 | 0.7738 | 0.8135 | 0.8081 |
The bold result indicates the best one in each column.
For the 10-CV experiments, as shown in Figure 3A, MNGACDA receives an average AUC value of 0.9182, which is higher than that of other methods by 2.48% (GATECDA), 4.71% (MINIMDA), 6.15% (LAGCN), 3.70% (MMGCN), 5.59% (GANLDA) and 7.08%(CRPGCN), respectively. The results of AUPR are shown in Figure 3B, and it can be seen that the average AUPR value of MNGACD is 0.9249, which is higher than that of other methods by 2.27% (GATECDA), 5.24% (MINIMDA), 6.99% (LAGCN), 5.37% (MMGCN), 6.06% (GANLDA) and 7.48% (CRPGCN), respectively. The values of other performance metrics, including Accuracy, Precision, Recall, Specificity and F1-Score, are shown in Table 2, and MNGACDA obtains the best results of 0.8519, 0.8498, 0.8646, 0.8350 and 0.8401, respectively. These comprehensive results show that MNGACDA outperforms the six state-of-the-art methods.

Method . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|
MNGACDA | 0.8519 | 0.8498 | 0.8646 | 0.8350 | 0.8401 |
GATECDA [9] | 0.8286 | 0.8273 | 0.8367 | 0.8172 | 0.8227 |
MINIMDA [25] | 0.8129 | 0.8031 | 0.8537 | 0.7526 | 0.7767 |
LAGCN [26] | 0.7994 | 0.7861 | 0.8522 | 0.7199 | 0.7542 |
MMGCN [27] | 0.8241 | 0.8182 | 0.8510 | 0.7855 | 0.7999 |
GANLDA [28] | 0.8065 | 0.7956 | 0.8505 | 0.7407 | 0.7679 |
CRPGCN [29] | 0.7993 | 0.7987 | 0.8007 | 0.7966 | 0.7998 |
Method . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|
MNGACDA | 0.8519 | 0.8498 | 0.8646 | 0.8350 | 0.8401 |
GATECDA [9] | 0.8286 | 0.8273 | 0.8367 | 0.8172 | 0.8227 |
MINIMDA [25] | 0.8129 | 0.8031 | 0.8537 | 0.7526 | 0.7767 |
LAGCN [26] | 0.7994 | 0.7861 | 0.8522 | 0.7199 | 0.7542 |
MMGCN [27] | 0.8241 | 0.8182 | 0.8510 | 0.7855 | 0.7999 |
GANLDA [28] | 0.8065 | 0.7956 | 0.8505 | 0.7407 | 0.7679 |
CRPGCN [29] | 0.7993 | 0.7987 | 0.8007 | 0.7966 | 0.7998 |
Method . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|
MNGACDA | 0.8519 | 0.8498 | 0.8646 | 0.8350 | 0.8401 |
GATECDA [9] | 0.8286 | 0.8273 | 0.8367 | 0.8172 | 0.8227 |
MINIMDA [25] | 0.8129 | 0.8031 | 0.8537 | 0.7526 | 0.7767 |
LAGCN [26] | 0.7994 | 0.7861 | 0.8522 | 0.7199 | 0.7542 |
MMGCN [27] | 0.8241 | 0.8182 | 0.8510 | 0.7855 | 0.7999 |
GANLDA [28] | 0.8065 | 0.7956 | 0.8505 | 0.7407 | 0.7679 |
CRPGCN [29] | 0.7993 | 0.7987 | 0.8007 | 0.7966 | 0.7998 |
Method . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|
MNGACDA | 0.8519 | 0.8498 | 0.8646 | 0.8350 | 0.8401 |
GATECDA [9] | 0.8286 | 0.8273 | 0.8367 | 0.8172 | 0.8227 |
MINIMDA [25] | 0.8129 | 0.8031 | 0.8537 | 0.7526 | 0.7767 |
LAGCN [26] | 0.7994 | 0.7861 | 0.8522 | 0.7199 | 0.7542 |
MMGCN [27] | 0.8241 | 0.8182 | 0.8510 | 0.7855 | 0.7999 |
GANLDA [28] | 0.8065 | 0.7956 | 0.8505 | 0.7407 | 0.7679 |
CRPGCN [29] | 0.7993 | 0.7987 | 0.8007 | 0.7966 | 0.7998 |
Parameter sensitivity analysis
The following parameters exist in MNGACDA: (1) the number of heads T of the multiheaded attention mechanism, (2) the hidden layer embedding dimension F and (3) the number of layers n of the GAT. We implement experiments on the benchmark data set and evaluate the prediction performance under 5-CV for parametric analysis.
First, MNGACDA uses a multiheaded attention mechanism to provide it with more powerful representation learning capability. As shown in Figure 4, the model shows the best performance when the number of attention heads reaches 3. The results show that increasing the number of attention heads can improve the performance of the GAT model within a certain range.

Second, we analyze the effect of varying the hidden layer embedding F. As shown in Figure 5, the best performance was achieved for both AUC and AUPR when F = 128.

Finally, we analyze the effect of the number of graph attention layers on the prediction performance in MNGACDA. As shown in Figure 6, the best AUC and AUPR values are obtained when n is 2. In addition, we observe that the performance of the GAT encoder decreases as the number of graph attention layers increases, and that there still exists good performance when n increases to 4 because of the addition of the residual module to alleviate over-smoothing, but performance decreases sharply when n continues to increase.

Based on the above evaluation, we set T to 3, F to 128 and n to 2 to obtain the best performance. In addition, MNGACDA uses Xavier [30] to initialize the model parameters and Adam [31] as an optimizer (in which learning rate = 0.001, weight decay = 0.005, and the epochs used for training MNGACDA is 1000).
Ablation tests
Since our method MNGACDA utilizes both information of similarity networks and association networks to learn embeddings for circRNAs and drugs. We propose another two embedding extraction models, called MNGACDA-sim and MNGACDA-asso, to evaluate the impact of node embeddings from different modal networks on the prediction performance, respectively. Specifically, MNGACDA-sim learns the embeddings of circRNAs and drugs only from their integrated similarity networks, whereas MNGACDA-asso learns the embedding representations of circRNAs and drugs only from the circRNA–drug association network. We compare the prediction performance of MNGACDA, MNGACDA-asso and MNGACDA-sim under different embedding sizes (F = {32, 64, 128, 256}). Moreover, MNGACA includes three key modules: node-level attention mechanism, residual module and CNN combiner. We separately remove each component in MNGACDA and use 5-CV experiments on the benchmark data set to investigate the impact of different components on prediction ability. The followings are four models we test and compare:
MNGACDA-RES model: preserves the backbone graph attention layer and CNN combiner module, and removes the residual structure from the original framework.
MNGACDA-CNN model: retains the GAT backbone and residual modules, and replaces the CNN combiner module with a simple concatenation operation.
MNGACDA-GAT model: preserves the residual and CNN combiner module, and replaces the GAT backbone with a common GCN backbone.
MNGACDA-LA: to further demonstrate that the CNN combiner module can help us learn complex nonlinear relationships in node features, we keep the GAT backbone and residual modules, and replace the CNN combiner module with the layer attention module of LAGCN [26], which uses layer attention to compute the attention score of each layer embedding, and the final embedding is a linearly weighted sum of each layer.

Effect of node embedding extracted from different networks on prediction.
Method . | AUC . | AUPR . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|---|---|
MNGACDA | 0.9139 | 0.9209 | 0.8472 | 0.8424 | 0.8723 | 0.8155 | 0.8247 |
MNGACDA-LA | 0.9016 | 0.9106 | 0.8374 | 0.8350 | 0.8495 | 0.8205 | 0.8269 |
MNGACDA-CNN | 0.8864 | 0.8971 | 0.8356 | 0.8303 | 0.8626 | 0.7980 | 0.8107 |
MNGACDA-GAT | 0.9055 | 0.9160 | 0.8399 | 0.8377 | 0.8508 | 0.8246 | 0.8307 |
MNGACDA-RES | 0.9120 | 0.9193 | 0.8412 | 0.8384 | 0.8541 | 0.8227 | 0.8302 |
Method . | AUC . | AUPR . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|---|---|
MNGACDA | 0.9139 | 0.9209 | 0.8472 | 0.8424 | 0.8723 | 0.8155 | 0.8247 |
MNGACDA-LA | 0.9016 | 0.9106 | 0.8374 | 0.8350 | 0.8495 | 0.8205 | 0.8269 |
MNGACDA-CNN | 0.8864 | 0.8971 | 0.8356 | 0.8303 | 0.8626 | 0.7980 | 0.8107 |
MNGACDA-GAT | 0.9055 | 0.9160 | 0.8399 | 0.8377 | 0.8508 | 0.8246 | 0.8307 |
MNGACDA-RES | 0.9120 | 0.9193 | 0.8412 | 0.8384 | 0.8541 | 0.8227 | 0.8302 |
Method . | AUC . | AUPR . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|---|---|
MNGACDA | 0.9139 | 0.9209 | 0.8472 | 0.8424 | 0.8723 | 0.8155 | 0.8247 |
MNGACDA-LA | 0.9016 | 0.9106 | 0.8374 | 0.8350 | 0.8495 | 0.8205 | 0.8269 |
MNGACDA-CNN | 0.8864 | 0.8971 | 0.8356 | 0.8303 | 0.8626 | 0.7980 | 0.8107 |
MNGACDA-GAT | 0.9055 | 0.9160 | 0.8399 | 0.8377 | 0.8508 | 0.8246 | 0.8307 |
MNGACDA-RES | 0.9120 | 0.9193 | 0.8412 | 0.8384 | 0.8541 | 0.8227 | 0.8302 |
Method . | AUC . | AUPR . | F1-Score . | Accuracy . | Recall . | Specificity . | Precision . |
---|---|---|---|---|---|---|---|
MNGACDA | 0.9139 | 0.9209 | 0.8472 | 0.8424 | 0.8723 | 0.8155 | 0.8247 |
MNGACDA-LA | 0.9016 | 0.9106 | 0.8374 | 0.8350 | 0.8495 | 0.8205 | 0.8269 |
MNGACDA-CNN | 0.8864 | 0.8971 | 0.8356 | 0.8303 | 0.8626 | 0.7980 | 0.8107 |
MNGACDA-GAT | 0.9055 | 0.9160 | 0.8399 | 0.8377 | 0.8508 | 0.8246 | 0.8307 |
MNGACDA-RES | 0.9120 | 0.9193 | 0.8412 | 0.8384 | 0.8541 | 0.8227 | 0.8302 |
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
---|---|---|---|---|---|
1 | CALD* | CTRP | 11 | FKBP10* | CTRP |
2 | ANP32B* | CTRP | 12 | ZFP36L1* | CTRP |
3 | JUP | Nonsignificant | 13 | ARID1A* | CTRP |
4 | NOP53* | CTRP | 14 | HSP90B1 | Nonsignificant |
5 | FN1* | CTRP | 15 | LGALS3BP* | CTRP |
6 | FLNA* | CTRP | 16 | PYGB* | CTRP |
7 | PLEC* | CTRP | 17 | CXCL1* | CTRP |
8 | ANKRD36C* | CTRP | 18 | PRRC2A* | CTRP |
9 | CTSB* | CTRP | 19 | GRN* | CTRP |
10 | ALDH3A2* | CTRP | 20 | ESRP2 | Nonsignificant |
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
---|---|---|---|---|---|
1 | CALD* | CTRP | 11 | FKBP10* | CTRP |
2 | ANP32B* | CTRP | 12 | ZFP36L1* | CTRP |
3 | JUP | Nonsignificant | 13 | ARID1A* | CTRP |
4 | NOP53* | CTRP | 14 | HSP90B1 | Nonsignificant |
5 | FN1* | CTRP | 15 | LGALS3BP* | CTRP |
6 | FLNA* | CTRP | 16 | PYGB* | CTRP |
7 | PLEC* | CTRP | 17 | CXCL1* | CTRP |
8 | ANKRD36C* | CTRP | 18 | PRRC2A* | CTRP |
9 | CTSB* | CTRP | 19 | GRN* | CTRP |
10 | ALDH3A2* | CTRP | 20 | ESRP2 | Nonsignificant |
Note: circRNAs marked with ‘*’ are verified.
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
---|---|---|---|---|---|
1 | CALD* | CTRP | 11 | FKBP10* | CTRP |
2 | ANP32B* | CTRP | 12 | ZFP36L1* | CTRP |
3 | JUP | Nonsignificant | 13 | ARID1A* | CTRP |
4 | NOP53* | CTRP | 14 | HSP90B1 | Nonsignificant |
5 | FN1* | CTRP | 15 | LGALS3BP* | CTRP |
6 | FLNA* | CTRP | 16 | PYGB* | CTRP |
7 | PLEC* | CTRP | 17 | CXCL1* | CTRP |
8 | ANKRD36C* | CTRP | 18 | PRRC2A* | CTRP |
9 | CTSB* | CTRP | 19 | GRN* | CTRP |
10 | ALDH3A2* | CTRP | 20 | ESRP2 | Nonsignificant |
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
---|---|---|---|---|---|
1 | CALD* | CTRP | 11 | FKBP10* | CTRP |
2 | ANP32B* | CTRP | 12 | ZFP36L1* | CTRP |
3 | JUP | Nonsignificant | 13 | ARID1A* | CTRP |
4 | NOP53* | CTRP | 14 | HSP90B1 | Nonsignificant |
5 | FN1* | CTRP | 15 | LGALS3BP* | CTRP |
6 | FLNA* | CTRP | 16 | PYGB* | CTRP |
7 | PLEC* | CTRP | 17 | CXCL1* | CTRP |
8 | ANKRD36C* | CTRP | 18 | PRRC2A* | CTRP |
9 | CTSB* | CTRP | 19 | GRN* | CTRP |
10 | ALDH3A2* | CTRP | 20 | ESRP2 | Nonsignificant |
Note: circRNAs marked with ‘*’ are verified.
As shown in Figure 7, the AUC and AUPR values of MNGACDA with different embedding sizes are consistently higher than those of MNGACDA-asso and MNGACDA-sim, which demonstrates the effectiveness of integrating different modal networks to improve the model performance. As shown in Table 3, the performance of MNGACDA-RES is only slightly higher than MNGACDA in Specificity and Precision, whereas all other performance metrics are worse than MNGACDA, which indicates that the introduction of the residual module can train a more efficient deep neural network, and the performance of MNGACDA-CNN is also worse compared with MNGACDA, which indicates that the CNN combiner module can effectively learn the complex nonlinear relationships in the node features. MNGACDA-LA is also only slightly better than MNGACDA in terms of specificity and accuracy, because the CNN combination module takes into account the connections that exist between embeddings, rather than calculating the attention scores for each layer of embeddings independently. MNGACDA-GAT also performs less well than MNGACDA, which implies the effectiveness of introducing a node-level attention mechanism, in which attention coefficients are used to adjust the priority of messages sent by different neighboring nodes to aggregate and update node embeddings, therefore facilitates GAT to achieve better prediction performance than normal GCN.
Case studies
In this section, we first apply the model MNGACDA to all known information used in this study for new association predictions. Because the known associations are from the database GDSC [10], we search another independent database CTRP [32] for identifying predicted new associations. We select the Top 20 inferred results with the highest scores from two drugs (Vorinostat and PAC-1) for validation.
Vorinostat, a synthetic hydroxamic acid derivative with antitumor activity, is approved for refractory or relapsed cutaneous T-cell lymphoma [33]. As shown in Table 4, 17 circRNAs have been confirmed in CTRP among the Top 20 results predicted to be associated with the drug Vorinostat. Meanwhile, PAC-1 is a potent procaspase-3 activator, inducing apoptosis in cancer cells and non-cancer cells dependent on procaspase-3 concentration [34]. We list the Top 20 predicted PAC-1-related circRNAs in Table 5, and discover that 15 circRNAs have been identified in CTRP.
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
---|---|---|---|---|---|
1 | POLR2A* | CTRP | 11 | FKBP10* | CTRP |
2 | VIM* | CTRP | 12 | EHBP1L1 | Nonsignificant |
3 | TCOF1 | Nonsignificant | 13 | EFEMP1* | CTRP |
4 | THBS1* | CTRP | 14 | COL1A1* | CTRP |
5 | ENO2 | Nonsignificant | 15 | AATF | Nonsignificant |
6 | MEF2D* | CTRP | 16 | CRIM1* | CTRP |
7 | FBLN1* | CTRP | 17 | CRIM1* | CTRP |
8 | NCL | Nonsignificant | 18 | ANKRD36* | CTRP |
9 | COL1A2* | CTRP | 19 | CTSB* | CTRP |
10 | ANP32B* | CTRP | 20 | COL6A2* | CTRP |
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
---|---|---|---|---|---|
1 | POLR2A* | CTRP | 11 | FKBP10* | CTRP |
2 | VIM* | CTRP | 12 | EHBP1L1 | Nonsignificant |
3 | TCOF1 | Nonsignificant | 13 | EFEMP1* | CTRP |
4 | THBS1* | CTRP | 14 | COL1A1* | CTRP |
5 | ENO2 | Nonsignificant | 15 | AATF | Nonsignificant |
6 | MEF2D* | CTRP | 16 | CRIM1* | CTRP |
7 | FBLN1* | CTRP | 17 | CRIM1* | CTRP |
8 | NCL | Nonsignificant | 18 | ANKRD36* | CTRP |
9 | COL1A2* | CTRP | 19 | CTSB* | CTRP |
10 | ANP32B* | CTRP | 20 | COL6A2* | CTRP |
Note: circRNAs marked with ‘*’ are verified.
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
---|---|---|---|---|---|
1 | POLR2A* | CTRP | 11 | FKBP10* | CTRP |
2 | VIM* | CTRP | 12 | EHBP1L1 | Nonsignificant |
3 | TCOF1 | Nonsignificant | 13 | EFEMP1* | CTRP |
4 | THBS1* | CTRP | 14 | COL1A1* | CTRP |
5 | ENO2 | Nonsignificant | 15 | AATF | Nonsignificant |
6 | MEF2D* | CTRP | 16 | CRIM1* | CTRP |
7 | FBLN1* | CTRP | 17 | CRIM1* | CTRP |
8 | NCL | Nonsignificant | 18 | ANKRD36* | CTRP |
9 | COL1A2* | CTRP | 19 | CTSB* | CTRP |
10 | ANP32B* | CTRP | 20 | COL6A2* | CTRP |
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
---|---|---|---|---|---|
1 | POLR2A* | CTRP | 11 | FKBP10* | CTRP |
2 | VIM* | CTRP | 12 | EHBP1L1 | Nonsignificant |
3 | TCOF1 | Nonsignificant | 13 | EFEMP1* | CTRP |
4 | THBS1* | CTRP | 14 | COL1A1* | CTRP |
5 | ENO2 | Nonsignificant | 15 | AATF | Nonsignificant |
6 | MEF2D* | CTRP | 16 | CRIM1* | CTRP |
7 | FBLN1* | CTRP | 17 | CRIM1* | CTRP |
8 | NCL | Nonsignificant | 18 | ANKRD36* | CTRP |
9 | COL1A2* | CTRP | 19 | CTSB* | CTRP |
10 | ANP32B* | CTRP | 20 | COL6A2* | CTRP |
Note: circRNAs marked with ‘*’ are verified.
To further evaluate the performance of MNGACDA in predicting potential circRNAs associated with new drugs, we choose two drugs (Crizotinib and Bortezomib) with only one known circRNA–drug association in the data set for de novo testing. For each drug, we remove the only association information and consider it as a new drug, whereas the other existing associations are used as training samples. We prioritize the associated circRNAs according to the final association scores.
Crizotinib is an orally available aminopyridine-based inhibitor of the receptor tyrosine kinase anaplastic lymphoma kinase and the c-Met/hepatocyte growth factor receptor with antineoplastic activity [35]. Bortezomib is a proteasome inhibitor and antineoplastic agent that is used in treatment of refractory multiple myeloma and certain lymphomas [36]. As shown in Table 6, 8 out of the Top 10 predicted circRNAs associated with Crizotinib have been identified in CTRP, and 5 of the Top 10 circRNAs associated with Bortezomib have been identified in CTRP.
The Top 10 predicted circRNAs associated with the two new drugs Crizotinib and Bortezomib
Crizotinib . | Bortezomib . | ||||
---|---|---|---|---|---|
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
1 | POLR2A* | CTRP | 1 | POLR2A | Nonsignificant |
2 | VIM* | CTRP | 2 | THBS1* | CTRP |
3 | THBS1* | CTRP | 3 | ANP32B | Nonsignificant |
4 | ANP32B* | CTRP | 4 | ENO2 | Nonsignificant |
5 | ENO2 | Nonsignificant | 5 | ANKRD36 | Nonsignificant |
6 | ANKRD36* | CTRP | 6 | SPINT2* | CTRP |
7 | SPINT2* | CTRP | 7 | FBLN1* | CTRP |
8 | TCOF1 | Nonsignificant | 8 | TCOF1 | Nonsignificant |
9 | FBLN1* | CTRP | 9 | EFEMP1* | CTRP |
10 | EFEMP1* | CTRP | 10 | MEF2D* | CTRP |
Crizotinib . | Bortezomib . | ||||
---|---|---|---|---|---|
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
1 | POLR2A* | CTRP | 1 | POLR2A | Nonsignificant |
2 | VIM* | CTRP | 2 | THBS1* | CTRP |
3 | THBS1* | CTRP | 3 | ANP32B | Nonsignificant |
4 | ANP32B* | CTRP | 4 | ENO2 | Nonsignificant |
5 | ENO2 | Nonsignificant | 5 | ANKRD36 | Nonsignificant |
6 | ANKRD36* | CTRP | 6 | SPINT2* | CTRP |
7 | SPINT2* | CTRP | 7 | FBLN1* | CTRP |
8 | TCOF1 | Nonsignificant | 8 | TCOF1 | Nonsignificant |
9 | FBLN1* | CTRP | 9 | EFEMP1* | CTRP |
10 | EFEMP1* | CTRP | 10 | MEF2D* | CTRP |
Note: circRNAs marked with ‘*’ are verified.
The Top 10 predicted circRNAs associated with the two new drugs Crizotinib and Bortezomib
Crizotinib . | Bortezomib . | ||||
---|---|---|---|---|---|
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
1 | POLR2A* | CTRP | 1 | POLR2A | Nonsignificant |
2 | VIM* | CTRP | 2 | THBS1* | CTRP |
3 | THBS1* | CTRP | 3 | ANP32B | Nonsignificant |
4 | ANP32B* | CTRP | 4 | ENO2 | Nonsignificant |
5 | ENO2 | Nonsignificant | 5 | ANKRD36 | Nonsignificant |
6 | ANKRD36* | CTRP | 6 | SPINT2* | CTRP |
7 | SPINT2* | CTRP | 7 | FBLN1* | CTRP |
8 | TCOF1 | Nonsignificant | 8 | TCOF1 | Nonsignificant |
9 | FBLN1* | CTRP | 9 | EFEMP1* | CTRP |
10 | EFEMP1* | CTRP | 10 | MEF2D* | CTRP |
Crizotinib . | Bortezomib . | ||||
---|---|---|---|---|---|
Ranking . | circRNA . | Evidence . | Ranking . | circRNA . | Evidence . |
1 | POLR2A* | CTRP | 1 | POLR2A | Nonsignificant |
2 | VIM* | CTRP | 2 | THBS1* | CTRP |
3 | THBS1* | CTRP | 3 | ANP32B | Nonsignificant |
4 | ANP32B* | CTRP | 4 | ENO2 | Nonsignificant |
5 | ENO2 | Nonsignificant | 5 | ANKRD36 | Nonsignificant |
6 | ANKRD36* | CTRP | 6 | SPINT2* | CTRP |
7 | SPINT2* | CTRP | 7 | FBLN1* | CTRP |
8 | TCOF1 | Nonsignificant | 8 | TCOF1 | Nonsignificant |
9 | FBLN1* | CTRP | 9 | EFEMP1* | CTRP |
10 | EFEMP1* | CTRP | 10 | MEF2D* | CTRP |
Note: circRNAs marked with ‘*’ are verified.
Conclusion
Recent studies have shown that circRNAs play a critical role in affecting drug sensitivity. Predicting circRNA–drug sensitivity associations can facilitate drug discovery, thus contributing to the treatment of diseases. In this study, we propose a deep learning-based approach called MNGACDA to discover potential circRNA–drug sensitivity associations. In order to validate the effectiveness of our model, we compare MNGACDA with six state-of-the-art methods on a benchmark data set based on 5-CV and 10-CV, and MNGACDA achieves the best prediction results. Furthermore, most of the top results of case studies are validated by independent database, suggesting that MNGACDA is an effective tool for predicting new circRNA–drug sensitivity associations.
It should be noted that the number of circRNA–drug sensitivity associations validated by biological experiments is small, which might produce biased prediction results. Integrating more experimentally supported circRNA–drug sensitivity associations would make the predictions more reliable. Meanwhile, similarity measures are a key factor in determining model accuracy in this study. In future work, we plan to incorporate more sources of biomedical date to construct more comprehensive similarities. Finally, current computational methods to investigate the relationships between circRNAs and drug sensitivity are limited. More efforts are in need in this research field.
We propose a computational method MNGACDA for circRNA–drug sensitivity association inference, in which embedded representations of circRNAs and drugs are learnt from multimodal networks.
MNGACDA adaptively captures the internal information between nodes in the multimodal networks through a node-level attention graph auto-encoder.
Experimental results on the benchmark data set show that MNGACDA outperforms other state-of-the-art methods. In addition, case studies suggest that it is an effective tool for predicting potential circRNA–drug sensitivity associations.
Authors’ contribution
H.C. conceived and designed this study. B.Y. implemented the experiments. B.Y. and H.C. analyzed the results. B.Y. and H.C. wrote the manuscript. Both authors read and approved the final manuscript.
Data availability
The data sets and source codes used in this study are freely available at https://github.com/youngbo9i/MNGACDA.
Acknowledgements
We would like to thank Zixuan Liu at School of Software, Xinjiang University for useful discussion.
Funding
National Natural Science Foundation of China (61862026).
Bo Yang is a graduate student at School of Software, East China Jiaotong University. His research interest is deep learning and bioinformatics.
Hailin Chen, PhD, is an associate professor at School of Software, East China Jiaotong University. His research interest includes data mining and bioinformatics.