-
PDF
- Split View
-
Views
-
Cite
Cite
Yang Li, Xue-Gang Hu, Lei Wang, Pei-Pei Li, Zhu-Hong You, MNMDCDA: prediction of circRNA–disease associations by learning mixed neighborhood information from multiple distances, Briefings in Bioinformatics, Volume 23, Issue 6, November 2022, bbac479, https://doi.org/10.1093/bib/bbac479
- Share Icon Share
Abstract
Emerging evidence suggests that circular RNA (circRNA) is an important regulator of a variety of pathological processes and serves as a promising biomarker for many complex human diseases. Nevertheless, there are relatively few known circRNA–disease associations, and uncovering new circRNA–disease associations by wet-lab methods is time consuming and costly. Considering the limitations of existing computational methods, we propose a novel approach named MNMDCDA, which combines high-order graph convolutional networks (high-order GCNs) and deep neural networks to infer associations between circRNAs and diseases. Firstly, we computed different biological attribute information of circRNA and disease separately and used them to construct multiple multi-source similarity networks. Then, we used the high-order GCN algorithm to learn feature embedding representations with high-order mixed neighborhood information of circRNA and disease from the constructed multi-source similarity networks, respectively. Finally, the deep neural network classifier was implemented to predict associations of circRNAs with diseases. The MNMDCDA model obtained AUC scores of 95.16%, 94.53%, 89.80% and 91.83% on four benchmark datasets, i.e., CircR2Disease, CircAtlas v2.0, Circ2Disease and CircRNADisease, respectively, using the 5-fold cross-validation approach. Furthermore, 25 of the top 30 circRNA–disease pairs with the best scores of MNMDCDA in the case study were validated by recent literature. Numerous experimental results indicate that MNMDCDA can be used as an effective computational tool to predict circRNA–disease associations and can provide the most promising candidates for biological experiments.
Introduction
Circular RNA (circRNA), a novel member of the noncoding cancer genome, is a single-stranded endogenous non-coding RNA (ncRNA) molecule with a continuous loop structure [1, 2], which is generated by a back splicing event between the downstream 5′ end splice site and the upstream 3′ end splice site [3, 4]. CircRNA was first discovered by Sanger and colleagues in the 1970s, and they observed the presence of circRNA in Sendai viruses and plant infection viruses by using electron microscopy [5, 6]. Subsequently, in the 1990s, endogenous circRNAs were first identified in human cells and large and abundant circRNAs were also found in the Sry gene of mouse [7, 8]. Despite the early discovery of circRNAs, they have long been ignored as a by-product of ‘shear noise’ or abnormal shear and have not attracted much attention from scholars. It was not until 2013 when two research papers on circRNAs were published in Nature that the mystery of circRNAs was unveiled and people began to really understand them [3, 9]. These studies demonstrate that circRNA is not a ‘splicing by-product’ of messenger RNA, but a class of RNA molecules that play important roles in cells, and it has significant biological functions. Therefore, the development of computational biology methods and deep sequencing technologies has made circRNA a frontier of research in the RNA field, which is crucial to reveal new functions and potential roles of circRNAs [10, 11].
Recently, computational approaches have emerged as an effective strategy to infer circRNA–disease associations to overcome the inherent shortcomings of wet-lab approaches [12]. Moreover, prioritizing the most promising circRNAs for candidate diseases by computational methods will also help to discover their molecular behavior in the identification of viral circRNAs as well as in carcinogenesis [13]. For instance, Xiao et al. [14] proposed a computational model called iCDA-CMG (identifying circRNA-disease associations-collective matrix completion with graph learning) based on collective matrix completion and graph learning algorithm to identify circRNA–disease associations. Zheng et al. [15] developed an approach with chaos game representation and support vector machine (SVM) to infer unobserved associations between circRNAs and diseases. Yang et al. [16] employed accelerated attribute network embedding and stacked auto-encoder algorithms to obtain feature representations of circRNA and disease and then used XGBoost classifier to obtain prediction results. Lu et al. [17] designed a CDASOR model, which adopted a convolutional neural network coupled with a bidirectional long short-term memory network to discover the underlying circRNAs with target diseases.
Although the above models have their own advantages and have achieved encouraging results. However, it is worth noting that they still have some problems: (1) Existing models are based on incompletely correlated biological information, with redundancy and noise between data, and they do not sufficiently fuse circRNA similarity and disease similarity. (2) The current circRNA–disease association data validated by wet-lab experiments are limited. The constructed circRNA–disease association networks are relatively sparse, with many false positive and false negative associations among their descriptors, and thus, the prediction models cannot be fully trained. (3) Most of the existing models were proposed based on one dataset only, and the generalization performance of the prediction models was not verified on other circRNA–disease association datasets.
To overcome these problems, we present a novel computational model called MNMDCDA that fuses mixed neighborhood information of circRNAs and diseases from multi-source similarity networks, which combines a high-order graph convolutional network (high-order GCN) and a deep neural network (DNN) to predict circRNA–disease associations. This model can use high-order GCN to overcome the problem that the original GCN cannot obtain high-order neighborhood information of each node from its neighbors at different distances in the network. It can learn the high-order mixed neighborhood embedding representations of circRNAs and diseases in a specific way.
Specifically, in the first step, we calculated the similarities for circRNA pairs and disease pairs and constructed 12 multi-source similarity networks by integrating various biological data information from four gold standard datasets. In the second step, based on each multi-source similarity network, we extracted the low-dimensional feature representations of circRNAs and diseases using the high-order GCN algorithm to learn the higher-order mixed neighborhood information of each node. In the third step, we introduced a DNN as a binary classifier to accurately identify the potential associations between circRNAs and diseases. The framework of MNMDCDA is shown in Figure 1.

The framework of MNMDCDA model for predicting the potential circRNA–disease associations.
In brief, the main contributions of the MNMDCDA model are as follows:
(i) We comprehensively utilize more attribute information of circRNAs and diseases and construct 12 multi-source similarity networks. It fuses these attributes from different perspectives to better describe the biological information of circRNAs and diseases.
(ii) We obtained higher-order neighborhood embedding representations of these attribute information from the network using the higher-order GCN algorithm and extracted advanced features. Thus, the hidden information contained in these networks is mined as much as possible to fully train the MNMDCDA model.
(iii) The MNMDCDA model was tested on three other benchmark datasets with the same experiments to further verify the generalization performance of the model. Furthermore, 25 of the top 30 disease-related circRNAs predicted by the MNMDCDA model in case studies have been validated by the latest published literature.
Materials
Gold standard dataset
Disease feature construction
Gaussian interaction profile kernel-based disease similarity
Medical subject heading-based disease semantic similarity
In this study, we used the Medical Subject Headings (MeSH) database [20, 21] to construct semantic similarity of diseases. MeSH is an authoritative, extensible biomedical subject heading that provides a rigorous classification of all diseases, which helps to calculate the semantic similarity of diseases. MeSH is available at https://www.nlm.nih.gov/. According to previous studies, we can use the semantic information of the MeSH database to construct a directed acyclic graph (DAG) to reflect the relationship among various diseases well. In the directed DAG, the nodes represent diseases and the directed edges indicate the relationships between diseases.
Disease Ontology-based disease semantic similarity
The Disease Ontology (DO) [22] can be organized as a DAG so that the semantic similarities among diseases can be computed based on their corresponding DO terms. The DO term for each disease is retrieved from http://disease-ontology.org/. Then, we measure the semantic similarities between two diseases following Wang’s method described, and the detailed calculation steps are described in the literature [23]. To distinguish, we use DSDO(d(i),d(j)) to represent the DO-based semantic similarity between two diseases d(i) and d(j).
Cosine similarity of disease
CircRNA feature construction
GIPK-based circRNA similarity
CircRNA functional similarity
Cosine similarity of circRNA
Multi-similarity matrix fusion
To fully utilize the information from different sources, we adopted a multi-similarity matrix fusion method to fuse circRNA similarity information and disease similarity information to realize feature complementation. The advantage of the fused information is that it not only reduces the potential shortcomings caused by single features but also absorbs the characteristics of different data sources.
Finally, the circRNA fusional similarity network corresponding to the matrix CFus(c(i),c(j)) is CN, while the disease fusional similarity network corresponding to the matrix DFus(d(i),d(j)) is DN.
Feature embedding of high-order GCNs
Deep neural network
In the input and hidden layers, we employed the ReLU [28] function (f(x) = max(0,x)) as the activation function of the model. In the output layer, we employed Sigmoid [29] function (f(x) = 1/1 + e-x) as the activation function to activate the DNN to obtain the probability score of circRNA–disease pairs, which was used to estimate the probability of association between circRNA and disease. The higher the score, the higher the association between circRNA and disease.
We used the binary cross-entropy as the loss function to judge whether the model is good or bad for the prediction results. In addition, to accelerate the training process and avoid overfitting, the Adam algorithm [30] is used to optimize the binary cross-entropy loss, and the Dropout technique [31] is also used in the input and hidden layers to further avoid overfitting of the proposed model.
Experimental results
Evaluation indicators
Evaluate model performance
In the training of high-order GCN, weight decay = 0.001, learning rate = 0.001, activation function = ReLU, number of neighbors = 20 and maximum order P of high-order GCN = 4. In the prediction using DNN, we used three layers of DNN. In the first and second layers we use 256 neurons, activation function = ReLU, dropout rate = 0.5. In the third layer we use 1 neuron, activation function = sigmoid. Meanwhile, Adam algorithm is utilized to optimize the binary cross-entropy loss function. Since the maximum order p of the high-order GCN determines the farthest distance that the nodes can obtain mixed information from their neighbors in the network learning, which greatly affects the performance of the prediction model. Therefore, to achieve the best prediction performance, we need to optimize the maximum order p of the high-order GCN to choose the appropriate order. The prediction results of the proposed model at different orders are given in Table 1. To visualize the prediction results, a line graph of the prediction performance of the proposed model at different orders is given in Figure 2. From these results, we can find that the proposed model obtains the highest AUC score of 95.16% at p = 4. Finally, we select the maximum order p = 4 for the high-order GCN to conduct the experiment in this study.
p . | 0 . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . |
---|---|---|---|---|---|---|---|---|---|
AUC (%) | 91.90 | 92.69 | 93.19 | 94.18 | 95.16 | 94.40 | 94.36 | 93.79 | 93.81 |
p . | 0 . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . |
---|---|---|---|---|---|---|---|---|---|
AUC (%) | 91.90 | 92.69 | 93.19 | 94.18 | 95.16 | 94.40 | 94.36 | 93.79 | 93.81 |
p . | 0 . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . |
---|---|---|---|---|---|---|---|---|---|
AUC (%) | 91.90 | 92.69 | 93.19 | 94.18 | 95.16 | 94.40 | 94.36 | 93.79 | 93.81 |
p . | 0 . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . |
---|---|---|---|---|---|---|---|---|---|
AUC (%) | 91.90 | 92.69 | 93.19 | 94.18 | 95.16 | 94.40 | 94.36 | 93.79 | 93.81 |

Line graph of the prediction performance of the model at different orders.
In the experiment, we utilized the 5-fold cross-validation approach to evaluate the prediction performance of the proposed MNMDCDA model on CircR2Disease dataset. The detailed experimental results of the 5-fold cross-validation are summarized in Table 2. From Table 2, we can see that the MNMDCDA model obtained an average accuracy of 88.69%. The average experimental results of MNMDCDA on Sen., Pre., F1, MCC and AUC were 94.07%, 85.00%, 89.28%, 77.87% and 95.16%, respectively, with their corresponding standard deviations of 1.93%, 3.00%, 2.11%, 4.57% and 1.84%, respectively. In addition, we also plotted the ROC curves generated by the MNMDCDA method using 5-fold cross-validation on the CircR2Disease dataset, as shown in Figure 3.
Model . | Testing set . | Acc. (%) . | Sen. (%) . | Pre. (%) . | F1 (%) . | MCC (%) . | AUC (%) . |
---|---|---|---|---|---|---|---|
Our model | 1 | 91.03 | 95.86 | 87.42 | 91.45 | 82.45 | 97.79 |
2 | 88.62 | 94.48 | 84.57 | 89.25 | 77.78 | 94.89 | |
3 | 88.97 | 95.86 | 84.24 | 89.68 | 78.68 | 95.65 | |
4 | 84.83 | 91.72 | 80.61 | 85.81 | 70.33 | 92.67 | |
5 | 90.00 | 92.41 | 88.16 | 90.24 | 80.09 | 94.79 | |
Average | 88.69 | 94.07 | 85.00 | 89.28 | 77.87 | 95.16 | |
Standard deviation | 2.36 | 1.93 | 3.00 | 2.11 | 4.57 | 1.84 | |
Cosine similarity model | 1 | 81.38 | 86.21 | 78.62 | 82.24 | 63.05 | 91.52 |
2 | 85.52 | 88.28 | 83.66 | 85.91 | 71.14 | 94.11 | |
3 | 84.48 | 86.21 | 83.33 | 84.75 | 69.01 | 93.28 | |
4 | 82.76 | 82.07 | 83.22 | 82.64 | 65.52 | 89.65 | |
5 | 86.55 | 86.21 | 86.81 | 86.51 | 73.11 | 92.22 | |
Average | 84.14 | 85.79 | 83.13 | 84.41 | 68.37 | 92.16 | |
Standard deviation | 2.08 | 2.27 | 2.92 | 1.91 | 4.09 | 1.72 | |
DO-based disease semantic similarity model | 1 | 87.59 | 95.86 | 82.25 | 88.54 | 76.22 | 93.72 |
2 | 87.93 | 95.86 | 82.74 | 88.82 | 76.83 | 94.09 | |
3 | 86.55 | 97.93 | 79.78 | 87.93 | 75.07 | 95.01 | |
4 | 83.79 | 87.59 | 81.41 | 84.39 | 67.78 | 91.60 | |
5 | 83.79 | 88.28 | 81.01 | 84.49 | 67.86 | 91.45 | |
Average | 85.93 | 93.10 | 81.44 | 86.83 | 72.75 | 93.17 | |
Standard deviation | 2.02 | 4.80 | 1.15 | 2.21 | 4.55 | 1.58 | |
DA model | Average | 67.66 | 73.10 | 65.93 | 69.32 | 35.53 | 70.61 |
Standard deviation | 1.85 | 2.34 | 1.71 | 1.80 | 3.71 | 3.30 | |
LR model | Average | 69.38 | 74.34 | 67.65 | 70.82 | 38.97 | 71.43 |
Standard deviation | 1.49 | 2.09 | 1.54 | 1.44 | 2.99 | 3.44 | |
NB model | Average | 66.07 | 54.34 | 70.73 | 61.39 | 33.01 | 73.37 |
Standard deviation | 5.01 | 7.85 | 5.12 | 6.87 | 9.88 | 4.85 | |
KNN model | Average | 82.21 | 92.69 | 76.66 | 83.91 | 65.88 | 92.08 |
Standard deviation | 2.78 | 2.16 | 2.73 | 2.39 | 5.43 | 1.63 | |
SVM model | Average | 84.97 | 86.76 | 83.86 | 85.25 | 70.04 | 94.37 |
Standard deviation | 1.73 | 1.65 | 3.05 | 1.44 | 3.37 | 1.27 | |
DT model | Average | 85.24 | 86.90 | 84.26 | 85.46 | 70.71 | 91.01 |
Standard deviation | 0.66 | 4.20 | 2.52 | 0.95 | 1.41 | 1.25 | |
Adboost model | Average | 86.41 | 94.62 | 81.27 | 87.41 | 73.93 | 92.26 |
Standard deviation | 2.13 | 4.35 | 1.11 | 2.25 | 4.77 | 1.76 | |
RF model | Average | 87.24 | 91.45 | 84.37 | 87.74 | 74.80 | 94.30 |
Standard deviation | 1.40 | 2.87 | 1.28 | 1.46 | 2.95 | 0.84 |
Model . | Testing set . | Acc. (%) . | Sen. (%) . | Pre. (%) . | F1 (%) . | MCC (%) . | AUC (%) . |
---|---|---|---|---|---|---|---|
Our model | 1 | 91.03 | 95.86 | 87.42 | 91.45 | 82.45 | 97.79 |
2 | 88.62 | 94.48 | 84.57 | 89.25 | 77.78 | 94.89 | |
3 | 88.97 | 95.86 | 84.24 | 89.68 | 78.68 | 95.65 | |
4 | 84.83 | 91.72 | 80.61 | 85.81 | 70.33 | 92.67 | |
5 | 90.00 | 92.41 | 88.16 | 90.24 | 80.09 | 94.79 | |
Average | 88.69 | 94.07 | 85.00 | 89.28 | 77.87 | 95.16 | |
Standard deviation | 2.36 | 1.93 | 3.00 | 2.11 | 4.57 | 1.84 | |
Cosine similarity model | 1 | 81.38 | 86.21 | 78.62 | 82.24 | 63.05 | 91.52 |
2 | 85.52 | 88.28 | 83.66 | 85.91 | 71.14 | 94.11 | |
3 | 84.48 | 86.21 | 83.33 | 84.75 | 69.01 | 93.28 | |
4 | 82.76 | 82.07 | 83.22 | 82.64 | 65.52 | 89.65 | |
5 | 86.55 | 86.21 | 86.81 | 86.51 | 73.11 | 92.22 | |
Average | 84.14 | 85.79 | 83.13 | 84.41 | 68.37 | 92.16 | |
Standard deviation | 2.08 | 2.27 | 2.92 | 1.91 | 4.09 | 1.72 | |
DO-based disease semantic similarity model | 1 | 87.59 | 95.86 | 82.25 | 88.54 | 76.22 | 93.72 |
2 | 87.93 | 95.86 | 82.74 | 88.82 | 76.83 | 94.09 | |
3 | 86.55 | 97.93 | 79.78 | 87.93 | 75.07 | 95.01 | |
4 | 83.79 | 87.59 | 81.41 | 84.39 | 67.78 | 91.60 | |
5 | 83.79 | 88.28 | 81.01 | 84.49 | 67.86 | 91.45 | |
Average | 85.93 | 93.10 | 81.44 | 86.83 | 72.75 | 93.17 | |
Standard deviation | 2.02 | 4.80 | 1.15 | 2.21 | 4.55 | 1.58 | |
DA model | Average | 67.66 | 73.10 | 65.93 | 69.32 | 35.53 | 70.61 |
Standard deviation | 1.85 | 2.34 | 1.71 | 1.80 | 3.71 | 3.30 | |
LR model | Average | 69.38 | 74.34 | 67.65 | 70.82 | 38.97 | 71.43 |
Standard deviation | 1.49 | 2.09 | 1.54 | 1.44 | 2.99 | 3.44 | |
NB model | Average | 66.07 | 54.34 | 70.73 | 61.39 | 33.01 | 73.37 |
Standard deviation | 5.01 | 7.85 | 5.12 | 6.87 | 9.88 | 4.85 | |
KNN model | Average | 82.21 | 92.69 | 76.66 | 83.91 | 65.88 | 92.08 |
Standard deviation | 2.78 | 2.16 | 2.73 | 2.39 | 5.43 | 1.63 | |
SVM model | Average | 84.97 | 86.76 | 83.86 | 85.25 | 70.04 | 94.37 |
Standard deviation | 1.73 | 1.65 | 3.05 | 1.44 | 3.37 | 1.27 | |
DT model | Average | 85.24 | 86.90 | 84.26 | 85.46 | 70.71 | 91.01 |
Standard deviation | 0.66 | 4.20 | 2.52 | 0.95 | 1.41 | 1.25 | |
Adboost model | Average | 86.41 | 94.62 | 81.27 | 87.41 | 73.93 | 92.26 |
Standard deviation | 2.13 | 4.35 | 1.11 | 2.25 | 4.77 | 1.76 | |
RF model | Average | 87.24 | 91.45 | 84.37 | 87.74 | 74.80 | 94.30 |
Standard deviation | 1.40 | 2.87 | 1.28 | 1.46 | 2.95 | 0.84 |
Model . | Testing set . | Acc. (%) . | Sen. (%) . | Pre. (%) . | F1 (%) . | MCC (%) . | AUC (%) . |
---|---|---|---|---|---|---|---|
Our model | 1 | 91.03 | 95.86 | 87.42 | 91.45 | 82.45 | 97.79 |
2 | 88.62 | 94.48 | 84.57 | 89.25 | 77.78 | 94.89 | |
3 | 88.97 | 95.86 | 84.24 | 89.68 | 78.68 | 95.65 | |
4 | 84.83 | 91.72 | 80.61 | 85.81 | 70.33 | 92.67 | |
5 | 90.00 | 92.41 | 88.16 | 90.24 | 80.09 | 94.79 | |
Average | 88.69 | 94.07 | 85.00 | 89.28 | 77.87 | 95.16 | |
Standard deviation | 2.36 | 1.93 | 3.00 | 2.11 | 4.57 | 1.84 | |
Cosine similarity model | 1 | 81.38 | 86.21 | 78.62 | 82.24 | 63.05 | 91.52 |
2 | 85.52 | 88.28 | 83.66 | 85.91 | 71.14 | 94.11 | |
3 | 84.48 | 86.21 | 83.33 | 84.75 | 69.01 | 93.28 | |
4 | 82.76 | 82.07 | 83.22 | 82.64 | 65.52 | 89.65 | |
5 | 86.55 | 86.21 | 86.81 | 86.51 | 73.11 | 92.22 | |
Average | 84.14 | 85.79 | 83.13 | 84.41 | 68.37 | 92.16 | |
Standard deviation | 2.08 | 2.27 | 2.92 | 1.91 | 4.09 | 1.72 | |
DO-based disease semantic similarity model | 1 | 87.59 | 95.86 | 82.25 | 88.54 | 76.22 | 93.72 |
2 | 87.93 | 95.86 | 82.74 | 88.82 | 76.83 | 94.09 | |
3 | 86.55 | 97.93 | 79.78 | 87.93 | 75.07 | 95.01 | |
4 | 83.79 | 87.59 | 81.41 | 84.39 | 67.78 | 91.60 | |
5 | 83.79 | 88.28 | 81.01 | 84.49 | 67.86 | 91.45 | |
Average | 85.93 | 93.10 | 81.44 | 86.83 | 72.75 | 93.17 | |
Standard deviation | 2.02 | 4.80 | 1.15 | 2.21 | 4.55 | 1.58 | |
DA model | Average | 67.66 | 73.10 | 65.93 | 69.32 | 35.53 | 70.61 |
Standard deviation | 1.85 | 2.34 | 1.71 | 1.80 | 3.71 | 3.30 | |
LR model | Average | 69.38 | 74.34 | 67.65 | 70.82 | 38.97 | 71.43 |
Standard deviation | 1.49 | 2.09 | 1.54 | 1.44 | 2.99 | 3.44 | |
NB model | Average | 66.07 | 54.34 | 70.73 | 61.39 | 33.01 | 73.37 |
Standard deviation | 5.01 | 7.85 | 5.12 | 6.87 | 9.88 | 4.85 | |
KNN model | Average | 82.21 | 92.69 | 76.66 | 83.91 | 65.88 | 92.08 |
Standard deviation | 2.78 | 2.16 | 2.73 | 2.39 | 5.43 | 1.63 | |
SVM model | Average | 84.97 | 86.76 | 83.86 | 85.25 | 70.04 | 94.37 |
Standard deviation | 1.73 | 1.65 | 3.05 | 1.44 | 3.37 | 1.27 | |
DT model | Average | 85.24 | 86.90 | 84.26 | 85.46 | 70.71 | 91.01 |
Standard deviation | 0.66 | 4.20 | 2.52 | 0.95 | 1.41 | 1.25 | |
Adboost model | Average | 86.41 | 94.62 | 81.27 | 87.41 | 73.93 | 92.26 |
Standard deviation | 2.13 | 4.35 | 1.11 | 2.25 | 4.77 | 1.76 | |
RF model | Average | 87.24 | 91.45 | 84.37 | 87.74 | 74.80 | 94.30 |
Standard deviation | 1.40 | 2.87 | 1.28 | 1.46 | 2.95 | 0.84 |
Model . | Testing set . | Acc. (%) . | Sen. (%) . | Pre. (%) . | F1 (%) . | MCC (%) . | AUC (%) . |
---|---|---|---|---|---|---|---|
Our model | 1 | 91.03 | 95.86 | 87.42 | 91.45 | 82.45 | 97.79 |
2 | 88.62 | 94.48 | 84.57 | 89.25 | 77.78 | 94.89 | |
3 | 88.97 | 95.86 | 84.24 | 89.68 | 78.68 | 95.65 | |
4 | 84.83 | 91.72 | 80.61 | 85.81 | 70.33 | 92.67 | |
5 | 90.00 | 92.41 | 88.16 | 90.24 | 80.09 | 94.79 | |
Average | 88.69 | 94.07 | 85.00 | 89.28 | 77.87 | 95.16 | |
Standard deviation | 2.36 | 1.93 | 3.00 | 2.11 | 4.57 | 1.84 | |
Cosine similarity model | 1 | 81.38 | 86.21 | 78.62 | 82.24 | 63.05 | 91.52 |
2 | 85.52 | 88.28 | 83.66 | 85.91 | 71.14 | 94.11 | |
3 | 84.48 | 86.21 | 83.33 | 84.75 | 69.01 | 93.28 | |
4 | 82.76 | 82.07 | 83.22 | 82.64 | 65.52 | 89.65 | |
5 | 86.55 | 86.21 | 86.81 | 86.51 | 73.11 | 92.22 | |
Average | 84.14 | 85.79 | 83.13 | 84.41 | 68.37 | 92.16 | |
Standard deviation | 2.08 | 2.27 | 2.92 | 1.91 | 4.09 | 1.72 | |
DO-based disease semantic similarity model | 1 | 87.59 | 95.86 | 82.25 | 88.54 | 76.22 | 93.72 |
2 | 87.93 | 95.86 | 82.74 | 88.82 | 76.83 | 94.09 | |
3 | 86.55 | 97.93 | 79.78 | 87.93 | 75.07 | 95.01 | |
4 | 83.79 | 87.59 | 81.41 | 84.39 | 67.78 | 91.60 | |
5 | 83.79 | 88.28 | 81.01 | 84.49 | 67.86 | 91.45 | |
Average | 85.93 | 93.10 | 81.44 | 86.83 | 72.75 | 93.17 | |
Standard deviation | 2.02 | 4.80 | 1.15 | 2.21 | 4.55 | 1.58 | |
DA model | Average | 67.66 | 73.10 | 65.93 | 69.32 | 35.53 | 70.61 |
Standard deviation | 1.85 | 2.34 | 1.71 | 1.80 | 3.71 | 3.30 | |
LR model | Average | 69.38 | 74.34 | 67.65 | 70.82 | 38.97 | 71.43 |
Standard deviation | 1.49 | 2.09 | 1.54 | 1.44 | 2.99 | 3.44 | |
NB model | Average | 66.07 | 54.34 | 70.73 | 61.39 | 33.01 | 73.37 |
Standard deviation | 5.01 | 7.85 | 5.12 | 6.87 | 9.88 | 4.85 | |
KNN model | Average | 82.21 | 92.69 | 76.66 | 83.91 | 65.88 | 92.08 |
Standard deviation | 2.78 | 2.16 | 2.73 | 2.39 | 5.43 | 1.63 | |
SVM model | Average | 84.97 | 86.76 | 83.86 | 85.25 | 70.04 | 94.37 |
Standard deviation | 1.73 | 1.65 | 3.05 | 1.44 | 3.37 | 1.27 | |
DT model | Average | 85.24 | 86.90 | 84.26 | 85.46 | 70.71 | 91.01 |
Standard deviation | 0.66 | 4.20 | 2.52 | 0.95 | 1.41 | 1.25 | |
Adboost model | Average | 86.41 | 94.62 | 81.27 | 87.41 | 73.93 | 92.26 |
Standard deviation | 2.13 | 4.35 | 1.11 | 2.25 | 4.77 | 1.76 | |
RF model | Average | 87.24 | 91.45 | 84.37 | 87.74 | 74.80 | 94.30 |
Standard deviation | 1.40 | 2.87 | 1.28 | 1.46 | 2.95 | 0.84 |

ROC curves of 5-fold cross-validation achieved by MNMDCDA on CircR2Disease dataset.
Comparison with cosine similarity model
In the MNMDCDA model, we used GIPK similarity to denote the correlation between circRNA and disease. Therefore, to verify whether GIPK similarity is beneficial to the prediction performance of the proposed model, we compared it with cosine similarity. To be fair, we only used cosine similarity instead of GIPK similarity, and the other parts of the model remain unchanged. The results are presented in Table 2. As shown in Table 2, the average values of Acc., Sen., Pre., F1, MCC and AUC obtained based on the cosine similarity model were 4.55%, 8.28%, 1.87%, 4.87%, 9.50% and 3.00% less than the MNMDCDA model, respectively. Figure 4 shows the ROC curves generated by the cosine similarity model on the CircR2Disease dataset. Figure 5 visualizes the experimental results of the cosine similarity model and the proposed model on the CircR2Disease dataset. From these results, it can be seen that the prediction performance of the MNMDCDA model is superior to that of the cosine similarity-based model on the same dataset.

ROC curves of 5-fold cross-validation achieved by cosine similarity model on CircR2Disease dataset.

Comparison of the proposed different combinatorial models on the CircR2Disease dataset.
Comparison with DO-based disease semantic similarity model
In the experiment, we used MeSH-based disease semantic similarity to represent the correlation between two diseases. Therefore, to verify whether the MeSH-based disease semantic similarity is beneficial to the prediction performance of the proposed model, we compared it with the DO-based disease semantic similarity. Similarly, we perform the same 5-fold cross-validation experiment on the CircR2Disease dataset, and the results are shown in Table 2. As shown in Table 2, the average values of Acc., Sen., Pre., F1, MCC and AUC obtained from the DO-based disease semantic similarity model were 2.76%, 0.97%, 3.56%, 2.45%, 5.12% and 1.99% less than the MNMDCDA model, respectively. Figure 5 visualizes the experimental results of the DO-based disease semantic similarity model and the proposed model on the CircR2Disease dataset. Figure 6 shows the ROC curves generated by the DO-based disease semantic similarity model on the CircR2Disease dataset.

ROC curves of 5-fold cross-validation achieved by DO-based disease semantic similarity model on CircR2Disease dataset.
Comparison of various classifier models
To evaluate the impact of the DNN classifier on the overall performance of the MNMDCDA model, we compared eight different computational models, including discriminant analysis (DA), logistic regression (LR), naive Bayes (NB), K-nearest neighbor (KNN), SVM, Decision tree (DT), Adboost and Random Forest (RF). Table 2 shows the average results of the 5-fold cross-validation obtained by these models on the CircR2Disease dataset. As can be seen from Table 2, the highest average accuracy of the eight models is 87.24%, which is significantly lower than the proposed MNMDCDA model with an average accuracy of 88.69%. Figure 7 visualizes the experimental results of different classifier models on the CircR2Disease dataset. The results of this experiment further suggest that the use of DNN classifier in the MNMDCDA model can not only accurately determine whether circRNAs are associated with diseases but also contributes to the improvement of model prediction performance.

Comparison of various classifier models on the CircR2Disease dataset.
Performance on independent dataset
Although the MNMDCDA model achieved good prediction performance on the CircR2Disease dataset, we also need to test its predictive ability on other independent datasets. In this paper, CircAtlas v2.0 [35], Circ2Disease [36] and CircRNADisease [37] are treated as independent datasets to examine the generalization performance of the model. The results are summarized in Table 3.
Results of 5-fold cross-validation achieved by the proposed model on three other independent datasets
Independent datasets . | Testing set . | Acc. (%) . | Sen. (%) . | Pre. (%) . | F1 (%) . | MCC (%) . | AUC (%) . |
---|---|---|---|---|---|---|---|
CircAtlas v2.0 | 1 | 84.37 | 95.88 | 77.99 | 86.02 | 70.61 | 92.26 |
2 | 87.61 | 94.67 | 82.90 | 88.40 | 76.00 | 93.74 | |
3 | 90.83 | 92.90 | 89.20 | 91.01 | 81.73 | 96.92 | |
4 | 86.98 | 98.22 | 80.19 | 88.30 | 75.91 | 96.41 | |
5 | 87.57 | 95.86 | 82.23 | 88.52 | 76.20 | 93.34 | |
Average | 87.47 ± 2.30 | 95.51 ± 1.95 | 82.50 ± 4.21 | 88.45 ± 1.77 | 76.09 ± 3.93 | 94.53 ± 2.03 | |
Circ2Disease | 1 | 83.33 | 88.89 | 80.00 | 84.21 | 67.08 | 90.12 |
2 | 77.78 | 85.19 | 74.19 | 79.31 | 56.18 | 88.99 | |
3 | 86.11 | 83.33 | 88.24 | 85.71 | 72.33 | 93.52 | |
4 | 85.19 | 94.44 | 79.69 | 86.44 | 71.61 | 93.86 | |
5 | 74.07 | 74.07 | 74.07 | 74.07 | 48.15 | 82.51 | |
Average | 81.30 ± 5.17 | 85.19 ± 7.52 | 79.24 ± 5.78 | 81.95 ± 5.21 | 63.07 ± 10.55 | 89.80 ± 4.59 | |
CircRNADisease | 1 | 80.71 | 87.14 | 77.22 | 81.88 | 61.94 | 92.20 |
2 | 87.14 | 97.14 | 80.95 | 88.31 | 75.82 | 90.41 | |
3 | 86.43 | 97.14 | 80.00 | 87.74 | 74.59 | 92.47 | |
4 | 80.71 | 84.29 | 78.67 | 81.38 | 61.59 | 92.53 | |
5 | 85.71 | 98.57 | 78.41 | 87.34 | 73.91 | 91.53 | |
Average | 84.14 ± 3.17 | 92.86 ± 6.62 | 79.05 ± 1.45 | 85.33 ± 3.40 | 69.57 ± 7.16 | 91.83 ± 0.89 |
Independent datasets . | Testing set . | Acc. (%) . | Sen. (%) . | Pre. (%) . | F1 (%) . | MCC (%) . | AUC (%) . |
---|---|---|---|---|---|---|---|
CircAtlas v2.0 | 1 | 84.37 | 95.88 | 77.99 | 86.02 | 70.61 | 92.26 |
2 | 87.61 | 94.67 | 82.90 | 88.40 | 76.00 | 93.74 | |
3 | 90.83 | 92.90 | 89.20 | 91.01 | 81.73 | 96.92 | |
4 | 86.98 | 98.22 | 80.19 | 88.30 | 75.91 | 96.41 | |
5 | 87.57 | 95.86 | 82.23 | 88.52 | 76.20 | 93.34 | |
Average | 87.47 ± 2.30 | 95.51 ± 1.95 | 82.50 ± 4.21 | 88.45 ± 1.77 | 76.09 ± 3.93 | 94.53 ± 2.03 | |
Circ2Disease | 1 | 83.33 | 88.89 | 80.00 | 84.21 | 67.08 | 90.12 |
2 | 77.78 | 85.19 | 74.19 | 79.31 | 56.18 | 88.99 | |
3 | 86.11 | 83.33 | 88.24 | 85.71 | 72.33 | 93.52 | |
4 | 85.19 | 94.44 | 79.69 | 86.44 | 71.61 | 93.86 | |
5 | 74.07 | 74.07 | 74.07 | 74.07 | 48.15 | 82.51 | |
Average | 81.30 ± 5.17 | 85.19 ± 7.52 | 79.24 ± 5.78 | 81.95 ± 5.21 | 63.07 ± 10.55 | 89.80 ± 4.59 | |
CircRNADisease | 1 | 80.71 | 87.14 | 77.22 | 81.88 | 61.94 | 92.20 |
2 | 87.14 | 97.14 | 80.95 | 88.31 | 75.82 | 90.41 | |
3 | 86.43 | 97.14 | 80.00 | 87.74 | 74.59 | 92.47 | |
4 | 80.71 | 84.29 | 78.67 | 81.38 | 61.59 | 92.53 | |
5 | 85.71 | 98.57 | 78.41 | 87.34 | 73.91 | 91.53 | |
Average | 84.14 ± 3.17 | 92.86 ± 6.62 | 79.05 ± 1.45 | 85.33 ± 3.40 | 69.57 ± 7.16 | 91.83 ± 0.89 |
Results of 5-fold cross-validation achieved by the proposed model on three other independent datasets
Independent datasets . | Testing set . | Acc. (%) . | Sen. (%) . | Pre. (%) . | F1 (%) . | MCC (%) . | AUC (%) . |
---|---|---|---|---|---|---|---|
CircAtlas v2.0 | 1 | 84.37 | 95.88 | 77.99 | 86.02 | 70.61 | 92.26 |
2 | 87.61 | 94.67 | 82.90 | 88.40 | 76.00 | 93.74 | |
3 | 90.83 | 92.90 | 89.20 | 91.01 | 81.73 | 96.92 | |
4 | 86.98 | 98.22 | 80.19 | 88.30 | 75.91 | 96.41 | |
5 | 87.57 | 95.86 | 82.23 | 88.52 | 76.20 | 93.34 | |
Average | 87.47 ± 2.30 | 95.51 ± 1.95 | 82.50 ± 4.21 | 88.45 ± 1.77 | 76.09 ± 3.93 | 94.53 ± 2.03 | |
Circ2Disease | 1 | 83.33 | 88.89 | 80.00 | 84.21 | 67.08 | 90.12 |
2 | 77.78 | 85.19 | 74.19 | 79.31 | 56.18 | 88.99 | |
3 | 86.11 | 83.33 | 88.24 | 85.71 | 72.33 | 93.52 | |
4 | 85.19 | 94.44 | 79.69 | 86.44 | 71.61 | 93.86 | |
5 | 74.07 | 74.07 | 74.07 | 74.07 | 48.15 | 82.51 | |
Average | 81.30 ± 5.17 | 85.19 ± 7.52 | 79.24 ± 5.78 | 81.95 ± 5.21 | 63.07 ± 10.55 | 89.80 ± 4.59 | |
CircRNADisease | 1 | 80.71 | 87.14 | 77.22 | 81.88 | 61.94 | 92.20 |
2 | 87.14 | 97.14 | 80.95 | 88.31 | 75.82 | 90.41 | |
3 | 86.43 | 97.14 | 80.00 | 87.74 | 74.59 | 92.47 | |
4 | 80.71 | 84.29 | 78.67 | 81.38 | 61.59 | 92.53 | |
5 | 85.71 | 98.57 | 78.41 | 87.34 | 73.91 | 91.53 | |
Average | 84.14 ± 3.17 | 92.86 ± 6.62 | 79.05 ± 1.45 | 85.33 ± 3.40 | 69.57 ± 7.16 | 91.83 ± 0.89 |
Independent datasets . | Testing set . | Acc. (%) . | Sen. (%) . | Pre. (%) . | F1 (%) . | MCC (%) . | AUC (%) . |
---|---|---|---|---|---|---|---|
CircAtlas v2.0 | 1 | 84.37 | 95.88 | 77.99 | 86.02 | 70.61 | 92.26 |
2 | 87.61 | 94.67 | 82.90 | 88.40 | 76.00 | 93.74 | |
3 | 90.83 | 92.90 | 89.20 | 91.01 | 81.73 | 96.92 | |
4 | 86.98 | 98.22 | 80.19 | 88.30 | 75.91 | 96.41 | |
5 | 87.57 | 95.86 | 82.23 | 88.52 | 76.20 | 93.34 | |
Average | 87.47 ± 2.30 | 95.51 ± 1.95 | 82.50 ± 4.21 | 88.45 ± 1.77 | 76.09 ± 3.93 | 94.53 ± 2.03 | |
Circ2Disease | 1 | 83.33 | 88.89 | 80.00 | 84.21 | 67.08 | 90.12 |
2 | 77.78 | 85.19 | 74.19 | 79.31 | 56.18 | 88.99 | |
3 | 86.11 | 83.33 | 88.24 | 85.71 | 72.33 | 93.52 | |
4 | 85.19 | 94.44 | 79.69 | 86.44 | 71.61 | 93.86 | |
5 | 74.07 | 74.07 | 74.07 | 74.07 | 48.15 | 82.51 | |
Average | 81.30 ± 5.17 | 85.19 ± 7.52 | 79.24 ± 5.78 | 81.95 ± 5.21 | 63.07 ± 10.55 | 89.80 ± 4.59 | |
CircRNADisease | 1 | 80.71 | 87.14 | 77.22 | 81.88 | 61.94 | 92.20 |
2 | 87.14 | 97.14 | 80.95 | 88.31 | 75.82 | 90.41 | |
3 | 86.43 | 97.14 | 80.00 | 87.74 | 74.59 | 92.47 | |
4 | 80.71 | 84.29 | 78.67 | 81.38 | 61.59 | 92.53 | |
5 | 85.71 | 98.57 | 78.41 | 87.34 | 73.91 | 91.53 | |
Average | 84.14 ± 3.17 | 92.86 ± 6.62 | 79.05 ± 1.45 | 85.33 ± 3.40 | 69.57 ± 7.16 | 91.83 ± 0.89 |
From Table 3, the average AUC values of the proposed model on three independent datasets were 94.53%, 89.80% and 91.83%, respectively. Therefore, this model can be used to explore organisms for which circRNA–disease association data are not yet available and to provide appropriate experience for further discovering new candidate diseases associated with circRNAs. Figure 8 gives the histogram of the experimental results of the proposed model on the independent dataset.

Comparison with other existing methods
To further evaluate the prediction performance of the MNMDCDA model, we compare it with these six popular methods using the same dataset, including MGRCDA [34], NMFCDA [38], SGANRDA [39], iCircDA-MF [40], GCNCDA [41] and PWCDA [42]. To be fair, we use the AUC value that can fully reflect the stability of the model as a comparison index between different methods. Table 4 summarizes the AUC values obtained by these models on CircR2Disease. Figure 9 shows the line graph of the AUC scores obtained on the CircR2Disease dataset by different computational methods. These comparative results demonstrate that the MNMDCDA model using the high-order GCN framework combined with multi-source similarity networks has the best performance and is a promising approach.
Methods . | MNMDCDA . | MGRCDA . | NMFCDA . | SGANRDA . | iCircDA-MF . | GCNCDA . | PWCDA . |
---|---|---|---|---|---|---|---|
AUC | 0.9516 | 0.9298 | 0.9278 | 0.9215 | 0.9178 | 0.9090 | 0.8900 |
Methods . | MNMDCDA . | MGRCDA . | NMFCDA . | SGANRDA . | iCircDA-MF . | GCNCDA . | PWCDA . |
---|---|---|---|---|---|---|---|
AUC | 0.9516 | 0.9298 | 0.9278 | 0.9215 | 0.9178 | 0.9090 | 0.8900 |
Methods . | MNMDCDA . | MGRCDA . | NMFCDA . | SGANRDA . | iCircDA-MF . | GCNCDA . | PWCDA . |
---|---|---|---|---|---|---|---|
AUC | 0.9516 | 0.9298 | 0.9278 | 0.9215 | 0.9178 | 0.9090 | 0.8900 |
Methods . | MNMDCDA . | MGRCDA . | NMFCDA . | SGANRDA . | iCircDA-MF . | GCNCDA . | PWCDA . |
---|---|---|---|---|---|---|---|
AUC | 0.9516 | 0.9298 | 0.9278 | 0.9215 | 0.9178 | 0.9090 | 0.8900 |

Comparison of the AUC values of existing computational methods on the CircR2Disease dataset.
Case studies
To further investigate the effectiveness of MNMDCDA in screening unknown disease candidate circRNAs, we conducted the case studies experiment on the CircR2Disease dataset. After model prediction, the experimental results are shown in Table 5, from which we can see that 25 of the top 30 circRNA–disease pairs have been confirmed in the recently published literature. In general, MNMDCDA has an excellent ability to predict potential disease-associated circRNAs, and these top candidates will likely be selected for further biological studies to reduce the range of wet-lab experimental searches.
Rank . | circRNA . | Disease . | Evidence (PMID/ORCID) . | Year . |
---|---|---|---|---|
1 | hsa_circ_001569 | Breast cancer | 31104012 | 2019 |
2 | circFAT1 | Breast cancer | 34288822 | 2021 |
3 | hsa_circ_0000190 | Breast cancer | 10.1093/annonc/mdy428.010 | 2018 |
4 | hsa_circ_001763 | Breast cancer | 30509108 | 2019 |
5 | ciRS-7 | Breast cancer | 33390857 | 2021 |
6 | hsa_circ_0083964 | Osteoarthritis | Unconfirmed | N/A |
7 | hsa_circ_001988 | Gastric cancer | 32592202 | 2021 |
8 | hsa_circ_0001724 | Gastric cancer | 10.1016/j.genrep.2021.101226 | 2021 |
9 | circFAT1 | Gastric cancer | 30419346 | 2019 |
10 | circRHOBTB3 | Gastric cancer | 31928527 | 2020 |
11 | hsa_circ_0023404 | Rheumatoid arthritis | Unconfirmed | N/A |
12 | hsa_circ_0000520 | Breast cancer | 10.21203/rs.3.rs-1023577/v1 | 2021 |
13 | circBRAF | Glioma | 33650075 | 2021 |
14 | hsa_circ_005239 | Breast cancer | 29037220 | 2017 |
15 | Circ_HIPK3 | Glioblastoma | 34198978 | 2021 |
16 | Circ_SMARCA5 | Glioblastoma | 30736462 | 2019 |
17 | hsa_circ_0001566 | Glioma | Unconfirmed | N/A |
18 | circHIPK3 | Pancreatic cancer | 32104074 | 2020 |
19 | circRTN4 | Pancreatic cancer | 34983537 | 2022 |
20 | circRHOBTB3 | Pancreatic cancer | 34416910 | 2021 |
21 | hsa_circ_0089974 | Gastric cancer | Unconfirmed | N/A |
22 | circHIPK3 | Lung cancer | 31232177 | 2020 |
23 | hsa_circ_0001649 | Pancreatic cancer | 31138014 | 2019 |
24 | hsa_circ_0005015 | Diabetic retinopathy | 29288268 | 2017 |
25 | hsa_circRNA_100750 | Diabetes retinopathy | 28817829 | 2017 |
26 | hsa_circ_0005927 | Colorectal cancer | 33312376 | 2020 |
27 | hsa_circ_0081108 | Diabetic retinopathy | 32497630 | 2020 |
28 | hsa_circ_0045510 | Osteosarcoma | Unconfirmed | N/A |
29 | circHIAT1 | Hepatocellular carcinoma | 31108351 | 2019 |
30 | circFAT1 | Hepatocellular carcinoma | 33179443 | 2020 |
Rank . | circRNA . | Disease . | Evidence (PMID/ORCID) . | Year . |
---|---|---|---|---|
1 | hsa_circ_001569 | Breast cancer | 31104012 | 2019 |
2 | circFAT1 | Breast cancer | 34288822 | 2021 |
3 | hsa_circ_0000190 | Breast cancer | 10.1093/annonc/mdy428.010 | 2018 |
4 | hsa_circ_001763 | Breast cancer | 30509108 | 2019 |
5 | ciRS-7 | Breast cancer | 33390857 | 2021 |
6 | hsa_circ_0083964 | Osteoarthritis | Unconfirmed | N/A |
7 | hsa_circ_001988 | Gastric cancer | 32592202 | 2021 |
8 | hsa_circ_0001724 | Gastric cancer | 10.1016/j.genrep.2021.101226 | 2021 |
9 | circFAT1 | Gastric cancer | 30419346 | 2019 |
10 | circRHOBTB3 | Gastric cancer | 31928527 | 2020 |
11 | hsa_circ_0023404 | Rheumatoid arthritis | Unconfirmed | N/A |
12 | hsa_circ_0000520 | Breast cancer | 10.21203/rs.3.rs-1023577/v1 | 2021 |
13 | circBRAF | Glioma | 33650075 | 2021 |
14 | hsa_circ_005239 | Breast cancer | 29037220 | 2017 |
15 | Circ_HIPK3 | Glioblastoma | 34198978 | 2021 |
16 | Circ_SMARCA5 | Glioblastoma | 30736462 | 2019 |
17 | hsa_circ_0001566 | Glioma | Unconfirmed | N/A |
18 | circHIPK3 | Pancreatic cancer | 32104074 | 2020 |
19 | circRTN4 | Pancreatic cancer | 34983537 | 2022 |
20 | circRHOBTB3 | Pancreatic cancer | 34416910 | 2021 |
21 | hsa_circ_0089974 | Gastric cancer | Unconfirmed | N/A |
22 | circHIPK3 | Lung cancer | 31232177 | 2020 |
23 | hsa_circ_0001649 | Pancreatic cancer | 31138014 | 2019 |
24 | hsa_circ_0005015 | Diabetic retinopathy | 29288268 | 2017 |
25 | hsa_circRNA_100750 | Diabetes retinopathy | 28817829 | 2017 |
26 | hsa_circ_0005927 | Colorectal cancer | 33312376 | 2020 |
27 | hsa_circ_0081108 | Diabetic retinopathy | 32497630 | 2020 |
28 | hsa_circ_0045510 | Osteosarcoma | Unconfirmed | N/A |
29 | circHIAT1 | Hepatocellular carcinoma | 31108351 | 2019 |
30 | circFAT1 | Hepatocellular carcinoma | 33179443 | 2020 |
Rank . | circRNA . | Disease . | Evidence (PMID/ORCID) . | Year . |
---|---|---|---|---|
1 | hsa_circ_001569 | Breast cancer | 31104012 | 2019 |
2 | circFAT1 | Breast cancer | 34288822 | 2021 |
3 | hsa_circ_0000190 | Breast cancer | 10.1093/annonc/mdy428.010 | 2018 |
4 | hsa_circ_001763 | Breast cancer | 30509108 | 2019 |
5 | ciRS-7 | Breast cancer | 33390857 | 2021 |
6 | hsa_circ_0083964 | Osteoarthritis | Unconfirmed | N/A |
7 | hsa_circ_001988 | Gastric cancer | 32592202 | 2021 |
8 | hsa_circ_0001724 | Gastric cancer | 10.1016/j.genrep.2021.101226 | 2021 |
9 | circFAT1 | Gastric cancer | 30419346 | 2019 |
10 | circRHOBTB3 | Gastric cancer | 31928527 | 2020 |
11 | hsa_circ_0023404 | Rheumatoid arthritis | Unconfirmed | N/A |
12 | hsa_circ_0000520 | Breast cancer | 10.21203/rs.3.rs-1023577/v1 | 2021 |
13 | circBRAF | Glioma | 33650075 | 2021 |
14 | hsa_circ_005239 | Breast cancer | 29037220 | 2017 |
15 | Circ_HIPK3 | Glioblastoma | 34198978 | 2021 |
16 | Circ_SMARCA5 | Glioblastoma | 30736462 | 2019 |
17 | hsa_circ_0001566 | Glioma | Unconfirmed | N/A |
18 | circHIPK3 | Pancreatic cancer | 32104074 | 2020 |
19 | circRTN4 | Pancreatic cancer | 34983537 | 2022 |
20 | circRHOBTB3 | Pancreatic cancer | 34416910 | 2021 |
21 | hsa_circ_0089974 | Gastric cancer | Unconfirmed | N/A |
22 | circHIPK3 | Lung cancer | 31232177 | 2020 |
23 | hsa_circ_0001649 | Pancreatic cancer | 31138014 | 2019 |
24 | hsa_circ_0005015 | Diabetic retinopathy | 29288268 | 2017 |
25 | hsa_circRNA_100750 | Diabetes retinopathy | 28817829 | 2017 |
26 | hsa_circ_0005927 | Colorectal cancer | 33312376 | 2020 |
27 | hsa_circ_0081108 | Diabetic retinopathy | 32497630 | 2020 |
28 | hsa_circ_0045510 | Osteosarcoma | Unconfirmed | N/A |
29 | circHIAT1 | Hepatocellular carcinoma | 31108351 | 2019 |
30 | circFAT1 | Hepatocellular carcinoma | 33179443 | 2020 |
Rank . | circRNA . | Disease . | Evidence (PMID/ORCID) . | Year . |
---|---|---|---|---|
1 | hsa_circ_001569 | Breast cancer | 31104012 | 2019 |
2 | circFAT1 | Breast cancer | 34288822 | 2021 |
3 | hsa_circ_0000190 | Breast cancer | 10.1093/annonc/mdy428.010 | 2018 |
4 | hsa_circ_001763 | Breast cancer | 30509108 | 2019 |
5 | ciRS-7 | Breast cancer | 33390857 | 2021 |
6 | hsa_circ_0083964 | Osteoarthritis | Unconfirmed | N/A |
7 | hsa_circ_001988 | Gastric cancer | 32592202 | 2021 |
8 | hsa_circ_0001724 | Gastric cancer | 10.1016/j.genrep.2021.101226 | 2021 |
9 | circFAT1 | Gastric cancer | 30419346 | 2019 |
10 | circRHOBTB3 | Gastric cancer | 31928527 | 2020 |
11 | hsa_circ_0023404 | Rheumatoid arthritis | Unconfirmed | N/A |
12 | hsa_circ_0000520 | Breast cancer | 10.21203/rs.3.rs-1023577/v1 | 2021 |
13 | circBRAF | Glioma | 33650075 | 2021 |
14 | hsa_circ_005239 | Breast cancer | 29037220 | 2017 |
15 | Circ_HIPK3 | Glioblastoma | 34198978 | 2021 |
16 | Circ_SMARCA5 | Glioblastoma | 30736462 | 2019 |
17 | hsa_circ_0001566 | Glioma | Unconfirmed | N/A |
18 | circHIPK3 | Pancreatic cancer | 32104074 | 2020 |
19 | circRTN4 | Pancreatic cancer | 34983537 | 2022 |
20 | circRHOBTB3 | Pancreatic cancer | 34416910 | 2021 |
21 | hsa_circ_0089974 | Gastric cancer | Unconfirmed | N/A |
22 | circHIPK3 | Lung cancer | 31232177 | 2020 |
23 | hsa_circ_0001649 | Pancreatic cancer | 31138014 | 2019 |
24 | hsa_circ_0005015 | Diabetic retinopathy | 29288268 | 2017 |
25 | hsa_circRNA_100750 | Diabetes retinopathy | 28817829 | 2017 |
26 | hsa_circ_0005927 | Colorectal cancer | 33312376 | 2020 |
27 | hsa_circ_0081108 | Diabetic retinopathy | 32497630 | 2020 |
28 | hsa_circ_0045510 | Osteosarcoma | Unconfirmed | N/A |
29 | circHIAT1 | Hepatocellular carcinoma | 31108351 | 2019 |
30 | circFAT1 | Hepatocellular carcinoma | 33179443 | 2020 |
Conclusion
Identifying the association between circRNAs and diseases can not only provide insight into the pathogenesis of complex diseases but also provide effective ideas and solutions for early prevention, diagnosis and treatment of diseases. In this paper, we propose a novel computational model MNMDCDA combining high-order GCN and DNN, aiming to investigate the potential relationship between circRNAs and diseases. To evaluate the model performance, we performed several ablation experiments on four datasets, including comparison of cosine similarity model, DO-based disease semantic similarity model, different classifier models, comparison of model generalization performance with other existing models. Numerous experimental results suggest that MNMDCDA outperforms other existing computational models and can effectively discriminate new disease-associated circRNAs.
There are three main reasons for the excellent performance of MNMDCDA: (1) MNMDCDA integrates multiple biological attribute information between circRNAs and diseases to form fusion descriptors and to construct multiple multi-source similarity networks. (2) Using the GCN algorithm of deep learning to fully learn the high-order mixed neighborhood embedding representation of circRNAs and diseases. (3) MNMDCDA can effectively predict the potential disease-related circRNAs from the fused features, and it has good generalization performance on three independent datasets.
Integrating the multiple biological attribute information of circRNAs and diseases can comprehensively describe the complex association between circRNAs and diseases from multiple perspectives.
The high-order GCN algorithm of deep learning is used to learn the embedding representations with high-order mixed neighborhood information of circRNAs and diseases from multiple multi-source similarity networks, respectively.
Experimental results on three other benchmark datasets ensure the generalization performance of the MNMDCDA model and provide corresponding theoretical guidance for further wet-lab approaches.
Extensive experimental results demonstrate the superior performance of the MNMDCDA model in predicting potential circRNA–disease associations.
Data Availability
The data sets and source code can be freely downloaded from: https://github.com/ly2021010123/MNMDCDA/.
Acknowledgements
The authors would like to thank all anonymous reviewers for their constructive advice.
Funding
National Natural Science Foundation of China (61976077, 62076085, 62172355, 62120106008), in part by the Major special projects of the Ministry of Science and Technology (2021ZD0200403), in part by the Qingtan scholar talent project of Zaozhuang University.
Author Biographies
Yang Li is a PhD student in the Key Laboratory of Knowledge Engineering with Big Data in Anhui Province, School of Computer Science and Information Engineering at Hefei University of Technology, Hefei, China. His current research interests include machine learning, data mining and its applications in bioinformatics.
Xue-Gang Hu is a professor of Hefei University of Technology. His research interests include data mining and knowledge engineering.
Lei Wang is a professor of Guangxi Academy of Sciences. His research interests include data mining, machine learning, deep learning, computational biology and bioinformatics.
Pei-Pei Li is an associate professor of Hefei University of Technology. Her research interests include data mining and intelligent computing.
Zhu-Hong You is a professor of Northwestern Polytechnical University. His research interests include neural networks, intelligent information processing, sparse representation and its applications in bioinformatics.