Abstract

Increasing biomedical evidence has proved that the dysregulation of miRNAs is associated with human complex diseases. Identification of disease-related miRNAs is of great importance for disease prevention, diagnosis and remedy. To reduce the time and cost of biomedical experiments, there is a strong incentive to develop efficient computational methods to infer potential miRNA-disease associations. Although many computational approaches have been proposed to address this issue, the prediction accuracy needs to be further improved. In this study, we present a computational framework MKGAT to predict possible associations between miRNAs and diseases through graph attention networks (GATs) using dual Laplacian regularized least squares. We use GATs to learn embeddings of miRNAs and diseases on each layer from initial input features of known miRNA-disease associations, intra-miRNA similarities and intra-disease similarities. We then calculate kernel matrices of miRNAs and diseases based on Gaussian interaction profile (GIP) with the learned embeddings. We further fuse the kernel matrices of each layer and initial similarities with attention mechanism. Dual Laplacian regularized least squares are finally applied for new miRNA-disease association predictions with the fused miRNA and disease kernels. Compared with six state-of-the-art methods by 5-fold cross-validations, our method MKGAT receives the highest AUROC value of 0.9627 and AUPR value of 0.7372. We use MKGAT to predict related miRNAs for three cancers and discover that all the top 50 predicted results in the three diseases are confirmed by existing databases. The excellent performance indicates that MKGAT would be a useful computational tool for revealing disease-related miRNAs.

Introduction

miRNAs are one category of endogenous non-coding RNAs with approximately 22 nucleotides in length. Since the initial discovery in 1993 [1], they have widely been found in plants, animals and viruses. miRNAs function as regulators of gene expression by targeting mRNAs through base pair for cleavage or translational repression [2]. Increasing studies have revealed that miRNAs are involved in various critical biological processes, such as developmental timing [3], cell proliferation [4] and cellular signaling [5]. As such, the dysregulation of miRNAs is associated with many human diseases. For example, four miRNAs (miR-375, miR-10a, miR-122 and miR-423) were found to be significantly upregulated in patients with hepatocellular carcinoma (HCC) [6]. Therefore, detecting disease-related miRNAs is of great importance for disease prevention, diagnosis and remedy.

To date, biomedical scientists have made numerous efforts to investigate and uncover the roles of miRNAs in human diseases. The publications of their findings are scattered in the literature. To facilitate further research, publicly online databases, such as dbDEMC [7], HMDD [8] and miR2Disease [9], have been established to record experimentally verified evidence for associations between miRNAs and diseases through text mining. These manually curated databases offer valuable resources for browsing, searching and integrating detailed information on miRNA–disease relationships. However, statistics reveal that ~30% of human miRNAs and ~80% of diseases have not been reported by experimental investigations [10]. To comprehensively understand the molecular mechanisms of diseases, more disease-associated miRNAs need to be identified. Traditional biological experiments, such as PCR and microarray, are time-consuming and costly; therefore, there is a strong incentive to develop efficient computational methods to infer potential miRNA-disease associations for further biomedical screening.

Researchers have proposed a series of computational methods to predict new miRNA-disease associations till now based on various types of biomedical information, such as miRNA expression profiles and human phenotype ontology [11]. These computational efforts can mainly be divided into three categories: graph theory-based methods, traditional machine learning-based methods and deep learning-based methods.

Graph theory-based approaches to miRNA-disease association inference consider biomedical entities (such as miRNAs and diseases) as vertices and their relationships as edges to construct heterogeneous networks. Graph theories are then applied to rank unknown associations between miRNAs and diseases at the network level. For example, Gu et al. [12] developed a network-consistency-projection-based method NCPMDA to reveal potential associations between miRNAs and diseases. Under the guilt-by-association principle [13], Chen et al. [14] presented a computational method for new miRNA-disease association predictions based on information flow [15] through a triple layer heterogeneous network constructed by experimentally supported miRNA-disease associations, miRNA-long noncoding RNA (lncRNA) interactions and similarity measurement. Chen et al. [16] proposed a model HLPMDA which applied a weighted one-mode projection technique [17] to propagate labels on miRNA-lncRNA-disease heterogeneous networks for possible miRNA-disease association inference. Chen et al. [18] implemented a bipartite network recommendation algorithm BNPMDA to predict associations between miRNAs and diseases. Bias ratings were used in BNPMDA to prioritize unlabeled miRNA-disease associations. Chen et al. [19] introduced a bipartite heterogeneous network link prediction method to predict potential miRNA-disease associations, in which the co-neighbors of the structural characteristics of the miRNA-disease bipartite network were used to represent the probability of associations between diseases and miRNAs. Zhang et al. [20] developed a link inference method to predict miRNA-disease associations. Their method first integrated known miRNA-disease associations, miRNA-miRNA similarity and disease-disease similarity to formulate a bipartite network. A label propagation algorithm was then implemented for scoring. A weighted average strategy was adopted to make final predictions. Li et al. [21] integrated a dual random walk with restart and network projection on a miRNA-disease bipartite network for novel miRNA-disease association predictions. The pre-estimated scores of miRNA–disease associations were calculated by random walks on the bipartite network. Network projection was used to obtain the final prediction scores. Graph theory-based approaches can easily integrate various biomedical features from miRNAs and diseases, and they make full use of topological information of the known miRNA–disease bilayer network for inference. Reliable predictions have been achieved in the above methods; however, there is still room for improving their performance.

Traditional machine learning approaches like SVM and random forest have been widely used in computer vision, text classification and natural language processing. The successful applications have encouraged biomedical scientists to develop effective algorithms for miRNA-disease association predictions. For example, Pasquier et al. [22] presented a vector space model to predict miRNA-disease associations. They first collected and integrated five distinct matrices for representation. The integrated matrix was then decomposed by Singular Value Decomposition (SVD) for dimensionality reduction. The relatedness between a miRNA vector and a disease vector was finally measured by their cosine distance. Chen et al. [23] developed a computational model LRSSLMDA using Laplacian regularized sparse subspace learning to discover disease-related miRNAs. Luo et al. [24] devised a semi-supervised method for predicting missing miRNA-disease associations based on Kronecker regularized least squares and different omics data. Chen et al. [25] proposed a computational model RKNNMDA to predict potential miRNA-disease associations by integrating known miRNA-disease associations and similarity measurement, in which an SVM ranking model and KNN were combined to sort neighbors. Chen et al. [26] developed a novel inductive matrix completion model IMCMDA for miRNA-disease association predictions. Chen et al. [27] proposed a computational model RFMDA based on a random forest for miRNA-disease association predictions. Xuan et al. [28] presented a nonnegative matrix factorization-based method MDAPred to predict candidate miRNAs for diseases by integrating information from miRNA family and cluster, similarity measurement and miRNA-disease associations. Chen et al. [29] proposed a computational framework integrating dimensionality reduction and ensemble learning for miRNA-disease association predictions. Principal components analysis was applied to each base learning for feature dimensionality reduction. An average strategy was used in decision trees to compute final association scores between miRNAs and diseases. Chen et al. [30] presented a canonical correlation analysis-based method to comprehensively predict potentially related diseases for miRNAs of interest. Chen et al. [31] developed a computational model KBMFMDA in which kernel-based nonlinear dimensionality reduction, matrix factorization and binary classification were combined for miRNA-disease association predictions. Ji et al. [32] introduced a network embedding-based model to identify associations between miRNAs and diseases. A heterogeneous network was constructed by combining information from lncRNAs, drugs, proteins, diseases and miRNAs. Network embedding was employed to learn graph representations. Random Forest (RF) classifier was used for predicting potential miRNA-disease associations. Wang et al. [33] presented a prediction method HFHLMDA using high-dimensionality features and hypergraph learning to reveal miRNA-disease associations.

Compared with conventional machine-learning techniques, methods based on deep learning [34] allow a machine to be fed with raw data for representation learning and they have achieved astonishing success in many research fields. More recently, researchers have applied deep learning techniques in revealing miRNA-disease associations. For example, Peng et al. [35] proposed a novel neural network-based learning framework, MDA-CNN, for miRNA-disease association identification. MDA-CNN first captured interaction features between diseases and miRNAs. Then, it used an auto-encoder for feature learning. Finally, a convolutional neural network was applied for predictions. Li et al. [36] presented a novel method NIMCGCN using graph convolutional network and nonlinear inductive matrix completion for predicting miRNA-disease associations. Tang et al. [37] presented a Multi-view Multichannel Attention Graph Convolutional Network (MMGCN) for potential miRNA–disease association inference. Ding et al. [38] proposed a deep learning framework (VGAE-MDA) using variational graph auto-encoders to detect associations between miRNAs and diseases. Wang et al. [39] developed a supervised model SAEMDA using stacked autoencoders to identify miRNA-disease associations. Ding et al. [40] present a computational model VGAMF based on a variational graph auto-encoder with matrix factorization for miRNA-disease association predictions. Xuan et al. [41] proposed a generative adversarial network (GAN)-based model to learn feature information and to output association scores for potential miRNA-disease associations. Jin et al. [42] developed a matrix completion-based method using graph autoencoders (GAE) and a self-attention mechanism for miRNA-disease association predictions. Liu et al. [43] proposed a computational method via deep forest ensemble learning based on an autoencoder to predict miRNA–disease associations. Li et al. [44] presented a deep-learning-based model using a hierarchical graph attention network (GAT) for predicting miRNA-disease associations.

For these machine learning-based methods including deep learning-based methods, the selection of proper parameters for optimal miRNA–disease association predictions is a challenging task. Meanwhile, supervised machine learning models often require negative samples for classification. However, experimentally verified negative miRNA–disease association samples are usually not available due to lack of research interest in life sciences. Negative samples are usually selected randomly from the unlabeled ones, which would affect the final prediction accuracy.

In this study, we propose a computational framework MKGAT which combines GATs [45] and dual Laplacian regularized least squares to predict potential miRNA-disease associations. First, input features are constructed based on known miRNA-disease associations, intra-miRNA similarities and intra-disease similarities. GATs are applied to learn the embeddings of miRNAs and diseases on each layer. Then, we calculate kernel matrices of miRNA and disease embeddings on each layer based on Gaussian interaction profile (GIP) and fuse the kernel matrices of each layer and initial similarities with the attention mechanism. Finally, new miRNA-disease associations are predicted by dual Laplacian regularized least squares in the space of combined miRNA and disease kernels. Five-fold cross-validations show MKGAT achieves an area under the receiver operating characteristic curve (AUROC) of 0.9627 and an area under the precision–recall curve (AUPR) of 0.7372, which is superior to six state-of-the-art prediction methods. Case studies on three cancers show all the top 50 predictions have been supported by established databases, which further demonstrates the effectiveness of MKGAT in detecting disease-related miRNAs.

Materials and methods

Benchmark datasets

Known human miRNA–disease associations

The datasets used in the study are downloaded from reference [26], in which Chen et al. collected 495 miRNAs, 383 diseases and 5430 experimentally validated miRNA-disease associations from HMDD v2.0 [46]. We use Nm and Nd to denote the numbers of miRNAs and diseases, respectively, and A ∈ |${\mathbf{R}}^{N_m\times{N}_d}$|to describe the adjacency matrix of the miRNA-disease associations, where Nm (= 495) represents the number of rows (miRNAs) and Nd (= 383) represents the number of columns (diseases). The value of A(i, j) at the corresponding position of the matrix is set to 1 if miRNA m(i) and disease d(j) have a known association, otherwise 0.

miRNA functional similarity

Wang et al. [47] provided a method for miRNA functional similarity calculation based on the hypothesis that diseases with similar phenotypes are more likely to be associated with functionally similar miRNAs. We download the miRNA functional similarity from their study at https://www.cuilab.cn/files/images/cuilab/misim.zip. We construct a matrix FS to describe the functional similarity between two miRNAs, where FS(mi, mj) denotes the miRNA functional similarity score between miRNA mi and mj.

Disease semantic similarity

We use MeSH terms to describe each disease as a directed acyclic graph (DAG). Specifically, we formulate a disease di using DAG(di) = (di, T(di), E(di)), where T(di) denotes the set of nodes consisting of node di and its ancestor nodes and E(di) indicates the corresponding set of edges containing direct links from parent to child nodes. According to reference [47], we calculate the semantic contribution of disease dt to di as follows:
(1)
where ∆ denotes the semantic contribution decay factor, which is set to 0.5 in our study. The semantic value of disease di can be calculated based on the contribution of the ancestral diseases and disease di itself as follows:
(2)
The semantic similarity DSS1(di, dj) between diseases di and dj is defined by the following equation:
(3)
Meanwhile, Chen et al. [26] proposed to calculate the semantic contribution of disease dt to di in another way
(4)
Correspondingly, the semantic value of disease di and the semantic similarity between disease di and dj can be calculated by equations (5) and (6), respectively,
(5)
(6)
In this study, we calculate the final disease semantic similarity by combining these two disease semantic similarities, and the disease semantic similarity SS(di, dj) between two diseases di and dj can be calculated by Equation (7)
(7)

GIP kernel similarity for diseases and miRNAs

Similar to reference [48], a binary vector IP(m(i)) is constructed to record the associations between miRNA m(i) and all diseases. The corresponding value of IP(m(i)) is set to 1 if there is an experimentally supported association between the two, and 0 otherwise. GIP kernel similarity KM(mi, mj) between miRNA mi and mj can then be calculated as follows:
(8)
where |${r}_m$| represents the normalized kernel bandwidth, and is obtained from Equation (9) as follows:
(9)
where Nm denotes the number of all miRNAs and |${r}_m^{\prime }$| is the original bandwidth, which was set to 1 in our study. Similarly, the GIP kernel similarity KD(di, dj) between two diseases di and dj can be computed by the following two equations:
(10)
(11)
where |${r}_{\mathrm{d}}$| denotes the normalized kernel bandwidth, Nd represents the number of all diseases and |${r}_d^{\prime }$| is set to 1.

Integrated similarities for miRNAs and diseases

Considering not all miRNA pairs have functional similarity, the integrated similarity SM(mi, mj) between miRNA mi and mj is calculated as follows:
(12)
Similarly, the integrated similarity SD(di, dj) between diseases di and dj is calculated as follows:
(13)

This similarity integration strategy has been applied in references [23, 26, 49] for miRNA-disease association inference.

miRNA-disease bipartite network

The miRNA-disease bipartite network G is defined by the adjacency matrix A and its transpose AT
(14)

Method architecture

In this section, we introduce the architecture of MKGAT for miRNA-disease association prediction. The workflow of MKGAT is illustrated in Figure 1.

The workflow of MKGAT.
Figure 1

The workflow of MKGAT.

GATs for feature extraction

As a new neural network architecture, GATs [45] are applied in our study to extract miRNA and disease features. Specifically, given the adjacency matrix G of the bipartite network defined above, the GATs are defined as follows:
(15)
where |${H}^{(l)}$| is the l-layer embedding of nodes, l = 1, …, L, σ(∙) is the nonlinear activation function (ReLU), GAT denotes a single graph attention layer and the whole L-layer GAT architecture is stacked with several graph attention layers. The initial input is a set of node features |$\mathbf{h}=\big\{{\overrightarrow{h}}_1,{\overrightarrow{h}}_2,\dots, {\overrightarrow{h}}_N\big\},{\overrightarrow{h}}_i\in{\mathbf{R}}^F$|⁠, where N is the number of nodes and F is the number of features in each node. The layer produces a new set of node features |${\mathbf{h}}^{\prime }=\big\{{\overrightarrow{h}}_1^{\prime },{\overrightarrow{h}}_2^{\prime },\dots, {\overrightarrow{h}}_N^{\prime}\big\},{\overrightarrow{h}}_i^{\prime}\in{\mathbf{R}}^{F^{\prime }}$|⁠, and we transform the input features into higher level features using a learnable linear transformation by applying the weight matrix W ϵ|${\mathbf{R}}^{F^{\prime}\times F}$| to each node. We then compute the attention coefficients as
(16)
After normalized by the softmax function, we receive the coefficients as
(17)
substituting Equation (16) into Equation (17), the coefficients of the attention mechanism can be represented as follows:
(18)
where α is the attention coefficient, |$\overrightarrow{\mathbf{a}}\in{\mathbf{R}}^{2{F}^{\prime }}$| denotes the parameterized weight vector, LeakyReLU denotes the activation function (with a negative slope of 0.2), T denotes matrix transpose, || is the connection operation and Ni is neighbors of node i. After calculating the normalized attention coefficients, the final output feature of each node can be calculated as
(19)
In our study, we construct the initial embedding H(0) of the first layer as follows:
(20)

Kernel combination

The multiple-layer GAT model calculates embeddings of different layers, and the embedding of each layer represents different graph structure information. We compute multiple kernel matrices by treating the embedding of each layer separately as different feature vectors. We denote the embedding of each layer as Hl =
$$\left[\begin{array}{l}{H}_l^m\\{}{H}_l^d\end{array}\right]\in{\mathbf{R}}^{\big( Nm+ Nd\big)\times kl}$$
, where |${H}_l^m\in{\mathbf{R}}^{Nm\times kl}$| is the embedding of miRNA at layer l, and |${H}_l^d\in{\mathbf{R}}^{N_d\times kl}$| is the embedding of diseases at layer l. We use GIP to calculate the kernel matrix of miRNA and disease embeddings at each layer as follows:
(21)
(22)
where |${H}_l^m(i)$| and |${H}_l^d(i)$| denote the lth row in the lth layer of miRNA and disease embedding, and |${\gamma}_{h_l}$| represents the corresponding width.
Since the contributions of different embeddings are inconsistent in different layers, the kernel calculated by embeddings represents the similarity between nodes of different views. We integrate the matrices and obtain the kernel set in miRNA space |${S}^m=\big\{{\mathbf{K}}_s^m,{\mathbf{K}}_{h_1}^m,\dots, {\mathbf{K}}_{h_l}^m\big\}\ \big({\mathbf{K}}_s^m\ \mathrm{is}\ \mathrm{the}\ \mathrm{initial}\ \mathrm{similarity}\ \mathrm{matrix}\ SM\big)$| and disease space |${S}^d=\big\{{\mathbf{K}}_s^d,{\mathbf{K}}_{h_1}^d,\dots, {\mathbf{K}}_{h_l}^d\big\}\ \big({\mathbf{K}}_s^d\ \mathrm{is}\ \mathrm{the}\ \mathrm{initial}\ \mathrm{similarity}\ \mathrm{matrix}\ SD\big)$|⁠. We use the attention mechanism to combine the multiple kernel matrices (in two spaces, respectively) and the final combined kernels are
(23)
(24)
where |${S}_i^m$| and|${S}_i^d$| are the ith kernel in the set of miRNA and disease kernels, |${a}_i$|and |${b}_i$| are the corresponding attention factors for each kernel and L is the number of layers.

Dual Laplacian regularized least squares for prediction

We apply dual Laplacian regularized least squares [50] with the combined kernel matrices of the two feature spaces to predict potential associations between miRNAs and diseases. The loss function is defined as
(25)
where ||·||F is the Frobenius norm, |${A}_{\mathrm{train}}\in{\mathbf{R}}^{N_m\times{N}_d}$| is the adjacency matrix of miRNA-disease associations in the training set, |${W}_m$| and |${W}_d^T\in{\mathbf{R}}^{N_m\times{N}_d}$| are trainable matrices, Km|$\in{\mathbf{R}}^{N_m\times{N}_m}$| and Kd|$\in{\mathbf{R}}^{N_d\times{N}_d}$| are the combinatorial kernels in the two feature spaces and the parameters λm and λd are the coefficients of the regularization terms. Lm|$\in{\mathbf{R}}^{N_m\times{N}_m}$| and Ld|$\in{\mathbf{R}}^{N_d\times{N}_d}$| are Laplacian regularization matrices defined as follows:
(26)
(27)
where |${\mathbf{D}}_m\big(k,k\big)=\sum \limits_{t=1}^{N_m}{\mathbf{K}}_m\big(k,t\big)$| and |${\mathbf{D}}_d\big(k,k\big)=\sum \limits_{t=1}^{N_d}{\mathbf{K}}_d\big(k,t\big)$| are diagonal matrices. The final miRNA-disease associations are predicted as follows:
(28)

Parameter optimization

We use Adam optimizer [51] to optimize the parameters in GATs and attention factors for kernel fusion. Regarding the parameters of dual Laplacian regularized least squares, we obtain the iterative function by directly computing the partial derivatives. To optimize the parameter Wm, we fix the parameter Wd and treat it as a constant, and then calculate the partial derivatives of the loss function with respect to Wm as follows:
(29)
By letting |$\frac{\partial J}{W_m}=0$|⁠, we can obtain
(30)
Similarly, the partial derivatives of the loss function with respect to Wd are calculated as follows:
(31)
By letting |$\frac{\partial J}{W_d}=0$|⁠, we can obtain
(32)

Results

Experimental setting

We use 5-fold cross-validations (5-CV) to evaluate the performance of the prediction model by randomly dividing all miRNA-disease associations into five roughly equal parts, four of which were used for training and the remaining one for testing. In addition, to comprehensively evaluate the performance, we calculate recall (also known as sensitivity), specificity, accuracy, precision and F1-measure (F1-score) as follows:
(33)
(34)
(35)
(36)
(37)
(38)
(39)
where TP and TN are the numbers of miRNA-disease association pairs and non-association pairs which are correctly identified, respectively; FP and FN are the numbers of miRNA-disease association pairs and non-association pairs which are incorrectly identified, respectively. Receiver operating characteristics (ROC) curve is plotted based on true positive rate (TPR) versus false positive rate (FPR) at different rank cutoffs, and the area under the ROC curve (AUROC) is calculated. Similarly, the precision-recall (PR) curve is plotted based on precision and recall calculated by varying thresholds. The area under the precision-recall curve (AUPR) is computed.

The hyperparameters in MKGAT, such as the number of layers L, the embedding dimension of the L layers (K1, K2, …, KL) and the learning rate lr, are adjusted empirically. We finally set the parameters L = 3, K1 = 128, K2 = 64, K3 = 32, lr = 0.001, λm = 2−3, λd = 2–3.7, |${\gamma}_{h_1}$| = 2−5, |${\gamma}_{h_2}$| = 2−5 and |${\gamma}_{h_3}$| = 2−5 in our study.

Effects of different kernels on prediction performance

In our model MKGAT, GATs are applied to extract the features of miRNAs and diseases. The multiple layer GAT model is to compute embeddings of different layers and we fuse multiple kernel matrices based on the graph embedding information and initial similarities. In this section, we discuss the impact of initial similar measurement, kernel matrices generated by different layers, as well as the combined kernels on association prediction. As three layers of GATs are applied in our study, we use MKGAT-hl (l = 1,2,3) to denote that the MKGAT model uses the kernel matrix obtained in each of the three layers. In addition, we use only miRNA similarity matrix SM and disease similarity matrix SD (denoted as MKGAT-sm) in the model. The ROC and PR curves based on 5-fold cross-validations are illustrated in Figures 2 and 3, respectively. The results are shown in Table 1.

ROC curves of MKGAT by ablation and 5-fold cross-validation tests.
Figure 2

ROC curves of MKGAT by ablation and 5-fold cross-validation tests.

PR curves of MKGAT by ablation and 5-fold cross-validation tests.
Figure 3

PR curves of MKGAT by ablation and 5-fold cross-validation tests.

Table 1

Performance of MKGAT based on different kernels

ModelAUROCAUPRF1-ScoreACCRECALLSPECPRE
MKGAT-h10.89940.43460.45640.96730.47750.981680.4394
MKGAT-h20.91320.37050.41330.96290.45540.97790.3816
MKGAT-h30.90020.38240.41330.96520.42870.98110.4024
MKGAT-sm0.94870.63540.61730.97790.62340.98830.6116
MKGAT0.96270.73720.69100.98250.69000.99150.7044
ModelAUROCAUPRF1-ScoreACCRECALLSPECPRE
MKGAT-h10.89940.43460.45640.96730.47750.981680.4394
MKGAT-h20.91320.37050.41330.96290.45540.97790.3816
MKGAT-h30.90020.38240.41330.96520.42870.98110.4024
MKGAT-sm0.94870.63540.61730.97790.62340.98830.6116
MKGAT0.96270.73720.69100.98250.69000.99150.7044
Table 1

Performance of MKGAT based on different kernels

ModelAUROCAUPRF1-ScoreACCRECALLSPECPRE
MKGAT-h10.89940.43460.45640.96730.47750.981680.4394
MKGAT-h20.91320.37050.41330.96290.45540.97790.3816
MKGAT-h30.90020.38240.41330.96520.42870.98110.4024
MKGAT-sm0.94870.63540.61730.97790.62340.98830.6116
MKGAT0.96270.73720.69100.98250.69000.99150.7044
ModelAUROCAUPRF1-ScoreACCRECALLSPECPRE
MKGAT-h10.89940.43460.45640.96730.47750.981680.4394
MKGAT-h20.91320.37050.41330.96290.45540.97790.3816
MKGAT-h30.90020.38240.41330.96520.42870.98110.4024
MKGAT-sm0.94870.63540.61730.97790.62340.98830.6116
MKGAT0.96270.73720.69100.98250.69000.99150.7044

We can see from Table 1 that the AUROC value of MKGAT-h2 is better than those of MKGAT-h1 and MKGAT-h3, and the AUPR value of MKGAT-h1 is better than those of MKGAT-h2 and MKGAT-h3, which means that each kernel matrix generated by GATs provides useful information for prediction. Meanwhile, the performance of MKGAT is best among all models. We conclude that using an attention mechanism for information fusion can improve prediction performance in our study.

Performance comparison with other methods

We compare our model MKGAT with six latest state-of-the-art methods using the benchmark datasets by 5-fold cross-validations in our study. The six baseline methods are as follows:

  • IMCMDA [26]: an inductive matrix completion model to complete missing miRNA-disease associations based on known miRNA-disease associations as well as integrated miRNA similarity and disease similarity.

  • LAGCN [52]: a computational model to predict drug-disease associations by combining embeddings from multiple graph convolution layers using an attention mechanism based on drug-disease heterogeneous networks.

  • VGAMF [40]: a variational graph autoencoder and matrix decomposition approach to infer associations between miRNAs and diseases.

  • NIMCGCN [36]: a neural induction matrix complementation approach based on graph convolutional networks to predict miRNA-disease associations.

  • NIMGSA [42]: a neural induction matrix completion-based method to predict miRNA-disease associations using graph autoencoder (GAE) and self-attentive mechanism.

  • DFELMDA [43]: a computational approach using the deep random forest to predict miRNA-disease associations based on autoencoder for feature learning.

We download the source codes from the links provided by these studies and set the parameters used in the methods according to their experimental settings. We plot the ROC and PR curves based on 5-fold cross-validations in Figures 4 and 5, respectively. We list the comparison results in Table 2. As we can see from Table 2, MKGAT outperforms the other six methods in all evaluation metrics, with the highest AUROC value of 0.9627 and the highest AUPR value of 0.7372. The AUROC and AUPR values for MKGAT are higher than those for DFELMDA (2nd) by 1.4 and 16.7%, respectively. The comprehensive results demonstrate that MKGAT is superior to the six baseline methods in potential miRNA-disease association predictions.

ROC curves of seven methods by 5-fold cross-validation tests.
Figure 4

ROC curves of seven methods by 5-fold cross-validation tests.

PR curves of seven methods by 5-fold cross-validation tests.
Figure 5

PR curves of seven methods by 5-fold cross-validation tests.

Table 2

Performance comparison of seven methods based on 5-fold cross-validations

ModelAUROCAUPRF1-ScoreACCRECALLSPECPRE
MKGAT0.96270.73720.69100.98250.69000.99150.7044
DFELMDA0.94880.57020.55970.97540.54620.98810.5768
NIMGSA0.93080.45000.45950.96960.45170.98490.4689
NIMCGCN0.93570.46530.47380.96990.47310.98450.4771
VGAMF0.91340.43120.45010.96840.45190.98360.4515
LAGCN0.92380.47170.42830.95590.57640.96710.3407
IMCMDA0.81420.35070.38620.92650.41230.95710.3646
ModelAUROCAUPRF1-ScoreACCRECALLSPECPRE
MKGAT0.96270.73720.69100.98250.69000.99150.7044
DFELMDA0.94880.57020.55970.97540.54620.98810.5768
NIMGSA0.93080.45000.45950.96960.45170.98490.4689
NIMCGCN0.93570.46530.47380.96990.47310.98450.4771
VGAMF0.91340.43120.45010.96840.45190.98360.4515
LAGCN0.92380.47170.42830.95590.57640.96710.3407
IMCMDA0.81420.35070.38620.92650.41230.95710.3646
Table 2

Performance comparison of seven methods based on 5-fold cross-validations

ModelAUROCAUPRF1-ScoreACCRECALLSPECPRE
MKGAT0.96270.73720.69100.98250.69000.99150.7044
DFELMDA0.94880.57020.55970.97540.54620.98810.5768
NIMGSA0.93080.45000.45950.96960.45170.98490.4689
NIMCGCN0.93570.46530.47380.96990.47310.98450.4771
VGAMF0.91340.43120.45010.96840.45190.98360.4515
LAGCN0.92380.47170.42830.95590.57640.96710.3407
IMCMDA0.81420.35070.38620.92650.41230.95710.3646
ModelAUROCAUPRF1-ScoreACCRECALLSPECPRE
MKGAT0.96270.73720.69100.98250.69000.99150.7044
DFELMDA0.94880.57020.55970.97540.54620.98810.5768
NIMGSA0.93080.45000.45950.96960.45170.98490.4689
NIMCGCN0.93570.46530.47380.96990.47310.98450.4771
VGAMF0.91340.43120.45010.96840.45190.98360.4515
LAGCN0.92380.47170.42830.95590.57640.96710.3407
IMCMDA0.81420.35070.38620.92650.41230.95710.3646

Case studies

To further validate the performance of MKGAT in discovering miRNA-disease associations, we conduct case studies on three important diseases: colon neoplasms, lung neoplasms and breast neoplasms. Specifically, we first exclude the association information from the 5430 known miRNA-disease association matrix for each specific disease. We then train MKGAT for new miRNA-disease association predictions. Finally, we prioritize and select the top 50 predictions for the disease of interest as biologists are more interested in the top results. We also make comprehensive miRNA-disease association predictions based on the benchmark datasets, in which the total 5430 associations and similarity measurements are used for training. We choose the top 50 predicted results for validation. Since the benchmark datasets were collected from HMDD v2.0, we search the latest version of other online databases like dbDEMC and HMDD v3 for result confirmation.

Colon neoplasms are epithelial tumors with a high mortality rate [53]. Statistics show that it is one of the major causes of cancer-related deaths worldwide [54]. The refined protein-rich and high-fat diet has been considered as a possible cause of the diseases [55]. Early screening is an effective way for improving personalized treatment and prevention as colon tumors may be unluckily silent for long in a large number of patients [54]. Studies have demonstrated that miRNA signatures mirror pathological changes in patients with colon neoplasms and several miRNAs are promising biomarkers for diagnosis [56–58]. We, therefore, use MKGAT to infer the relevant miRNAs for colon neoplasms. We select the top 50 predictions by our model and find all the predictions have been confirmed in databases like HMDD v3 or dbDEMC. We list the result in Table 3.

Table 3

The top 50 predicted miRNAs associated with colon neoplasms

RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-30adbDEMC, HMDD26hsa-mir-483dbDEMC, HMDD
2hsa-mir-137dbDEMC, HMDD27hsa-mir-125adbDEMC, HMDD
3hsa-mir-126dbDEMC, HMDD28hsa-let-7edbDEMC, HMDD
4hsa-mir-145dbDEMC, HMDD29hsa-mir-151dbDEMC
5hsa-mir-17dbDEMC, HMDD30hsa-mir-142dbDEMC, HMDD
6hsa-mir-424dbDEMC, HMDD31hsa-mir-122dbDEMC
7hsa-mir-9dbDEMC32hsa-mir-33adbDEMC, HMDD
8hsa-mir-140dbDEMC, HMDD33hsa-mir-429dbDEMC
9hsa-mir-205HMDD34hsa-mir-454dbDEMC
10hsa-mir-133bdbDEMC, HMDD35hsa-mir-28dbDEMC
11hsa-mir-23adbDEMC36hsa-mir-20adbDEMC, HMDD
12hsa-mir-10bdbDEMC, HMDD37hsa-mir-518cdbDEMC, HMDD
13hsa-mir-449adbDEMC, HMDD38hsa-let-7cdbDEMC, HMDD
14hsa-mir-106adbDEMC, HMDD39hsa-mir-1dbDEMC, HMDD
15hsa-mir-22dbDEMC, HMDD40hsa-mir-561HMDD
16hsa-mir-218dbDEMC, HMDD41hsa-mir-95dbDEMC
17hsa-let-7idbDEMC, HMDD42hsa-mir-101dbDEMC, HMDD
18hsa-mir-152dbDEMC, HMDD43hsa-mir-143dbDEMC, HMDD
19hsa-mir-23bdbDEMC, HMDD44hsa-mir-615dbDEMC, HMDD
20hsa-mir-622dbDEMC, HMDD45hsa-mir-574dbDEMC
21hsa-mir-296dbDEMC, HMDD46hsa-mir-367HMDD
22hsa-mir-302bdbDEMC, HMDD47hsa-mir-370dbDEMC
23hsa-mir-330dbDEMC, HMDD48hsa-mir-629HMDD
24hsa-mir-127dbDEMC, HMDD49hsa-mir-141dbDEMC, HMDD
25hsa-mir-153dbDEMC50hsa-mir-630dbDEMC, HMDD
RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-30adbDEMC, HMDD26hsa-mir-483dbDEMC, HMDD
2hsa-mir-137dbDEMC, HMDD27hsa-mir-125adbDEMC, HMDD
3hsa-mir-126dbDEMC, HMDD28hsa-let-7edbDEMC, HMDD
4hsa-mir-145dbDEMC, HMDD29hsa-mir-151dbDEMC
5hsa-mir-17dbDEMC, HMDD30hsa-mir-142dbDEMC, HMDD
6hsa-mir-424dbDEMC, HMDD31hsa-mir-122dbDEMC
7hsa-mir-9dbDEMC32hsa-mir-33adbDEMC, HMDD
8hsa-mir-140dbDEMC, HMDD33hsa-mir-429dbDEMC
9hsa-mir-205HMDD34hsa-mir-454dbDEMC
10hsa-mir-133bdbDEMC, HMDD35hsa-mir-28dbDEMC
11hsa-mir-23adbDEMC36hsa-mir-20adbDEMC, HMDD
12hsa-mir-10bdbDEMC, HMDD37hsa-mir-518cdbDEMC, HMDD
13hsa-mir-449adbDEMC, HMDD38hsa-let-7cdbDEMC, HMDD
14hsa-mir-106adbDEMC, HMDD39hsa-mir-1dbDEMC, HMDD
15hsa-mir-22dbDEMC, HMDD40hsa-mir-561HMDD
16hsa-mir-218dbDEMC, HMDD41hsa-mir-95dbDEMC
17hsa-let-7idbDEMC, HMDD42hsa-mir-101dbDEMC, HMDD
18hsa-mir-152dbDEMC, HMDD43hsa-mir-143dbDEMC, HMDD
19hsa-mir-23bdbDEMC, HMDD44hsa-mir-615dbDEMC, HMDD
20hsa-mir-622dbDEMC, HMDD45hsa-mir-574dbDEMC
21hsa-mir-296dbDEMC, HMDD46hsa-mir-367HMDD
22hsa-mir-302bdbDEMC, HMDD47hsa-mir-370dbDEMC
23hsa-mir-330dbDEMC, HMDD48hsa-mir-629HMDD
24hsa-mir-127dbDEMC, HMDD49hsa-mir-141dbDEMC, HMDD
25hsa-mir-153dbDEMC50hsa-mir-630dbDEMC, HMDD
Table 3

The top 50 predicted miRNAs associated with colon neoplasms

RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-30adbDEMC, HMDD26hsa-mir-483dbDEMC, HMDD
2hsa-mir-137dbDEMC, HMDD27hsa-mir-125adbDEMC, HMDD
3hsa-mir-126dbDEMC, HMDD28hsa-let-7edbDEMC, HMDD
4hsa-mir-145dbDEMC, HMDD29hsa-mir-151dbDEMC
5hsa-mir-17dbDEMC, HMDD30hsa-mir-142dbDEMC, HMDD
6hsa-mir-424dbDEMC, HMDD31hsa-mir-122dbDEMC
7hsa-mir-9dbDEMC32hsa-mir-33adbDEMC, HMDD
8hsa-mir-140dbDEMC, HMDD33hsa-mir-429dbDEMC
9hsa-mir-205HMDD34hsa-mir-454dbDEMC
10hsa-mir-133bdbDEMC, HMDD35hsa-mir-28dbDEMC
11hsa-mir-23adbDEMC36hsa-mir-20adbDEMC, HMDD
12hsa-mir-10bdbDEMC, HMDD37hsa-mir-518cdbDEMC, HMDD
13hsa-mir-449adbDEMC, HMDD38hsa-let-7cdbDEMC, HMDD
14hsa-mir-106adbDEMC, HMDD39hsa-mir-1dbDEMC, HMDD
15hsa-mir-22dbDEMC, HMDD40hsa-mir-561HMDD
16hsa-mir-218dbDEMC, HMDD41hsa-mir-95dbDEMC
17hsa-let-7idbDEMC, HMDD42hsa-mir-101dbDEMC, HMDD
18hsa-mir-152dbDEMC, HMDD43hsa-mir-143dbDEMC, HMDD
19hsa-mir-23bdbDEMC, HMDD44hsa-mir-615dbDEMC, HMDD
20hsa-mir-622dbDEMC, HMDD45hsa-mir-574dbDEMC
21hsa-mir-296dbDEMC, HMDD46hsa-mir-367HMDD
22hsa-mir-302bdbDEMC, HMDD47hsa-mir-370dbDEMC
23hsa-mir-330dbDEMC, HMDD48hsa-mir-629HMDD
24hsa-mir-127dbDEMC, HMDD49hsa-mir-141dbDEMC, HMDD
25hsa-mir-153dbDEMC50hsa-mir-630dbDEMC, HMDD
RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-30adbDEMC, HMDD26hsa-mir-483dbDEMC, HMDD
2hsa-mir-137dbDEMC, HMDD27hsa-mir-125adbDEMC, HMDD
3hsa-mir-126dbDEMC, HMDD28hsa-let-7edbDEMC, HMDD
4hsa-mir-145dbDEMC, HMDD29hsa-mir-151dbDEMC
5hsa-mir-17dbDEMC, HMDD30hsa-mir-142dbDEMC, HMDD
6hsa-mir-424dbDEMC, HMDD31hsa-mir-122dbDEMC
7hsa-mir-9dbDEMC32hsa-mir-33adbDEMC, HMDD
8hsa-mir-140dbDEMC, HMDD33hsa-mir-429dbDEMC
9hsa-mir-205HMDD34hsa-mir-454dbDEMC
10hsa-mir-133bdbDEMC, HMDD35hsa-mir-28dbDEMC
11hsa-mir-23adbDEMC36hsa-mir-20adbDEMC, HMDD
12hsa-mir-10bdbDEMC, HMDD37hsa-mir-518cdbDEMC, HMDD
13hsa-mir-449adbDEMC, HMDD38hsa-let-7cdbDEMC, HMDD
14hsa-mir-106adbDEMC, HMDD39hsa-mir-1dbDEMC, HMDD
15hsa-mir-22dbDEMC, HMDD40hsa-mir-561HMDD
16hsa-mir-218dbDEMC, HMDD41hsa-mir-95dbDEMC
17hsa-let-7idbDEMC, HMDD42hsa-mir-101dbDEMC, HMDD
18hsa-mir-152dbDEMC, HMDD43hsa-mir-143dbDEMC, HMDD
19hsa-mir-23bdbDEMC, HMDD44hsa-mir-615dbDEMC, HMDD
20hsa-mir-622dbDEMC, HMDD45hsa-mir-574dbDEMC
21hsa-mir-296dbDEMC, HMDD46hsa-mir-367HMDD
22hsa-mir-302bdbDEMC, HMDD47hsa-mir-370dbDEMC
23hsa-mir-330dbDEMC, HMDD48hsa-mir-629HMDD
24hsa-mir-127dbDEMC, HMDD49hsa-mir-141dbDEMC, HMDD
25hsa-mir-153dbDEMC50hsa-mir-630dbDEMC, HMDD

For lung neoplasms and breast neoplasms, we delete their related information in the known miRNA-disease associations, implement the prediction procedures in MKGAT and discover that all the top 50 predictions are verified in both cancers by existing independent databases. We show the results for the two diseases in Tables 4 and 5, respectively.

Table 4

The top 50 predicted miRNAs associated with lung neoplasms

RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-486dbDEMC, HMDD26hsa-mir-708dbDEMC
2hsa-mir-34adbDEMC, HMDD27hsa-mir-668dbDEMC
3hsa-mir-125bdbDEMC, HMDD28hsa-mir-499aHMDD
4hsa-mir-93dbDEMC, HMDD29hsa-let-7ddbDEMC, HMDD
5hsa-mir-155dbDEMC, HMDD30hsa-mir-199adbDEMC, HMDD
6hsa-mir-520ddbDEMC31hsa-mir-181adbDEMC, HMDD
7hsa-mir-16dbDEMC, HMDD32hsa-mir-497dbDEMC, HMDD
8hsa-mir-145dbDEMC, HMDD33hsa-mir-130adbDEMC, HMDD
9hsa-mir-100dbDEMC, HMDD34hsa-mir-30ddbDEMC, HMDD
10hsa-mir-27bdbDEMC, HMDD35hsa-mir-223dbDEMC, HMDD
11hsa-mir-30edbDEMC, HMDD36hsa-mir-30adbDEMC, HMDD
12hsa-mir-1dbDEMC, HMDD37hsa-mir-384dbDEMC
13hsa-mir-205dbDEMC, HMDD38hsa-mir-144dbDEMC, HMDD
14hsa-let-7 gdbDEMC, HMDD39hsa-mir-134dbDEMC, HMDD
15hsa-mir-21dbDEMC, HMDD40hsa-mir-221dbDEMC, HMDD
16hsa-mir-193bdbDEMC41hsa-mir-561dbDEMC
17hsa-mir-218dbDEMC, HMDD42hsa-mir-101dbDEMC, HMDD
18hsa-mir-27adbDEMC, HMDD43hsa-mir-520bdbDEMC, HMDD
19hsa-mir-186dbDEMC, HMDD44hsa-mir-449aHMDD
20hsa-let-7bdbDEMC, HMDD45hsa-mir-135bdbDEMC, HMDD
21hsa-mir-196adbDEMC, HMDD46hsa-mir-520adbDEMC
22hsa-mir-424dbDEMC47hsa-mir-125adbDEMC, HMDD
23hsa-mir-7dbDEMC, HMDD48hsa-mir-135adbDEMC, HMDD
24hsa-mir-487adbDEMC49hsa-mir-378adbDEMC
25hsa-mir-148adbDEMC, HMDD50hsa-mir-663dbDEMC, HMDD
RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-486dbDEMC, HMDD26hsa-mir-708dbDEMC
2hsa-mir-34adbDEMC, HMDD27hsa-mir-668dbDEMC
3hsa-mir-125bdbDEMC, HMDD28hsa-mir-499aHMDD
4hsa-mir-93dbDEMC, HMDD29hsa-let-7ddbDEMC, HMDD
5hsa-mir-155dbDEMC, HMDD30hsa-mir-199adbDEMC, HMDD
6hsa-mir-520ddbDEMC31hsa-mir-181adbDEMC, HMDD
7hsa-mir-16dbDEMC, HMDD32hsa-mir-497dbDEMC, HMDD
8hsa-mir-145dbDEMC, HMDD33hsa-mir-130adbDEMC, HMDD
9hsa-mir-100dbDEMC, HMDD34hsa-mir-30ddbDEMC, HMDD
10hsa-mir-27bdbDEMC, HMDD35hsa-mir-223dbDEMC, HMDD
11hsa-mir-30edbDEMC, HMDD36hsa-mir-30adbDEMC, HMDD
12hsa-mir-1dbDEMC, HMDD37hsa-mir-384dbDEMC
13hsa-mir-205dbDEMC, HMDD38hsa-mir-144dbDEMC, HMDD
14hsa-let-7 gdbDEMC, HMDD39hsa-mir-134dbDEMC, HMDD
15hsa-mir-21dbDEMC, HMDD40hsa-mir-221dbDEMC, HMDD
16hsa-mir-193bdbDEMC41hsa-mir-561dbDEMC
17hsa-mir-218dbDEMC, HMDD42hsa-mir-101dbDEMC, HMDD
18hsa-mir-27adbDEMC, HMDD43hsa-mir-520bdbDEMC, HMDD
19hsa-mir-186dbDEMC, HMDD44hsa-mir-449aHMDD
20hsa-let-7bdbDEMC, HMDD45hsa-mir-135bdbDEMC, HMDD
21hsa-mir-196adbDEMC, HMDD46hsa-mir-520adbDEMC
22hsa-mir-424dbDEMC47hsa-mir-125adbDEMC, HMDD
23hsa-mir-7dbDEMC, HMDD48hsa-mir-135adbDEMC, HMDD
24hsa-mir-487adbDEMC49hsa-mir-378adbDEMC
25hsa-mir-148adbDEMC, HMDD50hsa-mir-663dbDEMC, HMDD
Table 4

The top 50 predicted miRNAs associated with lung neoplasms

RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-486dbDEMC, HMDD26hsa-mir-708dbDEMC
2hsa-mir-34adbDEMC, HMDD27hsa-mir-668dbDEMC
3hsa-mir-125bdbDEMC, HMDD28hsa-mir-499aHMDD
4hsa-mir-93dbDEMC, HMDD29hsa-let-7ddbDEMC, HMDD
5hsa-mir-155dbDEMC, HMDD30hsa-mir-199adbDEMC, HMDD
6hsa-mir-520ddbDEMC31hsa-mir-181adbDEMC, HMDD
7hsa-mir-16dbDEMC, HMDD32hsa-mir-497dbDEMC, HMDD
8hsa-mir-145dbDEMC, HMDD33hsa-mir-130adbDEMC, HMDD
9hsa-mir-100dbDEMC, HMDD34hsa-mir-30ddbDEMC, HMDD
10hsa-mir-27bdbDEMC, HMDD35hsa-mir-223dbDEMC, HMDD
11hsa-mir-30edbDEMC, HMDD36hsa-mir-30adbDEMC, HMDD
12hsa-mir-1dbDEMC, HMDD37hsa-mir-384dbDEMC
13hsa-mir-205dbDEMC, HMDD38hsa-mir-144dbDEMC, HMDD
14hsa-let-7 gdbDEMC, HMDD39hsa-mir-134dbDEMC, HMDD
15hsa-mir-21dbDEMC, HMDD40hsa-mir-221dbDEMC, HMDD
16hsa-mir-193bdbDEMC41hsa-mir-561dbDEMC
17hsa-mir-218dbDEMC, HMDD42hsa-mir-101dbDEMC, HMDD
18hsa-mir-27adbDEMC, HMDD43hsa-mir-520bdbDEMC, HMDD
19hsa-mir-186dbDEMC, HMDD44hsa-mir-449aHMDD
20hsa-let-7bdbDEMC, HMDD45hsa-mir-135bdbDEMC, HMDD
21hsa-mir-196adbDEMC, HMDD46hsa-mir-520adbDEMC
22hsa-mir-424dbDEMC47hsa-mir-125adbDEMC, HMDD
23hsa-mir-7dbDEMC, HMDD48hsa-mir-135adbDEMC, HMDD
24hsa-mir-487adbDEMC49hsa-mir-378adbDEMC
25hsa-mir-148adbDEMC, HMDD50hsa-mir-663dbDEMC, HMDD
RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-486dbDEMC, HMDD26hsa-mir-708dbDEMC
2hsa-mir-34adbDEMC, HMDD27hsa-mir-668dbDEMC
3hsa-mir-125bdbDEMC, HMDD28hsa-mir-499aHMDD
4hsa-mir-93dbDEMC, HMDD29hsa-let-7ddbDEMC, HMDD
5hsa-mir-155dbDEMC, HMDD30hsa-mir-199adbDEMC, HMDD
6hsa-mir-520ddbDEMC31hsa-mir-181adbDEMC, HMDD
7hsa-mir-16dbDEMC, HMDD32hsa-mir-497dbDEMC, HMDD
8hsa-mir-145dbDEMC, HMDD33hsa-mir-130adbDEMC, HMDD
9hsa-mir-100dbDEMC, HMDD34hsa-mir-30ddbDEMC, HMDD
10hsa-mir-27bdbDEMC, HMDD35hsa-mir-223dbDEMC, HMDD
11hsa-mir-30edbDEMC, HMDD36hsa-mir-30adbDEMC, HMDD
12hsa-mir-1dbDEMC, HMDD37hsa-mir-384dbDEMC
13hsa-mir-205dbDEMC, HMDD38hsa-mir-144dbDEMC, HMDD
14hsa-let-7 gdbDEMC, HMDD39hsa-mir-134dbDEMC, HMDD
15hsa-mir-21dbDEMC, HMDD40hsa-mir-221dbDEMC, HMDD
16hsa-mir-193bdbDEMC41hsa-mir-561dbDEMC
17hsa-mir-218dbDEMC, HMDD42hsa-mir-101dbDEMC, HMDD
18hsa-mir-27adbDEMC, HMDD43hsa-mir-520bdbDEMC, HMDD
19hsa-mir-186dbDEMC, HMDD44hsa-mir-449aHMDD
20hsa-let-7bdbDEMC, HMDD45hsa-mir-135bdbDEMC, HMDD
21hsa-mir-196adbDEMC, HMDD46hsa-mir-520adbDEMC
22hsa-mir-424dbDEMC47hsa-mir-125adbDEMC, HMDD
23hsa-mir-7dbDEMC, HMDD48hsa-mir-135adbDEMC, HMDD
24hsa-mir-487adbDEMC49hsa-mir-378adbDEMC
25hsa-mir-148adbDEMC, HMDD50hsa-mir-663dbDEMC, HMDD
Table 5

The top 50 predicted miRNAs associated with breast neoplasms

RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-142HMDD26hsa-mir-375dbDEMC, HMDD
2hsa-mir-31dbDEMC27hsa-mir-302adbDEMC, HMDD
3hsa-mir-21dbDEMC, HMDD28hsa-mir-34cHMDD
4hsa-mir-125bdbDEMC, HMDD29hsa-mir-186dbDEMC
5hsa-mir-145dbDEMC, HMDD30hsa-mir-129dbDEMC, HMDD
6hsa-mir-302bdbDEMC, HMDD31hsa-mir-30adbDEMC, HMDD
7hsa-mir-195dbDEMC, HMDD32hsa-mir-521dbDEMC
8hsa-mir-302ddbDEMC, HMDD33hsa-mir-217dbDEMC
9hsa-mir-302cdbDEMC, HMDD34hsa-mir-411dbDEMC, HMDD
10hsa-mir-200cdbDEMC, HMDD35hsa-mir-346dbDEMC
11hsa-mir-100dbDEMC, HMDD36hsa-mir-29bdbDEMC, HMDD
12hsa-mir-181adbDEMC, HMDD37hsa-mir-218dbDEMC, HMDD
13hsa-mir-431dbDEMC38hsa-mir-371dbDEMC
14hsa-mir-99adbDEMC, HMDD39hsa-mir-143dbDEMC, HMDD
15hsa-mir-542dbDEMC40hsa-mir-29adbDEMC, HMDD
16hsa-mir-34adbDEMC, HMDD41hsa-mir-181cdbDEMC, HMDD
17hsa-mir-93dbDEMC, HMDD42hsa-mir-1266dbDEMC, HMDD
18hsa-mir-92adbDEMC, HMDD43hsa-mir-652HMDD
19hsa-mir-9dbDEMC, HMDD44hsa-mir-140HMDD
20hsa-mir-150dbDEMC, HMDD45hsa-mir-26bdbDEMC, HMDD
21hsa-mir-210dbDEMC, HMDD46hsa-mir-138dbDEMC, HMDD
22hsa-mir-432dbDEMC47hsa-mir-376adbDEMC, HMDD
23hsa-mir-205dbDEMC, HMDD48hsa-mir-365bHMDD
24hsa-mir-7dbDEMC, HMDD49hsa-mir-135bdbDEMC, HMDD
25hsa-mir-221dbDEMC, HMDD50hsa-mir-146bHMDD
RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-142HMDD26hsa-mir-375dbDEMC, HMDD
2hsa-mir-31dbDEMC27hsa-mir-302adbDEMC, HMDD
3hsa-mir-21dbDEMC, HMDD28hsa-mir-34cHMDD
4hsa-mir-125bdbDEMC, HMDD29hsa-mir-186dbDEMC
5hsa-mir-145dbDEMC, HMDD30hsa-mir-129dbDEMC, HMDD
6hsa-mir-302bdbDEMC, HMDD31hsa-mir-30adbDEMC, HMDD
7hsa-mir-195dbDEMC, HMDD32hsa-mir-521dbDEMC
8hsa-mir-302ddbDEMC, HMDD33hsa-mir-217dbDEMC
9hsa-mir-302cdbDEMC, HMDD34hsa-mir-411dbDEMC, HMDD
10hsa-mir-200cdbDEMC, HMDD35hsa-mir-346dbDEMC
11hsa-mir-100dbDEMC, HMDD36hsa-mir-29bdbDEMC, HMDD
12hsa-mir-181adbDEMC, HMDD37hsa-mir-218dbDEMC, HMDD
13hsa-mir-431dbDEMC38hsa-mir-371dbDEMC
14hsa-mir-99adbDEMC, HMDD39hsa-mir-143dbDEMC, HMDD
15hsa-mir-542dbDEMC40hsa-mir-29adbDEMC, HMDD
16hsa-mir-34adbDEMC, HMDD41hsa-mir-181cdbDEMC, HMDD
17hsa-mir-93dbDEMC, HMDD42hsa-mir-1266dbDEMC, HMDD
18hsa-mir-92adbDEMC, HMDD43hsa-mir-652HMDD
19hsa-mir-9dbDEMC, HMDD44hsa-mir-140HMDD
20hsa-mir-150dbDEMC, HMDD45hsa-mir-26bdbDEMC, HMDD
21hsa-mir-210dbDEMC, HMDD46hsa-mir-138dbDEMC, HMDD
22hsa-mir-432dbDEMC47hsa-mir-376adbDEMC, HMDD
23hsa-mir-205dbDEMC, HMDD48hsa-mir-365bHMDD
24hsa-mir-7dbDEMC, HMDD49hsa-mir-135bdbDEMC, HMDD
25hsa-mir-221dbDEMC, HMDD50hsa-mir-146bHMDD
Table 5

The top 50 predicted miRNAs associated with breast neoplasms

RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-142HMDD26hsa-mir-375dbDEMC, HMDD
2hsa-mir-31dbDEMC27hsa-mir-302adbDEMC, HMDD
3hsa-mir-21dbDEMC, HMDD28hsa-mir-34cHMDD
4hsa-mir-125bdbDEMC, HMDD29hsa-mir-186dbDEMC
5hsa-mir-145dbDEMC, HMDD30hsa-mir-129dbDEMC, HMDD
6hsa-mir-302bdbDEMC, HMDD31hsa-mir-30adbDEMC, HMDD
7hsa-mir-195dbDEMC, HMDD32hsa-mir-521dbDEMC
8hsa-mir-302ddbDEMC, HMDD33hsa-mir-217dbDEMC
9hsa-mir-302cdbDEMC, HMDD34hsa-mir-411dbDEMC, HMDD
10hsa-mir-200cdbDEMC, HMDD35hsa-mir-346dbDEMC
11hsa-mir-100dbDEMC, HMDD36hsa-mir-29bdbDEMC, HMDD
12hsa-mir-181adbDEMC, HMDD37hsa-mir-218dbDEMC, HMDD
13hsa-mir-431dbDEMC38hsa-mir-371dbDEMC
14hsa-mir-99adbDEMC, HMDD39hsa-mir-143dbDEMC, HMDD
15hsa-mir-542dbDEMC40hsa-mir-29adbDEMC, HMDD
16hsa-mir-34adbDEMC, HMDD41hsa-mir-181cdbDEMC, HMDD
17hsa-mir-93dbDEMC, HMDD42hsa-mir-1266dbDEMC, HMDD
18hsa-mir-92adbDEMC, HMDD43hsa-mir-652HMDD
19hsa-mir-9dbDEMC, HMDD44hsa-mir-140HMDD
20hsa-mir-150dbDEMC, HMDD45hsa-mir-26bdbDEMC, HMDD
21hsa-mir-210dbDEMC, HMDD46hsa-mir-138dbDEMC, HMDD
22hsa-mir-432dbDEMC47hsa-mir-376adbDEMC, HMDD
23hsa-mir-205dbDEMC, HMDD48hsa-mir-365bHMDD
24hsa-mir-7dbDEMC, HMDD49hsa-mir-135bdbDEMC, HMDD
25hsa-mir-221dbDEMC, HMDD50hsa-mir-146bHMDD
RankingmiRNAEvidenceRankingmiRNAEvidence
1hsa-mir-142HMDD26hsa-mir-375dbDEMC, HMDD
2hsa-mir-31dbDEMC27hsa-mir-302adbDEMC, HMDD
3hsa-mir-21dbDEMC, HMDD28hsa-mir-34cHMDD
4hsa-mir-125bdbDEMC, HMDD29hsa-mir-186dbDEMC
5hsa-mir-145dbDEMC, HMDD30hsa-mir-129dbDEMC, HMDD
6hsa-mir-302bdbDEMC, HMDD31hsa-mir-30adbDEMC, HMDD
7hsa-mir-195dbDEMC, HMDD32hsa-mir-521dbDEMC
8hsa-mir-302ddbDEMC, HMDD33hsa-mir-217dbDEMC
9hsa-mir-302cdbDEMC, HMDD34hsa-mir-411dbDEMC, HMDD
10hsa-mir-200cdbDEMC, HMDD35hsa-mir-346dbDEMC
11hsa-mir-100dbDEMC, HMDD36hsa-mir-29bdbDEMC, HMDD
12hsa-mir-181adbDEMC, HMDD37hsa-mir-218dbDEMC, HMDD
13hsa-mir-431dbDEMC38hsa-mir-371dbDEMC
14hsa-mir-99adbDEMC, HMDD39hsa-mir-143dbDEMC, HMDD
15hsa-mir-542dbDEMC40hsa-mir-29adbDEMC, HMDD
16hsa-mir-34adbDEMC, HMDD41hsa-mir-181cdbDEMC, HMDD
17hsa-mir-93dbDEMC, HMDD42hsa-mir-1266dbDEMC, HMDD
18hsa-mir-92adbDEMC, HMDD43hsa-mir-652HMDD
19hsa-mir-9dbDEMC, HMDD44hsa-mir-140HMDD
20hsa-mir-150dbDEMC, HMDD45hsa-mir-26bdbDEMC, HMDD
21hsa-mir-210dbDEMC, HMDD46hsa-mir-138dbDEMC, HMDD
22hsa-mir-432dbDEMC47hsa-mir-376adbDEMC, HMDD
23hsa-mir-205dbDEMC, HMDD48hsa-mir-365bHMDD
24hsa-mir-7dbDEMC, HMDD49hsa-mir-135bdbDEMC, HMDD
25hsa-mir-221dbDEMC, HMDD50hsa-mir-146bHMDD

For the top 50 predicted results received from using the benchmark datasets as training, we find 48 associations are confirmed by HMDD v3 or dbDEMC. We list the result in Table 6. The encouraging results demonstrate the usefulness of our method MKGAT in discovering disease-related miRNAs in real situations.

Table 6

The top 50 predicted miRNA-disease associations by MKGAT

RankingmiRNADiseaseEvidence
1hsa-mir-143Carcinoma HepatocellulardbDEMC, HMDD
2hsa-mir-125aColorectal NeoplasmsdbDEMC, HMDD
3hsa-mir-137GliomaHMDD
4hsa-mir-92aGastric NeoplasmsdbDEMC, HMDD
5hsa-mir-92aGliomaHMDD
6hsa-mir-155Multiple MyelomaHMDD
7hsa-mir-34bCarcinoma HepatocellulardbDEMC, HMDD
8hsa-mir-16Leukemia Myeloid AcutedbDEMC, HMDD
9hsa-mir-29bGastric NeoplasmsdbDEMC, HMDD
10hsa-mir-125blymphomadbDEMC, HMDD
11hsa-mir-15aColorectal NeoplasmsdbDEMC, HMDD
12hsa-mir-142Esophageal NeoplasmsdbDEMC, HMDD
13hsa-mir-17Carcinoma Squamous CellHMDD
14hsa-mir-223Colorectal NeoplasmsdbDEMC, HMDD
15hsa-mir-1Esophageal NeoplasmsdbDEMC, HMDD
16hsa-mir-181bMelanomadbDEMC
17hsa-mir-200bEsophageal NeoplasmsdbDEMC, HMDD
18hsa-mir-155Prostatic NeoplasmsdbDEMC
19hsa-mir-1Pancreatic NeoplasmsdbDEMC, HMDD
20hsa-mir-10bMelanomadbDEMC, HMDD
21hsa-mir-200aHead and Neck NeoplasmsdbDEMC, HMDD
22hsa-mir-375Carcinoma Non-Small-Cell LungHMDD
23hsa-mir-19aGliomaHMDD
24hsa-mir-15aGastric NeoplasmsdbDEMC, HMDD
25hsa-mir-142Breast NeoplasmsdbDEMC, HMDD
26hsa-mir-221Carcinoma Squamous CelldbDEMC, HMDD
27hsa-mir-92aMelanomadbDEMC, HMDD
28hsa-mir-34aAdenocarcinomaHMDD
29hsa-mir-196aColon NeoplasmsdbDEMC, HMDD
30hsa-mir-31OsteosarcomaNA
31hsa-mir-7Ovarian NeoplasmsdbDEMC, HMDD
32hsa-mir-203Gastric NeoplasmsdbDEMC, HMDD
33hsa-mir-9Carcinoma HepatocellulardbDEMC, HMDD
34hsa-mir-141AdenocarcinomaHMDD
35hsa-mir-205Gastric NeoplasmsdbDEMC, HMDD
36hsa-mir-200cUterine Cervical NeoplasmsdbDEMC
37hsa-mir-146bCarcinoma Squamous CelldbDEMC, HMDD
38hsa-mir-137MedulloblastomaNA
39hsa-mir-19bGliomaHMDD
40hsa-mir-106bColorectal NeoplasmsdbDEMC, HMDD
41hsa-mir-183Carcinoma Squamous CelldbDEMC, HMDD
42hsa-mir-218Endometrial NeoplasmsHMDD
43hsa-mir-30cMelanomadbDEMC
44hsa-mir-34cLeukemia Myeloid AcutedbDEMC, HMDD
45hsa-mir-21Leukemia Myeloid AcutedbDEMC, HMDD
46hsa-mir-210Gastric NeoplasmsdbDEMC, HMDD
47hsa-mir-15aCarcinoma Renal CellHMDD
48hsa-mir-183Colon NeoplasmsdbDEMC, HMDD
49hsa-mir-192Carcinoma Renal CellHMDD
50hsa-mir-125bAdenocarcinomaHMDD
RankingmiRNADiseaseEvidence
1hsa-mir-143Carcinoma HepatocellulardbDEMC, HMDD
2hsa-mir-125aColorectal NeoplasmsdbDEMC, HMDD
3hsa-mir-137GliomaHMDD
4hsa-mir-92aGastric NeoplasmsdbDEMC, HMDD
5hsa-mir-92aGliomaHMDD
6hsa-mir-155Multiple MyelomaHMDD
7hsa-mir-34bCarcinoma HepatocellulardbDEMC, HMDD
8hsa-mir-16Leukemia Myeloid AcutedbDEMC, HMDD
9hsa-mir-29bGastric NeoplasmsdbDEMC, HMDD
10hsa-mir-125blymphomadbDEMC, HMDD
11hsa-mir-15aColorectal NeoplasmsdbDEMC, HMDD
12hsa-mir-142Esophageal NeoplasmsdbDEMC, HMDD
13hsa-mir-17Carcinoma Squamous CellHMDD
14hsa-mir-223Colorectal NeoplasmsdbDEMC, HMDD
15hsa-mir-1Esophageal NeoplasmsdbDEMC, HMDD
16hsa-mir-181bMelanomadbDEMC
17hsa-mir-200bEsophageal NeoplasmsdbDEMC, HMDD
18hsa-mir-155Prostatic NeoplasmsdbDEMC
19hsa-mir-1Pancreatic NeoplasmsdbDEMC, HMDD
20hsa-mir-10bMelanomadbDEMC, HMDD
21hsa-mir-200aHead and Neck NeoplasmsdbDEMC, HMDD
22hsa-mir-375Carcinoma Non-Small-Cell LungHMDD
23hsa-mir-19aGliomaHMDD
24hsa-mir-15aGastric NeoplasmsdbDEMC, HMDD
25hsa-mir-142Breast NeoplasmsdbDEMC, HMDD
26hsa-mir-221Carcinoma Squamous CelldbDEMC, HMDD
27hsa-mir-92aMelanomadbDEMC, HMDD
28hsa-mir-34aAdenocarcinomaHMDD
29hsa-mir-196aColon NeoplasmsdbDEMC, HMDD
30hsa-mir-31OsteosarcomaNA
31hsa-mir-7Ovarian NeoplasmsdbDEMC, HMDD
32hsa-mir-203Gastric NeoplasmsdbDEMC, HMDD
33hsa-mir-9Carcinoma HepatocellulardbDEMC, HMDD
34hsa-mir-141AdenocarcinomaHMDD
35hsa-mir-205Gastric NeoplasmsdbDEMC, HMDD
36hsa-mir-200cUterine Cervical NeoplasmsdbDEMC
37hsa-mir-146bCarcinoma Squamous CelldbDEMC, HMDD
38hsa-mir-137MedulloblastomaNA
39hsa-mir-19bGliomaHMDD
40hsa-mir-106bColorectal NeoplasmsdbDEMC, HMDD
41hsa-mir-183Carcinoma Squamous CelldbDEMC, HMDD
42hsa-mir-218Endometrial NeoplasmsHMDD
43hsa-mir-30cMelanomadbDEMC
44hsa-mir-34cLeukemia Myeloid AcutedbDEMC, HMDD
45hsa-mir-21Leukemia Myeloid AcutedbDEMC, HMDD
46hsa-mir-210Gastric NeoplasmsdbDEMC, HMDD
47hsa-mir-15aCarcinoma Renal CellHMDD
48hsa-mir-183Colon NeoplasmsdbDEMC, HMDD
49hsa-mir-192Carcinoma Renal CellHMDD
50hsa-mir-125bAdenocarcinomaHMDD

NA indicates not available.

Table 6

The top 50 predicted miRNA-disease associations by MKGAT

RankingmiRNADiseaseEvidence
1hsa-mir-143Carcinoma HepatocellulardbDEMC, HMDD
2hsa-mir-125aColorectal NeoplasmsdbDEMC, HMDD
3hsa-mir-137GliomaHMDD
4hsa-mir-92aGastric NeoplasmsdbDEMC, HMDD
5hsa-mir-92aGliomaHMDD
6hsa-mir-155Multiple MyelomaHMDD
7hsa-mir-34bCarcinoma HepatocellulardbDEMC, HMDD
8hsa-mir-16Leukemia Myeloid AcutedbDEMC, HMDD
9hsa-mir-29bGastric NeoplasmsdbDEMC, HMDD
10hsa-mir-125blymphomadbDEMC, HMDD
11hsa-mir-15aColorectal NeoplasmsdbDEMC, HMDD
12hsa-mir-142Esophageal NeoplasmsdbDEMC, HMDD
13hsa-mir-17Carcinoma Squamous CellHMDD
14hsa-mir-223Colorectal NeoplasmsdbDEMC, HMDD
15hsa-mir-1Esophageal NeoplasmsdbDEMC, HMDD
16hsa-mir-181bMelanomadbDEMC
17hsa-mir-200bEsophageal NeoplasmsdbDEMC, HMDD
18hsa-mir-155Prostatic NeoplasmsdbDEMC
19hsa-mir-1Pancreatic NeoplasmsdbDEMC, HMDD
20hsa-mir-10bMelanomadbDEMC, HMDD
21hsa-mir-200aHead and Neck NeoplasmsdbDEMC, HMDD
22hsa-mir-375Carcinoma Non-Small-Cell LungHMDD
23hsa-mir-19aGliomaHMDD
24hsa-mir-15aGastric NeoplasmsdbDEMC, HMDD
25hsa-mir-142Breast NeoplasmsdbDEMC, HMDD
26hsa-mir-221Carcinoma Squamous CelldbDEMC, HMDD
27hsa-mir-92aMelanomadbDEMC, HMDD
28hsa-mir-34aAdenocarcinomaHMDD
29hsa-mir-196aColon NeoplasmsdbDEMC, HMDD
30hsa-mir-31OsteosarcomaNA
31hsa-mir-7Ovarian NeoplasmsdbDEMC, HMDD
32hsa-mir-203Gastric NeoplasmsdbDEMC, HMDD
33hsa-mir-9Carcinoma HepatocellulardbDEMC, HMDD
34hsa-mir-141AdenocarcinomaHMDD
35hsa-mir-205Gastric NeoplasmsdbDEMC, HMDD
36hsa-mir-200cUterine Cervical NeoplasmsdbDEMC
37hsa-mir-146bCarcinoma Squamous CelldbDEMC, HMDD
38hsa-mir-137MedulloblastomaNA
39hsa-mir-19bGliomaHMDD
40hsa-mir-106bColorectal NeoplasmsdbDEMC, HMDD
41hsa-mir-183Carcinoma Squamous CelldbDEMC, HMDD
42hsa-mir-218Endometrial NeoplasmsHMDD
43hsa-mir-30cMelanomadbDEMC
44hsa-mir-34cLeukemia Myeloid AcutedbDEMC, HMDD
45hsa-mir-21Leukemia Myeloid AcutedbDEMC, HMDD
46hsa-mir-210Gastric NeoplasmsdbDEMC, HMDD
47hsa-mir-15aCarcinoma Renal CellHMDD
48hsa-mir-183Colon NeoplasmsdbDEMC, HMDD
49hsa-mir-192Carcinoma Renal CellHMDD
50hsa-mir-125bAdenocarcinomaHMDD
RankingmiRNADiseaseEvidence
1hsa-mir-143Carcinoma HepatocellulardbDEMC, HMDD
2hsa-mir-125aColorectal NeoplasmsdbDEMC, HMDD
3hsa-mir-137GliomaHMDD
4hsa-mir-92aGastric NeoplasmsdbDEMC, HMDD
5hsa-mir-92aGliomaHMDD
6hsa-mir-155Multiple MyelomaHMDD
7hsa-mir-34bCarcinoma HepatocellulardbDEMC, HMDD
8hsa-mir-16Leukemia Myeloid AcutedbDEMC, HMDD
9hsa-mir-29bGastric NeoplasmsdbDEMC, HMDD
10hsa-mir-125blymphomadbDEMC, HMDD
11hsa-mir-15aColorectal NeoplasmsdbDEMC, HMDD
12hsa-mir-142Esophageal NeoplasmsdbDEMC, HMDD
13hsa-mir-17Carcinoma Squamous CellHMDD
14hsa-mir-223Colorectal NeoplasmsdbDEMC, HMDD
15hsa-mir-1Esophageal NeoplasmsdbDEMC, HMDD
16hsa-mir-181bMelanomadbDEMC
17hsa-mir-200bEsophageal NeoplasmsdbDEMC, HMDD
18hsa-mir-155Prostatic NeoplasmsdbDEMC
19hsa-mir-1Pancreatic NeoplasmsdbDEMC, HMDD
20hsa-mir-10bMelanomadbDEMC, HMDD
21hsa-mir-200aHead and Neck NeoplasmsdbDEMC, HMDD
22hsa-mir-375Carcinoma Non-Small-Cell LungHMDD
23hsa-mir-19aGliomaHMDD
24hsa-mir-15aGastric NeoplasmsdbDEMC, HMDD
25hsa-mir-142Breast NeoplasmsdbDEMC, HMDD
26hsa-mir-221Carcinoma Squamous CelldbDEMC, HMDD
27hsa-mir-92aMelanomadbDEMC, HMDD
28hsa-mir-34aAdenocarcinomaHMDD
29hsa-mir-196aColon NeoplasmsdbDEMC, HMDD
30hsa-mir-31OsteosarcomaNA
31hsa-mir-7Ovarian NeoplasmsdbDEMC, HMDD
32hsa-mir-203Gastric NeoplasmsdbDEMC, HMDD
33hsa-mir-9Carcinoma HepatocellulardbDEMC, HMDD
34hsa-mir-141AdenocarcinomaHMDD
35hsa-mir-205Gastric NeoplasmsdbDEMC, HMDD
36hsa-mir-200cUterine Cervical NeoplasmsdbDEMC
37hsa-mir-146bCarcinoma Squamous CelldbDEMC, HMDD
38hsa-mir-137MedulloblastomaNA
39hsa-mir-19bGliomaHMDD
40hsa-mir-106bColorectal NeoplasmsdbDEMC, HMDD
41hsa-mir-183Carcinoma Squamous CelldbDEMC, HMDD
42hsa-mir-218Endometrial NeoplasmsHMDD
43hsa-mir-30cMelanomadbDEMC
44hsa-mir-34cLeukemia Myeloid AcutedbDEMC, HMDD
45hsa-mir-21Leukemia Myeloid AcutedbDEMC, HMDD
46hsa-mir-210Gastric NeoplasmsdbDEMC, HMDD
47hsa-mir-15aCarcinoma Renal CellHMDD
48hsa-mir-183Colon NeoplasmsdbDEMC, HMDD
49hsa-mir-192Carcinoma Renal CellHMDD
50hsa-mir-125bAdenocarcinomaHMDD

NA indicates not available.

Conclusion

Recent advances in life sciences suggest miRNAs play critical roles in the regulation of development and physiology. miRNAs are therefore becoming an important category of biomarkers for disease diagnosis. Computational efforts to predict disease-related miRNAs are an excellent alternative to biomedical experiments. In this paper, we propose a computational framework MKGAT to discover associations between miRNAs and diseases. Comprehensive experiments including cross-validations and case studies show the effectiveness and superiority of MKGAT in revealing disease-related miRNAs.

Our framework MKGAT mainly consists of two sections. In the first section, we use an attention mechanism for feature extraction. Experiments show using this mechanism can produce more reliable information for inference. The other section is dual Laplacian regularized least squares for prediction. As a solid and semi-supervised method, our dual Laplacian regularized least squares make full use of information from both the miRNA side and disease side for prediction. Compared with the method HGANMDA [44] which used a hierarchical GAT for inference, there are three major differences between our model MKGAT and HGANMDA. First, MKGAT uses GIP to calculate the kernel matrix. Second, MKGAT applies an attention mechanism to combine the multiple kernel matrices. Third, MKGAT uses a different strategy for final predictions. Compared with supervised methods for miRNA-disease association predictions, our framework does not need negative samples for inference. As we state in the Introduction section, data of negative samples for miRNA-disease association prediction are hard to collect, and randomly selected negative samples would output less satisfactory results.

It should be noted that our method heavily depends on similarity measurement for inference. As stated in a previous study [59], integrating proper features for similarity calculation is a challenging task because of deficiencies in data availability in biomedical science. For example, diseases may not be included in the MeSH database. Moreover, as we stated in the Introduction section, parameter tuning and optimization in MKGAT is a tricky process. This is a common problem needed to be addressed in deep learning methods. Finally, miRNAs can either up- or downregulate gene expressions when function as regulators in the progression of human diseases. However, we do not distinguish the regulation patterns in this study. This would be a new direction for our future research.

Key Points
  • We propose a computational framework MKGAT for miRNA-disease association predictions, in which GATs are used for feature extraction and dual Laplacian regularized least squares are for association inference.

  • Our method MKGAT shows improvement in prediction accuracy and is superior to six state-of-the-art methods by comprehensive experiments on a benchmark dataset under 5-fold cross validations.

  • Case studies on three categories of cancers indicate that MKGAT is powerful for revealing disease-related miRNAs.

Authors’ contribution

H.C. conceived and designed this study. W.W. implemented the experiments. W.W. and H.C. analyzed the results. W.W. and H.C. wrote the manuscript. Both authors read and approved the final manuscript.

Funding

National Natural Science Foundation of China (61862026).

Conflict of Interest

None declared.

Data availability

The datasets and source codes used in this study are freely available at https://github.com/shine-lucky/MKGAT-main.

Author Biographies

Wengang Wang is a graduate student at the School of Software, East China Jiaotong University. His research interests are deep learning and bioinformatics.

Hailin Chen, PhD, is an associate professor at the School of Software, East China Jiaotong University. His research interests include data mining and bioinformatics.

References

1.

Lee
RC
,
Feinbaum
RL
,
Ambrost
V
.
The C. elegans Heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-4
.
Cell
1993
;
75
:
843
54
.

2.

Ambros
V
.
The functions of animal microRNAs
.
Nature
2004
;
431
:
350
5
.

3.

Reinhart
BJ
,
Slack
FJ
,
Basson
M
, et al.
The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans
.
Nature
2000
;
403
:
901
6
.

4.

Brennecke
J
,
Hipfner
DR
,
Stark
A
, et al.
Bantam encodes a developmentally regulated microRNA that controls cell proliferation and regulates the proapoptotic gene hid in drosophila
.
Cell
2003
;
113
:
25
36
.

5.

Cui
Q
,
Yu
Z
,
Purisima
EO
, et al.
Principles of microRNA regulation of a human cellular signaling network
.
Mol Syst Biol
2006
;
2
:
46
.

6.

An
Y
,
Gao
S
,
Zhao
W
, et al.
Novel serum microRNAs panel on the diagnostic and prognostic implications of hepatocellular carcinoma
.
World J Gastroenterol
2018
;
24
:
2596
604
.

7.

Yang
Z
,
Wu
L
,
Wang
A
, et al.
dbDEMC 2.0: updated database of differentially expressed miRNAs in human cancers
.
Nucleic Acids Res
2017
;
45
:
D812
8
.

8.

Huang
Z
,
Shi
J
,
Gao
Y
, et al.
HMDD v3.0: a database for experimentally supported human microRNA–disease associations
.
Nucleic Acids Res
2019
;
47
:
D1013
7
.

9.

Jiang
Q
,
Wang
Y
,
Hao
Y
, et al.
miR2Disease: a manually curated database for microRNA deregulation in human disease
.
Nucleic Acids Res
2009
;
37
:
D98
104
.

10.

Huang
Z
,
Liu
L
,
Gao
Y
, et al.
Benchmark of computational methods for predicting microRNA-disease associations
.
Genome Biol
2019
;
20
:1–13.

11.

Chen
X
,
Xie
D
,
Zhao
Q
, et al.
MicroRNAs and complex diseases: from experimental results to computational models
.
Brief Bioinform
2019
;
20
:
515
39
.

12.

Gu
C
,
Liao
B
,
Li
X
, et al.
Network consistency projection for human miRNA-disease associations inference
.
Sci Rep
2016
;
6
:
36054
.

13.

Altshuler
D
,
Daly
M
,
Kruglyak
L
.
Guilt by association
.
Nat Genet
2000
;
26
:
135
7
.

14.

Chen
X
,
Qu
J
,
Yin
J
.
TLHNMDA: triple layer heterogeneous network based inference for MiRNA-disease association prediction
.
Front Genet
2018
;
9
:234.

15.

Wang
W
,
Yang
S
,
Li
J
.
Drug target predictions based on heterogeneous graph inference
.
Pac Symp Biocomput
2013
;
2013
:
53
64
.

16.

Chen
X
,
Zhang
D
,
You
Z
.
A heterogeneous label propagation approach to explore the potential associations between miRNA and disease
.
J Transl Med
2018
;
16
:1–14.

17.

Zhou
T
,
Ren
J
,
Medo
M
, et al.
Bipartite network projection and personal recommendation, physical review
.
E, Statistical, nonlinear, and soft matter physics
2007
;
76
:
46115
.

18.

Chen
X
,
Xie
D
,
Wang
L
, et al.
BNPMDA: bipartite network projection for MiRNA–disease association prediction
.
Bioinformatics
2018
;
34
:
3178
86
.

19.

Chen
M
,
Zhang
Y
,
Li
A
, et al.
Bipartite heterogeneous network method based on co-neighbor for MiRNA-disease association prediction
.
Front Genet
2019
;
10
:385.

20.

Zhang
W
,
Li
Z
,
Guo
W
, et al.
A fast linear Neighborhood similarity-based network link inference method to predict MicroRNA-disease associations
.
IEEE/ACM Trans Comput Biol Bioinform
2019
;
18
:
405
15
.

21.

Li
A
,
Deng
Y
,
Tan
Y
, et al.
A novel miRNA-disease association prediction model using dual random walk with restart and space projection federated method
.
PLoS One
2021
;
16
:
e252971
.

22.

Pasquier
C
,
Gardès
J
.
Prediction of miRNA-disease associations with a vector space model
.
Sci Rep
2016
;
6
:1–10.

23.

Chen
X
,
Huang
L
.
LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction
.
PLoS Comput Biol
2017
;
13
:
e1005912
.

24.

Luo
J
,
Xiao
Q
,
Liang
C
, et al.
Predicting MicroRNA-disease associations using Kronecker regularized least squares based on heterogeneous omics data
.
IEEE Access
2017
;
5
:
2503
13
.

25.

Chen
X
,
Wu
Q
,
Yan
G
.
RKNNMDA: ranking-based KNN for MiRNA-disease association prediction
.
RNA Biol
2017
;
14
:
952
62
.

26.

Chen
X
,
Wang
L
,
Qu
J
, et al.
Predicting miRNA–disease association based on inductive matrix completion
.
Bioinformatics
2018
;
34
:4256–4265.

27.

Chen
X
,
Wang
C
,
Yin
J
, et al.
Novel human miRNA-disease association inference based on random Forest
.
Molecular Therapy-Nucleic Acids
2018
;
13
:
568
79
.

28.

Xuan
P
,
Li
L
,
Zhang
T
, et al.
Prediction of disease-related microRNAs through integrating attributes of microRNA nodes and multiple kinds of connecting edges
.
Molecules
2019
;
24
:3099.

29.

Chen
X
,
Zhu
C
,
Yin
J
.
Ensemble of decision tree reveals potential miRNA-disease associations
.
PLoS Comput Biol
2019
;
15
:
e1007209
.

30.

Chen
H
,
Zhang
Z
,
Feng
D
.
Prediction and interpretation of miRNA-disease associations based on miRNA target genes using canonical correlation analysis
.
BMC Bioinformatics
2019
;
20
:1–8.

31.

Chen
X
,
Li
SX
,
Yin
J
, et al.
Potential miRNA-disease association prediction based on kernelized Bayesian matrix factorization
.
Genomics
2020
;
112
:
809
19
.

32.

Ji
B
,
You
Z
,
Cheng
L
, et al.
Predicting miRNA-disease association from heterogeneous information network with GraRep embedding model
.
Sci Rep
2020
;
10
:1–12.

33.

Wang
YT
,
Wu
QW
,
Gao
Z
, et al.
MiRNA-disease association prediction via hypergraph learning based on high-dimensionality features
.
BMC Med Inform Decis Mak
2021
;
21
:
133
.

34.

LeCun
Y
,
Bengio
Y
,
Hinton
G
.
Deep learning
.
Nature
2015
;
521
:
436
44
.

35.

Peng
J
,
Hui
W
,
Li
Q
, et al.
A learning-based framework for miRNA-disease association identification using neural networks
.
Bioinformatics
2019
;
35
:
4364
71
.

36.

Li
J
,
Zhang
S
,
Liu
T
, et al.
Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction
.
Bioinformatics
2020
;
36
:
2538
46
.

37.

Tang
X
,
Luo
J
,
Shen
C
, et al.
Multi-view multichannel attention graph convolutional network for miRNA–disease association prediction
.
Brief Bioinform
2021
;
22
:bbab174.

38.

Ding
Y
,
Tian
L
,
Lei
X
, et al.
Variational graph auto-encoders for miRNA-disease association prediction
.
Methods
2021
;
192
:
25
34
.

39.

Wang
C
,
Li
T
,
Huang
L
, et al.
Prediction of potential miRNA–disease associations based on stacked autoencoder
.
Brief Bioinform
2022
;
23
:bbac021.

40.

Ding
Y
,
Lei
X
,
Liao
B
, et al.
Predicting miRNA-disease associations based on multi-view Variational graph auto-encoder with matrix factorization
.
IEEE J Biomed Health Inform
2022
;
26
:
446
57
.

41.

Xuan
P
,
Wang
D
,
Cui
H
, et al.
Integration of pairwise neighbor topologies and miRNA family and cluster attributes for miRNA-disease association prediction
.
Brief Bioinform
2022
;
23
:bbab428.

42.

Jin
C
,
Shi
Z
,
Lin
K
, et al.
Predicting miRNA-disease association based on neural inductive matrix completion with graph autoencoders and self-attention mechanism
.
Biomolecules
2022
;
12
:
64
.

43.

Liu
W
,
Lin
H
,
Huang
L
, et al.
Identification of miRNA–disease associations via deep forest ensemble learning based on autoencoder
.
Brief Bioinform
2022
;
23
:bbac104.

44.

Li
Z
,
Zhong
T
,
Huang
D
, et al.
Hierarchical graph attention network for miRNA-disease association prediction
.
Mol Ther
2022
;
30
:
1775
86
.

45.

Veličković
P
,
Cucurull
G
,
Casanova
A
, et al.
Graph attention networks
. arXiv preprint arXiv:1710.10903 2017.

46.

Li
Y
,
Qiu
C
,
Tu
J
, et al.
HMDD v2.0: a database for experimentally supported human microRNA and disease associations
.
Nucleic Acids Res
2013
;
42
:
D1070
4
.

47.

Wang
D
,
Wang
J
,
Lu
M
, et al.
Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases
.
Bioinformatics
2010
;
26
:
1644
50
.

48.

van
Laarhoven
T
,
Nabuurs
SB
,
Marchiori
E
.
Gaussian interaction profile kernels for predicting drug–target interaction
.
Bioinformatics
2011
;
27
:
3036
43
.

49.

Li
Z
,
Li
J
,
Nie
R
, et al.
A graph auto-encoder model for miRNA-disease associations prediction
.
Brief Bioinform
2021
;
22
:bbaa240.

50.

Belkin
M
,
Niyogi
P
,
Sindhwani
V
.
Manifold regularization: a geometric framework for learning from Labeled and Unlabeled examples
.
J Mach Learn Res
2006
;
7
:
2399
434
.

51.

Kingma
DP
,
Ba
JL
.
Adam: a method for stochastic optimization
. arXiv preprint arXiv:1412.6980 2014.

52.

Yu
Z
,
Huang
F
,
Zhao
X
, et al.
Predicting drug–disease associations through layer attention graph convolutional network
.
Brief Bioinform
2021
;
22
:bbaa243.

53.

Ahmed
M
.
Colon cancer: a Clinician’s perspective in 2019
.
Gastroenterology Res
2020
;
13
:
1
10
.

54.

Mattiuzzi
C
,
Sanchis-Gomar
F
,
Lippi
G
.
Concise update on colorectal cancer epidemiology
.
Ann Transl Med
2019
;
7
:
609
.

55.

Vogel
VG
,
McPherson
RS
.
Dietary epidemiology of colon cancer
.
Hematol Oncol Clin North Am
1989
;
3
:
35
63
.

56.

Nakajima
G
,
Hayashi
K
,
Xi
Y
, et al.
Non-coding MicroRNAs hsa-let-7g and hsa-miR-181b are associated with Chemoresponse to S-1 in colon cancer
.
Cancer Genom Proteom
2006
;
3
:
317
24
.

57.

Ogata-Kawata
H
,
Izumiya
M
,
Kurioka
D
, et al.
Circulating exosomal microRNAs as biomarkers of colon cancer
.
PLoS One
2014
;
9
:
e92921
.

58.

Chen
LG
,
Xia
YJ
,
Cui
Y
.
Upregulation of miR-101 enhances the cytotoxic effect of anticancer drugs through inhibition of colon cancer cell proliferation
.
Oncol Rep
2017
;
38
:
100
8
.

59.

Chen
H
,
Guo
R
,
Li
G
, et al.
Comparative analysis of similarity measurements in miRNAs with applications to miRNA-disease association predictions
.
BMC Bioinform
2020
;
21
:1–14.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)