Abstract

MicroRNAs (miRNAs) are a family of non-coding RNA molecules with vital roles in regulating gene expression. Although researchers have recognized the importance of miRNAs in the development of human diseases, it is very resource-consuming to use experimental methods for identifying which dysregulated miRNA is associated with a specific disease. To reduce the cost of human effort, a growing body of studies has leveraged computational methods for predicting the potential miRNA–disease associations. However, the extant computational methods usually ignore the crucial mediating role of genes and suffer from the data sparsity problem. To address this limitation, we introduce the multi-task learning technique and develop a new model called MTLMDA (Multi-Task Learning model for predicting potential MicroRNA-Disease Associations). Different from existing models that only learn from the miRNA–disease network, our MTLMDA model exploits both miRNA–disease and gene–disease networks for improving the identification of miRNA–disease associations. To evaluate model performance, we compare our model with competitive baselines on a real-world dataset of experimentally supported miRNA–disease associations. Empirical results show that our model performs best using various performance metrics. We also examine the effectiveness of model components via ablation study and further showcase the predictive power of our model for six types of common cancers. The data and source code are available from https://github.com/qwslle/MTLMDA.

INTRODUCTION

MicroRNAs (miRNAs) are a family of non-coding RNA molecules in animal species [1]. The first miRNA (called lin-4) with a molecule of |$22$| ribonucleotides long was discovered in |$1993$| [2]. Although miRNAs are tiny, they play a huge role in biological processes by regulating different genes. These tiny molecules are 21–25 nucleotides in length and are negative regulators of gene expression. At present, a large body of scientific data has confirmed the significance of this tiny molecule in animal cell death and proliferation, hematopoiesis and nervous system patterns [3, 4]. Therefore, abnormal expression of human miRNAs may lead to various serious diseases.

The nature of cancer pathogenesis is the result of genomic dysfunction, and the molecular characterization of miRNAs in different malignancies suggests that they are not only actively involved in the pathogenesis of human cancers, but also play an important role in the recovery process of patients [5]. Furthermore, according to statistics, there are about 200 miRNAs to be significantly dysregulated in various cancer malignancies [6]. For example, vitro experiments found that the down-regulation of let-7 family in humans can up-regulate Rat sarcoma (RAS) proteins, thus leading to lung cancer. Among the differential expression of miRNAs, for breast cancer, miR-10b, miR-125b, and miR-145 were down-regulated while miR-21 and miR-155 were up-regulated, suggesting that they may function as tumor suppressor genes or oncogenes [7, 8]. Overall, the above malignant diseases are mainly formed under the regulation of miRNAs through corresponding genes.

It is well known that Coronavirus disease 2019 (COVID-19) is a contagious disease caused by a type of viruses, which quickly spreads worldwide and results in the COVID-19 pandemic. Up to date, its infection is considered as one of the leading causes of human death. In addition to vaccines already on the market, miRNAs may also be promising options against the new virus. miRNAs can inhibit viral translation after miRNAs attach to the 3-UTR of the viral genome and can also target receptors, structural or nonstructural proteins of severe acute respiratory syndrome coronavirus 2 without affecting human gene expression [9]. MiR-618 is reported to be expressed 1.62 times higher in COVID-19 patients than in healthy people, and it is associated with down-regulation of the immune system [10]. Therefore, miR-618 could be a promising target for the treatment of COVID-19 patients. That is to say, a deep understanding of the potential relationship between miRNAs and diseases is of great significance to human life and health.

Traditional wet experimental methods such as PCR [11], microarray profiling [12] and northern blotting [13] require a lot of time and economic costs in the process of exploring and understanding miRNAs. Besides, benefiting from the rapid development of computer technology, during the gradual establishment and continuous improvement of a large number of relevant bioinformatics databases, the use of computational methods to explore the relationship between miRNAs and diseases has become an essential technical means.

Towards this line of research, over the past few decades, many computational methods have been proposed to investigate the miRNA–disease relationship. Overall, existing research on predicting the relationship between relevant miRNAs and diseases can be mainly divided into two categories: network-based methods and machine learning (ML)-based methods. For Network-based methods, disease similarity, miRNA similarity and miRNA–disease association usually are used to predict potential relationships between diseases and miRNAs [14–17]. For ML-based methods, researchers utilize different ML methods to construct prediction models for the identification of potential miRNA–disease associations[18–21].

However, existing solutions to predict the miRNA–disease relationship still suffer from the following two challenges: (1) Despite the continuous progress of science and technology, the relationship between miRNAs and diseases that have been discovered is still infancy and sparse, which might be insufficient to train an accurate model. (2) Disease–disease similarity and miRNA-miRNA similarity are the key component in most miRNA–disease association prediction models. However, predictive models based on these similarities are not robust enough, and only directly calculating the similarity between miRNAs and diseases fails to capture deep interaction patterns between miRNAs and diseases.

To tackle the above two challenges, in this paper, we propose a novel approach to examine the connections between miRNAs and diseases by introducing the concept of multi-task learning. We have developed a multi-task learning framework that utilizes the relationships between miRNA–disease pairs to construct a disease-gene network. It is worth noting that the two tasks are not independent of each other, but instead are highly correlated due to their shared disease nodes. Consequently, disease nodes are more likely to have a similar proximity structure in the two networks, and to share similar features in non-task-specific latent feature spaces [22]. By leveraging the knowledge contained in different tasks and sharing it with each other, our approach effectively improves the generalization performance of all tasks while also mitigating the issue of data sparsity [23]. The contributions of our work can be summarized in three aspects.

  • i.

    We propose an effective Multi-Task Learning for Predicting potential miRNA–disease Associations (MTLMDA), which is an end-to-end trainable graph neural network model using GCN-based autoencoder and linear decoder. In the MTLMDA, two sub-networks are constructed by miRNA–disease and gene–disease and are bridged by specially designed cross&compress units. The whole network uses the information learned in the gene–disease sub-network to assist the sub-network of miRNA–disease.

  • ii.

    To the best of our knowledge, this is the first work to simultaneously consider miRNAs, Genes and diseases in a multi-task learning framework. Through the auxiliary information of the relationship between genes and diseases, we can dig out more relevant information, which can help obtain a more in-depth understanding on the relationships between miRNA and diseases.

  • iii.

    We verify the effectiveness of our approach via experiments on HMDD v2.0 and HMDD V3.2 dataset. Experimental results demonstrate that our model performs better than state-of-the-art approaches, i.e. our model can predict the miRNA–disease relationship effectively and accurately.

RELATED WORK

Our study is mainly related with the following two topics: network-based methods and machine learning-based methods to predict the relationship between miRNAs and diseases.

Network-based methods

In network-based methods, the computation of various similarities is a key component. Most network-based methods are based on the hypothesis that miRNAs with similar functions are more likely to be associated with similar diseases and vice versa. The hypothesis was first confirmed by Lu et al. [14] through their designed experiments. The research established a theoretical basis for the study of miRNA–disease associations. Moreover, Gu et al. [15] proposed a computational approach to infer disease-related miRNAs based on the theory, and predict miRNA–disease associations in all diseases but does not require negative samples. Chen et al. [24] adopted a global network similarity measure which was different from traditional local network similarity measures and developed an algorithm using random walks with restart to identify miRNA–disease associations. However, it ignores disease phenotypic similarity information, and does not work well when there is no connections between miRNAs and diseases. Xuan et al. [16] addressed the shortcomings of those methods that did not consider the prior information of the local topology of nodes in their network, and built a miRNA network based on the functional similarity of miRNAs.

For disease nodes with known related miRNAs, it can be divided into two types: labeled (i.e. having an association) and unlabeled (i.e. having no association or unknown relationship). Disease similarities were obtained by extending the walk on the miRNA–disease bipartite network for those unlabeled links. The prediction process was then modeled as a random walk on the miRNA network from miRNA-related diseases. For example, Chen et al. [25] developed a model to predict potential miRNAs associated with various complex diseases, named WBSMDA. By integrating disease and miRNA similarities, WBSMDA can be applied to diseases without any known associated miRNAs. However, Wang et al. [26] pointed out that prediction results based on model similarity was not robust enough. Li et al. [27] proposed a label propagation model with linear neighborhood similarity for undiscovered disease-miRNA association prediction by transforming the disease and miRNA similarity into linear domain similarity. Known miRNA–disease associations were used as input for linear propagation, and miRNA–disease associations were scored. Compared with previous models, Chen et al. [17] proposed a new model (i.e. IMCMDA), which used the known associations and the integrated miRNA and disease similarity to discover potential disease-related miRNAs based on matrix completion with network regularization. Wang et al. [28] further developed an effective computational model, named HFHLMDA, that used high-dimensional features and hypergraph learning to predict the relationship between miRNA and disease. Then, Ha et al. [29] further developed an effective computational model, named NCMD, which used node2vec-based neural collaborative filtering for predicting miRNA–disease associations.

With the advancement of network-based methods, Alaimo [30] first proposed a method called ncPred based on a triple network to infer novel ncrna-disease combinations. And later, Yu et al. [31] proposed a three-layer heterogeneous network combined with unbalanced random walk to predict miRNA–disease associations. This method combined lncRNAs, miRNAs and diseases into a heterogeneous network, and leveraged lncRNA as transition information to predict the potential relationship between miRNA and diseases. However, network-based methods ignore the rich structural information contained in the network and cannot fully represent the deep relationships in the miRNA–disease network.

Machine learning-based methods

Machine learning are also widely used in the prediction of miRNA and disease associations. Among those studies, Xu et al. [18] first proposed to rank prostate cancer miRNAs by training the support vector machine (SVM). Chen et al. [32] proposed a model that could not only predict potential miRNA–diseases but also infer the types of miRNA–disease combinations. However, this approach is only suitable for inference of diseases with known miRNA–disease association information. Therefore, Chen et al. [19] developed a random forest-based computational model (i.e. RFMDA) to predict miRNA–disease associations, which could predict unknown miRNA–disease associations through the score labels obtained after implementing random forests. Inspired by the RFMDA model, Yao et al. [33] proposed an improved random forest model to predict potential miRNA–disease combinations. Feature selection through random forest variable importance score can reduce noise information to a greater extent to improve the predictive power of the model. Subsequently, Zheng et al. [34] proposed a new model (MLMDA) to predict miRNA–disease combinations, which further extracted the integrated features through the deep auto-encoder neural network, and finally used the random forest classifier to make predictions. Ji et al. [35] proposed a deep autoencoder model based on the deep learning algorithm, which learned potential miRNA–disease associations from known relation between miRNAs and diseases with an end-to-end manner. Furthermore, Liu et al. [36] developed a framework called SMALF for miRNA and disease association prediction. SMALF used the stacked autoencoder to integrate the node features of the network, and then sent it to XGBoost for miRNA–disease prediction.

Recently, in view of the great capability of learning high-order relationships, some work also adopts graph representation learning to capture the relationship between miRNAs and diseases. For example, Li et al. [20] proposed a graph convolutional network model (i.e. NIMCGCN). However, the performance of NIMCGCN may be limited by the insufficient number of feature representation learning methods. Inspired by the significant promotion of the GraphSAGE algorithm proposed by Hamilton et al. [37], Li et al. [38] proposed a novel graph auto-encoder model, GAEMDA, which gully leveraged the algorithm of GraphSAGE to fully aggregate the miRNA–disease heterogeneous graph information in an end-to-end manner. Wang et al. [39] proposed the NMCMDA model based on the graph neural network. The NMCMDA adopted the graph convolutional autoencoder to calculate the miRNA and disease latent feature representations, and then to predict multiple-category miRNA–disease associations.

A recent study proposed a novel approach called PDMDA (predicting deep-level miRNA–disease associations with graph neural networks and sequence features), which was proposed by Yan et al. [21]. PDMDA use GNN to extract disease feature representation from disease-gene association and PPI network. But, it is still possible to improve the overall prediction performance by considering some biological information. What’s more, Lou et al. [40] proposes a new prediction method named MINIMDA, which learns the embedding representation of miRNA and disease from a multimodal networks which are integrating multiple biological information and then to predict the relationship between miRNA and disease.

However, the aforementioned prediction models for miRNA and disease associations mainly regard the similarity as an important component and do not fully consider the sparsity of identified miRNA–disease relationships, which might lead to inaccurate predictions. In this view, we design a multi-tasking learning model (i.e. MTLMDA) to effectively predict the potential relationship between miRNAs and diseases.

MATERIALS AND METHODS

In this section, we mainly introduce the preparation work required to construct the MTLMDA model as well as the overall framework of the model. In order to improve the readability of the following content, Table 1 summarizes the main notations used in our study.

Table 1

Summary of main notations

VariableDescription
|$MD$|miRNA–disease relationship matrix
|$n_{d}$|Number of disease nodes in the HMDD V2.0 -383
|$n_{m}$|Number of miRNA nodes in the HMDD V2.0 -495
|$GD$|Gene–disease relationship matrix
|$n_{g}$|Number of gene nodes in the MTLMDA-4395
|$K_{GIP,m}$|MiRNA Gaussian similarity matrix
|$K_{GIP,d}$|Disease Gaussian similarity matrix
|$Y_{m_{i}}$|The binary row vector in matrix MD
|$Y_{d_{i}}$|The binary column vector in matrix MD
|$r_{m},r_{d}$|The bandwidth of the kernel
|$M(i)$|The |$i$|-th miRNA node feature representation
|$D_{1}(\,j)$|The |$j$|-th disease node feature representation
|$G(i)$|The |$i$|-th gene node feature representation
|$D_{2}(\,j)$|The |$j$|-th diseasenode feature representation
|$H_{m}(i),H_{d1}(i)$|Projection features of miRNA and disease node
|$H_{g}(i),H_{d2}(i)$|Projection features of gene and disease node
|$W$|Different weight matrices
|$H_{aux-m},H_{aux-d1}$|Gene–disease network’s auxiliary information
|$H_{aux-g},H_{aux-d2}$|Mirna–disease network’s auxiliary information
|$H_{M}$|Integrated feature representation of miRNA nodes
|$H_{D1}$|Disease’s integrated representation in miRNA–disease network
|$H_{G}$|Integrated feature representation of gene nodes
|$H_{D2}$|Disease’s integrated representation in gene–disease network
|$F_{m}$|MiRNA node’s final representation in miRNA–disease network
|$F_{d1}$|Disease node’s final representation in miRNA–disease network
|$F_{g}$|Gene node’s final representation in gene–disease network
|$F_{d2}$|Disease node’s final representation in gene–disease network
|$\hat{y}_{md}$|Predicted association probability of miRNA and disease nodes
|$\hat{y}_{gd}$|Predicted association probability of gene and disease nodes
|$LOSS{m-d}$|The loss in the miRNA–disease sub-network
|$LOSS{g-d}$|The loss in the gene–disease sub-network
|$LOSS$|The loss of the entire model
VariableDescription
|$MD$|miRNA–disease relationship matrix
|$n_{d}$|Number of disease nodes in the HMDD V2.0 -383
|$n_{m}$|Number of miRNA nodes in the HMDD V2.0 -495
|$GD$|Gene–disease relationship matrix
|$n_{g}$|Number of gene nodes in the MTLMDA-4395
|$K_{GIP,m}$|MiRNA Gaussian similarity matrix
|$K_{GIP,d}$|Disease Gaussian similarity matrix
|$Y_{m_{i}}$|The binary row vector in matrix MD
|$Y_{d_{i}}$|The binary column vector in matrix MD
|$r_{m},r_{d}$|The bandwidth of the kernel
|$M(i)$|The |$i$|-th miRNA node feature representation
|$D_{1}(\,j)$|The |$j$|-th disease node feature representation
|$G(i)$|The |$i$|-th gene node feature representation
|$D_{2}(\,j)$|The |$j$|-th diseasenode feature representation
|$H_{m}(i),H_{d1}(i)$|Projection features of miRNA and disease node
|$H_{g}(i),H_{d2}(i)$|Projection features of gene and disease node
|$W$|Different weight matrices
|$H_{aux-m},H_{aux-d1}$|Gene–disease network’s auxiliary information
|$H_{aux-g},H_{aux-d2}$|Mirna–disease network’s auxiliary information
|$H_{M}$|Integrated feature representation of miRNA nodes
|$H_{D1}$|Disease’s integrated representation in miRNA–disease network
|$H_{G}$|Integrated feature representation of gene nodes
|$H_{D2}$|Disease’s integrated representation in gene–disease network
|$F_{m}$|MiRNA node’s final representation in miRNA–disease network
|$F_{d1}$|Disease node’s final representation in miRNA–disease network
|$F_{g}$|Gene node’s final representation in gene–disease network
|$F_{d2}$|Disease node’s final representation in gene–disease network
|$\hat{y}_{md}$|Predicted association probability of miRNA and disease nodes
|$\hat{y}_{gd}$|Predicted association probability of gene and disease nodes
|$LOSS{m-d}$|The loss in the miRNA–disease sub-network
|$LOSS{g-d}$|The loss in the gene–disease sub-network
|$LOSS$|The loss of the entire model
Table 1

Summary of main notations

VariableDescription
|$MD$|miRNA–disease relationship matrix
|$n_{d}$|Number of disease nodes in the HMDD V2.0 -383
|$n_{m}$|Number of miRNA nodes in the HMDD V2.0 -495
|$GD$|Gene–disease relationship matrix
|$n_{g}$|Number of gene nodes in the MTLMDA-4395
|$K_{GIP,m}$|MiRNA Gaussian similarity matrix
|$K_{GIP,d}$|Disease Gaussian similarity matrix
|$Y_{m_{i}}$|The binary row vector in matrix MD
|$Y_{d_{i}}$|The binary column vector in matrix MD
|$r_{m},r_{d}$|The bandwidth of the kernel
|$M(i)$|The |$i$|-th miRNA node feature representation
|$D_{1}(\,j)$|The |$j$|-th disease node feature representation
|$G(i)$|The |$i$|-th gene node feature representation
|$D_{2}(\,j)$|The |$j$|-th diseasenode feature representation
|$H_{m}(i),H_{d1}(i)$|Projection features of miRNA and disease node
|$H_{g}(i),H_{d2}(i)$|Projection features of gene and disease node
|$W$|Different weight matrices
|$H_{aux-m},H_{aux-d1}$|Gene–disease network’s auxiliary information
|$H_{aux-g},H_{aux-d2}$|Mirna–disease network’s auxiliary information
|$H_{M}$|Integrated feature representation of miRNA nodes
|$H_{D1}$|Disease’s integrated representation in miRNA–disease network
|$H_{G}$|Integrated feature representation of gene nodes
|$H_{D2}$|Disease’s integrated representation in gene–disease network
|$F_{m}$|MiRNA node’s final representation in miRNA–disease network
|$F_{d1}$|Disease node’s final representation in miRNA–disease network
|$F_{g}$|Gene node’s final representation in gene–disease network
|$F_{d2}$|Disease node’s final representation in gene–disease network
|$\hat{y}_{md}$|Predicted association probability of miRNA and disease nodes
|$\hat{y}_{gd}$|Predicted association probability of gene and disease nodes
|$LOSS{m-d}$|The loss in the miRNA–disease sub-network
|$LOSS{g-d}$|The loss in the gene–disease sub-network
|$LOSS$|The loss of the entire model
VariableDescription
|$MD$|miRNA–disease relationship matrix
|$n_{d}$|Number of disease nodes in the HMDD V2.0 -383
|$n_{m}$|Number of miRNA nodes in the HMDD V2.0 -495
|$GD$|Gene–disease relationship matrix
|$n_{g}$|Number of gene nodes in the MTLMDA-4395
|$K_{GIP,m}$|MiRNA Gaussian similarity matrix
|$K_{GIP,d}$|Disease Gaussian similarity matrix
|$Y_{m_{i}}$|The binary row vector in matrix MD
|$Y_{d_{i}}$|The binary column vector in matrix MD
|$r_{m},r_{d}$|The bandwidth of the kernel
|$M(i)$|The |$i$|-th miRNA node feature representation
|$D_{1}(\,j)$|The |$j$|-th disease node feature representation
|$G(i)$|The |$i$|-th gene node feature representation
|$D_{2}(\,j)$|The |$j$|-th diseasenode feature representation
|$H_{m}(i),H_{d1}(i)$|Projection features of miRNA and disease node
|$H_{g}(i),H_{d2}(i)$|Projection features of gene and disease node
|$W$|Different weight matrices
|$H_{aux-m},H_{aux-d1}$|Gene–disease network’s auxiliary information
|$H_{aux-g},H_{aux-d2}$|Mirna–disease network’s auxiliary information
|$H_{M}$|Integrated feature representation of miRNA nodes
|$H_{D1}$|Disease’s integrated representation in miRNA–disease network
|$H_{G}$|Integrated feature representation of gene nodes
|$H_{D2}$|Disease’s integrated representation in gene–disease network
|$F_{m}$|MiRNA node’s final representation in miRNA–disease network
|$F_{d1}$|Disease node’s final representation in miRNA–disease network
|$F_{g}$|Gene node’s final representation in gene–disease network
|$F_{d2}$|Disease node’s final representation in gene–disease network
|$\hat{y}_{md}$|Predicted association probability of miRNA and disease nodes
|$\hat{y}_{gd}$|Predicted association probability of gene and disease nodes
|$LOSS{m-d}$|The loss in the miRNA–disease sub-network
|$LOSS{g-d}$|The loss in the gene–disease sub-network
|$LOSS$|The loss of the entire model

The construction of subnetworks

In this part, we firstly describe the formation process of the two sub-networks of miRNA–disease and gene–disease, and secondly we depict the way of constructing the Gaussian similarity of the sub-networks.

Human miRNA-disease associations

In our study, miRNA–disease associations is derived from the HMDD v2.0 [41] (https://www.cuilab.cn/hmdd.), which is a mature dataset and contains experimentally verified associations of |$n_{d}(383)$| diseases and |$n_{m}(495)$| miRNAs. The |$5430$| associations of miRNA–disease have been confirmed on this dataset. In the experiments, the identified relationship between diseases and miRNAs is represented as a matrix |$MD$| with |$n_{d}$| columns and |$n_{m}$| rows. The value of an entry is |$1$| if the corresponding disease is associated with the corresponding miRNA, otherwise |$0$| (meaning the relationship is unknown). The matrix |$MD$| is represented as follows:

(1)

In total, there are |$189 585$| combinations in |$\mathbf{MD}$| with |$n_{m}(495)$| rows and |$n_{d}(383)$| columns. Besides, |$m_{i}$| represents the |$i$|-th miRNA (also the |$i$|-th row in |$\mathbf{MD}$|⁠), while |$d_{j}$| represents the |$j$|-th disease (the |$j$|-th column in |$\mathbf{MD}$|⁠).

Human genes-disease associations

In our study, the data of gene–disease associations is generated by our manual filtering. Firstly, the original gene–disease relationships are from the DisGeNet [42] (http://www.disgenet.org/home/), which is a database of gene–disease associations. The file named “Curated gene–disease associations” in this database can be downloaded directly from the website and contains |$84 038$| confirmed relationships between human diseases and genes. Then, we select the same disease and related genes as the corresponding miRNA–disease sub-network to form the genes–disease sub-network. Finally, the genes–disease sub-network contains |$9286$| associations between |$n_{d}(383)$| diseases and |$n_{g}(4395)$| genes. Thus, similar to miRNA–disease sub-network, we create a matrix |$\mathbf{GD}$| with |$n_{g}$| rows and |$n_{d}$| columns. If the disease is associated with a gene, the corresponding entry value is |$1$|⁠, |$0$| otherwise. Similarly, the known |$9286$| associations of genes and diseases are used as the positive samples of genes–disease sub-network, and then the negative samples were select from the entries with value |$0$| 0 in |$\mathbf{GD}$|⁠, randomly. The matrix |$\mathbf{GD}$| is represented as follows:

(2)

where |$g_{i}$| represents the |$i$|-th gene (⁠|$i$|-th row in |$\mathbf{GD}$|⁠), and |$d_{j}$| represents the |$j$|-th disease, i.e. |$j$|-th column in |$\mathbf{GD}$|⁠.

Gaussian interaction profile kernel similarity for miRNAs and diseases in miRNA–disease subnetwork

Previous studies [43] pointed that similar diseases are often associated with functionally similar miRNAs, based on this hypothesis, Gaussian interaction profile kernel similarity can be well used to simulate the similarity between miRNAs and diseases in miRNA–disease sub-network, and thus is adopted in our study. Specifically, Gaussian interaction profile kernel similarity for miRNAs is calculated by the information of known miRNA–disease associations. Each row in the matrix |$MD$| is represented by a binary vector |$Y_{m}$| that shows the associations between a certain miRNA and various diseases in the miRNA–disease sub-network. The Gaussian interaction profile kernel similarity between miRNA |$m_{i}$| and |$m_{j}$| can be defined as follows:

(3)

where |$r_{m}$| represents the bandwidth of the kernel, which can be calculated by:

(4)

where |$n_{m}$| represents the total number of miRNAs (⁠|$495$| in this study) and |$r^{\prime}_{m}$| denotes a normalization constant and following previous studies [44], we also set it to be |$1$|⁠. Figure 1 illustrates the Gaussian similarity of some miRNAs in the miRNA–disease sub-network, which represents the potential miRNA–miRNA correlation coefficient in the miRNA–disease sub-network ranging from |$0$| to |$1$|⁠.

miRNAs Gaussian similarity in miRNA–disease sub-network (Note: ‘m125a,’ ‘m196a,’ ‘m499a,’ ‘m1229,’ ‘m944’ and ‘m518a’ represent miRNA ‘hsa-mir-125a,’ ‘hsa-mir-196a,’ ‘hsa-mir-499a,’ ‘hsa-mir-1229,’ ’hsa-mir-944’ and ‘hsa-mir-518a’, respectively).
Figure 1

miRNAs Gaussian similarity in miRNA–disease sub-network (Note: ‘m125a,’ ‘m196a,’ ‘m499a,’ ‘m1229,’ ‘m944’ and ‘m518a’ represent miRNA ‘hsa-mir-125a,’ ‘hsa-mir-196a,’ ‘hsa-mir-499a,’ ‘hsa-mir-1229,’ ’hsa-mir-944’ and ‘hsa-mir-518a’, respectively).

graphic

Similarly, we can obtain the Gaussian interaction profile kernel similarity of the diseases according to the following formula:

(5)
(6)

where |$Y_{d}$| is the binary column vector of the matrix |$MD$|⁠, representing associations between miRNAs and each disease; |$n_{d}$| represents the total number of diseases (i.e. |$383$|⁠) and the normalization constant, |$r^{\prime}_{d}$|⁠, is set to |$1$|⁠. Figure 2 visualizes the Gaussian similarity of diseases in the corresponding miRNA–disease sub-network, which represents the potential disease–disease correlation coefficient ranging from |$0$| to |$1$|⁠.

Diseases Gaussian similarity in miRNA–disease sub-network (Note: ‘A,H,’ ‘AIS,’ ‘APA,’ ‘Vit,’ ‘WaM’ and ‘WaI’ represent diseases ‘Abortion, Habitual,’ ‘Acquired Immunodeficiency Syndrome,’ ‘ACTH-Secreting Pituitary Adenoma,’ ‘Vitiligo,’ ‘Waldenstrom Macroglobulinemia’ and ‘Wounds and Injuries,’ respectively).
Figure 2

Diseases Gaussian similarity in miRNA–disease sub-network (Note: ‘A,H,’ ‘AIS,’ ‘APA,’ ‘Vit,’ ‘WaM’ and ‘WaI’ represent diseases ‘Abortion, Habitual,’ ‘Acquired Immunodeficiency Syndrome,’ ‘ACTH-Secreting Pituitary Adenoma,’ ‘Vitiligo,’ ‘Waldenstrom Macroglobulinemia’ and ‘Wounds and Injuries,’ respectively).

We take the obtained Gaussian similarity of diseases and miRNAs as initial node features of disease and miRNA in the miRNA–disease sub-network, respectively.

Gaussian interaction profile kernel similarity for genes and diseases in the gene–disease sub-network

The biological principle of guilt-by-association shows that genes associated with similar disorders have demonstrated higher probability of physical interactions between their gene products [45]. Therefore, similarly, as demonstrated in Section Gaussian Interaction Profile Kernel Similarity for miRNAs and Diseases in miRNA-Disease Subnetwork, we also adopt Gaussian interaction profile kernel similarity measurement to calculate the similarity between genes and that between diseases based on genes–disease sub-network, |$GD$|⁠. These two kinds of similarity are then used as original features of disease and gene nodes in genes–disease sub-network, respectively.

Proposed model framework

Inspired by the use of graph neural networks in bioinformatics and multi-task learning method in ML applications, in our study, we propose an effective MTLMDA model, which contains two sub-networks (i.e. miRNA–disease and gene–disease networks), a graph convolutional network encoder, and a bilinear decoder, as shown in Figure 3. Specifically, as elaborated in Section of The Construction of Subnetworks, we construct two sub-networks from the corresponding datasets and provide initial features to the nodes of the two networks through Gaussian similarity. And then, through the cross&compress units module, we use the joint characteristics of the two sub-networks as their auxiliary information, and finally leverage the encoder and decoder in an end-to-end manner to predict the probability of potential miRNA and disease associations. The whole MTLMDA model can be described by the following six steps, and Algorithm 1 summarizes the main procedure of MLTMDA.

The overall framework of model MTLMDA.
Figure 3

The overall framework of model MTLMDA.

In MTLMDA (see Figure 3 and Algorithm 1), Step I constructs miRNA–disease sub-network and gene–disease sub-network and to form a heterogeneous graph; Step II projects the corresponding disease and miRNA nodes, disease and gene nodes in each sub-network to the same vector space; Step III extracts the initial joint features of the two sub-networks as their auxiliary information; Step IV uses a graph convolutional network to get the embeddings of two sub-network nodes; Step V simultaneously feeds the node embeddings obtained in the two sub-networks into the linear decoder to reconstruct the links of two sub-networks; and Step (VI) trains the entire model by an end-to-end manner using the cross-entropy loss function of the integrated two sub-networks. In the following, we will elaborate the six steps in great details.

Step I: As introduced in Section of The Construction of Subnetworks, in miRNA–disease sub-network, there are |$495$| miRNA nodes and |$383$| disease nodes. From HMDD v.2.0, we can get |$5430$| experimentally verified miRNA–disease associations, which are treated as positive samples (with label value |$1$|⁠). As previous studies [38, 46], we randomly select miRNA–disease pairs from all the unknown miRNA–disease associations (marked as |$0$| in |$MD$|⁠) as negative samples (with label value |$0$|⁠). In addition, we introduce Gaussian similarity as the node feature of miRNA–disease sub-network. Therefore, node feature of |$i$|-th miRNA, |$M(i)$| can be expressed as a 495-dimension vector:

(7)

where |$x^{1}_{i,j}$| represents the Gaussian similarity between miRNA |$m_{i}$| and miRNA |$m_{j}$| in the miRNA–disease sub-network. Similarly, the node feature of |$i$|-th disease, |$D_{1}(i)$|⁠, can be expressed as a 383-dimension vector:

(8)

where |$z_{i,j}$| represents the Gaussian similarity between disease |$d_{i}$| and disease |$d_{j}$|⁠.

In gene–disease sub-network, we mine the associated gene–disease pairs in terms of diseases in miRNA–disease sub-network from the DisGeNet database. There are |$9286$| associations (i.e. positive samples with label value |$1$|⁠) between |$n_{d}$| diseases and |$n_{g}$| genes. We randomly select the gene–disease pairs from all the unknown disease-gene associations (marked as |$0$| in |$GD$|⁠) to form negative samples (with label value |$0$|⁠). We also deploy Gaussian similarity as initial features for gene and disease nodes, |$G(i)$| (feature vector of gene |$g_{i}$|⁠) and |$D_{2}(\,j)$| (feature vector of disease |$d_{i}$|⁠), can be expressed as follows respectively:

(9)
(10)

where |$x^{2}_{i,j}$| represents the Gaussian similarity between gene |$g_{i}$| and |$g_{j}$|⁠, and |$z^{2}_{i,j}$| represents the Gaussian similarity between disease |$d_{i}$| and disease |$d_{j}$| in the gene–disease sub-network.

Step II: In the two sub-networks, nodes possess feature vectors of varying dimensions. To streamline the calculation process in subsequent steps, we have developed a projection module that unifies disparate node features into a common vector space. Specifically, in the miRNA–disease sub-network, the projection module maps disease and miRNA node features to a uniform 1024-dimensional space via a transition matrix. The process is as follows:

(11)
(12)
(13)
(14)

where |$H_{m}(i)\in \mathbb{R}^{1024}$| and |$H_{d1}(i)\in \mathbb{R}^{1024}$| are projection features of miRNA node |$m_{i}$| and disease node |$d_{i}$| in miRNA–disease network. Likewise, |$H_{g}(i)\in \mathbb{R}^{1024}$| and |$H_{d2}(i)\in \mathbb{R}^{1024}$| are projection features of gene |$g_{i}$| and disease |$d_{i}$| in the gene–disease network. The learnable weight matrices |$\mathbf{W}_{m}\in \mathbb{R}^{495\times 1024}$|⁠, |$\mathbf{W}_{g}\in \mathbb{R}^{4395\times 1024}$| and |$\mathbf{W}_{d}\in \mathbb{R}^{383\times 1024}$| are automatically generated by calling the torch package, according to the size requirements of our designed space vector. In order to reduce redundant parameters and learning time in the experiment, here, the weight matrix |$\mathbf{W}_{d}\in \mathbb{R}^{383\times 1024}$| is used to share to complete the task of mapping the disease nodes to the latent space in the two networks.

Step III: In this step, we connect the two sub-networks through cross&compress units module and simultaneously extract the auxiliary information from both sub-networks by analyzing the MD and GD matrices. |$\mathbf{H}_{aux-m}\in \mathbb{R}^{495\times 1024}$| and |$\mathbf{H}_{aux-d1}\in \mathbb{R}^{383\times 1024}$| respectively represent the miRNA and disease nodes in the miRNA–disease network, which have obtained auxiliary information from their own network and the gene–disease network.

(15)
(16)

The weight matrices, |$\mathbf{W}_{aux-m}\in \mathbb{R}^{383\times 1024}$| and |$\mathbf{W}_{aux-d1}\in \mathbb{R}^{4395\times 1024}$|⁠, are automatically generated using the Torch package to extract auxiliary information from corresponding networks. A similar process occurred in the gene–disease network is as follows:

(17)
(18)

Ultimately, we concatenate the initial features of the nodes with the auxiliary features to form the new features of the nodes, which can be summarized as follows:

(19)
(20)
(21)
(22)

where |$\mathbf{H}_{M}\in \mathbb{R}^{495\times 2048}$| and |$\mathbf{H}_{D1}\in \mathbb{R}^{383\times 2048}$| represent the integrated feature representations of nodes in the miRNA–disease network, while |$\mathbf{H}_{G}\in \mathbb{R}^{4935\times 2048}$| and |$\mathbf{H}_{D2}\in \mathbb{R}^{383\times 2048}$| represent integrated gene and disease nodes features in gene–disease sub-network respectively.

Step IV: Here, we further obtain the representations of two sub-network nodes using information about their direct neighbors in their respective networks based on graph convolutional network (GCN) encoder. Here, we adopt the Chebyshev filter-based approach (ChebConv) as MTLMDA’s encoder in view of its great expressive power [47]. At each layer of a graph convolutional network, MTLMDA update nodes’ embeddings according to the edges in the respective sub-networks (take the miRNA–disease sub-network as an example):

(23)
(24)
(25)

where |$\mathbf{h}_{i}^{(l)}\in \mathbb{R}^{u}$| represents the hidden state of node |$i$| at the |$l$|-th layer of the GCN (⁠|$u$| is the dimension of hidden state representation), and |$\mathbf{W}$| is learnable weight. In the study, |$k$| is the Chebyshev filter size (being set to 2 here). |$\widetilde{\mathbf{A}}\in \mathbb{R}^{(n_{d}+n_{m})\times (n_{d}+n_{m})}=\mathbf{A}_{MD}+\mathbf{I}$|⁠, where |$A_{MD}$| is the adjacency matrix of the miRNA–disease sub-network. |$\widetilde{\mathbf{D}}$| is the diagonal degree matrix of |$\widetilde{\mathbf{A}}$| with |$D_{ii}=\sum _{j}a_{i,j}$| (⁠|$a_{i,j}$| denotes corresponding entry value of matrix |$\widetilde{\mathbf{A}}$|⁠). |$\lambda _{max}$| is the largest eigenvalue of |$\mathbf{L}$|⁠. With the above Cheb-GCN encoder, we get the embeddings of the nodes in the two sub-network the final, respectively.

Step V: After obtaining the node representations of the two sub-networks, we then employ a linear decoder to reconstruct the links of heterogeneous graphs in the two sub-networks. The detailed representation is as follows:

(26)
(27)

where |$\hat{y}_{md}$| represents the predicted association probability of miRNA node |$m(i)$| and disease node |$d(\,j)$| in the miRNA–disease sub-network, |$F_{m}$| and |$F_{d1}$| represent the final miRNA and disease node embedding representations obtained through the MTLMDA encoder respectively in the miRNA–disease sub-network, and |$Q_{1}$| denotes a trainable parameter matrix, which is 64*64 dimensions. Sigmoid function represents:

(28)

similarly, |$\hat{y}_{gd}$| represents the predicted association probability of gene nodes and the disease nodes in the gene–disease sub-network.

Step VI: The loss function of MTLMDA model is the sum of reconstructed errors from all training samples in the two sub-networks. Here, we choose the cross-entropy loss function to measure the error between the ture vaule |$y$| of each associations in the sub-network and predicted probability value |$\hat{y}$|⁠. The form is as follows:

(29)
(30)

where |$LOSS_{m-d}$| represents the functional loss in the miRNA–disease sub-network, |$\hat{y}_{ij}$| represents the predicted link probability between disease and miRNA nodes, while |$y_{ij}$| represents the true label of the link, which will be 1 or 0. Correspondingly, |$LOSS_{g-d}$| represents the functional loss in the gene–disease sub-network. We take the sum of the two sub-network losses as the loss of MTLMDA whose form is as follows:

(31)

Then, we use the Loss function in Eq. (31) to train the whole model via back propagation algorithm with an end-toend manner.

RESULTS

In this section, we show the comparison results under different experimental conditions and different models on HMDD v2.0 dataset [41] to demonstrate the effectiveness of our proposed MTLMDA model.

Implementation settings

MTLMDA is implemented in the pytorch(v1.10.2) framework based on the DGL(v0.6.1) platform [48]. During model training, model parameters are randomly initialized and optimized with Adam. We adopted grid search to find the MTLMDA’s optimal hyper parameters, the learning rate is set to 0.0001 and the weight decay is set to 3*|$10^{-4}$|⁠. In order to prevent the overfitting, we add a drop mechanism in the model [38]. We select different dropout rates from 0.1 to 0.9 during the training process and the model performs the best when dropout rate is 0.3. The entire model is trained 800 epochs and output the test set results every 10 epochs. Please see Table 2 for the detailed hyperparameter settings in our experiment. The 5-fold cross-validation is applied to the performance evaluation of MTLMDA. In 5-fold cross-validation, the sample dataset is randomly divided into five subsets. At each time, one subset of data is used as the test set and the remaining four sets of subsets are used as the training set. After repeating the similar process five times, we can obtain the objective and fair experimental evaluation results.

Table 2

Setting of hyperparameters

ModelHyper-parameterHMDD v2.0Searching spaceDescription
MTLMDAWeight|$\_$|decay|$3*10^{-4}$||$[10^{-3},10^{-4},3*10^{-4},10^{-5}]$||$L_{2}$| regularization coefficient.
Layers3|$[1,2,3,4,5]$|Number of layers of chebGCN.
Dropout0.3|$[0.1-0.9]$|Dropout rate.
|$lr$||$10^{-4}$||$[10^{-2},10^{-3},5*10^{-4},10^{-4},10^{-5}]$|Learning rate.
Projection dimension1024|$[64,128,256,512,1024,2048]$|Node feature mapping dimension.
Embedding dimension64|$[16,32,64,128,256,512]$|Node feature embedding dimension.
ModelHyper-parameterHMDD v2.0Searching spaceDescription
MTLMDAWeight|$\_$|decay|$3*10^{-4}$||$[10^{-3},10^{-4},3*10^{-4},10^{-5}]$||$L_{2}$| regularization coefficient.
Layers3|$[1,2,3,4,5]$|Number of layers of chebGCN.
Dropout0.3|$[0.1-0.9]$|Dropout rate.
|$lr$||$10^{-4}$||$[10^{-2},10^{-3},5*10^{-4},10^{-4},10^{-5}]$|Learning rate.
Projection dimension1024|$[64,128,256,512,1024,2048]$|Node feature mapping dimension.
Embedding dimension64|$[16,32,64,128,256,512]$|Node feature embedding dimension.
Table 2

Setting of hyperparameters

ModelHyper-parameterHMDD v2.0Searching spaceDescription
MTLMDAWeight|$\_$|decay|$3*10^{-4}$||$[10^{-3},10^{-4},3*10^{-4},10^{-5}]$||$L_{2}$| regularization coefficient.
Layers3|$[1,2,3,4,5]$|Number of layers of chebGCN.
Dropout0.3|$[0.1-0.9]$|Dropout rate.
|$lr$||$10^{-4}$||$[10^{-2},10^{-3},5*10^{-4},10^{-4},10^{-5}]$|Learning rate.
Projection dimension1024|$[64,128,256,512,1024,2048]$|Node feature mapping dimension.
Embedding dimension64|$[16,32,64,128,256,512]$|Node feature embedding dimension.
ModelHyper-parameterHMDD v2.0Searching spaceDescription
MTLMDAWeight|$\_$|decay|$3*10^{-4}$||$[10^{-3},10^{-4},3*10^{-4},10^{-5}]$||$L_{2}$| regularization coefficient.
Layers3|$[1,2,3,4,5]$|Number of layers of chebGCN.
Dropout0.3|$[0.1-0.9]$|Dropout rate.
|$lr$||$10^{-4}$||$[10^{-2},10^{-3},5*10^{-4},10^{-4},10^{-5}]$|Learning rate.
Projection dimension1024|$[64,128,256,512,1024,2048]$|Node feature mapping dimension.
Embedding dimension64|$[16,32,64,128,256,512]$|Node feature embedding dimension.

Evaluation metrics

To comprehensively evaluate the performance of our proposed MTLMDA, we choose Precision (Prec.), Accuracy (Acc.), Recall, F1 score, AUC and precision–recall (P–R) curve as the evaluation criteria. The corresponding mathematical calculation is represented as follows:

(32)
(33)
(34)
(35)

where |$TP$|⁠, |$FP$|⁠, |$TN$| and |$FN$| denote true positive, false positive, true negative and false negative, respectively. AUC refers to the area under the receiver operating characteristic (ROC) curve, which can quantitatively reflect the model performance measured based on the ROC curve. The abscissa of |$ROC$| curve represents |$FPR$| and the ordinate is |$TPR$| where |$TPR$| and |$TPR$| are calculated as follows:

(36)
(37)

The abscissa of the P–R curve represents the recall of the model and the ordinate represents the precision. The larger area in the P–R curve represents better model performance. Table 3 describes the values of various evaluation indicators of our model using 5-fold cross-validation in detail.

Table 3

5-fold cross-validation results performed

Test setPrecisionAccuracyRecallF1-score
10.87300.86560.88310.8670
20.87620.87430.87450.8733
30.85200.87800.85370.8751
40.89030.86600.87560.8697
50.88990.86000.86300.8598
Mean87.63% |$\,\pm\, $|0.004686.88% |$\,\pm\, $|0.006587.74% |$\,\pm\, $|0.010486.93% |$\,\pm\, $|0.0054
Test setPrecisionAccuracyRecallF1-score
10.87300.86560.88310.8670
20.87620.87430.87450.8733
30.85200.87800.85370.8751
40.89030.86600.87560.8697
50.88990.86000.86300.8598
Mean87.63% |$\,\pm\, $|0.004686.88% |$\,\pm\, $|0.006587.74% |$\,\pm\, $|0.010486.93% |$\,\pm\, $|0.0054
Table 3

5-fold cross-validation results performed

Test setPrecisionAccuracyRecallF1-score
10.87300.86560.88310.8670
20.87620.87430.87450.8733
30.85200.87800.85370.8751
40.89030.86600.87560.8697
50.88990.86000.86300.8598
Mean87.63% |$\,\pm\, $|0.004686.88% |$\,\pm\, $|0.006587.74% |$\,\pm\, $|0.010486.93% |$\,\pm\, $|0.0054
Test setPrecisionAccuracyRecallF1-score
10.87300.86560.88310.8670
20.87620.87430.87450.8733
30.85200.87800.85370.8751
40.89030.86600.87560.8697
50.88990.86000.86300.8598
Mean87.63% |$\,\pm\, $|0.004686.88% |$\,\pm\, $|0.006587.74% |$\,\pm\, $|0.010486.93% |$\,\pm\, $|0.0054

We observe that MTLMDA has achieved an average Accuracy of 86.88%, Precision of 87.63%, Recall of 87.74% and F1 score of 86.93%. Moreover, Figure 4 shows that the AUC values of MTLMDA’s ROC curves under five-fold cross-validation are: 94.03%, 94.67%, 93.75%, 94.62%, 93.79%, with an average of 94.17% |$\,\pm\, $|0.0040. At the same time, Figure 5 shows that the values AUC of MTLMDA’s P–R curve under 5-fold cross-validation are: 93.27%, 93.53%, 94.55%, 94.10% and 93.31% with an average of 93.75% |$\,\pm\, $| 0.0050. To further demonstrate the value of incorporating gene–disease information in MTLMDA, we conducted an experiment whereby the miRNA–disease network remained unchanged while the gene–disease network was randomly shuffled, disrupting the original associations between genes and diseases. The validation results are shown in Table 4. Interestingly, despite the randomization of the gene–disease network, it is still possible for the two networks to exhibit similar structures within the vector space of disease nodes during the initial non-specific task of multi-task learning [22]. As a result, the perturbed gene–disease network can still offer significant auxiliary information to the miRNA–disease network.

ROC curves of MTLMDA in 5-fold cross validation.
Figure 4

ROC curves of MTLMDA in 5-fold cross validation.

P–R curves of MTLMDA in 5-fold cross validation.
Figure 5

P–R curves of MTLMDA in 5-fold cross validation.

Table 4

5-fold cross-validation (random shuffle gene–disease associations) performed

Test set12345Mean
Precision0.85610.85520.84960.83690.842784.81%|$\,\pm\, $|0.0074
Accuracy0.84870.85650.86200.86380.850085.62%|$\,\pm\, $|0.0061
Recall0.82490.83770.85280.87650.837984.60%|$\,\pm\, $|0.0176
F1-score0.83500.84640.85120.85620.840384.58%|$\,\pm\, $|0.0076
AUC0.92750.93620.92650.93400.928493.05%|$\,\pm\, $|0.0039
Test set12345Mean
Precision0.85610.85520.84960.83690.842784.81%|$\,\pm\, $|0.0074
Accuracy0.84870.85650.86200.86380.850085.62%|$\,\pm\, $|0.0061
Recall0.82490.83770.85280.87650.837984.60%|$\,\pm\, $|0.0176
F1-score0.83500.84640.85120.85620.840384.58%|$\,\pm\, $|0.0076
AUC0.92750.93620.92650.93400.928493.05%|$\,\pm\, $|0.0039
Table 4

5-fold cross-validation (random shuffle gene–disease associations) performed

Test set12345Mean
Precision0.85610.85520.84960.83690.842784.81%|$\,\pm\, $|0.0074
Accuracy0.84870.85650.86200.86380.850085.62%|$\,\pm\, $|0.0061
Recall0.82490.83770.85280.87650.837984.60%|$\,\pm\, $|0.0176
F1-score0.83500.84640.85120.85620.840384.58%|$\,\pm\, $|0.0076
AUC0.92750.93620.92650.93400.928493.05%|$\,\pm\, $|0.0039
Test set12345Mean
Precision0.85610.85520.84960.83690.842784.81%|$\,\pm\, $|0.0074
Accuracy0.84870.85650.86200.86380.850085.62%|$\,\pm\, $|0.0061
Recall0.82490.83770.85280.87650.837984.60%|$\,\pm\, $|0.0176
F1-score0.83500.84640.85120.85620.840384.58%|$\,\pm\, $|0.0076
AUC0.92750.93620.92650.93400.928493.05%|$\,\pm\, $|0.0039

Comparison with other latest methods

The constructed training samples are data sets with balanced positive and negative samples. Therefore, the ROC curve can more intuitively show the performance of the model. Here, we use the AUC values based on the ROC curve to compare the performance of MTLMDA with the other state-of-the-art models in a 5-fold cross-validation manner. We have selected the latest and most representative models in this field, which are ”Predicting microRNA–disease associations using label propagation based on linear neighborhood similarity” (LPLNS) [27], ”Tree-layer heterogeneous network combined with unbalanced random walk for miRNA–disease association prediction” (TCRWMDA)[31], ”A graph auto-encoder model for miRNA–disease associations prediction” (GAEMDA) [38], ”Multi-view multichannel attention graph convolutional network for miRN–disease association prediction” (MMGCN) [49] and ”Hierarchical graph attention network for miRNA–disease association prediction” (HGANMDA) [46]. For fair experiments, the five comparison algorithms are all performed 5-fold cross-validation experiments on HMDD v2.0 dataset. Figure 6 shows the ROC curve comparison of MTLMDA with the other five contrasting algorithms. In addition, Figure 7 compares our method with other approaches using an alternative version of the 5-fold cross-validation. Unlike the standard 5-fold cross-validation where the dataset is randomly partitioned, we used the average Gaussian similarity of each miRNA node with respect to the disease to divide the dataset into five parts. Table 5 shows the AUC values for each model in different versions of 5-fold cross-validation based on the HMDD V2.0. As shown in Figure 8, to obtain a more comprehensive evaluation, we further use the data of HMDD V2.0 as training, and use the data set of HMDD V3.2 [50] for test comparison. From the results, we observe that our MTLMDA perform better than the baseline methods. Comparing to other models, MTLMDA fully takes into account the relatively sparse relationships between miRNAs and diseases on the database and utilizes multi-task learning to effectively explore the sparse relationships. Moreover, MTLMDA uses the information of gene–disease network to assist the prediction of miRNA–disease improving the overall performance of the model. Therefore, MTLMDA achieves excellent results.

Comparison of ROC curves in 5-fold cross validation based on HMDD v2.0.
Figure 6

Comparison of ROC curves in 5-fold cross validation based on HMDD v2.0.

Comparison of ROC curves based on the alternative version of the 5-fold cross-validation.
Figure 7

Comparison of ROC curves based on the alternative version of the 5-fold cross-validation.

Comparison of ROC curves based on the HMDD v3.2.
Figure 8

Comparison of ROC curves based on the HMDD v3.2.

Table 5

5-fold cross-validation results comparison

ModelsAUCAUC (alternative version)
LPLNS0.9107|$\,\pm\, $|0.00410.8524|$\,\pm\, $|0.0018
TCRWMDA0.9209|$\,\pm\, $|0.00360.9157|$\,\pm\, $|0.0033
GAEMDA0.9356|$\,\pm\, $|0.00440.9319|$\,\pm\, $|0.0049
MMGCN0.9266|$\,\pm\, $|0.00220.9191|$\,\pm\, $|0.0015
HGANMDA0.9374|$\,\pm\, $|0.00410.9336|$\,\pm\, $|0.0038
MTLMDA0.9417|$\,\pm\, $|0.00400.9404|$\,\pm\, $|0.0039
ModelsAUCAUC (alternative version)
LPLNS0.9107|$\,\pm\, $|0.00410.8524|$\,\pm\, $|0.0018
TCRWMDA0.9209|$\,\pm\, $|0.00360.9157|$\,\pm\, $|0.0033
GAEMDA0.9356|$\,\pm\, $|0.00440.9319|$\,\pm\, $|0.0049
MMGCN0.9266|$\,\pm\, $|0.00220.9191|$\,\pm\, $|0.0015
HGANMDA0.9374|$\,\pm\, $|0.00410.9336|$\,\pm\, $|0.0038
MTLMDA0.9417|$\,\pm\, $|0.00400.9404|$\,\pm\, $|0.0039
Table 5

5-fold cross-validation results comparison

ModelsAUCAUC (alternative version)
LPLNS0.9107|$\,\pm\, $|0.00410.8524|$\,\pm\, $|0.0018
TCRWMDA0.9209|$\,\pm\, $|0.00360.9157|$\,\pm\, $|0.0033
GAEMDA0.9356|$\,\pm\, $|0.00440.9319|$\,\pm\, $|0.0049
MMGCN0.9266|$\,\pm\, $|0.00220.9191|$\,\pm\, $|0.0015
HGANMDA0.9374|$\,\pm\, $|0.00410.9336|$\,\pm\, $|0.0038
MTLMDA0.9417|$\,\pm\, $|0.00400.9404|$\,\pm\, $|0.0039
ModelsAUCAUC (alternative version)
LPLNS0.9107|$\,\pm\, $|0.00410.8524|$\,\pm\, $|0.0018
TCRWMDA0.9209|$\,\pm\, $|0.00360.9157|$\,\pm\, $|0.0033
GAEMDA0.9356|$\,\pm\, $|0.00440.9319|$\,\pm\, $|0.0049
MMGCN0.9266|$\,\pm\, $|0.00220.9191|$\,\pm\, $|0.0015
HGANMDA0.9374|$\,\pm\, $|0.00410.9336|$\,\pm\, $|0.0038
MTLMDA0.9417|$\,\pm\, $|0.00400.9404|$\,\pm\, $|0.0039

Performance analysis of the model under different feature information

To further demonstrate the effectiveness of our proposed model, we conduct ablation experiments. Our model is experimented with Gaussian features and auxiliary features(AUX+GUS), only Gaussian features(GUS), only auxiliary features(AUX) and only original edge features(Edge). Table 6 shows the model performance under different node features and visualized in Figure 9.

Table 6

Performance comparison of models under different node information

Node featureAUC (%)Precision (%)Accuracy (%)Recall (%)F1-score (%)
Edge91.2283.3783.5883.9083.63
AUX90.7385.3784.3881.4383.91
GUS92.1584.2085.3280.7483.62
AUX+GUS94.1786.1386.8887.7486.93
Node featureAUC (%)Precision (%)Accuracy (%)Recall (%)F1-score (%)
Edge91.2283.3783.5883.9083.63
AUX90.7385.3784.3881.4383.91
GUS92.1584.2085.3280.7483.62
AUX+GUS94.1786.1386.8887.7486.93
Table 6

Performance comparison of models under different node information

Node featureAUC (%)Precision (%)Accuracy (%)Recall (%)F1-score (%)
Edge91.2283.3783.5883.9083.63
AUX90.7385.3784.3881.4383.91
GUS92.1584.2085.3280.7483.62
AUX+GUS94.1786.1386.8887.7486.93
Node featureAUC (%)Precision (%)Accuracy (%)Recall (%)F1-score (%)
Edge91.2283.3783.5883.9083.63
AUX90.7385.3784.3881.4383.91
GUS92.1584.2085.3280.7483.62
AUX+GUS94.1786.1386.8887.7486.93
Comparing the results under different node features (AUX+GUS, GUS,AUX, edge represents the performance of the model under Gaussian similarity and auxiliary information, only Gaussian similarity, only auxiliary information, and only edge information, respectively).
Figure 9

Comparing the results under different node features (AUX+GUS, GUS,AUX, edge represents the performance of the model under Gaussian similarity and auxiliary information, only Gaussian similarity, only auxiliary information, and only edge information, respectively).

Performance analysis of the model under different GCN layers

In the encoder of MTLMDA, we aggregate the information of the network through the GCN, and then generate the representation of the network nodes. The different number of GCN layers in the encoder will lead to different aggregation effects on the node information in the network, which will affect the final prediction performance of the model. Figure 10A shows the performance of our model for different numbers of GCN layers in the encoder. We see that the performance of MTLMDA reaches the best performance when the number of GCN layers is 3 in the encoder, while the number of layers is ¿3, the performance will drop rapidly. We know that the |$0th$| layer embedding of a node in encoder is its input feature, |$layer-k$| embedding obtains information from nodes that are |$k$| hops away on the formed heterogeneous graph. If the encoder of the model contains an excessive number of GCN layers, every node in graph will obtain highly overlapped node information. As a result, the model suffers from the over-smoothing problem. As shown in Figure 10A, when the number of GCN layers is greater than 3, the model performance begins to degrade.

The value of MTLMDA under different experimental parameters.
Figure 10

The value of MTLMDA under different experimental parameters.

Performance analysis of the model under different embedding dimension

The size of the embedding representation of the nodes obtained after encoding by the MTLMDA encoder is an important factor affecting the performance of the model. The size of different node embeddings contains different node information for the same node. In the experiment, we choose the size of node embedding dimension as 16, 32, 64, 128, 256 and 512 respectively. In Figure 10B, the overall performance of the model is constantly improved within a certain range over the node embedding dimension. When the embedding dimension reaches 64, the performance of the model is the best. Thus, we choose 64 dimensions as the default embedding dimension of MTLMDA encoder.

Performance analysis of the model under different projection dimension

The projection dimension of the network nodes in the model is the most important factor in determining the initial characteristics of the nodes in the model encoder. For the experiments, we explore the performance of the model under different projection dimensions based on a 3-layer GCN encoder. As demonstrated in Figure 10C, we set the projection dimensions to 64, 128, 256, 512, 1024, and 2048, respectively. The experimental results show that the overall performance of the model is the optimal when the projection dimension is 1024. Therefore, in subsequent experiments, we choose 1024 as the default value of the projection dimension.

Comprehensive comparison of different models

To further demonstrate the comprehensive performance of MTLMDA, we regard the model as a recommender system, where miRNAs represent users and diseases represent items. Our task is to determine whether there is an interaction between the user and different items. Therefore, we conduct the experiment by using all known associateds in HDMM v2.0 as the training samples to identify various top rankings miRNAs for each diseases. Here, we mainly focus on the number of predicted positive samples in the different top-ranked. Figure 11 shows the numbers of correctly retrieved miRNA–disease associations. We observe that MTLMDA outperforms the other models among top 5 to top 50.

Number of correctly retrieved known miRNA–disease associations of top-k.
Figure 11

Number of correctly retrieved known miRNA–disease associations of top-k.

Real case studies for the proposed MTLMDA

To further test the predictive ability of MTLMDA for potentially disease-miRNA associations in practice, we leverage the model to conduct case studies on six common malignant human diseases. We learn that approximately 200 miRNAs have been found to be significantly dysregulated in various cancer malignancies. These miRNAs can produce effect on cancer generation by targeting proto-oncogenes or tumor suppressor genes [51]. Therefore, accurate prediction of potential miRNA–disease associations is a major advance in the field of human medicine and health. During the prediction process of the model, the training set of miRNA–disease sub-network includes positive samples of 5430 experimentally confirmed miRNA–disease combinations and the same number of negative samples randomly selected from the miRNA–disease combinations. And then, we establish the corresponding gene–disease sub-network training set according to the diseases in the miRNA–disease sub-network. The test set of MTLMDA is assembled through the disease of case studied with the remaining miRNAs in the miRNA–disease sub-network. By training the MTLMDA, we can get embedded representations for diseases and miRNAs. By decoding the test set, we can get the association probability between the disease and remaining miRNAs. For each disease, we select the top 30 miRNAs with the highest predicted association probability scores. For the prediction results, we use the three databases (i.e. dbDEMC [52], miR2Disease [53] and miRCancer [54]) to verify them in turn. If the results are confirmed in dbDEMC, we will no longer query the miR2Disease and miRCancer databases. Otherwise, we query them sequentially.

Lung cancer is the most common cause of death among all cancer pathologies. Most patients are not noticed until advanced-stage, and the prognosis is generally poor [55]. We know that loss or amplification of some miRNAs has been found the association with lung cancer. Therefore, it is very necessary to design the case study to explore potential associated miRNA in lung cancer. From Table 7, we observe that 29 of the top 30 candidate miRNAs can be confirmed with the three databases. Specifically, there is no evidence that hsa-mir-378a is associated with lung cancer, perhaps the connection has not been discovered yet rather than confirming no link.

Table 7

Top 30 lung cancer-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-16dbDEMC16hsa-mir-378aNO
2hsa-mir-122dbDEMC17hsa-mir-99adbDEMC
3hsa-mir-15adbDEMC18hsa-mir-302adbDEMC
4hsa-mir-106bdbDEMC19hsa-mir-328dbDEMC
5hsa-mir-15bdbDEMC20hsa-mir-196bdbDEMC
6hsa-mir-195dbDEMC21hsa-mir-372dbDEMC
7hsa-mir-141dbDEMC22hsa-mir-483dbDEMC
8hsa-mir-451adbDEMC23hsa-mir-10adbDEMC
9hsa-mir-23bdbDEMC24hsa-mir-208adbDEMC
10hsa-mir-342dbDEMC25hsa-mir-424mirCancer
11hsa-mir-429dbDEMC26hsa-mir-302bdbDEMC
12hsa-mir-373dbDEMC27hsa-mir-204dbDEMC
13hsa-mir-20bdbDEMC28hsa-mir-144dbDEMC
14hsa-mir-130adbDEMC29hsa-mir-28dbDEMC
15hsa-mir-193bdbDEMC30hsa-mir-149dbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-16dbDEMC16hsa-mir-378aNO
2hsa-mir-122dbDEMC17hsa-mir-99adbDEMC
3hsa-mir-15adbDEMC18hsa-mir-302adbDEMC
4hsa-mir-106bdbDEMC19hsa-mir-328dbDEMC
5hsa-mir-15bdbDEMC20hsa-mir-196bdbDEMC
6hsa-mir-195dbDEMC21hsa-mir-372dbDEMC
7hsa-mir-141dbDEMC22hsa-mir-483dbDEMC
8hsa-mir-451adbDEMC23hsa-mir-10adbDEMC
9hsa-mir-23bdbDEMC24hsa-mir-208adbDEMC
10hsa-mir-342dbDEMC25hsa-mir-424mirCancer
11hsa-mir-429dbDEMC26hsa-mir-302bdbDEMC
12hsa-mir-373dbDEMC27hsa-mir-204dbDEMC
13hsa-mir-20bdbDEMC28hsa-mir-144dbDEMC
14hsa-mir-130adbDEMC29hsa-mir-28dbDEMC
15hsa-mir-193bdbDEMC30hsa-mir-149dbDEMC
Table 7

Top 30 lung cancer-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-16dbDEMC16hsa-mir-378aNO
2hsa-mir-122dbDEMC17hsa-mir-99adbDEMC
3hsa-mir-15adbDEMC18hsa-mir-302adbDEMC
4hsa-mir-106bdbDEMC19hsa-mir-328dbDEMC
5hsa-mir-15bdbDEMC20hsa-mir-196bdbDEMC
6hsa-mir-195dbDEMC21hsa-mir-372dbDEMC
7hsa-mir-141dbDEMC22hsa-mir-483dbDEMC
8hsa-mir-451adbDEMC23hsa-mir-10adbDEMC
9hsa-mir-23bdbDEMC24hsa-mir-208adbDEMC
10hsa-mir-342dbDEMC25hsa-mir-424mirCancer
11hsa-mir-429dbDEMC26hsa-mir-302bdbDEMC
12hsa-mir-373dbDEMC27hsa-mir-204dbDEMC
13hsa-mir-20bdbDEMC28hsa-mir-144dbDEMC
14hsa-mir-130adbDEMC29hsa-mir-28dbDEMC
15hsa-mir-193bdbDEMC30hsa-mir-149dbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-16dbDEMC16hsa-mir-378aNO
2hsa-mir-122dbDEMC17hsa-mir-99adbDEMC
3hsa-mir-15adbDEMC18hsa-mir-302adbDEMC
4hsa-mir-106bdbDEMC19hsa-mir-328dbDEMC
5hsa-mir-15bdbDEMC20hsa-mir-196bdbDEMC
6hsa-mir-195dbDEMC21hsa-mir-372dbDEMC
7hsa-mir-141dbDEMC22hsa-mir-483dbDEMC
8hsa-mir-451adbDEMC23hsa-mir-10adbDEMC
9hsa-mir-23bdbDEMC24hsa-mir-208adbDEMC
10hsa-mir-342dbDEMC25hsa-mir-424mirCancer
11hsa-mir-429dbDEMC26hsa-mir-302bdbDEMC
12hsa-mir-373dbDEMC27hsa-mir-204dbDEMC
13hsa-mir-20bdbDEMC28hsa-mir-144dbDEMC
14hsa-mir-130adbDEMC29hsa-mir-28dbDEMC
15hsa-mir-193bdbDEMC30hsa-mir-149dbDEMC

Colon cancer is a type of cancer that begins in the large intestine (Colon). The colon is the final part of the digestive tract. Colon cancer can occur at any age but it is more likely to affect older adults. It usually starts as small noncancerous (benign) clumps of cells called polyps that form inside the colon. Some of these polyps can turn into colon cancer over time. An estimated 106 180 colon cancer cases will be diagnosed in the USA by 2022 [56]. From Table 8, the top 30 colon cancer-related miRNAs predicted by our model are confirmed on the three databases.

Table 8

Top 30 colon cancer-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-21dbDEMC16hsa-mir-29cdbDEMC
2hsa-mir-155dbDEMC17hsa-mir-15bmiR2Disease
3hsa-mir-34adbDEMC18hsa-mir-223dbDEMC
4hsa-mir-146adbDEMC19hsa-mir-199amirCancer
5hsa-mir-125bdbDEMC20hsa-mir-19bdbDEMC
6hsa-mir-122dbDEMC21hsa-let-7adbDEMC
7hsa-mir-16dbDEMC22hsa-mir-143dbDEMC
8hsa-mir-221dbDEMC23hsa-mir-92adbDEMC
9hsa-mir-29adbDEMC24hsa-mir-31dbDEMC
10hsa-mir-222dbDEMC25hsa-mir-210dbDEMC
11hsa-mir-133adbDEMC26hsa-mir-200bdbDEMC
12hsa-mir-29bdbDEMC27hsa-mir-206dbDEMC
13hsa-mir-1dbDEMC28hsa-mir-19adbDEMC
14hsa-mir-20adbDEMC29hsa-mir-18adbDEMC
15hsa-mir-15adbDEMC30hsa-let-7cdbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-21dbDEMC16hsa-mir-29cdbDEMC
2hsa-mir-155dbDEMC17hsa-mir-15bmiR2Disease
3hsa-mir-34adbDEMC18hsa-mir-223dbDEMC
4hsa-mir-146adbDEMC19hsa-mir-199amirCancer
5hsa-mir-125bdbDEMC20hsa-mir-19bdbDEMC
6hsa-mir-122dbDEMC21hsa-let-7adbDEMC
7hsa-mir-16dbDEMC22hsa-mir-143dbDEMC
8hsa-mir-221dbDEMC23hsa-mir-92adbDEMC
9hsa-mir-29adbDEMC24hsa-mir-31dbDEMC
10hsa-mir-222dbDEMC25hsa-mir-210dbDEMC
11hsa-mir-133adbDEMC26hsa-mir-200bdbDEMC
12hsa-mir-29bdbDEMC27hsa-mir-206dbDEMC
13hsa-mir-1dbDEMC28hsa-mir-19adbDEMC
14hsa-mir-20adbDEMC29hsa-mir-18adbDEMC
15hsa-mir-15adbDEMC30hsa-let-7cdbDEMC
Table 8

Top 30 colon cancer-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-21dbDEMC16hsa-mir-29cdbDEMC
2hsa-mir-155dbDEMC17hsa-mir-15bmiR2Disease
3hsa-mir-34adbDEMC18hsa-mir-223dbDEMC
4hsa-mir-146adbDEMC19hsa-mir-199amirCancer
5hsa-mir-125bdbDEMC20hsa-mir-19bdbDEMC
6hsa-mir-122dbDEMC21hsa-let-7adbDEMC
7hsa-mir-16dbDEMC22hsa-mir-143dbDEMC
8hsa-mir-221dbDEMC23hsa-mir-92adbDEMC
9hsa-mir-29adbDEMC24hsa-mir-31dbDEMC
10hsa-mir-222dbDEMC25hsa-mir-210dbDEMC
11hsa-mir-133adbDEMC26hsa-mir-200bdbDEMC
12hsa-mir-29bdbDEMC27hsa-mir-206dbDEMC
13hsa-mir-1dbDEMC28hsa-mir-19adbDEMC
14hsa-mir-20adbDEMC29hsa-mir-18adbDEMC
15hsa-mir-15adbDEMC30hsa-let-7cdbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-21dbDEMC16hsa-mir-29cdbDEMC
2hsa-mir-155dbDEMC17hsa-mir-15bmiR2Disease
3hsa-mir-34adbDEMC18hsa-mir-223dbDEMC
4hsa-mir-146adbDEMC19hsa-mir-199amirCancer
5hsa-mir-125bdbDEMC20hsa-mir-19bdbDEMC
6hsa-mir-122dbDEMC21hsa-let-7adbDEMC
7hsa-mir-16dbDEMC22hsa-mir-143dbDEMC
8hsa-mir-221dbDEMC23hsa-mir-92adbDEMC
9hsa-mir-29adbDEMC24hsa-mir-31dbDEMC
10hsa-mir-222dbDEMC25hsa-mir-210dbDEMC
11hsa-mir-133adbDEMC26hsa-mir-200bdbDEMC
12hsa-mir-29bdbDEMC27hsa-mir-206dbDEMC
13hsa-mir-1dbDEMC28hsa-mir-19adbDEMC
14hsa-mir-20adbDEMC29hsa-mir-18adbDEMC
15hsa-mir-15adbDEMC30hsa-let-7cdbDEMC

Lymphomas start in immune system cells and can occur almost anywhere in the body. In 2022, there will be an estimated 89 010 new cases of lymphoma in the USA and 21 170 people will die from the disease [56]. Our prediction results for lymphoma-associated miRNAs are shown in Table 9. Among the top 30 candidate miRNAs, only hsa-mir-142 and hsa-mir-34c have no evidence to prove their association with Lymphoma on the three databases.

Table 9

Top 30 lymphoma-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-125bdbDEMC16hsa-mir-196adbDEMC
2hsa-mir-29adbDEMC17hsa-mir-214dbDEMC
3hsa-mir-34adbDEMC18hsa-mir-195dbDEMC
4hsa-mir-221dbDEMC19hsa-mir-30adbDEMC
5hsa-mir-222dbDEMC20hsa-mir-9dbDEMC
6hsa-mir-29bdbDEMC21hsa-mir-143dbDEMC
7hsa-mir-133adbDEMC22hsa-mir-181bdbDEMC
8hsa-mir-199adbDEMC23hsa-let-7cdbDEMC
9hsa-mir-1dbDEMC24hsa-let-7adbDEMC
10hsa-mir-223dbDEMC25hsa-mir-15bdbDEMC
11hsa-mir-145dbDEMC26hsa-mir-23adbDEMC
12hsa-mir-106bdbDEMC27hsa-mir-146bdbDEMC
13hsa-mir-142NO28hsa-let-7bdbDEMC
14hsa-mir-206dbDEMC29hsa-mir-34cNO
15hsa-mir-31dbDEMC30hsa-mir-106adbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-125bdbDEMC16hsa-mir-196adbDEMC
2hsa-mir-29adbDEMC17hsa-mir-214dbDEMC
3hsa-mir-34adbDEMC18hsa-mir-195dbDEMC
4hsa-mir-221dbDEMC19hsa-mir-30adbDEMC
5hsa-mir-222dbDEMC20hsa-mir-9dbDEMC
6hsa-mir-29bdbDEMC21hsa-mir-143dbDEMC
7hsa-mir-133adbDEMC22hsa-mir-181bdbDEMC
8hsa-mir-199adbDEMC23hsa-let-7cdbDEMC
9hsa-mir-1dbDEMC24hsa-let-7adbDEMC
10hsa-mir-223dbDEMC25hsa-mir-15bdbDEMC
11hsa-mir-145dbDEMC26hsa-mir-23adbDEMC
12hsa-mir-106bdbDEMC27hsa-mir-146bdbDEMC
13hsa-mir-142NO28hsa-let-7bdbDEMC
14hsa-mir-206dbDEMC29hsa-mir-34cNO
15hsa-mir-31dbDEMC30hsa-mir-106adbDEMC
Table 9

Top 30 lymphoma-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-125bdbDEMC16hsa-mir-196adbDEMC
2hsa-mir-29adbDEMC17hsa-mir-214dbDEMC
3hsa-mir-34adbDEMC18hsa-mir-195dbDEMC
4hsa-mir-221dbDEMC19hsa-mir-30adbDEMC
5hsa-mir-222dbDEMC20hsa-mir-9dbDEMC
6hsa-mir-29bdbDEMC21hsa-mir-143dbDEMC
7hsa-mir-133adbDEMC22hsa-mir-181bdbDEMC
8hsa-mir-199adbDEMC23hsa-let-7cdbDEMC
9hsa-mir-1dbDEMC24hsa-let-7adbDEMC
10hsa-mir-223dbDEMC25hsa-mir-15bdbDEMC
11hsa-mir-145dbDEMC26hsa-mir-23adbDEMC
12hsa-mir-106bdbDEMC27hsa-mir-146bdbDEMC
13hsa-mir-142NO28hsa-let-7bdbDEMC
14hsa-mir-206dbDEMC29hsa-mir-34cNO
15hsa-mir-31dbDEMC30hsa-mir-106adbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-125bdbDEMC16hsa-mir-196adbDEMC
2hsa-mir-29adbDEMC17hsa-mir-214dbDEMC
3hsa-mir-34adbDEMC18hsa-mir-195dbDEMC
4hsa-mir-221dbDEMC19hsa-mir-30adbDEMC
5hsa-mir-222dbDEMC20hsa-mir-9dbDEMC
6hsa-mir-29bdbDEMC21hsa-mir-143dbDEMC
7hsa-mir-133adbDEMC22hsa-mir-181bdbDEMC
8hsa-mir-199adbDEMC23hsa-let-7cdbDEMC
9hsa-mir-1dbDEMC24hsa-let-7adbDEMC
10hsa-mir-223dbDEMC25hsa-mir-15bdbDEMC
11hsa-mir-145dbDEMC26hsa-mir-23adbDEMC
12hsa-mir-106bdbDEMC27hsa-mir-146bdbDEMC
13hsa-mir-142NO28hsa-let-7bdbDEMC
14hsa-mir-206dbDEMC29hsa-mir-34cNO
15hsa-mir-31dbDEMC30hsa-mir-106adbDEMC

Breast cancer is the most common cancer worldwide and the leading cause of cancer-related deaths in women, accounting for 25% of all cancer cases and 15% of cancer-related deaths [57]. Table 10 is the prediction results of our model for the top 30 breast cancer-related miRNAs. From the results, we see that only 25 candidate miRNAs have been confirmed to be related to breast cancer, and the remaining five miRNAs of hsa-mir-509, hsa-mir-362,hsa-mir-485, hsa-mir-491 and hsa-mir-378a do not find evidence relevance on the three databases.

Table 10

Top 30 breast cancer-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-150dbDEMC16hsa-mir-192dbDEMC
2hsa-mir-15bdbDEMC17hsa-mir-491NO
3hsa-mir-212dbDEMC18hsa-mir-95dbDEMC
4hsa-mir-509NO19hsa-mir-154dbDEMC
5hsa-mir-503dbDEMC20hsa-mir-483dbDEMC
6hsa-mir-142miR2Disease21hsa-mir-184dbDEMC
7hsa-mir-106bdbDEMC22hsa-mir-720dbDEMC
8hsa-mir-372dbDEMC23hsa-mir-431dbDEMC
9hsa-mir-362NO24hsa-mir-363dbDEMC
10hsa-mir-32dbDEMC25hsa-mir-28mirCancer
11hsa-mir-208adbDEMC26hsa-mir-193adbDEMC
12hsa-mir-485NO27hsa-mir-378aNO
13hsa-mir-30edbDEMC28hsa-mir-198dbDEMC
14hsa-mir-190adbDEMC29hsa-mir-424dbDEMC
15hsa-mir-98dbDEMC30hsa-mir-186dbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-150dbDEMC16hsa-mir-192dbDEMC
2hsa-mir-15bdbDEMC17hsa-mir-491NO
3hsa-mir-212dbDEMC18hsa-mir-95dbDEMC
4hsa-mir-509NO19hsa-mir-154dbDEMC
5hsa-mir-503dbDEMC20hsa-mir-483dbDEMC
6hsa-mir-142miR2Disease21hsa-mir-184dbDEMC
7hsa-mir-106bdbDEMC22hsa-mir-720dbDEMC
8hsa-mir-372dbDEMC23hsa-mir-431dbDEMC
9hsa-mir-362NO24hsa-mir-363dbDEMC
10hsa-mir-32dbDEMC25hsa-mir-28mirCancer
11hsa-mir-208adbDEMC26hsa-mir-193adbDEMC
12hsa-mir-485NO27hsa-mir-378aNO
13hsa-mir-30edbDEMC28hsa-mir-198dbDEMC
14hsa-mir-190adbDEMC29hsa-mir-424dbDEMC
15hsa-mir-98dbDEMC30hsa-mir-186dbDEMC
Table 10

Top 30 breast cancer-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-150dbDEMC16hsa-mir-192dbDEMC
2hsa-mir-15bdbDEMC17hsa-mir-491NO
3hsa-mir-212dbDEMC18hsa-mir-95dbDEMC
4hsa-mir-509NO19hsa-mir-154dbDEMC
5hsa-mir-503dbDEMC20hsa-mir-483dbDEMC
6hsa-mir-142miR2Disease21hsa-mir-184dbDEMC
7hsa-mir-106bdbDEMC22hsa-mir-720dbDEMC
8hsa-mir-372dbDEMC23hsa-mir-431dbDEMC
9hsa-mir-362NO24hsa-mir-363dbDEMC
10hsa-mir-32dbDEMC25hsa-mir-28mirCancer
11hsa-mir-208adbDEMC26hsa-mir-193adbDEMC
12hsa-mir-485NO27hsa-mir-378aNO
13hsa-mir-30edbDEMC28hsa-mir-198dbDEMC
14hsa-mir-190adbDEMC29hsa-mir-424dbDEMC
15hsa-mir-98dbDEMC30hsa-mir-186dbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-150dbDEMC16hsa-mir-192dbDEMC
2hsa-mir-15bdbDEMC17hsa-mir-491NO
3hsa-mir-212dbDEMC18hsa-mir-95dbDEMC
4hsa-mir-509NO19hsa-mir-154dbDEMC
5hsa-mir-503dbDEMC20hsa-mir-483dbDEMC
6hsa-mir-142miR2Disease21hsa-mir-184dbDEMC
7hsa-mir-106bdbDEMC22hsa-mir-720dbDEMC
8hsa-mir-372dbDEMC23hsa-mir-431dbDEMC
9hsa-mir-362NO24hsa-mir-363dbDEMC
10hsa-mir-32dbDEMC25hsa-mir-28mirCancer
11hsa-mir-208adbDEMC26hsa-mir-193adbDEMC
12hsa-mir-485NO27hsa-mir-378aNO
13hsa-mir-30edbDEMC28hsa-mir-198dbDEMC
14hsa-mir-190adbDEMC29hsa-mir-424dbDEMC
15hsa-mir-98dbDEMC30hsa-mir-186dbDEMC

Kidney Cancer is one of the 10 most common cancers in the Western community. Globally, approximately 270 000 kidney cancer cases are diagnosed each year, and 116 000 of them die from the disease[58]. It can be seen from the results in Table 11 that the top 30 candidate miRNAs predicted by MTLMDA have all been confirmed the connection.

Table 11

Top 30 kidney cancer-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-155dbDEMC16hsa-mir-223dbDEMC
2hsa-mir-146adbDEMC17hsa-mir-126dbDEMC
3hsa-mir-29adbDEMC18hsa-mir-199adbDEMC
4hsa-mir-34adbDEMC19hsa-mir-18adbDEMC
5hsa-mir-125bdbDEMC20hsa-mir-143dbDEMC
6hsa-mir-122dbDEMC21hsa-mir-19bdbDEMC
7hsa-mir-221dbDEMC22hsa-mir-150dbDEMC
8hsa-mir-16dbDEMC23hsa-mir-19adbDEMC
9hsa-mir-133bdbDEMC24hsa-mir-92adbDEMC
10hsa-mir-20adbDEMC25hsa-mir-15bdbDEMC
11hsa-mir-17dbDEMC26hsa-mir-31dbDEMC
12hsa-mir-29bdbDEMC27hsa-mir-7adbDEMC
13hsa-mir-222mirCancer28hsa-mir-195dbDEMC
14hsa-mir-1dbDEMC29hsa-mir-181adbDEMC
15hsa-mir-145dbDEMC30hsa-mir-200bdbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-155dbDEMC16hsa-mir-223dbDEMC
2hsa-mir-146adbDEMC17hsa-mir-126dbDEMC
3hsa-mir-29adbDEMC18hsa-mir-199adbDEMC
4hsa-mir-34adbDEMC19hsa-mir-18adbDEMC
5hsa-mir-125bdbDEMC20hsa-mir-143dbDEMC
6hsa-mir-122dbDEMC21hsa-mir-19bdbDEMC
7hsa-mir-221dbDEMC22hsa-mir-150dbDEMC
8hsa-mir-16dbDEMC23hsa-mir-19adbDEMC
9hsa-mir-133bdbDEMC24hsa-mir-92adbDEMC
10hsa-mir-20adbDEMC25hsa-mir-15bdbDEMC
11hsa-mir-17dbDEMC26hsa-mir-31dbDEMC
12hsa-mir-29bdbDEMC27hsa-mir-7adbDEMC
13hsa-mir-222mirCancer28hsa-mir-195dbDEMC
14hsa-mir-1dbDEMC29hsa-mir-181adbDEMC
15hsa-mir-145dbDEMC30hsa-mir-200bdbDEMC
Table 11

Top 30 kidney cancer-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-155dbDEMC16hsa-mir-223dbDEMC
2hsa-mir-146adbDEMC17hsa-mir-126dbDEMC
3hsa-mir-29adbDEMC18hsa-mir-199adbDEMC
4hsa-mir-34adbDEMC19hsa-mir-18adbDEMC
5hsa-mir-125bdbDEMC20hsa-mir-143dbDEMC
6hsa-mir-122dbDEMC21hsa-mir-19bdbDEMC
7hsa-mir-221dbDEMC22hsa-mir-150dbDEMC
8hsa-mir-16dbDEMC23hsa-mir-19adbDEMC
9hsa-mir-133bdbDEMC24hsa-mir-92adbDEMC
10hsa-mir-20adbDEMC25hsa-mir-15bdbDEMC
11hsa-mir-17dbDEMC26hsa-mir-31dbDEMC
12hsa-mir-29bdbDEMC27hsa-mir-7adbDEMC
13hsa-mir-222mirCancer28hsa-mir-195dbDEMC
14hsa-mir-1dbDEMC29hsa-mir-181adbDEMC
15hsa-mir-145dbDEMC30hsa-mir-200bdbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-155dbDEMC16hsa-mir-223dbDEMC
2hsa-mir-146adbDEMC17hsa-mir-126dbDEMC
3hsa-mir-29adbDEMC18hsa-mir-199adbDEMC
4hsa-mir-34adbDEMC19hsa-mir-18adbDEMC
5hsa-mir-125bdbDEMC20hsa-mir-143dbDEMC
6hsa-mir-122dbDEMC21hsa-mir-19bdbDEMC
7hsa-mir-221dbDEMC22hsa-mir-150dbDEMC
8hsa-mir-16dbDEMC23hsa-mir-19adbDEMC
9hsa-mir-133bdbDEMC24hsa-mir-92adbDEMC
10hsa-mir-20adbDEMC25hsa-mir-15bdbDEMC
11hsa-mir-17dbDEMC26hsa-mir-31dbDEMC
12hsa-mir-29bdbDEMC27hsa-mir-7adbDEMC
13hsa-mir-222mirCancer28hsa-mir-195dbDEMC
14hsa-mir-1dbDEMC29hsa-mir-181adbDEMC
15hsa-mir-145dbDEMC30hsa-mir-200bdbDEMC

Leukemia is a cancer of the bone marrow and blood. From 2009 to 2018, the incidence of leukemia in children and adolescents increased by about 1% per year [56]. We choose Leukemia as our final set of case studies. Table 12 shows the top 30 miRNAs predicted by our model that may be associated with Leukemia, and then the prediction results are confirmed through the three datasets.

Table 12

Top 30 leukemia-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-155dbDEMC16hsa-mir-34cdbDEMC
2hsa-mir-146adbDEMC17hsa-mir-200bdbDEMC
3hsa-mir-29adbDEMC18hsa-mir-200adbDEMC
4hsa-mir-221dbDEMC19hsa-mir-206dbDEMC
5hsa-mir-133adbDEMC20hsa-let-7adbDEMC
6hsa-mir-122miRCancer21hsa-let-7ddbDEMC
7hsa-mir-222dbDEMC22hsa-mir-182dbDEMC
8hsa-mir-29bdbDEMC23hsa-mir-146bdbDEMC
9hsa-mir-145dbDEMC24hsa-let-7edbDEMC
10hsa-mir-1dbDEMC25hsa-mir-195dbDEMC
11hsa-mir-106bdbDEMC26hsa-mir-142dbDEMC
12hsa-mir-223dbDEMC27hsa-mir-210dbDEMC
13hsa-mir-126dbDEMC28hsa-mir-148adbDEMC
14hsa-mir-15bdbDEMC29hsa-mir-26adbDEMC
15hsa-mir-29cdbDEMC30hsa-mir-106adbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-155dbDEMC16hsa-mir-34cdbDEMC
2hsa-mir-146adbDEMC17hsa-mir-200bdbDEMC
3hsa-mir-29adbDEMC18hsa-mir-200adbDEMC
4hsa-mir-221dbDEMC19hsa-mir-206dbDEMC
5hsa-mir-133adbDEMC20hsa-let-7adbDEMC
6hsa-mir-122miRCancer21hsa-let-7ddbDEMC
7hsa-mir-222dbDEMC22hsa-mir-182dbDEMC
8hsa-mir-29bdbDEMC23hsa-mir-146bdbDEMC
9hsa-mir-145dbDEMC24hsa-let-7edbDEMC
10hsa-mir-1dbDEMC25hsa-mir-195dbDEMC
11hsa-mir-106bdbDEMC26hsa-mir-142dbDEMC
12hsa-mir-223dbDEMC27hsa-mir-210dbDEMC
13hsa-mir-126dbDEMC28hsa-mir-148adbDEMC
14hsa-mir-15bdbDEMC29hsa-mir-26adbDEMC
15hsa-mir-29cdbDEMC30hsa-mir-106adbDEMC
Table 12

Top 30 leukemia-related miRNAs predicted

RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-155dbDEMC16hsa-mir-34cdbDEMC
2hsa-mir-146adbDEMC17hsa-mir-200bdbDEMC
3hsa-mir-29adbDEMC18hsa-mir-200adbDEMC
4hsa-mir-221dbDEMC19hsa-mir-206dbDEMC
5hsa-mir-133adbDEMC20hsa-let-7adbDEMC
6hsa-mir-122miRCancer21hsa-let-7ddbDEMC
7hsa-mir-222dbDEMC22hsa-mir-182dbDEMC
8hsa-mir-29bdbDEMC23hsa-mir-146bdbDEMC
9hsa-mir-145dbDEMC24hsa-let-7edbDEMC
10hsa-mir-1dbDEMC25hsa-mir-195dbDEMC
11hsa-mir-106bdbDEMC26hsa-mir-142dbDEMC
12hsa-mir-223dbDEMC27hsa-mir-210dbDEMC
13hsa-mir-126dbDEMC28hsa-mir-148adbDEMC
14hsa-mir-15bdbDEMC29hsa-mir-26adbDEMC
15hsa-mir-29cdbDEMC30hsa-mir-106adbDEMC
RankmiRNAEvidenceRankmiRNAEvidence
1hsa-mir-155dbDEMC16hsa-mir-34cdbDEMC
2hsa-mir-146adbDEMC17hsa-mir-200bdbDEMC
3hsa-mir-29adbDEMC18hsa-mir-200adbDEMC
4hsa-mir-221dbDEMC19hsa-mir-206dbDEMC
5hsa-mir-133adbDEMC20hsa-let-7adbDEMC
6hsa-mir-122miRCancer21hsa-let-7ddbDEMC
7hsa-mir-222dbDEMC22hsa-mir-182dbDEMC
8hsa-mir-29bdbDEMC23hsa-mir-146bdbDEMC
9hsa-mir-145dbDEMC24hsa-let-7edbDEMC
10hsa-mir-1dbDEMC25hsa-mir-195dbDEMC
11hsa-mir-106bdbDEMC26hsa-mir-142dbDEMC
12hsa-mir-223dbDEMC27hsa-mir-210dbDEMC
13hsa-mir-126dbDEMC28hsa-mir-148adbDEMC
14hsa-mir-15bdbDEMC29hsa-mir-26adbDEMC
15hsa-mir-29cdbDEMC30hsa-mir-106adbDEMC

From the above six groups of case studies, we can further confirm the reliability of our model. In summary, we provide a reliable reference and guidance for miRNA-disease association research through our proposed model.

CONCLUSION

Various malignant diseases in humans are formed in the process of miRNAs controlling gene expression and abnormal expression of miRNAs is a key element of human diseases. Therefore, accurate relationship prediction between diseases and miRNAs can promote the progress of human health. In this paper, we propose multi-task learning model (i.e. MTLMDA) to predict potential miRNA–disease associations. According to the diseases in the miRNA–disease network, we construct the corresponding gene–disease sub-network to assist the prediction task of miRNA–disease associations. Compared with five latest classic benchmark models, our proposed MTLMDA model obtains superior AUC. Moreover, the accuracy and reliability of MTLMDA model in the prediction process are confirmed through six case studies (lung cancer, colon cancer, lymphoma, breast cancer, kidney cancer and leukemia).

Key Points
  • This is the first work (called MTLMDA) in the field to introduce the idea of multi-task learning through the use of miRNA–disease–gene relationships.

  • We select the same disease and related genes from the DisGeNet as the corresponding miRNA–disease sub-network to built the genes–disease sub-network, which was used to assist the miRNA–disease relationship prediction.

  • MTLMDA is an end-to-end trainable graph neural network model using GCN-based autoencoder and decoder.

  • we compare our model with competitive baselines on a real-world dataset and conduct six case studies for both miRNAs and diseases, which confirmed the effectiveness of our model.

FUNDING

National Natural Science Foundation of China (Grant No. 62202089, U22A2004, 72192832); Shanghai Rising-Star Program (Grant No. 23QA1403100); Natural Science Foundation of Shanghai (Grant No. 21ZR1421900); General project of Liaoning Provincial Department of Education (Grant No. LJKZ0005); Doctor Startup Foundation of Liaoning Province (Grant No. 2021-BS-055), Fundamental Research Funds for the Central Universities (Grant No. N2119004).

DATA AVAILABILITY

The data and source code are available from https://github.com/qwslle/MTLMDA.

Author Biographies

Qiang He received the Ph.D. degree in computer application technology from the Northeastern University, Shenyang, China in 2020. He is currently Associate Professor at College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China.

Wei Qiao is currently pursuing a master’s degree in electronic information from the Northeastern University, Shenyang, China. His main research interests are data mining, medical informatics.

Hui Fang is an Associate Professor at Shanghai University of Finance and Economics, China. She received her PhD from Nanyang Technological University, Singapore. Her main research topic is personalized machine learning, including trust/link prediction in online communities, and recommender systems. She has published papers in leading conferences (e.g., IJCAI, AAAI and SIGIR), and journals (e.g., AIJ, TKDE, TOIS and TPAMI). She is the SE of the ECRA journal, and serves as a PC Board of IJCAI, and (Senior) PC Member for WWW, UMAP, IJCAI, AAAI and AAMAS, etc.

Yang Bao received Ph.D. degree in Information Systems from the School of Computing at the National University of Singapore, and bachelor & master degree from Nanjing University. He is currently an Associate Professor at the Antai College of Economics and Management (ACEM), Shanghai Jiao Tong University (SJTU), Shanghai, China.

REFERENCES

1.

Pasquinelli
 
AE
,
Ruvkun
 
G
.
Control of developmental timing by micrornas and their targets
.
Annu Rev Cell Dev Biol
 
2002
;
18
:
495
513
.

2.

Lee
 
RC
,
Feinbaum
 
RL
,
Ambros
 
V
.
The c. elegans heterochronic gene lin-4 encodes small rnas with antisense complementarity to lin-14
.
Cell
 
1993
;
75
(
5
):
843
54
.

3.

Ambros
 
V
.
The functions of animal micrornas
.
Nature
 
2004
;
431
(
7006
):
350
5
.

4.

Nahand
 
JS
,
Shojaie
 
L
,
Akhlagh
 
SA
, et al. .
Cell death pathways and viruses: role of micrornas
.
Mol Ther-Nucleic Acids
 
2021
;
24
:
487
511
.

5.

Meltzer
 
PS
.
Small rnas with big impacts
.
Nature
 
2005
;
435
(
7043
):
745
6
.

6.

Chen
 
X
,
Xie
 
D
,
Zhao
 
Q
,
You
 
ZH
.
Micrornas and complex diseases: from experimental results to computational models
.
Brief Bioinform
 
2019
;
20
(
2
):
515
39
.

7.

Iorio
 
MV
,
Ferracin
 
M
,
Liu
 
C
, et al. .
Microrna gene expression deregulation in human breast cancer
.
Cancer Res
 
2005
;
65
(
16
):
7065
70
.

8.

Sayed
 
D
,
Abdellatif
 
M
.
Micrornas in development and disease
.
Physiol Rev
 
2011
;
91
(
3
):
827
87
.

9.

Fani
 
M
,
Zandi
 
M
,
Ebrahimi
 
S
, et al. .
The role of mirnas in covid-19 disease
.
Future Virology
 
2021
;
16
(
4
):
301
6
.

10.

Li
 
C
,
Hu
 
X
,
Li
 
L
,
Li
 
JH
.
Differential microrna expression in the peripheral blood from human patients with covid-19
.
J Clin Lab Anal
 
2020
;
34
:
e23590
.

11.

Freeman
 
WM
,
Walker
 
SJ
,
Vrana
 
KE
.
Quantitative rt-pcr: pitfalls and potential
.
Biotechniques
 
1999
;
26
(
1
):
112
25
.

12.

Baskerville
 
S
,
Bartel
 
DP
.
Microarray profiling of micrornas reveals frequent coexpression with neighboring mirnas and host genes
.
RNA
 
2005
;
11
(
3
):
241
7
.

13.

Várallyay
 
E
,
Burgyán
 
J
,
Havelda
 
Z
.
Microrna detection by northern blotting using locked nucleic acid probes
.
Nat Protoc
 
2008
;
3
(
2
):
190
6
.

14.

Lu
 
M
,
Zhang
 
Q
,
Deng
 
M
, et al. .
An analysis of human microrna and disease associations
.
PloS One
 
2008
;
3
(
10
):
e3420
.

15.

Gu
 
C
,
Liao
 
B
,
Li
 
X
,
Li
 
K
.
Network consistency projection for human miRNA-disease associations inference
.
Sci Rep
 
2016
;
6
(
1
):
1
10
.

16.

Xuan
 
P
,
Han
 
K
,
Guo
 
Y
, et al. .
Prediction of potential disease-associated micrornas based on random walk
.
Bioinformatics
 
2015
;
31
(
11
):
1805
15
.

17.

Chen
 
X
,
Wang
 
L
,
Qu
 
J
, et al. .
Predicting miRNA–disease association based on inductive matrix completion
.
Bioinformatics
 
2018
;
34
(
24
):
4256
65
.

18.

Xu
 
J
,
Li
 
CX
,
Lv
 
JY
, et al. .
Prioritizing candidate disease mirnas by topological features in the mirna target–dysregulated network: case study of prostate cancermirna target–dysregulated network
.
Mol Cancer Ther
 
2011
;
10
(
10
):
1857
66
.

19.

Chen
 
X
,
Wang
 
CC
,
Yin
 
J
,
You
 
ZH
.
Novel human mirna-disease association inference based on random forest
.
Mol Therapy-Nucleic Acids
 
2018
;
13
:
568
79
.

20.

Li
 
J
,
Zhang
 
S
,
Liu
 
T
, et al. .
Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction
.
Bioinformatics
 
2020
;
36
(
8
):
2538
46
.

21.

Yan
 
C
,
Duan
 
G
,
Li
 
N
, et al. .
PDMDA: predicting deep-level miRNA–disease associations with graph neural networks and sequence features
.
Bioinformatics
 
2022
;
38
(
8
):
2226
34
.

22.

Long
 
M
,
Cao
 
Z
,
Wang
 
J
, et al. .
Learning multiple tasks with multilinear relationship networks
.
Adv Neural Inform Process Syst
 
2017
;
30
.

23.

Zhang
 
Y
,
Yang
 
Q
.
A survey on multi-task learning
.
IEEE Trans Knowl Data Eng
 
2021
;
34
(
12
):
5586
609
.

24.

Chen
 
X
,
Liu
 
MX
,
Yan
 
GY
.
Rwrmda: predicting novel human microrna–disease associations
.
Mol Biosyst
 
2012
;
8
(
10
):
2792
8
.

25.

Chen
 
X
,
Yan
 
CC
,
Zhang
 
X
, et al. .
Wbsmda: within and between score for mirna-disease association prediction
.
Sci Rep
 
2016
;
6
(
1
):
1
9
.

26.

Wang
 
F
,
Zhang
 
C
.
Label propagation through linear neighborhoods
.
IEEE Trans Knowl Data Eng
 
2007
;
20
(
1
):
55
67
.

27.

Li
 
G
,
Luo
 
J
,
Xiao
 
Q
, et al. .
Predicting microrna-disease associations using label propagation based on linear neighborhood similarity
.
J Biomed Inform
 
2018
;
82
:
169
77
.

28.

Wang
 
YT
,
Wu
 
QW
,
Gao
 
Z
, et al. .
MiRNA-disease association prediction via hypergraph learning based on high-dimensionality features
.
BMC Med Inform Decis Mak
 
2021
;
21
:
1
13
.

29.

Ha
 
J
,
Park
 
S
.
NCMD: Node2vec-based neural collaborative filtering for predicting miRNA-disease association
.
IEEE/ACM Trans Comput Biol Bioinform
 
2022
.

30.

Alaimo
 
S
,
Giugno
 
R
,
Pulvirenti
 
A
.
Ncpred: ncrna-disease association prediction through tripartite network-based inference
.
Front Bioeng Biotechnol
 
2014
;
2
:
71
.

31.

Yu
 
L
,
Shen
 
X
,
Zhong
 
D
,
Yang
 
J
.
Three-layer heterogeneous network combined with unbalanced random walk for mirna-disease association prediction
.
Front Genet
 
2020
;
10
:
1316
.

32.

Chen
 
X
,
Clarence Yan
 
C
,
Zhang
 
X
, et al. .
Rbmmmda: predicting multiple types of disease-microrna associations
.
Sci Rep
 
2015
;
5
(
1
):
1
13
.

33.

Yao
 
D
,
Zhan
 
X
,
Kwoh
 
CK
.
An improved random forest-based computational model for predicting novel mirna-disease associations
.
BMC Bioinformatics
 
2019
;
20
(
1
):
1
14
.

34.

Zheng
 
K
,
You
 
ZH
,
Wang
 
L
, et al. .
Mlmda: a machine learning approach to predict and validate microrna–disease associations by integrating of heterogenous information sources
.
J Transl Med
 
2019
;
17
(
1
):
1
14
.

35.

Ji
 
C
,
Gao
 
Z
,
Ma
 
X
, et al. .
Aemda: inferring mirna–disease associations based on deep autoencoder
.
Bioinformatics
 
2021
;
37
(
1
):
66
72
.

36.

Liu
 
D
,
Huang
 
Y
,
Nie
 
W
, et al. .
Smalf: mirna-disease associations prediction based on stacked autoencoder and xgboost
.
BMC bioinformatics
 
2021
;
22
(
1
):
1
18
.

37.

Hamilton
 
W
,
Ying
 
Z
,
Leskovec
 
J
.
Inductive representation learning on large graphs
.
Adv Neural Inform Process. Syst
 
2017
;
30
.

38.

Li
 
Z
,
Li
 
J
,
Nie
 
R
, et al. .
A graph auto-encoder model for mirna-disease associations prediction
.
Brief Bioinform
 
2021
;
22
(
4
).

39.

Wang
 
J
,
Li
 
J
,
Yue
 
K
, et al. .
NMCMDA: neural multicategory MiRNA–disease association prediction
.
Brief Bioinform
 
2021
;
22
(
5
):
bbab074
.

40.

Lou
 
Z
,
Cheng
 
Z
,
Li
 
H
, et al. .
Predicting miRNA–disease associations via learning multimodal networks and fusing mixed neighborhood information
.
Brief Bioinform
 
2022
;
23
(
5
).

41.

Li
 
Y
,
Qiu
 
C
,
Tu
 
J
, et al. .
Hmdd v2. 0: a database for experimentally supported human microrna and disease associations
.
Nucleic Acids Res
 
2014
;
42
(
D1
):
D1070
4
.

42.

Piñero
 
J
,
Queralt-Rosinach
 
N
,
Bravo
 
A
, et al. .
Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes
.
Database
 
2015
;
2015
.

43.

Wang
 
D
,
Wang
 
J
,
Lu
 
M
, et al. .
Inferring the human microrna functional similarity and functional network based on microrna-associated diseases
.
Bioinformatics
 
2010
;
26
(
13
):
1644
50
.

44.

Van Laarhoven
 
T
,
Nabuurs
 
SB
,
Marchiori
 
E
.
Gaussian interaction profile kernels for predicting drug–target interaction
.
Bioinformatics
 
2011
;
27
(
21
):
3036
43
.

45.

Goh
 
KI
,
Cusick
 
ME
,
Valle
 
D
, et al. .
The human disease network
.
Proc Natl Acad Sci
 
2007
;
104
(
21
):
8685
90
.

46.

Li
 
Z
,
Zhong
 
T
,
Huang
 
D
, et al. .
Hierarchical graph attention network for mirna-disease association prediction
.
Mol Ther
 
2022
;
30
(
4
):
1775
86
.

47.

Defferrard
 
M
,
Bresson
 
X
,
Vandergheynst
 
P
.
Convolutional neural networks on graphs with fast localized spectral filtering
.
Adv Neural Inform Process Syst
 
2016
;
29
.

48.

Wang
 
M Y.
Deep graph library: Towards efficient and scalable deep learning on graphs
. In:
ICLR Workshop on Representation Learning on Graphs and Manifolds
,
2019
.

49.

Tang
 
X
,
Luo
 
J
,
Shen
 
C
,
Lai
 
Z
.
Multi-view multichannel attention graph convolutional network for mirna–disease association prediction
.
Brief Bioinform
 
2021
;
22
(
6
):
bbab174
.

50.

Huang
 
Z
,
Shi
 
J
,
Gao
 
Y
, et al. .
Hmdd v3. 0: a database for experimentally supported human microrna–disease associations
.
Nucleic Acids Res
 
2019
;
47
(
D1
):
D1013
7
.

51.

Bandyopadhyay
 
S
,
Mitra
 
R
,
Maulik
 
U
,
Zhang
 
MQ
.
Development of the human cancer microrna network
.
Silence
 
2010
;
1
(
1
):
6
14
.

52.

Yang
 
Z
,
Ren
 
F
,
Liu
 
C
, et al. .
dbdemc: a database of differentially expressed mirnas in human cancers
. In:
BMC Genomics
, Vol.
11
.
Springer
,
2010
,
1
8
.

53.

Jiang
 
Q
,
Wang
 
Y
,
Hao
 
Y
, et al. .
mir2disease: a manually curated database for microrna deregulation in human disease
.
Nucleic Acids Res
 
2009
;
37
(
suppl_1
):
D98
104
.

54.

Xie
 
B
,
Ding
 
Q
,
Han
 
H
,
Wu
 
D
.
Mircancer: a microrna–cancer association database constructed by text mining on literature
.
Bioinformatics
 
2013
;
29
(
5
):
638
44
.

55.

Hirsch
 
FR
,
Jänne
 
PA
,
Eberhardt
 
WE
, et al. .
Epidermal growth factor receptor inhibition in lung cancer: status 2012
.
J Thorac Oncol
 
2013
;
8
(
3
):
373
84
.

56.

Cokkinides
 
V
,
Albano
 
J
,
Samuels
 
A
, et al. .  
American Cancer Society: Cancer Facts and Figures
.
Atlanta
:
American Cancer Society
,
2005
.

57.

Ward Elizabeth
 
M
, et al. .
Global cancer in women: burden and trend
.
Cancer Epidemiol Biomarkers Prevention: Publ Am Assoc Cancer Res
 
2017
;
26
(
4
):
444
57
.

58.

Ljungberg
 
B
,
Campbell
 
SC
,
Choi
 
HY
, et al. .
The epidemiology of renal cell carcinoma
.
Eur Urol
 
2011
;
60
(
4
):
615
21
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)