Improving the identification of miRNA–disease associations with multi-task learning on gene–disease networks

Summary of main notations

Variable	Description
\|$MD$\|	miRNA–disease relationship matrix
\|$n_{d}$\|	Number of disease nodes in the HMDD V2.0 -383
\|$n_{m}$\|	Number of miRNA nodes in the HMDD V2.0 -495
\|$GD$\|	Gene–disease relationship matrix
\|$n_{g}$\|	Number of gene nodes in the MTLMDA-4395
\|$K_{GIP,m}$\|	MiRNA Gaussian similarity matrix
\|$K_{GIP,d}$\|	Disease Gaussian similarity matrix
\|$Y_{m_{i}}$\|	The binary row vector in matrix MD
\|$Y_{d_{i}}$\|	The binary column vector in matrix MD
\|$r_{m},r_{d}$\|	The bandwidth of the kernel
\|$M(i)$\|	The \|$i$\|-th miRNA node feature representation
\|$D_{1}(\,j)$\|	The \|$j$\|-th disease node feature representation
\|$G(i)$\|	The \|$i$\|-th gene node feature representation
\|$D_{2}(\,j)$\|	The \|$j$\|-th diseasenode feature representation
\|$H_{m}(i),H_{d1}(i)$\|	Projection features of miRNA and disease node
\|$H_{g}(i),H_{d2}(i)$\|	Projection features of gene and disease node
\|$W$\|	Different weight matrices
\|$H_{aux-m},H_{aux-d1}$\|	Gene–disease network’s auxiliary information
\|$H_{aux-g},H_{aux-d2}$\|	Mirna–disease network’s auxiliary information
\|$H_{M}$\|	Integrated feature representation of miRNA nodes
\|$H_{D1}$\|	Disease’s integrated representation in miRNA–disease network
\|$H_{G}$\|	Integrated feature representation of gene nodes
\|$H_{D2}$\|	Disease’s integrated representation in gene–disease network
\|$F_{m}$\|	MiRNA node’s final representation in miRNA–disease network
\|$F_{d1}$\|	Disease node’s final representation in miRNA–disease network
\|$F_{g}$\|	Gene node’s final representation in gene–disease network
\|$F_{d2}$\|	Disease node’s final representation in gene–disease network
\|$\hat{y}_{md}$\|	Predicted association probability of miRNA and disease nodes
\|$\hat{y}_{gd}$\|	Predicted association probability of gene and disease nodes
\|$LOSS{m-d}$\|	The loss in the miRNA–disease sub-network
\|$LOSS{g-d}$\|	The loss in the gene–disease sub-network
\|$LOSS$\|	The loss of the entire model

Variable	Description
\|$MD$\|	miRNA–disease relationship matrix
\|$n_{d}$\|	Number of disease nodes in the HMDD V2.0 -383
\|$n_{m}$\|	Number of miRNA nodes in the HMDD V2.0 -495
\|$GD$\|	Gene–disease relationship matrix
\|$n_{g}$\|	Number of gene nodes in the MTLMDA-4395
\|$K_{GIP,m}$\|	MiRNA Gaussian similarity matrix
\|$K_{GIP,d}$\|	Disease Gaussian similarity matrix
\|$Y_{m_{i}}$\|	The binary row vector in matrix MD
\|$Y_{d_{i}}$\|	The binary column vector in matrix MD
\|$r_{m},r_{d}$\|	The bandwidth of the kernel
\|$M(i)$\|	The \|$i$\|-th miRNA node feature representation
\|$D_{1}(\,j)$\|	The \|$j$\|-th disease node feature representation
\|$G(i)$\|	The \|$i$\|-th gene node feature representation
\|$D_{2}(\,j)$\|	The \|$j$\|-th diseasenode feature representation
\|$H_{m}(i),H_{d1}(i)$\|	Projection features of miRNA and disease node
\|$H_{g}(i),H_{d2}(i)$\|	Projection features of gene and disease node
\|$W$\|	Different weight matrices
\|$H_{aux-m},H_{aux-d1}$\|	Gene–disease network’s auxiliary information
\|$H_{aux-g},H_{aux-d2}$\|	Mirna–disease network’s auxiliary information
\|$H_{M}$\|	Integrated feature representation of miRNA nodes
\|$H_{D1}$\|	Disease’s integrated representation in miRNA–disease network
\|$H_{G}$\|	Integrated feature representation of gene nodes
\|$H_{D2}$\|	Disease’s integrated representation in gene–disease network
\|$F_{m}$\|	MiRNA node’s final representation in miRNA–disease network
\|$F_{d1}$\|	Disease node’s final representation in miRNA–disease network
\|$F_{g}$\|	Gene node’s final representation in gene–disease network
\|$F_{d2}$\|	Disease node’s final representation in gene–disease network
\|$\hat{y}_{md}$\|	Predicted association probability of miRNA and disease nodes
\|$\hat{y}_{gd}$\|	Predicted association probability of gene and disease nodes
\|$LOSS{m-d}$\|	The loss in the miRNA–disease sub-network
\|$LOSS{g-d}$\|	The loss in the gene–disease sub-network
\|$LOSS$\|	The loss of the entire model

Table 1

Summary of main notations

Variable	Description
\|$MD$\|	miRNA–disease relationship matrix
\|$n_{d}$\|	Number of disease nodes in the HMDD V2.0 -383
\|$n_{m}$\|	Number of miRNA nodes in the HMDD V2.0 -495
\|$GD$\|	Gene–disease relationship matrix
\|$n_{g}$\|	Number of gene nodes in the MTLMDA-4395
\|$K_{GIP,m}$\|	MiRNA Gaussian similarity matrix
\|$K_{GIP,d}$\|	Disease Gaussian similarity matrix
\|$Y_{m_{i}}$\|	The binary row vector in matrix MD
\|$Y_{d_{i}}$\|	The binary column vector in matrix MD
\|$r_{m},r_{d}$\|	The bandwidth of the kernel
\|$M(i)$\|	The \|$i$\|-th miRNA node feature representation
\|$D_{1}(\,j)$\|	The \|$j$\|-th disease node feature representation
\|$G(i)$\|	The \|$i$\|-th gene node feature representation
\|$D_{2}(\,j)$\|	The \|$j$\|-th diseasenode feature representation
\|$H_{m}(i),H_{d1}(i)$\|	Projection features of miRNA and disease node
\|$H_{g}(i),H_{d2}(i)$\|	Projection features of gene and disease node
\|$W$\|	Different weight matrices
\|$H_{aux-m},H_{aux-d1}$\|	Gene–disease network’s auxiliary information
\|$H_{aux-g},H_{aux-d2}$\|	Mirna–disease network’s auxiliary information
\|$H_{M}$\|	Integrated feature representation of miRNA nodes
\|$H_{D1}$\|	Disease’s integrated representation in miRNA–disease network
\|$H_{G}$\|	Integrated feature representation of gene nodes
\|$H_{D2}$\|	Disease’s integrated representation in gene–disease network
\|$F_{m}$\|	MiRNA node’s final representation in miRNA–disease network
\|$F_{d1}$\|	Disease node’s final representation in miRNA–disease network
\|$F_{g}$\|	Gene node’s final representation in gene–disease network
\|$F_{d2}$\|	Disease node’s final representation in gene–disease network
\|$\hat{y}_{md}$\|	Predicted association probability of miRNA and disease nodes
\|$\hat{y}_{gd}$\|	Predicted association probability of gene and disease nodes
\|$LOSS{m-d}$\|	The loss in the miRNA–disease sub-network
\|$LOSS{g-d}$\|	The loss in the gene–disease sub-network
\|$LOSS$\|	The loss of the entire model

Variable	Description
\|$MD$\|	miRNA–disease relationship matrix
\|$n_{d}$\|	Number of disease nodes in the HMDD V2.0 -383
\|$n_{m}$\|	Number of miRNA nodes in the HMDD V2.0 -495
\|$GD$\|	Gene–disease relationship matrix
\|$n_{g}$\|	Number of gene nodes in the MTLMDA-4395
\|$K_{GIP,m}$\|	MiRNA Gaussian similarity matrix
\|$K_{GIP,d}$\|	Disease Gaussian similarity matrix
\|$Y_{m_{i}}$\|	The binary row vector in matrix MD
\|$Y_{d_{i}}$\|	The binary column vector in matrix MD
\|$r_{m},r_{d}$\|	The bandwidth of the kernel
\|$M(i)$\|	The \|$i$\|-th miRNA node feature representation
\|$D_{1}(\,j)$\|	The \|$j$\|-th disease node feature representation
\|$G(i)$\|	The \|$i$\|-th gene node feature representation
\|$D_{2}(\,j)$\|	The \|$j$\|-th diseasenode feature representation
\|$H_{m}(i),H_{d1}(i)$\|	Projection features of miRNA and disease node
\|$H_{g}(i),H_{d2}(i)$\|	Projection features of gene and disease node
\|$W$\|	Different weight matrices
\|$H_{aux-m},H_{aux-d1}$\|	Gene–disease network’s auxiliary information
\|$H_{aux-g},H_{aux-d2}$\|	Mirna–disease network’s auxiliary information
\|$H_{M}$\|	Integrated feature representation of miRNA nodes
\|$H_{D1}$\|	Disease’s integrated representation in miRNA–disease network
\|$H_{G}$\|	Integrated feature representation of gene nodes
\|$H_{D2}$\|	Disease’s integrated representation in gene–disease network
\|$F_{m}$\|	MiRNA node’s final representation in miRNA–disease network
\|$F_{d1}$\|	Disease node’s final representation in miRNA–disease network
\|$F_{g}$\|	Gene node’s final representation in gene–disease network
\|$F_{d2}$\|	Disease node’s final representation in gene–disease network
\|$\hat{y}_{md}$\|	Predicted association probability of miRNA and disease nodes
\|$\hat{y}_{gd}$\|	Predicted association probability of gene and disease nodes
\|$LOSS{m-d}$\|	The loss in the miRNA–disease sub-network
\|$LOSS{g-d}$\|	The loss in the gene–disease sub-network
\|$LOSS$\|	The loss of the entire model

The construction of subnetworks

In this part, we firstly describe the formation process of the two sub-networks of miRNA–disease and gene–disease, and secondly we depict the way of constructing the Gaussian similarity of the sub-networks.

Human miRNA-disease associations

In our study, miRNA–disease associations is derived from the HMDD v2.0 [41] (https://www.cuilab.cn/hmdd.), which is a mature dataset and contains experimentally verified associations of |$n_{d}(383)$| diseases and |$n_{m}(495)$| miRNAs. The |$5430$| associations of miRNA–disease have been confirmed on this dataset. In the experiments, the identified relationship between diseases and miRNAs is represented as a matrix |$MD$| with |$n_{d}$| columns and |$n_{m}$| rows. The value of an entry is |$1$| if the corresponding disease is associated with the corresponding miRNA, otherwise |$0$| (meaning the relationship is unknown). The matrix |$MD$| is represented as follows:

$$ \begin{align}& MD(i,j)= { \begin{cases} 1,& if\:\ m_{i}\:\ is\ associated\ with\:\ d_{j}\\ 0,& otherwise \end{cases}} \end{align} $$

(1)

In total, there are |$189 585$| combinations in |$\mathbf{MD}$| with |$n_{m}(495)$| rows and |$n_{d}(383)$| columns. Besides, |$m_{i}$| represents the |$i$|-th miRNA (also the |$i$|-th row in |$\mathbf{MD}$|⁠), while |$d_{j}$| represents the |$j$|-th disease (the |$j$|-th column in |$\mathbf{MD}$|⁠).

Human genes-disease associations

In our study, the data of gene–disease associations is generated by our manual filtering. Firstly, the original gene–disease relationships are from the DisGeNet [42] (http://www.disgenet.org/home/), which is a database of gene–disease associations. The file named “Curated gene–disease associations” in this database can be downloaded directly from the website and contains |$84 038$| confirmed relationships between human diseases and genes. Then, we select the same disease and related genes as the corresponding miRNA–disease sub-network to form the genes–disease sub-network. Finally, the genes–disease sub-network contains |$9286$| associations between |$n_{d}(383)$| diseases and |$n_{g}(4395)$| genes. Thus, similar to miRNA–disease sub-network, we create a matrix |$\mathbf{GD}$| with |$n_{g}$| rows and |$n_{d}$| columns. If the disease is associated with a gene, the corresponding entry value is |$1$|⁠, |$0$| otherwise. Similarly, the known |$9286$| associations of genes and diseases are used as the positive samples of genes–disease sub-network, and then the negative samples were select from the entries with value |$0$| 0 in |$\mathbf{GD}$|⁠, randomly. The matrix |$\mathbf{GD}$| is represented as follows:

$$ \begin{align}& GD(i,j)= { \begin{cases} 1, & if\:\ g_{i}\:\ is\ associated\ with\:\ d_{j}\\ 0, & otherwise \end{cases} } \end{align} $$

(2)

where |$g_{i}$| represents the |$i$|-th gene (⁠|$i$|-th row in |$\mathbf{GD}$|⁠), and |$d_{j}$| represents the |$j$|-th disease, i.e. |$j$|-th column in |$\mathbf{GD}$|⁠.

Gaussian interaction profile kernel similarity for miRNAs and diseases in miRNA–disease subnetwork

Previous studies [43] pointed that similar diseases are often associated with functionally similar miRNAs, based on this hypothesis, Gaussian interaction profile kernel similarity can be well used to simulate the similarity between miRNAs and diseases in miRNA–disease sub-network, and thus is adopted in our study. Specifically, Gaussian interaction profile kernel similarity for miRNAs is calculated by the information of known miRNA–disease associations. Each row in the matrix |$MD$| is represented by a binary vector |$Y_{m}$| that shows the associations between a certain miRNA and various diseases in the miRNA–disease sub-network. The Gaussian interaction profile kernel similarity between miRNA |$m_{i}$| and |$m_{j}$| can be defined as follows:

$$ \begin{align}& K_{GIP,m}(m_{i},m_{j})=\exp \left(-r_{m}\lVert Y_{m_{i}}-Y_{m_{j}}\rVert^{2}\right) \end{align} $$

(3)

where |$r_{m}$| represents the bandwidth of the kernel, which can be calculated by:

$$ \begin{align}& r_{m}= {r^{\prime}_{m}}/ \left(\frac 1 {n_{m}}\sum_{i=1}^{n_{m}}\lVert Y_{m_{i}}\rVert^{2}\right) \end{align} $$

(4)

where |$n_{m}$| represents the total number of miRNAs (⁠|$495$| in this study) and |$r^{\prime}_{m}$| denotes a normalization constant and following previous studies [44], we also set it to be |$1$|⁠. Figure 1 illustrates the Gaussian similarity of some miRNAs in the miRNA–disease sub-network, which represents the potential miRNA–miRNA correlation coefficient in the miRNA–disease sub-network ranging from |$0$| to |$1$|⁠.

Figure 1

miRNAs Gaussian similarity in miRNA–disease sub-network (Note: ‘m125a,’ ‘m196a,’ ‘m499a,’ ‘m1229,’ ‘m944’ and ‘m518a’ represent miRNA ‘hsa-mir-125a,’ ‘hsa-mir-196a,’ ‘hsa-mir-499a,’ ‘hsa-mir-1229,’ ’hsa-mir-944’ and ‘hsa-mir-518a’, respectively).

Similarly, we can obtain the Gaussian interaction profile kernel similarity of the diseases according to the following formula:

$$ \begin{align}& K_{GIP,d}(d_{i},d_{j})=\exp \left(-r_{d}\lVert Y_{d_{i}}-Y_{d_{j}}\rVert^{2}\right) \end{align} $$

(5)

$$ \begin{align}& r_{d}= {r^{\prime}_{d}}/ \left(\frac 1 {n_{d}}\sum_{i=1}^{n_{d}}\lVert Y_{d_{i}}\rVert^{2}\right) \end{align} $$

(6)

where |$Y_{d}$| is the binary column vector of the matrix |$MD$|⁠, representing associations between miRNAs and each disease; |$n_{d}$| represents the total number of diseases (i.e. |$383$|⁠) and the normalization constant, |$r^{\prime}_{d}$|⁠, is set to |$1$|⁠. Figure 2 visualizes the Gaussian similarity of diseases in the corresponding miRNA–disease sub-network, which represents the potential disease–disease correlation coefficient ranging from |$0$| to |$1$|⁠.

Figure 2

Diseases Gaussian similarity in miRNA–disease sub-network (Note: ‘A,H,’ ‘AIS,’ ‘APA,’ ‘Vit,’ ‘WaM’ and ‘WaI’ represent diseases ‘Abortion, Habitual,’ ‘Acquired Immunodeficiency Syndrome,’ ‘ACTH-Secreting Pituitary Adenoma,’ ‘Vitiligo,’ ‘Waldenstrom Macroglobulinemia’ and ‘Wounds and Injuries,’ respectively).

We take the obtained Gaussian similarity of diseases and miRNAs as initial node features of disease and miRNA in the miRNA–disease sub-network, respectively.

Gaussian interaction profile kernel similarity for genes and diseases in the gene–disease sub-network

The biological principle of guilt-by-association shows that genes associated with similar disorders have demonstrated higher probability of physical interactions between their gene products [45]. Therefore, similarly, as demonstrated in Section Gaussian Interaction Profile Kernel Similarity for miRNAs and Diseases in miRNA-Disease Subnetwork, we also adopt Gaussian interaction profile kernel similarity measurement to calculate the similarity between genes and that between diseases based on genes–disease sub-network, |$GD$|⁠. These two kinds of similarity are then used as original features of disease and gene nodes in genes–disease sub-network, respectively.

Proposed model framework

Inspired by the use of graph neural networks in bioinformatics and multi-task learning method in ML applications, in our study, we propose an effective MTLMDA model, which contains two sub-networks (i.e. miRNA–disease and gene–disease networks), a graph convolutional network encoder, and a bilinear decoder, as shown in Figure 3. Specifically, as elaborated in Section of The Construction of Subnetworks, we construct two sub-networks from the corresponding datasets and provide initial features to the nodes of the two networks through Gaussian similarity. And then, through the cross&compress units module, we use the joint characteristics of the two sub-networks as their auxiliary information, and finally leverage the encoder and decoder in an end-to-end manner to predict the probability of potential miRNA and disease associations. The whole MTLMDA model can be described by the following six steps, and Algorithm 1 summarizes the main procedure of MLTMDA.

Figure 3

The overall framework of model MTLMDA.

In MTLMDA (see Figure 3 and Algorithm 1), Step I constructs miRNA–disease sub-network and gene–disease sub-network and to form a heterogeneous graph; Step II projects the corresponding disease and miRNA nodes, disease and gene nodes in each sub-network to the same vector space; Step III extracts the initial joint features of the two sub-networks as their auxiliary information; Step IV uses a graph convolutional network to get the embeddings of two sub-network nodes; Step V simultaneously feeds the node embeddings obtained in the two sub-networks into the linear decoder to reconstruct the links of two sub-networks; and Step (VI) trains the entire model by an end-to-end manner using the cross-entropy loss function of the integrated two sub-networks. In the following, we will elaborate the six steps in great details.

Step I: As introduced in Section of The Construction of Subnetworks, in miRNA–disease sub-network, there are |$495$| miRNA nodes and |$383$| disease nodes. From HMDD v.2.0, we can get |$5430$| experimentally verified miRNA–disease associations, which are treated as positive samples (with label value |$1$|⁠). As previous studies [38, 46], we randomly select miRNA–disease pairs from all the unknown miRNA–disease associations (marked as |$0$| in |$MD$|⁠) as negative samples (with label value |$0$|⁠). In addition, we introduce Gaussian similarity as the node feature of miRNA–disease sub-network. Therefore, node feature of |$i$|-th miRNA, |$M(i)$| can be expressed as a 495-dimension vector:

$$ \begin{align}& M(i)=(x^{1}_{i,1},x^{1}_{i,2},\dots,x^{1}_{i,n_{m}}) \end{align} $$

(7)

where |$x^{1}_{i,j}$| represents the Gaussian similarity between miRNA |$m_{i}$| and miRNA |$m_{j}$| in the miRNA–disease sub-network. Similarly, the node feature of |$i$|-th disease, |$D_{1}(i)$|⁠, can be expressed as a 383-dimension vector:

$$ \begin{align}& D_{1}(i)=(z^{1}_{i,1},z^{1}_{i,2},\dots,z^{1}_{i,n_{d}}) \end{align} $$

(8)

where |$z_{i,j}$| represents the Gaussian similarity between disease |$d_{i}$| and disease |$d_{j}$|⁠.

In gene–disease sub-network, we mine the associated gene–disease pairs in terms of diseases in miRNA–disease sub-network from the DisGeNet database. There are |$9286$| associations (i.e. positive samples with label value |$1$|⁠) between |$n_{d}$| diseases and |$n_{g}$| genes. We randomly select the gene–disease pairs from all the unknown disease-gene associations (marked as |$0$| in |$GD$|⁠) to form negative samples (with label value |$0$|⁠). We also deploy Gaussian similarity as initial features for gene and disease nodes, |$G(i)$| (feature vector of gene |$g_{i}$|⁠) and |$D_{2}(\,j)$| (feature vector of disease |$d_{i}$|⁠), can be expressed as follows respectively:

$$ \begin{align}& G(i)=(x^{2}_{i,1},x^{2}_{i,2},\dots,x^{2}_{i,n_{g}}) \end{align} $$

(9)

$$ \begin{align}& D_{2}(i)=(z^{2}_{i,1},z^{2}_{i,2},\dots,z^{2}_{i,n_{d}}) \end{align} $$

(10)

where |$x^{2}_{i,j}$| represents the Gaussian similarity between gene |$g_{i}$| and |$g_{j}$|⁠, and |$z^{2}_{i,j}$| represents the Gaussian similarity between disease |$d_{i}$| and disease |$d_{j}$| in the gene–disease sub-network.

Step II: In the two sub-networks, nodes possess feature vectors of varying dimensions. To streamline the calculation process in subsequent steps, we have developed a projection module that unifies disparate node features into a common vector space. Specifically, in the miRNA–disease sub-network, the projection module maps disease and miRNA node features to a uniform 1024-dimensional space via a transition matrix. The process is as follows:

$$ \begin{align}& H_{m}(i)= M(i) \cdot \mathbf{W}_{m} \end{align} $$

(11)

$$ \begin{align}& {H_{d1}(i)= D_{1}(i) \cdot \mathbf{W}_{d}} \end{align} $$

(12)

$$ \begin{align}& H_{g}(i)= G(i) \cdot \mathbf{W}_{g} \end{align} $$

(13)

$$ \begin{align}& {H_{d2}(i)= D_{2}(i) \cdot\mathbf{W}_{d}} \end{align} $$

(14)

where |$H_{m}(i)\in \mathbb{R}^{1024}$| and |$H_{d1}(i)\in \mathbb{R}^{1024}$| are projection features of miRNA node |$m_{i}$| and disease node |$d_{i}$| in miRNA–disease network. Likewise, |$H_{g}(i)\in \mathbb{R}^{1024}$| and |$H_{d2}(i)\in \mathbb{R}^{1024}$| are projection features of gene |$g_{i}$| and disease |$d_{i}$| in the gene–disease network. The learnable weight matrices |$\mathbf{W}_{m}\in \mathbb{R}^{495\times 1024}$|⁠, |$\mathbf{W}_{g}\in \mathbb{R}^{4395\times 1024}$| and |$\mathbf{W}_{d}\in \mathbb{R}^{383\times 1024}$| are automatically generated by calling the torch package, according to the size requirements of our designed space vector. In order to reduce redundant parameters and learning time in the experiment, here, the weight matrix |$\mathbf{W}_{d}\in \mathbb{R}^{383\times 1024}$| is used to share to complete the task of mapping the disease nodes to the latent space in the two networks.

Step III: In this step, we connect the two sub-networks through cross&compress units module and simultaneously extract the auxiliary information from both sub-networks by analyzing the MD and GD matrices. |$\mathbf{H}_{aux-m}\in \mathbb{R}^{495\times 1024}$| and |$\mathbf{H}_{aux-d1}\in \mathbb{R}^{383\times 1024}$| respectively represent the miRNA and disease nodes in the miRNA–disease network, which have obtained auxiliary information from their own network and the gene–disease network.

$$ \begin{align}& \mathbf{H}_{aux-m}= MD \cdot \mathbf{W}_{aux-m} \end{align} $$

(15)

$$ \begin{align}& \mathbf{H}_{aux-d1}= GD^{T} \cdot \mathbf{W}_{aux-d1} \end{align} $$

(16)

The weight matrices, |$\mathbf{W}_{aux-m}\in \mathbb{R}^{383\times 1024}$| and |$\mathbf{W}_{aux-d1}\in \mathbb{R}^{4395\times 1024}$|⁠, are automatically generated using the Torch package to extract auxiliary information from corresponding networks. A similar process occurred in the gene–disease network is as follows:

$$ \begin{align}& \mathbf{H}_{aux-g}= GD \cdot \mathbf{W}_{aux-g} \end{align} $$

(17)

$$ \begin{align}& \mathbf{H}_{aux-d2}= MD^{T}\cdot \mathbf{W}_{aux-d2} \end{align} $$

(18)

Ultimately, we concatenate the initial features of the nodes with the auxiliary features to form the new features of the nodes, which can be summarized as follows:

$$ \begin{align}& \mathbf{H}_{M}= cat(\mathbf{H}_{m},\mathbf{H}_{aux-m}) \end{align} $$

(19)

$$ \begin{align}& \mathbf{H}_{D1}=cat(\mathbf{H}_{d1},\mathbf{H}_{aux-d1}) \end{align} $$

(20)

$$ \begin{align}& \mathbf{H}_{G}= cat(\mathbf{H}_{g},\mathbf{H}_{aux-g}) \end{align} $$

(21)

$$ \begin{align}& \mathbf{H}_{D2}=cat(\mathbf{H}_{d2},\mathbf{H}_{aux-d2}) \end{align} $$

(22)

where |$\mathbf{H}_{M}\in \mathbb{R}^{495\times 2048}$| and |$\mathbf{H}_{D1}\in \mathbb{R}^{383\times 2048}$| represent the integrated feature representations of nodes in the miRNA–disease network, while |$\mathbf{H}_{G}\in \mathbb{R}^{4935\times 2048}$| and |$\mathbf{H}_{D2}\in \mathbb{R}^{383\times 2048}$| represent integrated gene and disease nodes features in gene–disease sub-network respectively.

Step IV: Here, we further obtain the representations of two sub-network nodes using information about their direct neighbors in their respective networks based on graph convolutional network (GCN) encoder. Here, we adopt the Chebyshev filter-based approach (ChebConv) as MTLMDA’s encoder in view of its great expressive power [47]. At each layer of a graph convolutional network, MTLMDA update nodes’ embeddings according to the edges in the respective sub-networks (take the miRNA–disease sub-network as an example):

$$ \begin{align}& {h}_{i}^{(l+1)}=\mathbf{W}^{0,l}\mathbf{z}^{0,l}+\mathbf{W}^{1,l}\mathbf{z}^{1,l}+\cdots+\mathbf{W}^{k,l}\mathbf{z}^{k,l} \end{align} $$

(23)

$$ \begin{align}& \begin{split} \mathbf{Z}^{k,l}=2\cdot \widetilde{\mathbf{L}}\cdot \mathbf{Z}^{k-1,l}-\mathbf{Z}^{k-2,l},\\ \text{where}\ \mathbf{Z}^{0,l}=\begin{bmatrix}\mathbf{H}^{l}_{M}\\ \mathbf{H}^{l}_{D1}\end{bmatrix},\mathbf{Z}^{1,l}=\mathbf{\widetilde{L}} \cdot\begin{bmatrix}\mathbf{H}^{l}_{M}\\ \mathbf{H}^{l}_{D1}\end{bmatrix} \end{split} \end{align} $$

(24)

$$ \begin{align}& \widetilde{\mathbf{L}}=2 \overbrace{(\mathbf{I}-\widetilde{\mathbf{D}}^{-1/2}\widetilde{\mathbf{A}}\widetilde{\mathbf{D}}^{1/2})}^{\mathbf{L}}/\lambda_{max}-\mathbf{I} \end{align} $$

(25)

where |$\mathbf{h}_{i}^{(l)}\in \mathbb{R}^{u}$| represents the hidden state of node |$i$| at the |$l$|-th layer of the GCN (⁠|$u$| is the dimension of hidden state representation), and |$\mathbf{W}$| is learnable weight. In the study, |$k$| is the Chebyshev filter size (being set to 2 here). |$\widetilde{\mathbf{A}}\in \mathbb{R}^{(n_{d}+n_{m})\times (n_{d}+n_{m})}=\mathbf{A}_{MD}+\mathbf{I}$|⁠, where |$A_{MD}$| is the adjacency matrix of the miRNA–disease sub-network. |$\widetilde{\mathbf{D}}$| is the diagonal degree matrix of |$\widetilde{\mathbf{A}}$| with |$D_{ii}=\sum _{j}a_{i,j}$| (⁠|$a_{i,j}$| denotes corresponding entry value of matrix |$\widetilde{\mathbf{A}}$|⁠). |$\lambda _{max}$| is the largest eigenvalue of |$\mathbf{L}$|⁠. With the above Cheb-GCN encoder, we get the embeddings of the nodes in the two sub-network the final, respectively.

Step V: After obtaining the node representations of the two sub-networks, we then employ a linear decoder to reconstruct the links of heterogeneous graphs in the two sub-networks. The detailed representation is as follows:

$$ \begin{align}& \hat{y}_{md} = Sigmoid((F_{m(i)})^{\mathrm{T}}\mathbf{Q}_{1}F_{d1(\,j)}) \end{align} $$

(26)

$$ \begin{align}& \hat{y}_{gd} = Sigmoid((F_{g(i)})^{\mathrm{T}}\mathbf{Q}_{\,2}F_{d2(\,j)}) \end{align} $$

(27)

where |$\hat{y}_{md}$| represents the predicted association probability of miRNA node |$m(i)$| and disease node |$d(\,j)$| in the miRNA–disease sub-network, |$F_{m}$| and |$F_{d1}$| represent the final miRNA and disease node embedding representations obtained through the MTLMDA encoder respectively in the miRNA–disease sub-network, and |$Q_{1}$| denotes a trainable parameter matrix, which is 64*64 dimensions. Sigmoid function represents:

$$ \begin{align}& Sigmoid(x)=\frac 1 {1+e^{-x}} \end{align} $$

(28)

similarly, |$\hat{y}_{gd}$| represents the predicted association probability of gene nodes and the disease nodes in the gene–disease sub-network.

Step VI: The loss function of MTLMDA model is the sum of reconstructed errors from all training samples in the two sub-networks. Here, we choose the cross-entropy loss function to measure the error between the ture vaule |$y$| of each associations in the sub-network and predicted probability value |$\hat{y}$|⁠. The form is as follows:

$$ \begin{align}& LOSS_{m-d}=-\sum y_{ij} log\hat{y}_{ij}+(1-y_{ij})log(1-\hat{y}_{ij}) \end{align} $$

(29)

$$ \begin{align}& LOSS_{g-d}=-\sum y_{ij} log\hat{y}_{ij}+(1-y_{ij})log(1-\hat{y}_{ij}) \end{align} $$

(30)

where |$LOSS_{m-d}$| represents the functional loss in the miRNA–disease sub-network, |$\hat{y}_{ij}$| represents the predicted link probability between disease and miRNA nodes, while |$y_{ij}$| represents the true label of the link, which will be 1 or 0. Correspondingly, |$LOSS_{g-d}$| represents the functional loss in the gene–disease sub-network. We take the sum of the two sub-network losses as the loss of MTLMDA whose form is as follows:

$$ \begin{align}& Loss=LOSS_{g-d}+LOSS_{m-d} \end{align} $$

(31)

Then, we use the Loss function in Eq. (31) to train the whole model via back propagation algorithm with an end-toend manner.

RESULTS

In this section, we show the comparison results under different experimental conditions and different models on HMDD v2.0 dataset [41] to demonstrate the effectiveness of our proposed MTLMDA model.

Implementation settings

MTLMDA is implemented in the pytorch(v1.10.2) framework based on the DGL(v0.6.1) platform [48]. During model training, model parameters are randomly initialized and optimized with Adam. We adopted grid search to find the MTLMDA’s optimal hyper parameters, the learning rate is set to 0.0001 and the weight decay is set to 3*|$10^{-4}$|⁠. In order to prevent the overfitting, we add a drop mechanism in the model [38]. We select different dropout rates from 0.1 to 0.9 during the training process and the model performs the best when dropout rate is 0.3. The entire model is trained 800 epochs and output the test set results every 10 epochs. Please see Table 2 for the detailed hyperparameter settings in our experiment. The 5-fold cross-validation is applied to the performance evaluation of MTLMDA. In 5-fold cross-validation, the sample dataset is randomly divided into five subsets. At each time, one subset of data is used as the test set and the remaining four sets of subsets are used as the training set. After repeating the similar process five times, we can obtain the objective and fair experimental evaluation results.

Table 2

Setting of hyperparameters

Model	Hyper-parameter	HMDD v2.0	Searching space	Description
MTLMDA	Weight\|$\_$\|decay	\|$3*10^{-4}$\|	\|$[10^{-3},10^{-4},3*10^{-4},10^{-5}]$\|	\|$L_{2}$\| regularization coefficient.
	Layers	3	\|$[1,2,3,4,5]$\|	Number of layers of chebGCN.
	Dropout	0.3	\|$[0.1-0.9]$\|	Dropout rate.
	\|$lr$\|	\|$10^{-4}$\|	\|$[10^{-2},10^{-3},5*10^{-4},10^{-4},10^{-5}]$\|	Learning rate.
	Projection dimension	1024	\|$[64,128,256,512,1024,2048]$\|	Node feature mapping dimension.
	Embedding dimension	64	\|$[16,32,64,128,256,512]$\|	Node feature embedding dimension.

Model	Hyper-parameter	HMDD v2.0	Searching space	Description
MTLMDA	Weight\|$\_$\|decay	\|$3*10^{-4}$\|	\|$[10^{-3},10^{-4},3*10^{-4},10^{-5}]$\|	\|$L_{2}$\| regularization coefficient.
	Layers	3	\|$[1,2,3,4,5]$\|	Number of layers of chebGCN.
	Dropout	0.3	\|$[0.1-0.9]$\|	Dropout rate.
	\|$lr$\|	\|$10^{-4}$\|	\|$[10^{-2},10^{-3},5*10^{-4},10^{-4},10^{-5}]$\|	Learning rate.
	Projection dimension	1024	\|$[64,128,256,512,1024,2048]$\|	Node feature mapping dimension.
	Embedding dimension	64	\|$[16,32,64,128,256,512]$\|	Node feature embedding dimension.

Table 2

Setting of hyperparameters

Model	Hyper-parameter	HMDD v2.0	Searching space	Description
MTLMDA	Weight\|$\_$\|decay	\|$3*10^{-4}$\|	\|$[10^{-3},10^{-4},3*10^{-4},10^{-5}]$\|	\|$L_{2}$\| regularization coefficient.
	Layers	3	\|$[1,2,3,4,5]$\|	Number of layers of chebGCN.
	Dropout	0.3	\|$[0.1-0.9]$\|	Dropout rate.
	\|$lr$\|	\|$10^{-4}$\|	\|$[10^{-2},10^{-3},5*10^{-4},10^{-4},10^{-5}]$\|	Learning rate.
	Projection dimension	1024	\|$[64,128,256,512,1024,2048]$\|	Node feature mapping dimension.
	Embedding dimension	64	\|$[16,32,64,128,256,512]$\|	Node feature embedding dimension.

Model	Hyper-parameter	HMDD v2.0	Searching space	Description
MTLMDA	Weight\|$\_$\|decay	\|$3*10^{-4}$\|	\|$[10^{-3},10^{-4},3*10^{-4},10^{-5}]$\|	\|$L_{2}$\| regularization coefficient.
	Layers	3	\|$[1,2,3,4,5]$\|	Number of layers of chebGCN.
	Dropout	0.3	\|$[0.1-0.9]$\|	Dropout rate.
	\|$lr$\|	\|$10^{-4}$\|	\|$[10^{-2},10^{-3},5*10^{-4},10^{-4},10^{-5}]$\|	Learning rate.
	Projection dimension	1024	\|$[64,128,256,512,1024,2048]$\|	Node feature mapping dimension.
	Embedding dimension	64	\|$[16,32,64,128,256,512]$\|	Node feature embedding dimension.

Evaluation metrics

To comprehensively evaluate the performance of our proposed MTLMDA, we choose Precision (Prec.), Accuracy (Acc.), Recall, F1 score, AUC and precision–recall (P–R) curve as the evaluation criteria. The corresponding mathematical calculation is represented as follows:

$$ \begin{align} & Precision= \frac{TP} {TP+FP}\end{align} $$

(32)

$$ \begin{align}& Accuracy= \frac{TP+TN} {TP+TN+FP+FN}\end{align} $$

(33)

$$ \begin{align}& Recall= \frac{TP} {TP+FN} \end{align} $$

(34)

$$ \begin{align}& F1-score= \frac{2TP} {2TP+FN+FP} \end{align} $$

(35)

where |$TP$|⁠, |$FP$|⁠, |$TN$| and |$FN$| denote true positive, false positive, true negative and false negative, respectively. AUC refers to the area under the receiver operating characteristic (ROC) curve, which can quantitatively reflect the model performance measured based on the ROC curve. The abscissa of |$ROC$| curve represents |$FPR$| and the ordinate is |$TPR$| where |$TPR$| and |$TPR$| are calculated as follows:

$$ \begin{align}& TPR= \frac{TP} {TP+FN} \end{align} $$

(36)

$$ \begin{align}& FPR= \frac{FP} {FP+TN} \end{align} $$

(37)

The abscissa of the P–R curve represents the recall of the model and the ordinate represents the precision. The larger area in the P–R curve represents better model performance. Table 3 describes the values of various evaluation indicators of our model using 5-fold cross-validation in detail.

Table 3

5-fold cross-validation results performed

Test set	Precision	Accuracy	Recall	F1-score
1	0.8730	0.8656	0.8831	0.8670
2	0.8762	0.8743	0.8745	0.8733
3	0.8520	0.8780	0.8537	0.8751
4	0.8903	0.8660	0.8756	0.8697
5	0.8899	0.8600	0.8630	0.8598
Mean	87.63% \|$\,\pm\, $\|0.0046	86.88% \|$\,\pm\, $\|0.0065	87.74% \|$\,\pm\, $\|0.0104	86.93% \|$\,\pm\, $\|0.0054

Test set	Precision	Accuracy	Recall	F1-score
1	0.8730	0.8656	0.8831	0.8670
2	0.8762	0.8743	0.8745	0.8733
3	0.8520	0.8780	0.8537	0.8751
4	0.8903	0.8660	0.8756	0.8697
5	0.8899	0.8600	0.8630	0.8598
Mean	87.63% \|$\,\pm\, $\|0.0046	86.88% \|$\,\pm\, $\|0.0065	87.74% \|$\,\pm\, $\|0.0104	86.93% \|$\,\pm\, $\|0.0054

Table 3

5-fold cross-validation results performed

Test set	Precision	Accuracy	Recall	F1-score
1	0.8730	0.8656	0.8831	0.8670
2	0.8762	0.8743	0.8745	0.8733
3	0.8520	0.8780	0.8537	0.8751
4	0.8903	0.8660	0.8756	0.8697
5	0.8899	0.8600	0.8630	0.8598
Mean	87.63% \|$\,\pm\, $\|0.0046	86.88% \|$\,\pm\, $\|0.0065	87.74% \|$\,\pm\, $\|0.0104	86.93% \|$\,\pm\, $\|0.0054

Test set	Precision	Accuracy	Recall	F1-score
1	0.8730	0.8656	0.8831	0.8670
2	0.8762	0.8743	0.8745	0.8733
3	0.8520	0.8780	0.8537	0.8751
4	0.8903	0.8660	0.8756	0.8697
5	0.8899	0.8600	0.8630	0.8598
Mean	87.63% \|$\,\pm\, $\|0.0046	86.88% \|$\,\pm\, $\|0.0065	87.74% \|$\,\pm\, $\|0.0104	86.93% \|$\,\pm\, $\|0.0054

We observe that MTLMDA has achieved an average Accuracy of 86.88%, Precision of 87.63%, Recall of 87.74% and F1 score of 86.93%. Moreover, Figure 4 shows that the AUC values of MTLMDA’s ROC curves under five-fold cross-validation are: 94.03%, 94.67%, 93.75%, 94.62%, 93.79%, with an average of 94.17% |$\,\pm\, $|0.0040. At the same time, Figure 5 shows that the values AUC of MTLMDA’s P–R curve under 5-fold cross-validation are: 93.27%, 93.53%, 94.55%, 94.10% and 93.31% with an average of 93.75% |$\,\pm\, $| 0.0050. To further demonstrate the value of incorporating gene–disease information in MTLMDA, we conducted an experiment whereby the miRNA–disease network remained unchanged while the gene–disease network was randomly shuffled, disrupting the original associations between genes and diseases. The validation results are shown in Table 4. Interestingly, despite the randomization of the gene–disease network, it is still possible for the two networks to exhibit similar structures within the vector space of disease nodes during the initial non-specific task of multi-task learning [22]. As a result, the perturbed gene–disease network can still offer significant auxiliary information to the miRNA–disease network.

Figure 4

ROC curves of MTLMDA in 5-fold cross validation.

Figure 5

P–R curves of MTLMDA in 5-fold cross validation.

Table 4

5-fold cross-validation (random shuffle gene–disease associations) performed

Test set	1	2	3	4	5	Mean
Precision	0.8561	0.8552	0.8496	0.8369	0.8427	84.81%\|$\,\pm\, $\|0.0074
Accuracy	0.8487	0.8565	0.8620	0.8638	0.8500	85.62%\|$\,\pm\, $\|0.0061
Recall	0.8249	0.8377	0.8528	0.8765	0.8379	84.60%\|$\,\pm\, $\|0.0176
F1-score	0.8350	0.8464	0.8512	0.8562	0.8403	84.58%\|$\,\pm\, $\|0.0076
AUC	0.9275	0.9362	0.9265	0.9340	0.9284	93.05%\|$\,\pm\, $\|0.0039

Test set	1	2	3	4	5	Mean
Precision	0.8561	0.8552	0.8496	0.8369	0.8427	84.81%\|$\,\pm\, $\|0.0074
Accuracy	0.8487	0.8565	0.8620	0.8638	0.8500	85.62%\|$\,\pm\, $\|0.0061
Recall	0.8249	0.8377	0.8528	0.8765	0.8379	84.60%\|$\,\pm\, $\|0.0176
F1-score	0.8350	0.8464	0.8512	0.8562	0.8403	84.58%\|$\,\pm\, $\|0.0076
AUC	0.9275	0.9362	0.9265	0.9340	0.9284	93.05%\|$\,\pm\, $\|0.0039

Table 4

5-fold cross-validation (random shuffle gene–disease associations) performed

Test set	1	2	3	4	5	Mean
Precision	0.8561	0.8552	0.8496	0.8369	0.8427	84.81%\|$\,\pm\, $\|0.0074
Accuracy	0.8487	0.8565	0.8620	0.8638	0.8500	85.62%\|$\,\pm\, $\|0.0061
Recall	0.8249	0.8377	0.8528	0.8765	0.8379	84.60%\|$\,\pm\, $\|0.0176
F1-score	0.8350	0.8464	0.8512	0.8562	0.8403	84.58%\|$\,\pm\, $\|0.0076
AUC	0.9275	0.9362	0.9265	0.9340	0.9284	93.05%\|$\,\pm\, $\|0.0039

Test set	1	2	3	4	5	Mean
Precision	0.8561	0.8552	0.8496	0.8369	0.8427	84.81%\|$\,\pm\, $\|0.0074
Accuracy	0.8487	0.8565	0.8620	0.8638	0.8500	85.62%\|$\,\pm\, $\|0.0061
Recall	0.8249	0.8377	0.8528	0.8765	0.8379	84.60%\|$\,\pm\, $\|0.0176
F1-score	0.8350	0.8464	0.8512	0.8562	0.8403	84.58%\|$\,\pm\, $\|0.0076
AUC	0.9275	0.9362	0.9265	0.9340	0.9284	93.05%\|$\,\pm\, $\|0.0039

Comparison with other latest methods

The constructed training samples are data sets with balanced positive and negative samples. Therefore, the ROC curve can more intuitively show the performance of the model. Here, we use the AUC values based on the ROC curve to compare the performance of MTLMDA with the other state-of-the-art models in a 5-fold cross-validation manner. We have selected the latest and most representative models in this field, which are ”Predicting microRNA–disease associations using label propagation based on linear neighborhood similarity” (LPLNS) [27], ”Tree-layer heterogeneous network combined with unbalanced random walk for miRNA–disease association prediction” (TCRWMDA)[31], ”A graph auto-encoder model for miRNA–disease associations prediction” (GAEMDA) [38], ”Multi-view multichannel attention graph convolutional network for miRN–disease association prediction” (MMGCN) [49] and ”Hierarchical graph attention network for miRNA–disease association prediction” (HGANMDA) [46]. For fair experiments, the five comparison algorithms are all performed 5-fold cross-validation experiments on HMDD v2.0 dataset. Figure 6 shows the ROC curve comparison of MTLMDA with the other five contrasting algorithms. In addition, Figure 7 compares our method with other approaches using an alternative version of the 5-fold cross-validation. Unlike the standard 5-fold cross-validation where the dataset is randomly partitioned, we used the average Gaussian similarity of each miRNA node with respect to the disease to divide the dataset into five parts. Table 5 shows the AUC values for each model in different versions of 5-fold cross-validation based on the HMDD V2.0. As shown in Figure 8, to obtain a more comprehensive evaluation, we further use the data of HMDD V2.0 as training, and use the data set of HMDD V3.2 [50] for test comparison. From the results, we observe that our MTLMDA perform better than the baseline methods. Comparing to other models, MTLMDA fully takes into account the relatively sparse relationships between miRNAs and diseases on the database and utilizes multi-task learning to effectively explore the sparse relationships. Moreover, MTLMDA uses the information of gene–disease network to assist the prediction of miRNA–disease improving the overall performance of the model. Therefore, MTLMDA achieves excellent results.

Figure 6

Comparison of ROC curves in 5-fold cross validation based on HMDD v2.0.

Figure 7

Comparison of ROC curves based on the alternative version of the 5-fold cross-validation.

Figure 8

Comparison of ROC curves based on the HMDD v3.2.

Table 5

5-fold cross-validation results comparison

Models	AUC	AUC (alternative version)
LPLNS	0.9107\|$\,\pm\, $\|0.0041	0.8524\|$\,\pm\, $\|0.0018
TCRWMDA	0.9209\|$\,\pm\, $\|0.0036	0.9157\|$\,\pm\, $\|0.0033
GAEMDA	0.9356\|$\,\pm\, $\|0.0044	0.9319\|$\,\pm\, $\|0.0049
MMGCN	0.9266\|$\,\pm\, $\|0.0022	0.9191\|$\,\pm\, $\|0.0015
HGANMDA	0.9374\|$\,\pm\, $\|0.0041	0.9336\|$\,\pm\, $\|0.0038
MTLMDA	0.9417\|$\,\pm\, $\|0.0040	0.9404\|$\,\pm\, $\|0.0039

Models	AUC	AUC (alternative version)
LPLNS	0.9107\|$\,\pm\, $\|0.0041	0.8524\|$\,\pm\, $\|0.0018
TCRWMDA	0.9209\|$\,\pm\, $\|0.0036	0.9157\|$\,\pm\, $\|0.0033
GAEMDA	0.9356\|$\,\pm\, $\|0.0044	0.9319\|$\,\pm\, $\|0.0049
MMGCN	0.9266\|$\,\pm\, $\|0.0022	0.9191\|$\,\pm\, $\|0.0015
HGANMDA	0.9374\|$\,\pm\, $\|0.0041	0.9336\|$\,\pm\, $\|0.0038
MTLMDA	0.9417\|$\,\pm\, $\|0.0040	0.9404\|$\,\pm\, $\|0.0039

Table 5

5-fold cross-validation results comparison

Models	AUC	AUC (alternative version)
LPLNS	0.9107\|$\,\pm\, $\|0.0041	0.8524\|$\,\pm\, $\|0.0018
TCRWMDA	0.9209\|$\,\pm\, $\|0.0036	0.9157\|$\,\pm\, $\|0.0033
GAEMDA	0.9356\|$\,\pm\, $\|0.0044	0.9319\|$\,\pm\, $\|0.0049
MMGCN	0.9266\|$\,\pm\, $\|0.0022	0.9191\|$\,\pm\, $\|0.0015
HGANMDA	0.9374\|$\,\pm\, $\|0.0041	0.9336\|$\,\pm\, $\|0.0038
MTLMDA	0.9417\|$\,\pm\, $\|0.0040	0.9404\|$\,\pm\, $\|0.0039

Models	AUC	AUC (alternative version)
LPLNS	0.9107\|$\,\pm\, $\|0.0041	0.8524\|$\,\pm\, $\|0.0018
TCRWMDA	0.9209\|$\,\pm\, $\|0.0036	0.9157\|$\,\pm\, $\|0.0033
GAEMDA	0.9356\|$\,\pm\, $\|0.0044	0.9319\|$\,\pm\, $\|0.0049
MMGCN	0.9266\|$\,\pm\, $\|0.0022	0.9191\|$\,\pm\, $\|0.0015
HGANMDA	0.9374\|$\,\pm\, $\|0.0041	0.9336\|$\,\pm\, $\|0.0038
MTLMDA	0.9417\|$\,\pm\, $\|0.0040	0.9404\|$\,\pm\, $\|0.0039

Performance analysis of the model under different feature information

To further demonstrate the effectiveness of our proposed model, we conduct ablation experiments. Our model is experimented with Gaussian features and auxiliary features(AUX+GUS), only Gaussian features(GUS), only auxiliary features(AUX) and only original edge features(Edge). Table 6 shows the model performance under different node features and visualized in Figure 9.

Table 6

Performance comparison of models under different node information

Node feature	AUC (%)	Precision (%)	Accuracy (%)	Recall (%)	F1-score (%)
Edge	91.22	83.37	83.58	83.90	83.63
AUX	90.73	85.37	84.38	81.43	83.91
GUS	92.15	84.20	85.32	80.74	83.62
AUX+GUS	94.17	86.13	86.88	87.74	86.93

Node feature	AUC (%)	Precision (%)	Accuracy (%)	Recall (%)	F1-score (%)
Edge	91.22	83.37	83.58	83.90	83.63
AUX	90.73	85.37	84.38	81.43	83.91
GUS	92.15	84.20	85.32	80.74	83.62
AUX+GUS	94.17	86.13	86.88	87.74	86.93

Table 6

Performance comparison of models under different node information

Node feature	AUC (%)	Precision (%)	Accuracy (%)	Recall (%)	F1-score (%)
Edge	91.22	83.37	83.58	83.90	83.63
AUX	90.73	85.37	84.38	81.43	83.91
GUS	92.15	84.20	85.32	80.74	83.62
AUX+GUS	94.17	86.13	86.88	87.74	86.93

Node feature	AUC (%)	Precision (%)	Accuracy (%)	Recall (%)	F1-score (%)
Edge	91.22	83.37	83.58	83.90	83.63
AUX	90.73	85.37	84.38	81.43	83.91
GUS	92.15	84.20	85.32	80.74	83.62
AUX+GUS	94.17	86.13	86.88	87.74	86.93

Figure 9

Comparing the results under different node features (AUX+GUS, GUS,AUX, edge represents the performance of the model under Gaussian similarity and auxiliary information, only Gaussian similarity, only auxiliary information, and only edge information, respectively).

Performance analysis of the model under different GCN layers

In the encoder of MTLMDA, we aggregate the information of the network through the GCN, and then generate the representation of the network nodes. The different number of GCN layers in the encoder will lead to different aggregation effects on the node information in the network, which will affect the final prediction performance of the model. Figure 10A shows the performance of our model for different numbers of GCN layers in the encoder. We see that the performance of MTLMDA reaches the best performance when the number of GCN layers is 3 in the encoder, while the number of layers is ¿3, the performance will drop rapidly. We know that the |$0th$| layer embedding of a node in encoder is its input feature, |$layer-k$| embedding obtains information from nodes that are |$k$| hops away on the formed heterogeneous graph. If the encoder of the model contains an excessive number of GCN layers, every node in graph will obtain highly overlapped node information. As a result, the model suffers from the over-smoothing problem. As shown in Figure 10A, when the number of GCN layers is greater than 3, the model performance begins to degrade.

Figure 10

The value of MTLMDA under different experimental parameters.

Performance analysis of the model under different embedding dimension

The size of the embedding representation of the nodes obtained after encoding by the MTLMDA encoder is an important factor affecting the performance of the model. The size of different node embeddings contains different node information for the same node. In the experiment, we choose the size of node embedding dimension as 16, 32, 64, 128, 256 and 512 respectively. In Figure 10B, the overall performance of the model is constantly improved within a certain range over the node embedding dimension. When the embedding dimension reaches 64, the performance of the model is the best. Thus, we choose 64 dimensions as the default embedding dimension of MTLMDA encoder.

Performance analysis of the model under different projection dimension

The projection dimension of the network nodes in the model is the most important factor in determining the initial characteristics of the nodes in the model encoder. For the experiments, we explore the performance of the model under different projection dimensions based on a 3-layer GCN encoder. As demonstrated in Figure 10C, we set the projection dimensions to 64, 128, 256, 512, 1024, and 2048, respectively. The experimental results show that the overall performance of the model is the optimal when the projection dimension is 1024. Therefore, in subsequent experiments, we choose 1024 as the default value of the projection dimension.

Comprehensive comparison of different models

To further demonstrate the comprehensive performance of MTLMDA, we regard the model as a recommender system, where miRNAs represent users and diseases represent items. Our task is to determine whether there is an interaction between the user and different items. Therefore, we conduct the experiment by using all known associateds in HDMM v2.0 as the training samples to identify various top rankings miRNAs for each diseases. Here, we mainly focus on the number of predicted positive samples in the different top-ranked. Figure 11 shows the numbers of correctly retrieved miRNA–disease associations. We observe that MTLMDA outperforms the other models among top 5 to top 50.

Figure 11

Number of correctly retrieved known miRNA–disease associations of top-k.

Real case studies for the proposed MTLMDA

To further test the predictive ability of MTLMDA for potentially disease-miRNA associations in practice, we leverage the model to conduct case studies on six common malignant human diseases. We learn that approximately 200 miRNAs have been found to be significantly dysregulated in various cancer malignancies. These miRNAs can produce effect on cancer generation by targeting proto-oncogenes or tumor suppressor genes [51]. Therefore, accurate prediction of potential miRNA–disease associations is a major advance in the field of human medicine and health. During the prediction process of the model, the training set of miRNA–disease sub-network includes positive samples of 5430 experimentally confirmed miRNA–disease combinations and the same number of negative samples randomly selected from the miRNA–disease combinations. And then, we establish the corresponding gene–disease sub-network training set according to the diseases in the miRNA–disease sub-network. The test set of MTLMDA is assembled through the disease of case studied with the remaining miRNAs in the miRNA–disease sub-network. By training the MTLMDA, we can get embedded representations for diseases and miRNAs. By decoding the test set, we can get the association probability between the disease and remaining miRNAs. For each disease, we select the top 30 miRNAs with the highest predicted association probability scores. For the prediction results, we use the three databases (i.e. dbDEMC [52], miR2Disease [53] and miRCancer [54]) to verify them in turn. If the results are confirmed in dbDEMC, we will no longer query the miR2Disease and miRCancer databases. Otherwise, we query them sequentially.

Lung cancer is the most common cause of death among all cancer pathologies. Most patients are not noticed until advanced-stage, and the prognosis is generally poor [55]. We know that loss or amplification of some miRNAs has been found the association with lung cancer. Therefore, it is very necessary to design the case study to explore potential associated miRNA in lung cancer. From Table 7, we observe that 29 of the top 30 candidate miRNAs can be confirmed with the three databases. Specifically, there is no evidence that hsa-mir-378a is associated with lung cancer, perhaps the connection has not been discovered yet rather than confirming no link.

Table 7

Top 30 lung cancer-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-16	dbDEMC	16	hsa-mir-378a	NO
2	hsa-mir-122	dbDEMC	17	hsa-mir-99a	dbDEMC
3	hsa-mir-15a	dbDEMC	18	hsa-mir-302a	dbDEMC
4	hsa-mir-106b	dbDEMC	19	hsa-mir-328	dbDEMC
5	hsa-mir-15b	dbDEMC	20	hsa-mir-196b	dbDEMC
6	hsa-mir-195	dbDEMC	21	hsa-mir-372	dbDEMC
7	hsa-mir-141	dbDEMC	22	hsa-mir-483	dbDEMC
8	hsa-mir-451a	dbDEMC	23	hsa-mir-10a	dbDEMC
9	hsa-mir-23b	dbDEMC	24	hsa-mir-208a	dbDEMC
10	hsa-mir-342	dbDEMC	25	hsa-mir-424	mirCancer
11	hsa-mir-429	dbDEMC	26	hsa-mir-302b	dbDEMC
12	hsa-mir-373	dbDEMC	27	hsa-mir-204	dbDEMC
13	hsa-mir-20b	dbDEMC	28	hsa-mir-144	dbDEMC
14	hsa-mir-130a	dbDEMC	29	hsa-mir-28	dbDEMC
15	hsa-mir-193b	dbDEMC	30	hsa-mir-149	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-16	dbDEMC	16	hsa-mir-378a	NO
2	hsa-mir-122	dbDEMC	17	hsa-mir-99a	dbDEMC
3	hsa-mir-15a	dbDEMC	18	hsa-mir-302a	dbDEMC
4	hsa-mir-106b	dbDEMC	19	hsa-mir-328	dbDEMC
5	hsa-mir-15b	dbDEMC	20	hsa-mir-196b	dbDEMC
6	hsa-mir-195	dbDEMC	21	hsa-mir-372	dbDEMC
7	hsa-mir-141	dbDEMC	22	hsa-mir-483	dbDEMC
8	hsa-mir-451a	dbDEMC	23	hsa-mir-10a	dbDEMC
9	hsa-mir-23b	dbDEMC	24	hsa-mir-208a	dbDEMC
10	hsa-mir-342	dbDEMC	25	hsa-mir-424	mirCancer
11	hsa-mir-429	dbDEMC	26	hsa-mir-302b	dbDEMC
12	hsa-mir-373	dbDEMC	27	hsa-mir-204	dbDEMC
13	hsa-mir-20b	dbDEMC	28	hsa-mir-144	dbDEMC
14	hsa-mir-130a	dbDEMC	29	hsa-mir-28	dbDEMC
15	hsa-mir-193b	dbDEMC	30	hsa-mir-149	dbDEMC

Table 7

Top 30 lung cancer-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-16	dbDEMC	16	hsa-mir-378a	NO
2	hsa-mir-122	dbDEMC	17	hsa-mir-99a	dbDEMC
3	hsa-mir-15a	dbDEMC	18	hsa-mir-302a	dbDEMC
4	hsa-mir-106b	dbDEMC	19	hsa-mir-328	dbDEMC
5	hsa-mir-15b	dbDEMC	20	hsa-mir-196b	dbDEMC
6	hsa-mir-195	dbDEMC	21	hsa-mir-372	dbDEMC
7	hsa-mir-141	dbDEMC	22	hsa-mir-483	dbDEMC
8	hsa-mir-451a	dbDEMC	23	hsa-mir-10a	dbDEMC
9	hsa-mir-23b	dbDEMC	24	hsa-mir-208a	dbDEMC
10	hsa-mir-342	dbDEMC	25	hsa-mir-424	mirCancer
11	hsa-mir-429	dbDEMC	26	hsa-mir-302b	dbDEMC
12	hsa-mir-373	dbDEMC	27	hsa-mir-204	dbDEMC
13	hsa-mir-20b	dbDEMC	28	hsa-mir-144	dbDEMC
14	hsa-mir-130a	dbDEMC	29	hsa-mir-28	dbDEMC
15	hsa-mir-193b	dbDEMC	30	hsa-mir-149	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-16	dbDEMC	16	hsa-mir-378a	NO
2	hsa-mir-122	dbDEMC	17	hsa-mir-99a	dbDEMC
3	hsa-mir-15a	dbDEMC	18	hsa-mir-302a	dbDEMC
4	hsa-mir-106b	dbDEMC	19	hsa-mir-328	dbDEMC
5	hsa-mir-15b	dbDEMC	20	hsa-mir-196b	dbDEMC
6	hsa-mir-195	dbDEMC	21	hsa-mir-372	dbDEMC
7	hsa-mir-141	dbDEMC	22	hsa-mir-483	dbDEMC
8	hsa-mir-451a	dbDEMC	23	hsa-mir-10a	dbDEMC
9	hsa-mir-23b	dbDEMC	24	hsa-mir-208a	dbDEMC
10	hsa-mir-342	dbDEMC	25	hsa-mir-424	mirCancer
11	hsa-mir-429	dbDEMC	26	hsa-mir-302b	dbDEMC
12	hsa-mir-373	dbDEMC	27	hsa-mir-204	dbDEMC
13	hsa-mir-20b	dbDEMC	28	hsa-mir-144	dbDEMC
14	hsa-mir-130a	dbDEMC	29	hsa-mir-28	dbDEMC
15	hsa-mir-193b	dbDEMC	30	hsa-mir-149	dbDEMC

Colon cancer is a type of cancer that begins in the large intestine (Colon). The colon is the final part of the digestive tract. Colon cancer can occur at any age but it is more likely to affect older adults. It usually starts as small noncancerous (benign) clumps of cells called polyps that form inside the colon. Some of these polyps can turn into colon cancer over time. An estimated 106 180 colon cancer cases will be diagnosed in the USA by 2022 [56]. From Table 8, the top 30 colon cancer-related miRNAs predicted by our model are confirmed on the three databases.

Table 8

Top 30 colon cancer-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-21	dbDEMC	16	hsa-mir-29c	dbDEMC
2	hsa-mir-155	dbDEMC	17	hsa-mir-15b	miR2Disease
3	hsa-mir-34a	dbDEMC	18	hsa-mir-223	dbDEMC
4	hsa-mir-146a	dbDEMC	19	hsa-mir-199a	mirCancer
5	hsa-mir-125b	dbDEMC	20	hsa-mir-19b	dbDEMC
6	hsa-mir-122	dbDEMC	21	hsa-let-7a	dbDEMC
7	hsa-mir-16	dbDEMC	22	hsa-mir-143	dbDEMC
8	hsa-mir-221	dbDEMC	23	hsa-mir-92a	dbDEMC
9	hsa-mir-29a	dbDEMC	24	hsa-mir-31	dbDEMC
10	hsa-mir-222	dbDEMC	25	hsa-mir-210	dbDEMC
11	hsa-mir-133a	dbDEMC	26	hsa-mir-200b	dbDEMC
12	hsa-mir-29b	dbDEMC	27	hsa-mir-206	dbDEMC
13	hsa-mir-1	dbDEMC	28	hsa-mir-19a	dbDEMC
14	hsa-mir-20a	dbDEMC	29	hsa-mir-18a	dbDEMC
15	hsa-mir-15a	dbDEMC	30	hsa-let-7c	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-21	dbDEMC	16	hsa-mir-29c	dbDEMC
2	hsa-mir-155	dbDEMC	17	hsa-mir-15b	miR2Disease
3	hsa-mir-34a	dbDEMC	18	hsa-mir-223	dbDEMC
4	hsa-mir-146a	dbDEMC	19	hsa-mir-199a	mirCancer
5	hsa-mir-125b	dbDEMC	20	hsa-mir-19b	dbDEMC
6	hsa-mir-122	dbDEMC	21	hsa-let-7a	dbDEMC
7	hsa-mir-16	dbDEMC	22	hsa-mir-143	dbDEMC
8	hsa-mir-221	dbDEMC	23	hsa-mir-92a	dbDEMC
9	hsa-mir-29a	dbDEMC	24	hsa-mir-31	dbDEMC
10	hsa-mir-222	dbDEMC	25	hsa-mir-210	dbDEMC
11	hsa-mir-133a	dbDEMC	26	hsa-mir-200b	dbDEMC
12	hsa-mir-29b	dbDEMC	27	hsa-mir-206	dbDEMC
13	hsa-mir-1	dbDEMC	28	hsa-mir-19a	dbDEMC
14	hsa-mir-20a	dbDEMC	29	hsa-mir-18a	dbDEMC
15	hsa-mir-15a	dbDEMC	30	hsa-let-7c	dbDEMC

Table 8

Top 30 colon cancer-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-21	dbDEMC	16	hsa-mir-29c	dbDEMC
2	hsa-mir-155	dbDEMC	17	hsa-mir-15b	miR2Disease
3	hsa-mir-34a	dbDEMC	18	hsa-mir-223	dbDEMC
4	hsa-mir-146a	dbDEMC	19	hsa-mir-199a	mirCancer
5	hsa-mir-125b	dbDEMC	20	hsa-mir-19b	dbDEMC
6	hsa-mir-122	dbDEMC	21	hsa-let-7a	dbDEMC
7	hsa-mir-16	dbDEMC	22	hsa-mir-143	dbDEMC
8	hsa-mir-221	dbDEMC	23	hsa-mir-92a	dbDEMC
9	hsa-mir-29a	dbDEMC	24	hsa-mir-31	dbDEMC
10	hsa-mir-222	dbDEMC	25	hsa-mir-210	dbDEMC
11	hsa-mir-133a	dbDEMC	26	hsa-mir-200b	dbDEMC
12	hsa-mir-29b	dbDEMC	27	hsa-mir-206	dbDEMC
13	hsa-mir-1	dbDEMC	28	hsa-mir-19a	dbDEMC
14	hsa-mir-20a	dbDEMC	29	hsa-mir-18a	dbDEMC
15	hsa-mir-15a	dbDEMC	30	hsa-let-7c	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-21	dbDEMC	16	hsa-mir-29c	dbDEMC
2	hsa-mir-155	dbDEMC	17	hsa-mir-15b	miR2Disease
3	hsa-mir-34a	dbDEMC	18	hsa-mir-223	dbDEMC
4	hsa-mir-146a	dbDEMC	19	hsa-mir-199a	mirCancer
5	hsa-mir-125b	dbDEMC	20	hsa-mir-19b	dbDEMC
6	hsa-mir-122	dbDEMC	21	hsa-let-7a	dbDEMC
7	hsa-mir-16	dbDEMC	22	hsa-mir-143	dbDEMC
8	hsa-mir-221	dbDEMC	23	hsa-mir-92a	dbDEMC
9	hsa-mir-29a	dbDEMC	24	hsa-mir-31	dbDEMC
10	hsa-mir-222	dbDEMC	25	hsa-mir-210	dbDEMC
11	hsa-mir-133a	dbDEMC	26	hsa-mir-200b	dbDEMC
12	hsa-mir-29b	dbDEMC	27	hsa-mir-206	dbDEMC
13	hsa-mir-1	dbDEMC	28	hsa-mir-19a	dbDEMC
14	hsa-mir-20a	dbDEMC	29	hsa-mir-18a	dbDEMC
15	hsa-mir-15a	dbDEMC	30	hsa-let-7c	dbDEMC

Lymphomas start in immune system cells and can occur almost anywhere in the body. In 2022, there will be an estimated 89 010 new cases of lymphoma in the USA and 21 170 people will die from the disease [56]. Our prediction results for lymphoma-associated miRNAs are shown in Table 9. Among the top 30 candidate miRNAs, only hsa-mir-142 and hsa-mir-34c have no evidence to prove their association with Lymphoma on the three databases.

Table 9

Top 30 lymphoma-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-125b	dbDEMC	16	hsa-mir-196a	dbDEMC
2	hsa-mir-29a	dbDEMC	17	hsa-mir-214	dbDEMC
3	hsa-mir-34a	dbDEMC	18	hsa-mir-195	dbDEMC
4	hsa-mir-221	dbDEMC	19	hsa-mir-30a	dbDEMC
5	hsa-mir-222	dbDEMC	20	hsa-mir-9	dbDEMC
6	hsa-mir-29b	dbDEMC	21	hsa-mir-143	dbDEMC
7	hsa-mir-133a	dbDEMC	22	hsa-mir-181b	dbDEMC
8	hsa-mir-199a	dbDEMC	23	hsa-let-7c	dbDEMC
9	hsa-mir-1	dbDEMC	24	hsa-let-7a	dbDEMC
10	hsa-mir-223	dbDEMC	25	hsa-mir-15b	dbDEMC
11	hsa-mir-145	dbDEMC	26	hsa-mir-23a	dbDEMC
12	hsa-mir-106b	dbDEMC	27	hsa-mir-146b	dbDEMC
13	hsa-mir-142	NO	28	hsa-let-7b	dbDEMC
14	hsa-mir-206	dbDEMC	29	hsa-mir-34c	NO
15	hsa-mir-31	dbDEMC	30	hsa-mir-106a	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-125b	dbDEMC	16	hsa-mir-196a	dbDEMC
2	hsa-mir-29a	dbDEMC	17	hsa-mir-214	dbDEMC
3	hsa-mir-34a	dbDEMC	18	hsa-mir-195	dbDEMC
4	hsa-mir-221	dbDEMC	19	hsa-mir-30a	dbDEMC
5	hsa-mir-222	dbDEMC	20	hsa-mir-9	dbDEMC
6	hsa-mir-29b	dbDEMC	21	hsa-mir-143	dbDEMC
7	hsa-mir-133a	dbDEMC	22	hsa-mir-181b	dbDEMC
8	hsa-mir-199a	dbDEMC	23	hsa-let-7c	dbDEMC
9	hsa-mir-1	dbDEMC	24	hsa-let-7a	dbDEMC
10	hsa-mir-223	dbDEMC	25	hsa-mir-15b	dbDEMC
11	hsa-mir-145	dbDEMC	26	hsa-mir-23a	dbDEMC
12	hsa-mir-106b	dbDEMC	27	hsa-mir-146b	dbDEMC
13	hsa-mir-142	NO	28	hsa-let-7b	dbDEMC
14	hsa-mir-206	dbDEMC	29	hsa-mir-34c	NO
15	hsa-mir-31	dbDEMC	30	hsa-mir-106a	dbDEMC

Table 9

Top 30 lymphoma-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-125b	dbDEMC	16	hsa-mir-196a	dbDEMC
2	hsa-mir-29a	dbDEMC	17	hsa-mir-214	dbDEMC
3	hsa-mir-34a	dbDEMC	18	hsa-mir-195	dbDEMC
4	hsa-mir-221	dbDEMC	19	hsa-mir-30a	dbDEMC
5	hsa-mir-222	dbDEMC	20	hsa-mir-9	dbDEMC
6	hsa-mir-29b	dbDEMC	21	hsa-mir-143	dbDEMC
7	hsa-mir-133a	dbDEMC	22	hsa-mir-181b	dbDEMC
8	hsa-mir-199a	dbDEMC	23	hsa-let-7c	dbDEMC
9	hsa-mir-1	dbDEMC	24	hsa-let-7a	dbDEMC
10	hsa-mir-223	dbDEMC	25	hsa-mir-15b	dbDEMC
11	hsa-mir-145	dbDEMC	26	hsa-mir-23a	dbDEMC
12	hsa-mir-106b	dbDEMC	27	hsa-mir-146b	dbDEMC
13	hsa-mir-142	NO	28	hsa-let-7b	dbDEMC
14	hsa-mir-206	dbDEMC	29	hsa-mir-34c	NO
15	hsa-mir-31	dbDEMC	30	hsa-mir-106a	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-125b	dbDEMC	16	hsa-mir-196a	dbDEMC
2	hsa-mir-29a	dbDEMC	17	hsa-mir-214	dbDEMC
3	hsa-mir-34a	dbDEMC	18	hsa-mir-195	dbDEMC
4	hsa-mir-221	dbDEMC	19	hsa-mir-30a	dbDEMC
5	hsa-mir-222	dbDEMC	20	hsa-mir-9	dbDEMC
6	hsa-mir-29b	dbDEMC	21	hsa-mir-143	dbDEMC
7	hsa-mir-133a	dbDEMC	22	hsa-mir-181b	dbDEMC
8	hsa-mir-199a	dbDEMC	23	hsa-let-7c	dbDEMC
9	hsa-mir-1	dbDEMC	24	hsa-let-7a	dbDEMC
10	hsa-mir-223	dbDEMC	25	hsa-mir-15b	dbDEMC
11	hsa-mir-145	dbDEMC	26	hsa-mir-23a	dbDEMC
12	hsa-mir-106b	dbDEMC	27	hsa-mir-146b	dbDEMC
13	hsa-mir-142	NO	28	hsa-let-7b	dbDEMC
14	hsa-mir-206	dbDEMC	29	hsa-mir-34c	NO
15	hsa-mir-31	dbDEMC	30	hsa-mir-106a	dbDEMC

Breast cancer is the most common cancer worldwide and the leading cause of cancer-related deaths in women, accounting for 25% of all cancer cases and 15% of cancer-related deaths [57]. Table 10 is the prediction results of our model for the top 30 breast cancer-related miRNAs. From the results, we see that only 25 candidate miRNAs have been confirmed to be related to breast cancer, and the remaining five miRNAs of hsa-mir-509, hsa-mir-362,hsa-mir-485, hsa-mir-491 and hsa-mir-378a do not find evidence relevance on the three databases.

Table 10

Top 30 breast cancer-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-150	dbDEMC	16	hsa-mir-192	dbDEMC
2	hsa-mir-15b	dbDEMC	17	hsa-mir-491	NO
3	hsa-mir-212	dbDEMC	18	hsa-mir-95	dbDEMC
4	hsa-mir-509	NO	19	hsa-mir-154	dbDEMC
5	hsa-mir-503	dbDEMC	20	hsa-mir-483	dbDEMC
6	hsa-mir-142	miR2Disease	21	hsa-mir-184	dbDEMC
7	hsa-mir-106b	dbDEMC	22	hsa-mir-720	dbDEMC
8	hsa-mir-372	dbDEMC	23	hsa-mir-431	dbDEMC
9	hsa-mir-362	NO	24	hsa-mir-363	dbDEMC
10	hsa-mir-32	dbDEMC	25	hsa-mir-28	mirCancer
11	hsa-mir-208a	dbDEMC	26	hsa-mir-193a	dbDEMC
12	hsa-mir-485	NO	27	hsa-mir-378a	NO
13	hsa-mir-30e	dbDEMC	28	hsa-mir-198	dbDEMC
14	hsa-mir-190a	dbDEMC	29	hsa-mir-424	dbDEMC
15	hsa-mir-98	dbDEMC	30	hsa-mir-186	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-150	dbDEMC	16	hsa-mir-192	dbDEMC
2	hsa-mir-15b	dbDEMC	17	hsa-mir-491	NO
3	hsa-mir-212	dbDEMC	18	hsa-mir-95	dbDEMC
4	hsa-mir-509	NO	19	hsa-mir-154	dbDEMC
5	hsa-mir-503	dbDEMC	20	hsa-mir-483	dbDEMC
6	hsa-mir-142	miR2Disease	21	hsa-mir-184	dbDEMC
7	hsa-mir-106b	dbDEMC	22	hsa-mir-720	dbDEMC
8	hsa-mir-372	dbDEMC	23	hsa-mir-431	dbDEMC
9	hsa-mir-362	NO	24	hsa-mir-363	dbDEMC
10	hsa-mir-32	dbDEMC	25	hsa-mir-28	mirCancer
11	hsa-mir-208a	dbDEMC	26	hsa-mir-193a	dbDEMC
12	hsa-mir-485	NO	27	hsa-mir-378a	NO
13	hsa-mir-30e	dbDEMC	28	hsa-mir-198	dbDEMC
14	hsa-mir-190a	dbDEMC	29	hsa-mir-424	dbDEMC
15	hsa-mir-98	dbDEMC	30	hsa-mir-186	dbDEMC

Table 10

Top 30 breast cancer-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-150	dbDEMC	16	hsa-mir-192	dbDEMC
2	hsa-mir-15b	dbDEMC	17	hsa-mir-491	NO
3	hsa-mir-212	dbDEMC	18	hsa-mir-95	dbDEMC
4	hsa-mir-509	NO	19	hsa-mir-154	dbDEMC
5	hsa-mir-503	dbDEMC	20	hsa-mir-483	dbDEMC
6	hsa-mir-142	miR2Disease	21	hsa-mir-184	dbDEMC
7	hsa-mir-106b	dbDEMC	22	hsa-mir-720	dbDEMC
8	hsa-mir-372	dbDEMC	23	hsa-mir-431	dbDEMC
9	hsa-mir-362	NO	24	hsa-mir-363	dbDEMC
10	hsa-mir-32	dbDEMC	25	hsa-mir-28	mirCancer
11	hsa-mir-208a	dbDEMC	26	hsa-mir-193a	dbDEMC
12	hsa-mir-485	NO	27	hsa-mir-378a	NO
13	hsa-mir-30e	dbDEMC	28	hsa-mir-198	dbDEMC
14	hsa-mir-190a	dbDEMC	29	hsa-mir-424	dbDEMC
15	hsa-mir-98	dbDEMC	30	hsa-mir-186	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-150	dbDEMC	16	hsa-mir-192	dbDEMC
2	hsa-mir-15b	dbDEMC	17	hsa-mir-491	NO
3	hsa-mir-212	dbDEMC	18	hsa-mir-95	dbDEMC
4	hsa-mir-509	NO	19	hsa-mir-154	dbDEMC
5	hsa-mir-503	dbDEMC	20	hsa-mir-483	dbDEMC
6	hsa-mir-142	miR2Disease	21	hsa-mir-184	dbDEMC
7	hsa-mir-106b	dbDEMC	22	hsa-mir-720	dbDEMC
8	hsa-mir-372	dbDEMC	23	hsa-mir-431	dbDEMC
9	hsa-mir-362	NO	24	hsa-mir-363	dbDEMC
10	hsa-mir-32	dbDEMC	25	hsa-mir-28	mirCancer
11	hsa-mir-208a	dbDEMC	26	hsa-mir-193a	dbDEMC
12	hsa-mir-485	NO	27	hsa-mir-378a	NO
13	hsa-mir-30e	dbDEMC	28	hsa-mir-198	dbDEMC
14	hsa-mir-190a	dbDEMC	29	hsa-mir-424	dbDEMC
15	hsa-mir-98	dbDEMC	30	hsa-mir-186	dbDEMC

Kidney Cancer is one of the 10 most common cancers in the Western community. Globally, approximately 270 000 kidney cancer cases are diagnosed each year, and 116 000 of them die from the disease[58]. It can be seen from the results in Table 11 that the top 30 candidate miRNAs predicted by MTLMDA have all been confirmed the connection.

Table 11

Top 30 kidney cancer-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-155	dbDEMC	16	hsa-mir-223	dbDEMC
2	hsa-mir-146a	dbDEMC	17	hsa-mir-126	dbDEMC
3	hsa-mir-29a	dbDEMC	18	hsa-mir-199a	dbDEMC
4	hsa-mir-34a	dbDEMC	19	hsa-mir-18a	dbDEMC
5	hsa-mir-125b	dbDEMC	20	hsa-mir-143	dbDEMC
6	hsa-mir-122	dbDEMC	21	hsa-mir-19b	dbDEMC
7	hsa-mir-221	dbDEMC	22	hsa-mir-150	dbDEMC
8	hsa-mir-16	dbDEMC	23	hsa-mir-19a	dbDEMC
9	hsa-mir-133b	dbDEMC	24	hsa-mir-92a	dbDEMC
10	hsa-mir-20a	dbDEMC	25	hsa-mir-15b	dbDEMC
11	hsa-mir-17	dbDEMC	26	hsa-mir-31	dbDEMC
12	hsa-mir-29b	dbDEMC	27	hsa-mir-7a	dbDEMC
13	hsa-mir-222	mirCancer	28	hsa-mir-195	dbDEMC
14	hsa-mir-1	dbDEMC	29	hsa-mir-181a	dbDEMC
15	hsa-mir-145	dbDEMC	30	hsa-mir-200b	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-155	dbDEMC	16	hsa-mir-223	dbDEMC
2	hsa-mir-146a	dbDEMC	17	hsa-mir-126	dbDEMC
3	hsa-mir-29a	dbDEMC	18	hsa-mir-199a	dbDEMC
4	hsa-mir-34a	dbDEMC	19	hsa-mir-18a	dbDEMC
5	hsa-mir-125b	dbDEMC	20	hsa-mir-143	dbDEMC
6	hsa-mir-122	dbDEMC	21	hsa-mir-19b	dbDEMC
7	hsa-mir-221	dbDEMC	22	hsa-mir-150	dbDEMC
8	hsa-mir-16	dbDEMC	23	hsa-mir-19a	dbDEMC
9	hsa-mir-133b	dbDEMC	24	hsa-mir-92a	dbDEMC
10	hsa-mir-20a	dbDEMC	25	hsa-mir-15b	dbDEMC
11	hsa-mir-17	dbDEMC	26	hsa-mir-31	dbDEMC
12	hsa-mir-29b	dbDEMC	27	hsa-mir-7a	dbDEMC
13	hsa-mir-222	mirCancer	28	hsa-mir-195	dbDEMC
14	hsa-mir-1	dbDEMC	29	hsa-mir-181a	dbDEMC
15	hsa-mir-145	dbDEMC	30	hsa-mir-200b	dbDEMC

Table 11

Top 30 kidney cancer-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-155	dbDEMC	16	hsa-mir-223	dbDEMC
2	hsa-mir-146a	dbDEMC	17	hsa-mir-126	dbDEMC
3	hsa-mir-29a	dbDEMC	18	hsa-mir-199a	dbDEMC
4	hsa-mir-34a	dbDEMC	19	hsa-mir-18a	dbDEMC
5	hsa-mir-125b	dbDEMC	20	hsa-mir-143	dbDEMC
6	hsa-mir-122	dbDEMC	21	hsa-mir-19b	dbDEMC
7	hsa-mir-221	dbDEMC	22	hsa-mir-150	dbDEMC
8	hsa-mir-16	dbDEMC	23	hsa-mir-19a	dbDEMC
9	hsa-mir-133b	dbDEMC	24	hsa-mir-92a	dbDEMC
10	hsa-mir-20a	dbDEMC	25	hsa-mir-15b	dbDEMC
11	hsa-mir-17	dbDEMC	26	hsa-mir-31	dbDEMC
12	hsa-mir-29b	dbDEMC	27	hsa-mir-7a	dbDEMC
13	hsa-mir-222	mirCancer	28	hsa-mir-195	dbDEMC
14	hsa-mir-1	dbDEMC	29	hsa-mir-181a	dbDEMC
15	hsa-mir-145	dbDEMC	30	hsa-mir-200b	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-155	dbDEMC	16	hsa-mir-223	dbDEMC
2	hsa-mir-146a	dbDEMC	17	hsa-mir-126	dbDEMC
3	hsa-mir-29a	dbDEMC	18	hsa-mir-199a	dbDEMC
4	hsa-mir-34a	dbDEMC	19	hsa-mir-18a	dbDEMC
5	hsa-mir-125b	dbDEMC	20	hsa-mir-143	dbDEMC
6	hsa-mir-122	dbDEMC	21	hsa-mir-19b	dbDEMC
7	hsa-mir-221	dbDEMC	22	hsa-mir-150	dbDEMC
8	hsa-mir-16	dbDEMC	23	hsa-mir-19a	dbDEMC
9	hsa-mir-133b	dbDEMC	24	hsa-mir-92a	dbDEMC
10	hsa-mir-20a	dbDEMC	25	hsa-mir-15b	dbDEMC
11	hsa-mir-17	dbDEMC	26	hsa-mir-31	dbDEMC
12	hsa-mir-29b	dbDEMC	27	hsa-mir-7a	dbDEMC
13	hsa-mir-222	mirCancer	28	hsa-mir-195	dbDEMC
14	hsa-mir-1	dbDEMC	29	hsa-mir-181a	dbDEMC
15	hsa-mir-145	dbDEMC	30	hsa-mir-200b	dbDEMC

Leukemia is a cancer of the bone marrow and blood. From 2009 to 2018, the incidence of leukemia in children and adolescents increased by about 1% per year [56]. We choose Leukemia as our final set of case studies. Table 12 shows the top 30 miRNAs predicted by our model that may be associated with Leukemia, and then the prediction results are confirmed through the three datasets.

Table 12

Top 30 leukemia-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-155	dbDEMC	16	hsa-mir-34c	dbDEMC
2	hsa-mir-146a	dbDEMC	17	hsa-mir-200b	dbDEMC
3	hsa-mir-29a	dbDEMC	18	hsa-mir-200a	dbDEMC
4	hsa-mir-221	dbDEMC	19	hsa-mir-206	dbDEMC
5	hsa-mir-133a	dbDEMC	20	hsa-let-7a	dbDEMC
6	hsa-mir-122	miRCancer	21	hsa-let-7d	dbDEMC
7	hsa-mir-222	dbDEMC	22	hsa-mir-182	dbDEMC
8	hsa-mir-29b	dbDEMC	23	hsa-mir-146b	dbDEMC
9	hsa-mir-145	dbDEMC	24	hsa-let-7e	dbDEMC
10	hsa-mir-1	dbDEMC	25	hsa-mir-195	dbDEMC
11	hsa-mir-106b	dbDEMC	26	hsa-mir-142	dbDEMC
12	hsa-mir-223	dbDEMC	27	hsa-mir-210	dbDEMC
13	hsa-mir-126	dbDEMC	28	hsa-mir-148a	dbDEMC
14	hsa-mir-15b	dbDEMC	29	hsa-mir-26a	dbDEMC
15	hsa-mir-29c	dbDEMC	30	hsa-mir-106a	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-155	dbDEMC	16	hsa-mir-34c	dbDEMC
2	hsa-mir-146a	dbDEMC	17	hsa-mir-200b	dbDEMC
3	hsa-mir-29a	dbDEMC	18	hsa-mir-200a	dbDEMC
4	hsa-mir-221	dbDEMC	19	hsa-mir-206	dbDEMC
5	hsa-mir-133a	dbDEMC	20	hsa-let-7a	dbDEMC
6	hsa-mir-122	miRCancer	21	hsa-let-7d	dbDEMC
7	hsa-mir-222	dbDEMC	22	hsa-mir-182	dbDEMC
8	hsa-mir-29b	dbDEMC	23	hsa-mir-146b	dbDEMC
9	hsa-mir-145	dbDEMC	24	hsa-let-7e	dbDEMC
10	hsa-mir-1	dbDEMC	25	hsa-mir-195	dbDEMC
11	hsa-mir-106b	dbDEMC	26	hsa-mir-142	dbDEMC
12	hsa-mir-223	dbDEMC	27	hsa-mir-210	dbDEMC
13	hsa-mir-126	dbDEMC	28	hsa-mir-148a	dbDEMC
14	hsa-mir-15b	dbDEMC	29	hsa-mir-26a	dbDEMC
15	hsa-mir-29c	dbDEMC	30	hsa-mir-106a	dbDEMC

Table 12

Top 30 leukemia-related miRNAs predicted

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-155	dbDEMC	16	hsa-mir-34c	dbDEMC
2	hsa-mir-146a	dbDEMC	17	hsa-mir-200b	dbDEMC
3	hsa-mir-29a	dbDEMC	18	hsa-mir-200a	dbDEMC
4	hsa-mir-221	dbDEMC	19	hsa-mir-206	dbDEMC
5	hsa-mir-133a	dbDEMC	20	hsa-let-7a	dbDEMC
6	hsa-mir-122	miRCancer	21	hsa-let-7d	dbDEMC
7	hsa-mir-222	dbDEMC	22	hsa-mir-182	dbDEMC
8	hsa-mir-29b	dbDEMC	23	hsa-mir-146b	dbDEMC
9	hsa-mir-145	dbDEMC	24	hsa-let-7e	dbDEMC
10	hsa-mir-1	dbDEMC	25	hsa-mir-195	dbDEMC
11	hsa-mir-106b	dbDEMC	26	hsa-mir-142	dbDEMC
12	hsa-mir-223	dbDEMC	27	hsa-mir-210	dbDEMC
13	hsa-mir-126	dbDEMC	28	hsa-mir-148a	dbDEMC
14	hsa-mir-15b	dbDEMC	29	hsa-mir-26a	dbDEMC
15	hsa-mir-29c	dbDEMC	30	hsa-mir-106a	dbDEMC

Rank	miRNA	Evidence	Rank	miRNA	Evidence
1	hsa-mir-155	dbDEMC	16	hsa-mir-34c	dbDEMC
2	hsa-mir-146a	dbDEMC	17	hsa-mir-200b	dbDEMC
3	hsa-mir-29a	dbDEMC	18	hsa-mir-200a	dbDEMC
4	hsa-mir-221	dbDEMC	19	hsa-mir-206	dbDEMC
5	hsa-mir-133a	dbDEMC	20	hsa-let-7a	dbDEMC
6	hsa-mir-122	miRCancer	21	hsa-let-7d	dbDEMC
7	hsa-mir-222	dbDEMC	22	hsa-mir-182	dbDEMC
8	hsa-mir-29b	dbDEMC	23	hsa-mir-146b	dbDEMC
9	hsa-mir-145	dbDEMC	24	hsa-let-7e	dbDEMC
10	hsa-mir-1	dbDEMC	25	hsa-mir-195	dbDEMC
11	hsa-mir-106b	dbDEMC	26	hsa-mir-142	dbDEMC
12	hsa-mir-223	dbDEMC	27	hsa-mir-210	dbDEMC
13	hsa-mir-126	dbDEMC	28	hsa-mir-148a	dbDEMC
14	hsa-mir-15b	dbDEMC	29	hsa-mir-26a	dbDEMC
15	hsa-mir-29c	dbDEMC	30	hsa-mir-106a	dbDEMC

From the above six groups of case studies, we can further confirm the reliability of our model. In summary, we provide a reliable reference and guidance for miRNA-disease association research through our proposed model.

CONCLUSION

Various malignant diseases in humans are formed in the process of miRNAs controlling gene expression and abnormal expression of miRNAs is a key element of human diseases. Therefore, accurate relationship prediction between diseases and miRNAs can promote the progress of human health. In this paper, we propose multi-task learning model (i.e. MTLMDA) to predict potential miRNA–disease associations. According to the diseases in the miRNA–disease network, we construct the corresponding gene–disease sub-network to assist the prediction task of miRNA–disease associations. Compared with five latest classic benchmark models, our proposed MTLMDA model obtains superior AUC. Moreover, the accuracy and reliability of MTLMDA model in the prediction process are confirmed through six case studies (lung cancer, colon cancer, lymphoma, breast cancer, kidney cancer and leukemia).

Key Points

This is the first work (called MTLMDA) in the field to introduce the idea of multi-task learning through the use of miRNA–disease–gene relationships.
We select the same disease and related genes from the DisGeNet as the corresponding miRNA–disease sub-network to built the genes–disease sub-network, which was used to assist the miRNA–disease relationship prediction.
MTLMDA is an end-to-end trainable graph neural network model using GCN-based autoencoder and decoder.
we compare our model with competitive baselines on a real-world dataset and conduct six case studies for both miRNAs and diseases, which confirmed the effectiveness of our model.

FUNDING

National Natural Science Foundation of China (Grant No. 62202089, U22A2004, 72192832); Shanghai Rising-Star Program (Grant No. 23QA1403100); Natural Science Foundation of Shanghai (Grant No. 21ZR1421900); General project of Liaoning Provincial Department of Education (Grant No. LJKZ0005); Doctor Startup Foundation of Liaoning Province (Grant No. 2021-BS-055), Fundamental Research Funds for the Central Universities (Grant No. N2119004).

DATA AVAILABILITY

The data and source code are available from https://github.com/qwslle/MTLMDA.

Author Biographies

Qiang He received the Ph.D. degree in computer application technology from the Northeastern University, Shenyang, China in 2020. He is currently Associate Professor at College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China.

Wei Qiao is currently pursuing a master’s degree in electronic information from the Northeastern University, Shenyang, China. His main research interests are data mining, medical informatics.

Hui Fang is an Associate Professor at Shanghai University of Finance and Economics, China. She received her PhD from Nanyang Technological University, Singapore. Her main research topic is personalized machine learning, including trust/link prediction in online communities, and recommender systems. She has published papers in leading conferences (e.g., IJCAI, AAAI and SIGIR), and journals (e.g., AIJ, TKDE, TOIS and TPAMI). She is the SE of the ECRA journal, and serves as a PC Board of IJCAI, and (Senior) PC Member for WWW, UMAP, IJCAI, AAAI and AAMAS, etc.

Yang Bao received Ph.D. degree in Information Systems from the School of Computing at the National University of Singapore, and bachelor & master degree from Nanjing University. He is currently an Associate Professor at the Antai College of Economics and Management (ACEM), Shanghai Jiao Tong University (SJTU), Shanghai, China.

REFERENCES

1.

Pasquinelli

AE

,

Ruvkun

G

.

Control of developmental timing by micrornas and their targets

.

Annu Rev Cell Dev Biol

2002

;

18

:

495

–

513

.

2.

Lee

RC

,

Feinbaum

RL

,

Ambros

V

.

The c. elegans heterochronic gene lin-4 encodes small rnas with antisense complementarity to lin-14

.

Cell

1993

;

75

(

5

):

843

–

54

.

3.

Ambros

V

.

The functions of animal micrornas

.

Nature

2004

;

431

(

7006

):

350

–

5

.

4.

Nahand

JS

,

Shojaie

L

,

Akhlagh

SA

, et al. .

Cell death pathways and viruses: role of micrornas

.

Mol Ther-Nucleic Acids

2021

;

24

:

487

–

511

.

5.

Meltzer

PS

.

Small rnas with big impacts

.

Nature

2005

;

435

(

7043

):

745

–

6

.

6.

Chen

X

,

Xie

D

,

Zhao

Q

,

You

ZH

.

Micrornas and complex diseases: from experimental results to computational models

.

Brief Bioinform

2019

;

20

(

2

):

515

–

39

.

7.

Iorio

MV

,

Ferracin

M

,

Liu

C

, et al. .

Microrna gene expression deregulation in human breast cancer

.

Cancer Res

2005

;

65

(

16

):

7065

–

70

.

8.

Sayed

D

,

Abdellatif

M

.

Micrornas in development and disease

.

Physiol Rev

2011

;

91

(

3

):

827

–

87

.

9.

Fani

M

,

Zandi

M

,

Ebrahimi

S

, et al. .

The role of mirnas in covid-19 disease

.

Future Virology

2021

;

16

(

4

):

301

–

6

.

10.

Li

C

,

Hu

X

,

Li

L

,

Li

JH

.

Differential microrna expression in the peripheral blood from human patients with covid-19

.

J Clin Lab Anal

2020

;

34

:

e23590

.

11.

Freeman

WM

,

Walker

SJ

,

Vrana

KE

.

Quantitative rt-pcr: pitfalls and potential

.

Biotechniques

1999

;

26

(

1

):

112

–

25

.

12.

Baskerville

S

,

Bartel

DP

.

Microarray profiling of micrornas reveals frequent coexpression with neighboring mirnas and host genes

.

RNA

2005

;

11

(

3

):

241

–

7

.

13.

Várallyay

E

,

Burgyán

J

,

Havelda

Z

.

Microrna detection by northern blotting using locked nucleic acid probes

.

Nat Protoc

2008

;

3

(

2

):

190

–

6

.

14.

Lu

M

,

Zhang

Q

,

Deng

M

, et al. .

An analysis of human microrna and disease associations

.

PloS One

2008

;

3

(

10

):

e3420

.

15.

Gu

C

,

Liao

B

,

Li

X

,

Li

K

.

Network consistency projection for human miRNA-disease associations inference

.

Sci Rep

2016

;

6

(

1

):

1

–

10

.

16.

Xuan

P

,

Han

K

,

Guo

Y

, et al. .

Prediction of potential disease-associated micrornas based on random walk

.

Bioinformatics

2015

;

31

(

11

):

1805

–

15

.

17.

Chen

X

,

Wang

L

,

Qu

J

, et al. .

Predicting miRNA–disease association based on inductive matrix completion

.

Bioinformatics

2018

;

34

(

24

):

4256

–

65

.

18.

Xu

J

,

Li

CX

,

Lv

JY

, et al. .

Prioritizing candidate disease mirnas by topological features in the mirna target–dysregulated network: case study of prostate cancermirna target–dysregulated network

.

Mol Cancer Ther

2011

;

10

(

10

):

1857

–

66

.

19.

Chen

X

,

Wang

CC

,

Yin

J

,

You

ZH

.

Novel human mirna-disease association inference based on random forest

.

Mol Therapy-Nucleic Acids

2018

;

13

:

568

–

79

.

20.

Li

J

,

Zhang

S

,

Liu

T

, et al. .

Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction

.

Bioinformatics

2020

;

36

(

8

):

2538

–

46

.

21.

Yan

C

,

Duan

G

,

Li

N

, et al. .

PDMDA: predicting deep-level miRNA–disease associations with graph neural networks and sequence features

.

Bioinformatics

2022

;

38

(

8

):

2226

–

34

.

22.

Long

M

,

Cao

Z

,

Wang

J

, et al. .

Learning multiple tasks with multilinear relationship networks

.

Adv Neural Inform Process Syst

2017

;

30

.

23.

Zhang

Y

,

Yang

Q

.

A survey on multi-task learning

.

IEEE Trans Knowl Data Eng

2021

;

34

(

12

):

5586

–

609

.

24.

Chen

X

,

Liu

MX

,

Yan

GY

.

Rwrmda: predicting novel human microrna–disease associations

.

Mol Biosyst

2012

;

8

(

10

):

2792

–

8

.

25.

Chen

X

,

Yan

CC

,

Zhang

X

, et al. .

Wbsmda: within and between score for mirna-disease association prediction

.

Sci Rep

2016

;

6

(

1

):

1

–

9

.

26.

Wang

F

,

Zhang

C

.

Label propagation through linear neighborhoods

.

IEEE Trans Knowl Data Eng

2007

;

20

(

1

):

55

–

67

.

27.

Li

G

,

Luo

J

,

Xiao

Q

, et al. .

Predicting microrna-disease associations using label propagation based on linear neighborhood similarity

.

J Biomed Inform

2018

;

82

:

169

–

77

.

28.

Wang

YT

,

Wu

QW

,

Gao

Z

, et al. .

MiRNA-disease association prediction via hypergraph learning based on high-dimensionality features

.

BMC Med Inform Decis Mak

2021

;

21

:

1

–

13

.

29.

Ha

J

,

Park

S

.

NCMD: Node2vec-based neural collaborative filtering for predicting miRNA-disease association

.

IEEE/ACM Trans Comput Biol Bioinform

2022

.

30.

Alaimo

S

,

Giugno

R

,

Pulvirenti

A

.

Ncpred: ncrna-disease association prediction through tripartite network-based inference

.

Front Bioeng Biotechnol

2014

;

2

:

71

.

31.

Yu

L

,

Shen

X

,

Zhong

D

,

Yang

J

.

Three-layer heterogeneous network combined with unbalanced random walk for mirna-disease association prediction

.

Front Genet

2020

;

10

:

1316

.

32.

Chen

X

,

Clarence Yan

C

,

Zhang

X

, et al. .

Rbmmmda: predicting multiple types of disease-microrna associations

.

Sci Rep

2015

;

5

(

1

):

1

–

13

.

33.

Yao

D

,

Zhan

X

,

Kwoh

CK

.

An improved random forest-based computational model for predicting novel mirna-disease associations

.

BMC Bioinformatics

2019

;

20

(

1

):

1

–

14

.

34.

Zheng

K

,

You

ZH

,

Wang

L

, et al. .

Mlmda: a machine learning approach to predict and validate microrna–disease associations by integrating of heterogenous information sources

.

J Transl Med

2019

;

17

(

1

):

1

–

14

.

35.

Ji

C

,

Gao

Z

,

Ma

X

, et al. .

Aemda: inferring mirna–disease associations based on deep autoencoder

.

Bioinformatics

2021

;

37

(

1

):

66

–

72

.

36.

Liu

D

,

Huang

Y

,

Nie

W

, et al. .

Smalf: mirna-disease associations prediction based on stacked autoencoder and xgboost

.

BMC bioinformatics

2021

;

22

(

1

):

1

–

18

.

37.

Hamilton

W

,

Ying

Z

,

Leskovec

J

.

Inductive representation learning on large graphs

.

Adv Neural Inform Process. Syst

2017

;

30

.

38.

Li

Z

,

Li

J

,

Nie

R

, et al. .

A graph auto-encoder model for mirna-disease associations prediction

.

Brief Bioinform

2021

;

22

(

4

).

39.

Wang

J

,

Li

J

,

Yue

K

, et al. .

NMCMDA: neural multicategory MiRNA–disease association prediction

.

Brief Bioinform

2021

;

22

(

5

):

bbab074

.

40.

Lou

Z

,

Cheng

Z

,

Li

H

, et al. .

Predicting miRNA–disease associations via learning multimodal networks and fusing mixed neighborhood information

.

Brief Bioinform

2022

;

23

(

5

).

41.

Li

Y

,

Qiu

C

,

Tu

J

, et al. .

Hmdd v2. 0: a database for experimentally supported human microrna and disease associations

.

Nucleic Acids Res

2014

;

42

(

D1

):

D1070

–

4

.

42.

Piñero

J

,

Queralt-Rosinach

N

,

Bravo

A

, et al. .

Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes

.

Database

2015

;

2015

.

43.

Wang

D

,

Wang

J

,

Lu

M

, et al. .

Inferring the human microrna functional similarity and functional network based on microrna-associated diseases

.

Bioinformatics

2010

;

26

(

13

):

1644

–

50

.

44.

Van Laarhoven

T

,

Nabuurs

SB

,

Marchiori

E

.

Gaussian interaction profile kernels for predicting drug–target interaction

.

Bioinformatics

2011

;

27

(

21

):

3036

–

43

.

45.

Goh

KI

,

Cusick

ME

,

Valle

D

, et al. .

The human disease network

.

Proc Natl Acad Sci

2007

;

104

(

21

):

8685

–

90

.

46.

Li

Z

,

Zhong

T

,

Huang

D

, et al. .

Hierarchical graph attention network for mirna-disease association prediction

.

Mol Ther

2022

;

30

(

4

):

1775

–

86

.

47.

Defferrard

M

,

Bresson

X

,

Vandergheynst

P

.

Convolutional neural networks on graphs with fast localized spectral filtering

.

Adv Neural Inform Process Syst

2016

;

29

.

48.

Wang

M Y.

Deep graph library: Towards efficient and scalable deep learning on graphs

. In:

ICLR Workshop on Representation Learning on Graphs and Manifolds

,

2019

.

49.

Tang

X

,

Luo

J

,

Shen

C

,

Lai

Z

.

Multi-view multichannel attention graph convolutional network for mirna–disease association prediction

.

Brief Bioinform

2021

;

22

(

6

):

bbab174

.

50.

Huang

Z

,

Shi

J

,

Gao

Y

, et al. .

Hmdd v3. 0: a database for experimentally supported human microrna–disease associations

.

Nucleic Acids Res

2019

;

47

(

D1

):

D1013

–

7

.

51.

Bandyopadhyay

S

,

Mitra

R

,

Maulik

U

,

Zhang

MQ

.

Development of the human cancer microrna network

.

Silence

2010

;

1

(

1

):

6

–

14

.

52.

Yang

Z

,

Ren

F

,

Liu

C

, et al. .

dbdemc: a database of differentially expressed mirnas in human cancers

. In:

BMC Genomics

, Vol.

11

.

Springer

,

2010

,

1

–

8

.

53.

Jiang

Q

,

Wang

Y

,

Hao

Y

, et al. .

mir2disease: a manually curated database for microrna deregulation in human disease

.

Nucleic Acids Res

2009

;

37

(

suppl_1

):

D98

–

104

.

54.

Xie

B

,

Ding

Q

,

Han

H

,

Wu

D

.

Mircancer: a microrna–cancer association database constructed by text mining on literature

.

Bioinformatics

2013

;

29

(

5

):

638

–

44

.

55.

Hirsch

FR

,

Jänne

PA

,

Eberhardt

WE

, et al. .

Epidermal growth factor receptor inhibition in lung cancer: status 2012

.

J Thorac Oncol

2013

;

8

(

3

):

373

–

84

.

56.

Cokkinides

V

,

Albano

J

,

Samuels

A

, et al. .

American Cancer Society: Cancer Facts and Figures

.

Atlanta

:

American Cancer Society

,

2005

.

Google Preview

57.

Ward Elizabeth

M

, et al. .

Global cancer in women: burden and trend

.

Cancer Epidemiol Biomarkers Prevention: Publ Am Assoc Cancer Res

2017

;

26

(

4

):

444

–

57

.