NMCMDA: neural multicategory MiRNA–disease association prediction

Statistics of the datasets used in this study

Data	#d	#m	#c	#m–d	Sr
MCD-6	894	1208	6	25849	0.399%
MCD-20	894	1208	20	25849	0.120%
MCD-4	171	385	4	2003	0.761%
TDRC v2.0	169	324	4	1675	0.681%
TDRC v3.2	447	713	5	16341	1.025%

Data	#d	#m	#c	#m–d	Sr
MCD-6	894	1208	6	25849	0.399%
MCD-20	894	1208	20	25849	0.120%
MCD-4	171	385	4	2003	0.761%
TDRC v2.0	169	324	4	1675	0.681%
TDRC v3.2	447	713	5	16341	1.025%

#d, disease number; #m, miRNA number; #c, category number; #m–d, association number; Sr, sparsity rate.

Table 1

Statistics of the datasets used in this study

Data	#d	#m	#c	#m–d	Sr
MCD-6	894	1208	6	25849	0.399%
MCD-20	894	1208	20	25849	0.120%
MCD-4	171	385	4	2003	0.761%
TDRC v2.0	169	324	4	1675	0.681%
TDRC v3.2	447	713	5	16341	1.025%

Data	#d	#m	#c	#m–d	Sr
MCD-6	894	1208	6	25849	0.399%
MCD-20	894	1208	20	25849	0.120%
MCD-4	171	385	4	2003	0.761%
TDRC v2.0	169	324	4	1675	0.681%
TDRC v3.2	447	713	5	16341	1.025%

#d, disease number; #m, miRNA number; #c, category number; #m–d, association number; Sr, sparsity rate.

We observe that all miRNA–disease associations in HMDD can be divided into different categories. Supplementary Table 1 gives the statistics of m–d associations of different categories in MCD-6, MCD-4 and TDRC v3.2, respectively. On the other hand, some miRNA–disease associations may occur simultaneously in different categories. Supplementary Table 2 gives the statistics of disease–miRNA associations appearing simultaneously in different categories. We can see that although many miRNA–disease associations belong to only one category, some still belong to two or more categories. This further increases the challenge of accurate predictions.

MiRNA–miRNA similarity

In this study, miRNA similarity is measured using the miRNA’s functional similarity score and the Gaussian interaction profile kernel similarity score. The miRNA functional similarity between a miRNA|$i$| and |$j$|is defined as follows:

$$\begin{equation} \mathrm{MS}\left(i,j\right)=\left\{\kern-6pt\begin{array}{c}\mathrm{FS}\left(i,j\right)\ \mathrm{the}\ \mathrm{entry}\ \mathrm{in}\ \mathrm{MISIM}\ \mathrm{database}\ \\{}\mathrm{mGS}\left(i,j\right)\ \mathrm{otherwise}\ \end{array}\right. \end{equation}$$

(1)

where |$\mathrm{FS}(i,j)$| is the functional similarity scores download from the MISIM 2.0 database (http://www.lirmed.com/misim/). |$\mathrm{mGS}(i,j)$| is the Gaussian interaction profile kernel similarity score [6, 13, 36, 37], which is used to supplement the missing entries in MISIM. Specifically, |$\mathrm{mGS}(i,j)$| is calculated by follows:

$$\begin{equation} \mathrm{mGS}\left(i,j\right)=\exp \left(-{\theta}_{\mathrm{m}}{\left\Vert \mathbf{T}\left[i,:\right]-\mathbf{T}\left[j,:\right]\right\Vert}^2\right) \end{equation}$$

(2)

where |$\mathbf{T}[i,:]$| represents the |$i$|-row in the adjacent matrix |$\mathbf{T}$| and |${\theta}_{\mathrm{m}}$| is the kernel bandwidth parameter which is calculated by the following formula:

$$\begin{equation} {\theta}_{\mathrm{m}}=\frac{1}{m}{\sum}_{i=1}^m{\left\Vert \mathbf{T}\left[i,:\right]\right\Vert}^2 \end{equation}$$

(3)

where |$m$| is the number of miRNAs i.e. the row number of |$\mathbf{T}$|⁠.

With |$\mathrm{MS}(i,j)$|⁠, the miRNA functional similarity matrix is denoted by |${\mathbf{A}}_{\mathrm{m}}\in{\mathbb{R}}^{m\times m}$|and constructed by |${[{\mathbf{A}}_{\mathrm{m}}]}_{ij}=\mathrm{MS}(i,j)$|⁠.

Disease–disease similarity

The disease–disease semantic similarity scores were calculated based on the disease hierarchical directed acyclic graph (DAG) from the MeSH database (https://www.nlm.nih.gov/mesh/). First, let|$i$| be a disease. |$\mathrm{dag}(i)$| indicates the node set, including node |$i$| and its ancestor nodes in the disease DAG. Then, the first semantic contribution of a disease |$t$| to the disease |$i$| is denoted by |${\mathrm{SC}}_1(i,t)$| and can be formulated using the following equations [38],

$$\begin{equation} \left\{\begin{array}{c}\ {\mathrm{SC}}_1\left(i,t\right)=1\ if\ t=i\ \\ {\mathrm{SC}}_1\left(i,t\right)=\max \left\{\gamma\ {\mathrm{SC}}_1\left(i,{t}^{\prime}\right)|{t}^{\prime}\in children\ of\ t\right\}\ if\ t\ne i\ \end{array}\right. \end{equation}$$

(4)

Based on the definition of semantic contribution in Equation (4), the first semantic similarity scores between different diseases, denoted by |${\mathrm{dS}}_1$| was established. Let |$i,j$| be two different diseases. |${\mathrm{dS}}_1(i,j)$| is defined as follows:

$$\begin{equation} {\mathrm{dS}}_1\left(i,j\right)=\frac{\sum_{t\in \mathrm{dag}(i)\cap \mathrm{dag}(j)}\left({\mathrm{SC}}_1\left(i,t\right)+{\mathrm{SC}}_1\left(j,t\right)\right)}{\sum_{t\in \mathrm{dag}(i)}{\mathrm{SC}}_1\left(i,t\right)+\sum_{t\in \mathrm{dag}(j)}{\mathrm{SC}}_1\left(j,t\right)} \end{equation}$$

(5)

Intuitively, |${\mathrm{dS}}_1(i,j)$| is higher if the larger part of DAG is shared by i and j.

However, |${\mathrm{dS}}_1$|ignores the significance of different disease contributions. Supposing that |$i,t,q\in \mathrm{D}$|⁠, if disease |$t$| only appears in the |$\mathrm{dag}(i)$|⁠, and |$q$| appears in both |$\mathrm{dag}(i)$| and the dag of other diseases, |$t$| might have higher semantic contribution to |$i$| than |$q$|⁠. Thus, the second semantic contribution score |${\mathrm{SC}}_2(i,t)$| was presented as follows:

$$\begin{equation} {\mathrm{SC}}_2\left(i,t\right)=-\log \left(\frac{\mathrm{the}\ \mathrm{number}\ \mathrm{of}\ \mathrm{dags}\ \mathrm{including}\ t}{\mathrm{the}\ \mathrm{number}\ \mathrm{of}\ \mathrm{disease}}\right) \end{equation}$$

(6)

Based on |${\mathrm{SC}}_2(i,t)$|⁠, the second semantic similarity score |${\mathrm{dS}}_2$|⁠, between two diseases was presented as follows [38]:

$$\begin{equation} {\mathrm{dS}}_2\left(i,j\right)=\frac{\sum_{t\in \mathrm{dag}(i)\cap \mathrm{dag}(j)}\left({\mathrm{SC}}_2\left(i,t\right)+{\mathrm{SC}}_2\left(j,t\right)\right)}{\sum_{t\in \mathrm{dag}(i)}{\mathrm{SC}}_2\left(i,t\right)+\sum_{t\in \mathrm{dag}(j)}{\mathrm{SC}}_2\left(j,t\right)} \end{equation}$$

(7)

As disease similarity measures calculated using |${\mathrm{dS}}_1$| and |${\mathrm{dS}}_2$| are both from the MeSH database, it provides only a part of the entries in diseases semantic similarity matrix. Hence, the Gaussian interaction profile kernel similarity was adopted to complement the remaining disease similarity entries.

Let |$\mathbf{T}\in{\{0,1\}}^{m\times n}$| be the adjacent matrix constructed using the known HMDD v2.0 miRNA–disease association data. |$\mathbf{T}[:,j]$| is the |$j$|-column binary vector representing disease |$j$|⁠. Then, Gaussian interaction profile kernel similarity between disease |$i$| and disease |$j$| is defined as:

$$\begin{equation} \mathrm{dGS}\left(i,j\right)=\exp \left(-{\theta}_{\mathrm{d}}{\left\Vert \mathbf{T}\left[:,i\right]-\mathbf{T}\left[:,j\right]\right\Vert}^2\right) \end{equation}$$

(8)

where |${\theta}_{\mathrm{d}}$| is the kernel bandwidth parameter calculated using the following formula:

$$\begin{equation} {\theta}_{\mathrm{d}}=\frac{1}{n}{\sum}_{j=1}^n{\left\Vert \mathbf{T}\left[:,j\right]\right\Vert}^2 \end{equation}$$

(9)

where |$n$| is the number of diseases i.e. the column number of |$\mathbf{T}$|⁠.

With |${\mathrm{dS}}_1$|⁠,|${\mathrm{dS}}_2$| and |$\mathrm{dGS}$|⁠, the disease semantic similarity matrix is denoted by |${\mathbf{A}}_{\mathrm{d}}\in{\mathbb{R}}^{n\times n}$|and constructed using

$$\begin{equation} {\left[{\mathbf{A}}_{\mathrm{d}}\right]}_{ij}=\left\{\kern-6pt\begin{array}{c}\frac{{\mathrm{d}\mathrm{S}}_1\left(i,j\right)+{\mathrm{d}\mathrm{S}}_2\left(i,j\right)}{2},\mathrm{if}\ i\ \mathrm{and}\ j\ \mathrm{has}\ \mathrm{semantic}\ \mathrm{similarity}\ \mathrm{score}\\{}\mathrm{dGS}\left(i,j\right),\kern4.25em \mathrm{otherwise}\end{array}\right. \end{equation}$$

(10)

Methods

In this section, we first formulate the multicategory miRNA–disease association prediction as a tensor completion problem. Then, an end-to-end learning-based prediction model is proposed to solve the problem.

Problem formulation

Multicategory associations could be organized into a binary three-way tensor|$\mathbf{\mathcal{T}}\in{\{0,1\}}^{|M|\times |D|\times |C|}$|⁠, where|$|M|=m$|⁠,|$|D|=n$| and |$|C|=R$| represent the size of the set of miRNAs, diseases and association categories, respectively. The |$r\in \{1,2,\dots, R\}$| slice of |$\mathbf{\mathcal{T}}$| is an adjacent matrix |${\mathbf{T}}^{(r)}\in{\{0,1\}}^{|M|\times |D|}$| with 0–1 entries with regard to the |$r$|-category known miRNA–disease associations, where|${\mathbf{T}}^{(r)}(i,j)=1$| if a miRNA |$i\in M$| is associated with a disease |$j\in D$| in the |$r$|-evidence category. |${\mathbf{T}}^{(r)}(i,j)=0$| if the association between a miRNA |$i$| and a disease |$j$| is unknown or unobserved.

A problem of multi-category miRNA–disease association prediction (hereafter MCMDA) can be considered with |$m$| miRNAs and |$n$| diseases, and the partially observed |$m\times n\times R$| three-way association tensor|$\mathbf{\mathcal{T}}\in{\{0,1\}}^{m\times n\times R}$| where each entry |$\mathbf{\mathcal{T}}(i,j,r)=1$| represents miRNA |$i$| is associated with disease |$j$| in the |$r$|-category evidence. |$\mathbf{\mathcal{T}}(i,j,r)=0$| if the association between a miRNA |$i$| and a disease |$j$| is unknown or unobserved in the |$r$|-category evidence. Then, the objective of MCMDA is to find an approximation tensor |$\hat{\mathbf{\mathcal{T}}}\in{\mathbb{R}}^{m\times n\times R}$| such that:

$$\begin{equation} {\min}_{\hat{\mathbf{\mathcal{T}}}}\frac{1}{2}{\left\Vert{P}_{\Omega}\left(\mathbf{\mathcal{T}}-\hat{\mathbf{\mathcal{T}}}\right)\right\Vert}_{\mathrm{F}}^2 \end{equation}$$

(11)

where |${\Vert \mathbf{\mathcal{X}}\Vert}_{\mathrm{F}}^2$| is the tensor Fresenius norm and defined as |${\Vert \mathbf{\mathcal{X}}\Vert}_{\mathrm{F}}^2={\sum}_{i=1}^m{\sum}_{j=1}^n{\sum}_{r=1}^R{(\mathbf{\mathcal{X}}(i,j,r))}^{\mathbf{2}}$|⁠. |$\Omega$|is an index set denoting the indices of observations. |${P}_{\Omega}(\mathbf{\mathcal{X}})$| is the projection of |$\mathbf{\mathcal{X}}$| onto the set |$\Omega$|⁠. The intuitive explanation of Equation (11) is that: for MCMDA, we expect to find a tensor|$\hat{\mathbf{\mathcal{T}}}$|⁠, which subjects to the equality constraints given by the observations (a.k.a., experimentally validated associations).

We note that the objective of Equation (11) only considers the experimentally validated associations. From Table 1, we know that the sparsity rate of |$\mathbf{\mathcal{T}}$| is very low, which means only a few of the experimental validated miRNA–disease entries can be observed. There are many unknown or unobserved entries in |$\mathbf{\mathcal{T}}$|⁠. It should be pointed out that ‘unknown’ does not mean ‘no interaction’. ‘Unknown’ includes two situations, one is ‘no-interaction’, the other is ‘interaction but we don’t know yet’. Neglecting these entries in the optimization objective may yield degenerate results. To remedy this deficiency, well-balancing tradeoffs between the observed and unknown (or unobserved) entries should be considered in the objective function. Additionally, other side information about diseases and miRNAs can be exploited to further enhance the prediction performance. Moreover, a parametrized model should be established to yield |$\hat{\mathbf{\mathcal{T}}}$|⁠. In summary, the MCMDA can be more accurately defined as follows:

$$\begin{equation} {\min}_{\mathbf{\mathcal{W}}}\frac{\left(1-\alpha \right)}{2}{\left\Vert{P}_{\Omega}\left(\mathbf{\mathcal{T}}-\hat{\mathbf{\mathcal{T}}}\left(\mathbf{X},\mathbf{Y};\mathbf{\mathcal{W}}\right)\right)\right\Vert}_{\mathrm{F}}^2+\frac{\alpha }{2}{\left\Vert{P}_{\overline{\Omega}}\left(\mathbf{\mathcal{T}}-\hat{\mathbf{\mathcal{T}}}\left(\mathbf{X},\mathbf{Y};\mathbf{\mathcal{W}}\right)\right)\right\Vert}_{\mathrm{F}}^2 \end{equation}$$

(12)

where the parameter |$\alpha \in (0,1)$| is a bias item that appropriately weighs observed and unobserved entries. |$\overline{\Omega}$| denoted a subset set of unknown or unobserved entries in |$\mathbf{\mathcal{T}}$|⁠. |$\mathbf{\mathcal{W}}$| are the trainable parameters for yielding |$\hat{\mathbf{\mathcal{T}}}$|⁠. Figure 1 shows a general flowchart of MCMDA.

Figure 1

The general flowchart of MCMDA, which is mainly composed of two steps: (Step1) The feature representation learning modular learns miRNA and disease latent representations with similarity information and the known associations as input. (Step2) Prediction modular yields multicategory association scores. Totally, MCMDA is formulated as a tensor completion problem with side information.

When we solve the MCMDA defined by Equation (12), some questions must be answered. First, what are the appropriate feature representations for miRNAs and diseases? How can the feature matrices |$\mathbf{X}$| and |$\mathbf{Y}$| be obtained by exploiting the miRNA–miRNA similarity, disease–disease similarity, and the known different categories of miRNA–disease associations? Second, how to capture the dependency relationship between the latent feature representation and different association categories and establish the well-designed parameterized model to yield prediction score tensor|$\hat{\mathbf{\mathcal{T}}}$|? Finally, can we integrate the above two modules effectively to build an end-to-end learning pipeline from miRNA and disease feature representations learning to multiple-association prediction?

To answer these questions and solve the problem of MCMDA, in this paper, we develop an end-to-end learning-based prediction model that operates directly on a miRNA–disease multirelational heterogeneous graph. Generally, our proposed model has two main components: (i) encoder leverages graph neural network on the miRNA–disease multirelational heterogeneous graph to learn miRNA and disease latent representations, respectively. (ii) Decoder yields miRNA–disease association scores with the learned latent representations as input. In the following, we discuss the details of two components.

Encoders for latent feature of MiRNA and disease

In this subsection, two kinds of encoders are proposed to obtain latent features of miRNAs and diseases. The first is GCN-encoder which maps the nodes of miRNAs and diseases to the latent features by using graph convolutional network on the miRNA–miRNA similarity network and disease–disease similarity network, respectively. The second is RGCN-encoder which yields latent features by both exploiting similarity networks and the known different categories of miRNA–disease associations.

GCN-encoder

Let us start with discussing the details of the GCN-encoder. The basic assumption for miRNA–disease association prediction is that functionally similar miRNAs are more likely to be associated with phenotypically similar diseases, and vice versa. Therefore, similarity information among miRNAs and diseases are crucial for MDA. Here, the miRNA and disease latent feature representations are yielded by leveraging Graph Convolutional Network (GCN) [39] on the miRNA (or disease) similarity networks.

Let |${G}_m$| and |${G}_d$| be the miRNA functional similarity network and disease semantic similarity network, respectively. |${\mathbf{A}}_{\mathrm{m}}$| denotes the adjacent matrix for |${G}_m$| and |${\mathbf{A}}_d$| for |${G}_d$|⁠. |$|{V}_m|=m$| and |$|{V}_d|=n$| denote the size of the node set |${V}_m$| over |${G}_m$| and |${V}_d$| over |${G}_d,$| respectively. Let |$\mathbf{X}$| and |$\mathbf{Y}$| be the initial features on the set of nodes of the graph |${G}_m$| and |${G}_d$|⁠, respectively. GCN learns a node |$i$|’s feature by exploiting hierarchically aggregating feature information from |$i$|’s neighborhood. Next, we introduced the method of learning features for miRNAs over |${G}_m$|⁠. The way of learning features for diseases over |${G}_d$| is a similar process.

Specifically, a one-layer feature aggregating operator for a node |$i$| in a GCN is defined as:

$$\begin{equation} {\mathbf{x}}_i^{\left(l+1\right)}=\sigma \left({\sum}_{j\in N(i)\cup \left\{i\right\}}\frac{1}{c_{i,r}}{\boldsymbol{\Theta}}_m^{(l)}{\mathbf{x}}_j^{(l)}\right) \end{equation}$$

(13)

where let |${\mathbf{x}}_i^{(l)}\in{\mathbb{R}}^{f_l}$| be the hidden representation of the miRNA node |$i$| in the l-th layer. |$\tilde{\mathbf{A}}_m={\mathbf{A}}_m+{\mathbf{I}}_m$|denotes the adjacent matrix of |${G}_m$|with self-loop, where |${\mathbf{I}}_m$| is the identity matrix. |$\tilde{\mathbf{D}}_m$| denotes the diagonal matrix with |${[\tilde{\mathbf{D}}_m]}_{ii}={\sum}_j{[{\tilde{\mathbf{A}}}_m]}_{ij}$|⁠. |$\boldsymbol{\Theta} \in{\mathbb{R}}^{f_{l+1}\times{f}_l}$| is the filter parameters matrix with |${f}_l$| input channels and |${f}_{l+1}$| filters. |$\sigma (\cdot )$| is a nonlinear active function, such as ReLU. |${c}_{i,r}=\sqrt{\tilde{\mathbf{D}}_m(i,i)\tilde{\mathbf{D}}_m(j,j)}$| is a normalization constant. The equivalent matrix form can be written as follows.

$$\begin{equation} {\mathbf{X}}^{\left(l+1\right)}=\sigma \left(\tilde{\mathbf{D}}_m^{-1/2}{\tilde{\mathbf{A}}}_m\tilde{\mathbf{D}}_m^{-1/2}{\boldsymbol{\Theta}}_m^{(l)}{\mathbf{X}}^{(l)}\right) \end{equation}$$

(14)

Denotes |$\tilde{\mathbf{D}}_m^{-1/2}{\tilde{\mathbf{A}}}_m\tilde{\mathbf{D}}_m^{-1/2}$| by |$\tilde{\mathbf{L}}_m$| to simplify notations. Then, multiple layers feature aggregation operators can be stacked as an |$L$|-layer GCNs denoted as follows:

$$\begin{equation} {\mathrm{GCN}}_m^{(L)}\left(\mathbf{X};{\boldsymbol{\Theta}}_m\right)=\mathrm{ReLU}\left(\tilde{\mathbf{L}}_m{\boldsymbol{\Theta}}_m^{(L)}R\mathrm{eLU}\left(\cdots R\mathrm{eLU}\left(\tilde{\mathbf{L}}_m{\boldsymbol{\Theta}}_m^{(1)}\mathbf{X}\right)\cdots \right)\right) \end{equation}$$

(15)

where |$\mathbf{X}$| is a randomly initialized feature matrix for all miRNAs and |${\boldsymbol{\Theta}}_m=\{{\boldsymbol{\Theta}}_m^{(1)},\dots, {\boldsymbol{\Theta}}_m^{(L)}\}$| is the total trainable parameters. Similarly, for diseases we have

$$\begin{equation} {\mathrm{GCN}}_d^{(L)}\left(\mathbf{Y};{\boldsymbol{\Theta}}_d\right)=\mathrm{ReLU}\left(\tilde{\mathbf{L}}_d{\boldsymbol{\Theta}}_d^{(L)}R\mathrm{eLU}\left(\cdots R\mathrm{eLU}\left(\tilde{\mathbf{L}}_d{\boldsymbol{\Theta}}_d^{(1)}\mathbf{Y}\right)\cdots \right)\right) \end{equation}$$

(16)

Thus, considering a miRNA functional similarity network and a disease semantic similarity network, starting from the randomly initialized embedding |$\mathbf{X}$| and |$\mathbf{Y}$|⁠, GCN transforms the features in a layer-by-layer manner and finally outputs |${\mathbf{X}}^{(L)}$| and |${\mathbf{Y}}^{(L)}$|⁠. These learned features will be used as the input for the downstream multicategory association prediction model.

Note that GCN-encoder neglects the known multiple categories of associations between miRNA and disease when yielding the latent feature. Naturally, the quality of latent features may be further improved by considering these associations. In fact, miRNA, disease similarity networks, and the known miRNA–disease associations together form heterogeneous multiple relational networks. Hence, we can exploit the Relational Graph Convolutional Network [40] to yield the latent feature representations.

RGCN-encoder

RGCN-encoder is also a graph convolutional network-based encoder which yields latent features by exploiting both (miRNA, disease) similarity networks and the known different categories of miRNA–disease associations. Specifically, one-layer feature aggregating operator for a miRNA |$i$| by an RGCN-encoder is defined as:

$$\begin{equation} {\mathbf{x}}_i^{\left(l+1\right)}=\sigma \left({\sum}_{r\in R}{\sum}_{j\in{N}_i^r}\frac{1}{c_{i,r}}{\boldsymbol{\Theta}}_m^{r,(l)}{\mathbf{x}}_j^{(l)}+{\boldsymbol{\Theta}}_m^{(l)}{\mathbf{x}}_i^{(l)}\right) \end{equation}$$

(17)

where let |${\mathbf{x}}_i^{(l)}\in{\mathbb{R}}^{f_l}$| be the current layer hidden representation of the miRNA |$i$|⁠, |${N}_i^r$| is the |$r$|-category neighbors of node |$i$|⁠, |${c}_{i,r}=|{N}_i^r|$| is a normalization constant, |${\boldsymbol{\Theta}}_m^{r,(l)}\in{\mathbb{R}}^{f_{l+1}\times{f}_l}$| is a filter parameters matrix for the |$r$| category association relationship which is shared by all |$r$| category association neighbors of |$i$|⁠. |${\boldsymbol{\Theta}}_m^{(l)}$| is a filter parameters matrix for each node itself. Likely, we also have a one-layer feature aggregating operator for a disease i as follows.

$$\begin{equation} {\mathbf{y}}_i^{\left(l+1\right)}=\sigma \left({\sum}_{r\in R}{\sum}_{j\in{N}_i^r}\frac{1}{c_{i,r}}{\boldsymbol{\Theta}}_d^{r,(l)}{\mathbf{y}}_j^{(l)}+{\boldsymbol{\Theta}}_d^{(l)}{\mathbf{y}}_i^{(l)}\right) \end{equation}$$

(18)

Similar to GCN-encoder, we can stack multilayer RGCN feature aggregation operations to form an RGCN encoder. The RGCN-encoders for miRNA and disease are denoted by |$\mathrm{R}{\mathrm{GCN}}_m^{(L)}(\mathbf{X};{\boldsymbol{\Theta}}_m)$| and |$\mathrm{R}{\mathrm{GCN}}_d^{(L)}(\mathbf{Y};{\boldsymbol{\Theta}}_d)$|⁠, respectively.

Different from the idea of GCN-encoder, the RGCN-encoder fully considers the different types of relations between miRNAs and diseases, such as miRNA–miRNA similarities, disease–disease similarities and multiple categories of miRNA–disease associations, and transforms the features into different types of feature space with relation-specific filter parameters matrix and finally aggregate different type features. We will demonstrate that this will improve the quality of representation learning via experimental results.

Decoders score multicategory miRNA–disease association

In our proposed prediction model, an association prediction score for the |$r$|-category is obtained by a decoder which is a parameterized score function |${\mathrm{Dec}}^{(r)}(\mathbf{X},\mathbf{Y};\mathbf{\mathcal{W}}):{\mathbb{R}}^m\times R\times{\mathbb{R}}^d\to \mathbb{R}$| where |$\mathbf{X},\mathbf{Y}$| are the encoded miRNA and disease features, respectively. In this paper, we introduce the following three different decoders.

DistMult decoder (denoted by |${\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}$|⁠) adopts the DistMult factorization [41] as the scoring function, which is known to perform well on standard multirelational link prediction problem. In |${\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}$|⁠, every category |$r$| is associated with a diagonal matrix |${\mathbf{D}}^{(r)}$|⁠, and a miRNA–disease pair under |$r$| category |$(\mathbf{x},r,\mathbf{y})$|is scored as

$$\begin{equation} {\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}\left(\mathbf{x},r,\mathbf{y}\right)={\mathbf{x}}^{\top }{\mathbf{D}}^{(r)}\mathbf{y} \end{equation}$$

(19)

We observe that the diagonal matrix |${\mathbf{D}}^{(r)}$| in the DistMult decoder only captures the interactions between miRNAs and diseases under the specific |$r$| category. However, as shown in Supplementary Table 2 there may be associations across different categories. To this end, we extend |${\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}$| by incorporating a trainable parameter matrix|$\mathbf{G}$| into it and propose the following Linear Multi-Relational decoder.

LMR-decoder (denoted by |${\mathrm{Dec}}_{\mathrm{LMR}}^{(r)}$|⁠) is defined as follows.

$$\begin{equation} {\mathrm{Dec}}_{\mathrm{LMR}}^{(r)}\left(\mathbf{x},r,\mathbf{y}\right)={\mathbf{x}}^{\top }{\mathbf{D}}^{(r)}\mathbf{G}{\mathbf{D}}^{(r)}\mathbf{y} \end{equation}$$

(20)

where |$\mathbf{G}\in{\mathbb{R}}^{d\times d}$| is a parameter matrix which captures global interactions of the latent features of miRNAs and diseases across different categories. |${\mathbf{D}}^{(r)}\in{\mathbb{R}}^{d\times d}$| is still a trainable diagonal matrix which captures the importance of each dimension in latent representations toward |$r$| category association.

Both |${\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}$| and |${\mathrm{Dec}}_{\mathrm{LMR}}^{(r)}$| are bilinear decoders. Inspired by the idea of NIMCGCN [25], we proposed a novel method of neural multicategory association score model to capture the deeper and nonlinear interactions between the latent features of miRNAs and diseases.

NMR-decoder (denoted by |${\mathrm{Dec}}_{\mathrm{NMR}}^{(r)}$|⁠) represents a Neural Multi-Relational decoder. In the following, we will take miRNA as an example to show the idea of |${\mathrm{Dec}}_{\mathrm{NMR}}^{(r)}$|⁠. The same idea can be applied to diseases.

With GCN-output (or RGCN-output) feature |$\mathbf{X}$| as input, we establish a |$K$|-layer feedforward neural network |${\varphi}_m^{(r;K)}$|to further transform the features of miRNA for each category|$r$|⁠. Specifically,

$$\begin{equation} {\mathbf{X}}^{\left(r;K\right)}={\varphi}_m^{\left(r;K\right)}\left(\mathbf{X}\right) \end{equation}$$

(21)

A nonlinear transformation from |$k$|-layer to |$(k+1)$|-layer in |${\varphi}_m^{(r;K)}$|is defined as |${\mathbf{X}}^{(r;k+1)}=\sigma \left({\mathbf{W}}_m^{(r;k)}{\mathbf{X}}^{(r;k)}+{\mathbf{b}}_m^{(r;k)}\right)$| where |${\mathbf{X}}^{(r;k)}$| is the feature matrix at k-layer, |${\mathbf{W}}_m^{(r;k)}$| is the transformation parameter matrix and |${\mathbf{b}}_m^{(r;k)}$| is the bias vector. |$\sigma (\cdot )$| is a nonlinear active function. We denote all trainable parameters by |${\boldsymbol{\Psi}}_m=\{{\boldsymbol{\Psi}}_m^{(1)},\dots{\boldsymbol{\Psi}}_m^{(r)},\dots, {\boldsymbol{\Psi}}_m^{(R)}\}$| where |${\boldsymbol{\Psi}}_m^{(r)}=\{{\mathbf{W}}_m^{(r;0)},\dots, {\mathbf{W}}_m^{(r;K)},{\mathbf{b}}_m^{(r;0)},\dots, {\mathbf{b}}_m^{(r;K)}\}$| is the parameters involved in |${\varphi}_m^{(r;K)}$|⁠.

However, the above category-specific neural networks cannot capture the mutual interactions of latent features across different categories. To solve this problem, we establish a global |$H$|-layer feedforward neural network |${\psi}_m^{(H)}$| after category-specific neural networks to further capture the interactions of miRNA latent features across all possible different categories. Specifically,

$$\begin{equation} {\hat{\mathbf{X}}}^{(r)}={\psi}_m^{(H)}\left({\mathbf{X}}^{\left(r;K\right)}\right) \end{equation}$$

(22)

where |${\mathbf{X}}^{(r;K)}$| is the feature matrix output by the r-th category-specific neural network. |${\mathbf{W}}_m=\{{\mathbf{W}}_m^{(1)},\dots, {\mathbf{W}}_m^{(H)}\}$|and |${\mathbf{b}}_m=\{{\mathbf{b}}_m^{(1)},\dots, {\mathbf{b}}_m^{(H)}\}$| are the global trainable parameter matrix and bias vector shared by different categories of input features.

After obtaining |${\hat{\mathbf{X}}}^{(r)}$| for all miRNAs and |${\hat{\mathbf{Y}}}^{(r)}$| for all diseases, the association prediction scores matrix for |$r$|-category |${\hat{\mathbf{T}}}^{(r)}$| is the dot product of |${\hat{\mathbf{X}}}^{(r)}$| and |${\hat{\mathbf{Y}}}^{(r)}$|⁠, i.e., |${\hat{\mathbf{T}}}^{(r)}={{{\hat{\mathbf{X}}^{(r)}}}}{}^{\top }{\hat{\mathbf{Y}}}^{(r)}$|⁠.

$$\begin{equation} {\hat{\mathbf{T}}}^{(r)}={{{{\hat{\mathbf{X}}}^{(r)}}}}{}^{\top }{\hat{\mathbf{Y}}}^{(r)} \end{equation}$$

(23)

Totally, we can integrate the aforementioned components into a unified prediction model and |${\hat{\mathbf{T}}}^{(r)}$| can be calculated by the following dot product.

$$\begin{equation} {\hat{\mathbf{T}}}^{(r)}={\psi}_m^{(H)}{\left({\varphi}_m^{\left(r;K\right)}\left(\mathbf{X}\right)\right)}^{\top }{\psi}_d^{(H)}\left({\varphi}_d^{\left(r;K\right)}\left(\mathbf{Y}\right)\right) \end{equation}$$

(24)

End-to-end learning models

The various encoders and decoders described in the previous subsection can be combined into specific MCMDA prediction models. For example, when using RGCN as encoder and NMR as decoder, we get NMR-RGCN. Similarly, we also have NMR-GCN, LMR-RGCN and so on. Next, we introduce a general loss function which can be used as the loss function of different encoder–decoder combinations.

Now, we present the details of the loss function of MCMDA defined in Equation (12). Specifically, the partial observed association tensor|$\mathbf{\mathcal{T}}$| can be described as a set of different categories of association matrices, i.e. |$\mathbf{\mathcal{T}}=[{\mathbf{T}}^{(1)},{\mathbf{T}}^{(2)},\dots, {\mathbf{T}}^{(R)}]$| where |${\mathbf{T}}^{(r)}\in{\{0,1\}}^{m\times n}$| (⁠|$r=1,..,R$|⁠) is the experimentally verified |$r$|-category miRNA–disease association matrix. For |$r$|-category, |${\Omega}^{(r)}$| and |${\overline{\Omega}}^{(r)}$| are used to denote the set of observed and unobserved or unknown miRNA–disease entries from the known association matrix|${\mathbf{T}}^{(r)}$|⁠. The observation |${\Omega}^{(r)}$| consisted only of positive associations, i.e. if |$\forall (i,j)\in{\Omega}^{(r)}$|⁠, |${\mathbf{T}}^{(r)}(i,j)=1$|⁠. |${\overline{\Omega}}^{(r)}$| is the set of unknown or unobserved entries if |$\forall (i,j)\in{\overline{\Omega}}^{(r)}$|⁠, |${\mathbf{T}}^{(r)}(i,j)=0$|⁠. With these notations, the MCMDA defined in Equation (12) can be reformulated as follows:

$$\begin{eqnarray}&&\hskip-6pt {\min}_{\mathbf{\mathcal{W}}}\frac{\left(1-\alpha \right)}{2}{\sum}_{r=1}^R{\left\Vert{P}_{\Omega}\left({\mathbf{T}}^{(r)}-{\hat{\mathbf{T}}}^{(r)}\left(\mathbf{X},\mathbf{Y};\mathbf{\mathcal{W}}\right)\right)\right\Vert}_{\mathrm{F}}^2+\frac{\alpha }{2}{\sum}_{r=1}^R\nonumber\\ &&\hskip-6pt\times\,{\left\Vert{P}_{\overline{\Omega}}\left({\mathbf{T}}^{(r)}-{\hat{\mathbf{T}}}^{(r)}\left(\mathbf{X},\mathbf{Y};\mathbf{\mathcal{W}}\right)\right)\right\Vert}_{\mathrm{F}}^2+\lambda{\left\Vert \mathbf{\mathcal{W}}\right\Vert}^2 \end{eqnarray}$$

(25)

where |${\hat{\mathbf{T}}}^{(r)}$| is calculated by Equation (24), |$\mathbf{\mathcal{W}}$| indicates all trainable parameters, which includes encoder and decoder parameters.

It is worth mentioning that an encoder and a decoder are integrated into a unified end-to-end neural network learning framework. Specifically, GNN encoder is first leveraged to learn miRNA and disease features over a miRNA–disease heterogeneous information network, respectively. Then, decoder receives the learned latent features to take further transformations. The final prediction scores are obtained through the dot product of the transformed features. All parameters |$\mathbf{\mathcal{W}}$| involved in encoders and decoders are simultaneously optimized via a gradient descent with adaptive moment estimation [42]. Figure 2 demonstrates the flowchart of NMR-RGCN.

Figure 2

The flowchart of the NMR-RGCN, which is a specific model of Neural Multi-Category MiRNA-Disease Association prediction, i.e. NMCMDA with Neural Multi-Relational decoder and Relational Graph Convolutional Network encoder. NMR-RGCN operates directly on the miRNA–disease heterogeneous network and leverages RGCN to learn miRNA and disease latent representations, respectively. Then, neural multirelational decoder yields miRNA–disease association scores with the learned latent representations as input. The representation learning encoder and the prediction score decoder are integrated into a unified end-to-end neural network learning framework.

Results and discussion

Experimental setup

The experimental code is implemented based on the open-source machine learning framework Pytorch (https://pytorch.org). Graph neural network encoders are implemented based on the open-source deep learning on graph library (https://www.dgl.ai/). All experiments are carried on Windows 10 operation system with a Dell Precision T5820 workstation computer of an intel W-2145 8 cores, 3.7GHz CPU and 64G memory.

In this study, two following evaluation settings are setup.

(1) |${\mathbf{CV}}_{\mathbf{triplet}}{:}$| We randomly split all experimentally verified miRNA–disease-category triplets (as positive samples) into 10 subsets. In each fold, one subset and an equal-sized set of randomly sampled unknown triplets (as negative samples) as a testing set, the remaining subsets and an equal-sized set of randomly sampled unknown samples were used as a training set. Note that we were extremely careful to ensure the train and test sets did not include each other. The area under the precision-recall (AUPR) curve and the area under the receiver operating characteristic (AUC) curve were used to evaluate the prediction performance of all prediction methods.

(2) |${\mathbf{CV}}_{\mathbf{category}}{:}$| Every miRNA–disease pair is connected through zero, one or more relation types in our modeling problem. In this case, we randomly split all miRNA–disease pairs which are connected with not less than one type of association into ten subsets. In each fold, one subset as the testing set in turn, and the rest subsets as a training set. Each miRNA–disease pair in the test set is ranked under all association categories according to the predicted score. Then we consider the category with the highest score as the model prediction result for the test sample and calculate the precision (Top-1), recall (Top-1), f1-score (Top-1).

Comparing with the binary miRNA–disease association prediction, we pay more attention to the prediction of the associated category in the problem of miRNA–disease multirelational prediction which is exactly the test goal of |${\mathbf{CV}}_{\mathbf{category}}$|⁠, so we regard |${\mathbf{CV}}_{\mathbf{category}}$| as the primary experimental setting.

Performance analysis of various encoder–decoder combinations

Our proposed model is totally based on two components, i.e. encoders and decoders. Various encoders are presented to obtain the feature representations of input data. With the input of encoded features, several decoders are presented to produce multiple categories of association scores.

In this subsection, we conduct extensive experiments on MCD-6 and MCD-20 datasets to systematically compare the performances of NMR-RGCN (NMR as a decoder and RGCN as an encoder, the meanings of other notations are similar), DistMult-RGCN, LMR-RGCN, NMR-GCN. The compared results are shown in Tables 2 and 3.

Table 2

Experiment results of the various encoder–decoder combinations on MCD-6 dataset

MCD-6 dataset	CV_triplet		CV_category
MCD-6 dataset	AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
NMR-GCN	0.9467	0.9494	0.5116	0.3690	0.4288
DistMult-RGCN	0.9423	0.9371	0.4059	0.2912	0.3391
LMR-RGCN	0.9525	0.9501	0.5232	0.3722	0.4350
NMR-RGCN	0.9533	0.9521	0.5522	0.3967	0.4617

MCD-6 dataset	CV_triplet		CV_category
MCD-6 dataset	AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
NMR-GCN	0.9467	0.9494	0.5116	0.3690	0.4288
DistMult-RGCN	0.9423	0.9371	0.4059	0.2912	0.3391
LMR-RGCN	0.9525	0.9501	0.5232	0.3722	0.4350
NMR-RGCN	0.9533	0.9521	0.5522	0.3967	0.4617

Table 2

Experiment results of the various encoder–decoder combinations on MCD-6 dataset

MCD-6 dataset	CV_triplet		CV_category
MCD-6 dataset	AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
NMR-GCN	0.9467	0.9494	0.5116	0.3690	0.4288
DistMult-RGCN	0.9423	0.9371	0.4059	0.2912	0.3391
LMR-RGCN	0.9525	0.9501	0.5232	0.3722	0.4350
NMR-RGCN	0.9533	0.9521	0.5522	0.3967	0.4617

MCD-6 dataset	CV_triplet		CV_category
MCD-6 dataset	AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
NMR-GCN	0.9467	0.9494	0.5116	0.3690	0.4288
DistMult-RGCN	0.9423	0.9371	0.4059	0.2912	0.3391
LMR-RGCN	0.9525	0.9501	0.5232	0.3722	0.4350
NMR-RGCN	0.9533	0.9521	0.5522	0.3967	0.4617

Table 3

Experiment results of the various encoder–decoder combinations on MCD-20 dataset

MCD-20 dataset	CV_triplet		CV_category
MCD-20 dataset	AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
NMR-GCN	0.9419	0.9370	0.3593	0.2565	0.2933
DistMult-RGCN	0.9425	0.9374	0.1961	0.0516	0.0817
LMR-RGCN	0.9479	0.9505	0.3666	0.2729	0.3129
NMR-RGCN	0.9480	0.9527	0.4230	0.2893	0.3436

MCD-20 dataset	CV_triplet		CV_category
MCD-20 dataset	AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
NMR-GCN	0.9419	0.9370	0.3593	0.2565	0.2933
DistMult-RGCN	0.9425	0.9374	0.1961	0.0516	0.0817
LMR-RGCN	0.9479	0.9505	0.3666	0.2729	0.3129
NMR-RGCN	0.9480	0.9527	0.4230	0.2893	0.3436

Table 3

Experiment results of the various encoder–decoder combinations on MCD-20 dataset

MCD-20 dataset	CV_triplet		CV_category
MCD-20 dataset	AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
NMR-GCN	0.9419	0.9370	0.3593	0.2565	0.2933
DistMult-RGCN	0.9425	0.9374	0.1961	0.0516	0.0817
LMR-RGCN	0.9479	0.9505	0.3666	0.2729	0.3129
NMR-RGCN	0.9480	0.9527	0.4230	0.2893	0.3436

MCD-20 dataset	CV_triplet		CV_category
MCD-20 dataset	AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
NMR-GCN	0.9419	0.9370	0.3593	0.2565	0.2933
DistMult-RGCN	0.9425	0.9374	0.1961	0.0516	0.0817
LMR-RGCN	0.9479	0.9505	0.3666	0.2729	0.3129
NMR-RGCN	0.9480	0.9527	0.4230	0.2893	0.3436

$The effect of the biased item $\boldsymbol{\alpha}$ in the loss function on the performance of NMR-RGCN.$

Figure 3

The effect of the biased item |$\boldsymbol{\alpha}$| in the loss function on the performance of NMR-RGCN.

$The effect of the layer number of RGCN encoders $\mathbf{L}$ on the performance of NMR-RGCN.$

Figure 4

The effect of the layer number of RGCN encoders |$\mathbf{L}$| on the performance of NMR-RGCN.

Table 4

The effect of the number of neural multicategory layer on the performance of NMR-RGCN

Top-1 P

Top-1 R

Top-1 F1

(⁠|$K=1,\mathrm{LMR}$|⁠) + (⁠|$H=1,\mathrm{LMR}$|⁠)

0.3854

0.3015

0.3383

(⁠|$K=1,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)

0.5173

0.3841

0.4409

(⁠|$K=1,\mathrm{NMR}$|⁠)

0.3501

0.2748

0.3079

(⁠|$K=2,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)

0.5452

0.3998

0.4613

(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)

0.5531

0.4059

0.4682

(⁠|$K=4,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)

0.5520

0.4043

0.4667

(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=2,\mathrm{NMR}$|⁠)

0.5576

0.4071

0.4706

(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=3,\mathrm{NMR}$|⁠)

0.5569

0.4063

0.4698

The empirical results in Tables 2 and 3 show the effect of different encoders and decoders on prediction performance. Specifically, RGCN is superior to GCN because it exploits more link information to obtain latent representations. In addition, when using RGCN as encoder, LMR is better than DistMult since LMR learns to captures global interactions of the latent features of miRNAs and diseases across different categories. Furthermore, NMR is superior to LMR since NMR extends LMR to a nonlinear neural network framework.

Since NMR-RGCN has overperformed other encoder–decoder combinations especially under |${\mathbf{CV}}_{\mathbf{category}}$|⁠, in the following paper, we consider NMR-RGCN as the main prediction model and discuss the influence of different parameters on the performance of NMR-RGCN and compare NMR-RGCN with other baselines.

Table 5

Comparisons with existing work

		CV_triplet		CV_category
		AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
TDRC v3.2 dataset	NLPMMDA [34]	0.6564	0.7581	0.1844	0.1380	0.1579
	TDRC [35]	0.9284	0.9201	0.6178	0.4741	0.5365
	NMR-GCN (ours)	0.9224	0.9219	0.6278	0.4855	0.5476
	NMR-RGCN (ours)	0.9279	0.9257	0.6334	0.4891	0.5520
TDRC v2.0 dataset	NLPMMDA [34]	0.6610	0.7635	0.4397	0.3919	0.4144
	TDRC [35]	0.8663	0.8379	0.5609	0.4999	0.5286
	NMR-GCN (ours)	0.8882	0.8849	0.7144	0.6278	0.6683
	NMR-RGCN (ours)	0.8981	0.8889	0.7254	0.6330	0.6761
MCD-6 dataset	TDRC [35]	0.9446	0.9377	0.4067	0.2953	0.3422
	NMR-GCN (ours)	0.9467	0.9494	0.5116	0.3690	0.4288
	NMR-RGCN(ours)	0.9533	0.9521	0.5522	0.3967	0.4617

		CV_triplet		CV_category
		AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
TDRC v3.2 dataset	NLPMMDA [34]	0.6564	0.7581	0.1844	0.1380	0.1579
	TDRC [35]	0.9284	0.9201	0.6178	0.4741	0.5365
	NMR-GCN (ours)	0.9224	0.9219	0.6278	0.4855	0.5476
	NMR-RGCN (ours)	0.9279	0.9257	0.6334	0.4891	0.5520
TDRC v2.0 dataset	NLPMMDA [34]	0.6610	0.7635	0.4397	0.3919	0.4144
	TDRC [35]	0.8663	0.8379	0.5609	0.4999	0.5286
	NMR-GCN (ours)	0.8882	0.8849	0.7144	0.6278	0.6683
	NMR-RGCN (ours)	0.8981	0.8889	0.7254	0.6330	0.6761
MCD-6 dataset	TDRC [35]	0.9446	0.9377	0.4067	0.2953	0.3422
	NMR-GCN (ours)	0.9467	0.9494	0.5116	0.3690	0.4288
	NMR-RGCN(ours)	0.9533	0.9521	0.5522	0.3967	0.4617

Table 5

Comparisons with existing work

		CV_triplet		CV_category
		AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
TDRC v3.2 dataset	NLPMMDA [34]	0.6564	0.7581	0.1844	0.1380	0.1579
	TDRC [35]	0.9284	0.9201	0.6178	0.4741	0.5365
	NMR-GCN (ours)	0.9224	0.9219	0.6278	0.4855	0.5476
	NMR-RGCN (ours)	0.9279	0.9257	0.6334	0.4891	0.5520
TDRC v2.0 dataset	NLPMMDA [34]	0.6610	0.7635	0.4397	0.3919	0.4144
	TDRC [35]	0.8663	0.8379	0.5609	0.4999	0.5286
	NMR-GCN (ours)	0.8882	0.8849	0.7144	0.6278	0.6683
	NMR-RGCN (ours)	0.8981	0.8889	0.7254	0.6330	0.6761
MCD-6 dataset	TDRC [35]	0.9446	0.9377	0.4067	0.2953	0.3422
	NMR-GCN (ours)	0.9467	0.9494	0.5116	0.3690	0.4288
	NMR-RGCN(ours)	0.9533	0.9521	0.5522	0.3967	0.4617

		CV_triplet		CV_category
		AUPR	AUC	Top-1 P	Top-1 R	Top-1 F1
TDRC v3.2 dataset	NLPMMDA [34]	0.6564	0.7581	0.1844	0.1380	0.1579
	TDRC [35]	0.9284	0.9201	0.6178	0.4741	0.5365
	NMR-GCN (ours)	0.9224	0.9219	0.6278	0.4855	0.5476
	NMR-RGCN (ours)	0.9279	0.9257	0.6334	0.4891	0.5520
TDRC v2.0 dataset	NLPMMDA [34]	0.6610	0.7635	0.4397	0.3919	0.4144
	TDRC [35]	0.8663	0.8379	0.5609	0.4999	0.5286
	NMR-GCN (ours)	0.8882	0.8849	0.7144	0.6278	0.6683
	NMR-RGCN (ours)	0.8981	0.8889	0.7254	0.6330	0.6761
MCD-6 dataset	TDRC [35]	0.9446	0.9377	0.4067	0.2953	0.3422
	NMR-GCN (ours)	0.9467	0.9494	0.5116	0.3690	0.4288
	NMR-RGCN(ours)	0.9533	0.9521	0.5522	0.3967	0.4617

Analysis of parameters

The following parameters will significantly affect the performance of NMR-RGCN: (i)|$\alpha$|⁠: the biased item in the loss function defined by Equation (25); (ii)|$L$|⁠: the layer number of RGCN encoders; (iii)|$K$|⁠: the layer number of the category-specific neural networks defined by Equation (21) and (iv)|$H$|⁠: the layer number of the global neural networks across all possible different categories defined by Equation (22). We performed experiments on MCD-6 dataset with the metrics of Top-1 precision, Top-1 Recall and Top-1 F1 to evaluate the effect of these parameters on the performance of NMR-RGCN.

First, the biased item α is introduced to appropriately weigh observed and unobserved entries. The loss function is optimized only using positive samples when α = 0 and only using unobserved samples when α = 1. In this experiment, we fix|$L=3$|⁠, |$K=3$| and|$H=2$|⁠. Figure 3 shows the effects of different α on the prediction performance of NMR-RGCN. The performance when α = 0.2 is superior to the performances when α is set to be other values.

Second, we analyze the effect of |$L$| to the prediction performance. In this experiment, we fix|$\alpha =0.2$|⁠, |$K=3$| and|$H=2$|⁠. The comparative results are shown in Figure 4. From the results, we see that NMR-RGCN with |$L=3$| provides the best performance. Additionally, we also note that with L increasing (⁠|$L>3$|⁠), the performance of RGCN encoders slightly decreases as higher graph convolutional layers oversmoothed the encoded embeddings.

Finally, as described in the subsection Decoders score multicategory miRNA–disease association, the neural multirelational decoder consists of the local category-specific neural networks and the global neural networks across all categories. In the experiments, we fix |$\alpha =0.2$| and |$L=3$|⁠. We test the following situations:

(1) Local linear decoder + global linear decoder. In this case, we only consider the linear local and global decoders, i.e. |$K=1$|⁠, |$H=1$| and remove the nonlinear activation function. The results show that the performance of the linear decoder is far worse than that of the neural decoder.

(2) Local neural decoder. We remove the global decoder and only consider the local neural decoders. From the experimental results in Table 4, we found that the prediction performance of the model decreases when we removed the global decoder. It indicates that the global decoder is necessary for NMR-RGCN to improve the prediction performance.

(3) Local neural decoder + global neural decoder. We test the performance of several combinations of local neural decoders and global neural decoders. From the results in Table 4, we found that the prediction performance improved as the number of local neural decoder layers increases. However, when |$K>3$|⁠, the prediction performance decreases slightly. Similarly, for the global neural projection, when |$H\le 2$|⁠, the prediction performance improved as the number of layers increases. Note that the NMR decoder with |$K=3$| and |$H=2$| achieves the best performance.

Comparisons with existing work

Based on the above evaluation experimental results in the previous subsection, we use α = 0.2 |$, L=3,K=3$| and |$H=2$| as experimental settings for NMR-RGCN and α = 0.2|$, L=1, K=3$| and |$H=2$| for NMR-GCN in the comparison experiments. The learning rate of Adaptive Moment Estimation is 9e-4 and the regulation coefficient is 1e-8.

We have not found other approaches developed for predicting multiple categories of miRNA–disease associations except for only three methods [33–35]. Here, we compare NMR-RGCN (our method) with two methods: NLPMMDA [34], TDRC [35]. NLPMMDA predicted multirelational miRNA–disease associations by label propagation which integrates miRNA similarity and disease similarity. There are two hyperparameters in NLPMMDA, we set|${\lambda}_m=0.2$|⁠, |${\lambda}_d=0.2$|⁠, respectively as the original paper to get a best performance. TDRC introduced tensor decomposition with miRNA functional similarity and disease semantic similarity as relational constraints to solve the multiple types of miRNA–disease association prediction. There are four hyperparameters in TDRC, we set |$\lambda =0.001$|⁠, |$\gamma =4$|⁠, |$\alpha =0.125$| and |$\beta =0.25$|⁠, respectively as the original paper to get a best performance. All compared experiments were carried on TDRC v3.2 dataset, TDRC v2.0 dataset and MCD-6 dataset, respectively.

We summarize the experimental comparison results of our proposed model and baselines on three datasets in Table 5. All experiments are evaluated under the two aforementioned experimental settings. We start by comparing the results in TDRC v2.0 and TDRC v3.2 datasets. We found that the improvements of NMR-RGCN, NMR-GCN and TDRC are particularly obvious compared with NLPMMDA, which highlights the importance of considering the relationship between each type of association. Then, we further compared NMR-RGCN with TDRC and NMR-GCN on the MCD-6 dataset. We observed that NMR-RGCN achieves relative performance gains over TDRC by 35.78% in terms of Top-1 precision and 1.51% in terms of AUC. This shows that capturing the nonlinear relationship between multiple types of miRNAs and diseases can enable us to obtain better results, because TDRC is essentially a multilinear method and may not be enough to capture the complex and nonlinear interactions between the features of miRNAs and diseases. When compared with NMR-GCN, the average relative Top-1 precision of NMR-RGCN is improved by 7.94%, because the proposed NMR-RGCN also takes into account heterogeneous information from other sources in the heterogeneous network when encoding node information.

Case studies

To further evaluate the accuracy of our proposed model for predicting unobserved miRNA–disease-type entries, we conduct case studies on two widespread human diseases, i.e. breast cancer and lung cancer (Supplementary Tables 3 and 4). We prioritized candidate miRNA-type pairs for a specific disease using the trained model based on MCD-4 dataset (all known HMDD v2.0 data), and then we verified the top-50 predictions with HMDD v3.2 dataset and recent literature. We also trained RGCN-NMR model based on MCD-6 dataset (all known HMDD v3.2 data), and then prioritized all unknown miRNA–disease-type entries using their predicted scores. Table 6 shows the top-10 predicted results and seven predictions can be confirmed according to recent literature. The results prove that our proposed model is effective.

Table 6

The prediction and validation of top-10 miRNA–disease-category associations

MiRNA	Disease	Categories	Evidence(PMID)
hsa-mir-34c	Carcinoma, hepatocellular	Epigenetics	27165229
hsa-mir-34a	Carcinoma, hepatocellular	Epigenetics	27165229
hsa-mir-21	Multiple sclerosis	Other	Unconfirmed
hsa-mir-196a-2	Ovarian neoplasms	Genetics	30930933
hsa-mir-126	Breast neoplasms	Circulation	20801493
hsa-mir-21	Leukemia, lymphoblastic, acute	Circulation	Unconfirmed
hsa-mir-21	Alzheimer disease	Circulation	31592314
hsa-mir-146a	Osteosarcoma	Genetics	Unconfirmed
hsa-mir-155	Bladder neoplasms	Tissue	31298320
hsa-mir-150	Breast neoplasms	Circulation	31963351

MiRNA	Disease	Categories	Evidence(PMID)
hsa-mir-34c	Carcinoma, hepatocellular	Epigenetics	27165229
hsa-mir-34a	Carcinoma, hepatocellular	Epigenetics	27165229
hsa-mir-21	Multiple sclerosis	Other	Unconfirmed
hsa-mir-196a-2	Ovarian neoplasms	Genetics	30930933
hsa-mir-126	Breast neoplasms	Circulation	20801493
hsa-mir-21	Leukemia, lymphoblastic, acute	Circulation	Unconfirmed
hsa-mir-21	Alzheimer disease	Circulation	31592314
hsa-mir-146a	Osteosarcoma	Genetics	Unconfirmed
hsa-mir-155	Bladder neoplasms	Tissue	31298320
hsa-mir-150	Breast neoplasms	Circulation	31963351

Table 6

The prediction and validation of top-10 miRNA–disease-category associations

MiRNA	Disease	Categories	Evidence(PMID)
hsa-mir-34c	Carcinoma, hepatocellular	Epigenetics	27165229
hsa-mir-34a	Carcinoma, hepatocellular	Epigenetics	27165229
hsa-mir-21	Multiple sclerosis	Other	Unconfirmed
hsa-mir-196a-2	Ovarian neoplasms	Genetics	30930933
hsa-mir-126	Breast neoplasms	Circulation	20801493
hsa-mir-21	Leukemia, lymphoblastic, acute	Circulation	Unconfirmed
hsa-mir-21	Alzheimer disease	Circulation	31592314
hsa-mir-146a	Osteosarcoma	Genetics	Unconfirmed
hsa-mir-155	Bladder neoplasms	Tissue	31298320
hsa-mir-150	Breast neoplasms	Circulation	31963351

MiRNA	Disease	Categories	Evidence(PMID)
hsa-mir-34c	Carcinoma, hepatocellular	Epigenetics	27165229
hsa-mir-34a	Carcinoma, hepatocellular	Epigenetics	27165229
hsa-mir-21	Multiple sclerosis	Other	Unconfirmed
hsa-mir-196a-2	Ovarian neoplasms	Genetics	30930933
hsa-mir-126	Breast neoplasms	Circulation	20801493
hsa-mir-21	Leukemia, lymphoblastic, acute	Circulation	Unconfirmed
hsa-mir-21	Alzheimer disease	Circulation	31592314
hsa-mir-146a	Osteosarcoma	Genetics	Unconfirmed
hsa-mir-155	Bladder neoplasms	Tissue	31298320
hsa-mir-150	Breast neoplasms	Circulation	31963351

Conclusion

Identification of potential multicategory miRNA–disease associations using computational approaches is important as it will improve our understanding of the pathogenesis of diseases and guide treatment. In this study, we develop a novel data-driven end-to-end learning-based method NMCMDA of neural multiple-category miRNA–disease association prediction to deal with the issue of MCMDA. NMCMDA shows excellent performance for the prediction of multicategory miRNA–disease associations and is significantly superior to the state-of-the-art method TDRC [35] in terms of Top-1 precision, Top-1 Recall and Top-1 F1. In terms of our understanding, the advantages of NMCMDA over TDRC [35] may lie in the following two aspects. First, NMCMDA is an end-to-end learning framework where encoder operates directly on the miRNA–disease heterogeneous network and leverages GNN to learn latent representations. Decoder yields miRNA–disease association scores with the learned latent representations as input. All parameters involved in encoders and decoders are simultaneously optimized via a gradient descent. Although TDRC is not an end-to-end learning algorithm which cannot guarantee the latent feature is directly related to prediction objective. Second, TDRC is essentially a multilinear method and may not be enough to capture the complex and nonlinear interactions between the features of miRNAs and diseases. Case studies also provided for two high-risk human diseases breast cancer and lung cancer) and we also provide the prediction and validation of top-10 miRNA–disease-category associations based on all known data of HMDD v3.2, which further validate the effectiveness and feasibility of the proposed method.

There are several directions for future study. First, the structural information regarding miRNA and disease similarity networks significantly affects the learned feature representations, which further affects the final prediction results. Other sources of biomedical information, such as miRNA–gene interactions and disease–gene interactions, etc., might be relevant for modeling miRNA–disease associations, and we hope to investigate the utility of integrating them into the model. As our proposed NMCMDA is a general approach for multirelational link prediction in any heterogeneous network, it would be interesting to apply it to other domains and problems, for example, the identification of multirelational drug–drug interactions [43].

Key Points

Present a novel data-driven end-to-end learning-based method of neural multiple-category miRNA–disease association prediction (NMCMDA) for predicting multiple-category miRNA–disease associations.
NMCMDA with the encoder of Relational Graph Convolutional Network and the neural multirelational decoder (NMR-RGCN) achieves the best prediction performance.
The experimental results show that the NMR-RGCN is significantly superior to the state-of-the-art method TDRC in terms of Top-1 precision, Top-1 Recall, and Top-1 F1.

Acknowledgements

We thank anonymous reviewers for valuable suggestions.

Funding

National Natural Science Foundation of China (U1802271), the Science Foundation for Distinguished Young Scholars of Yunnan Province (2019FJ011), the Fundamental Research Project of Yunnan Province (201901BB050052).

Jingru Wang is a postgraduate student at Yunnan University. Her research focuses on bioinformatics and machine learning.

Jin Li is currently a professor at the School of Software, Yunnan University. His research interests include bioinformatics and machine learning.

Kun Yue is currently a professor at the School of Information, Yunnan University. His research interests include data and knowledge engineering.

Li Wang is a postgraduate student at Yunnan University. Her research focuses on bioinformatics and graph data analysis.

Yuyun Ma is a postgraduate student at Yunnan University. Her research focuses on graph neural network.

Qing Li is an associate chief physician at the First Affiliated Hospital of Kunming Medical University. His research focuses on pathology.

References

1.

Ambros

V

.

The functions of animal microRNAs

.

Nature

2004

;

431

:

350

–

5

.

2.

Kapranov

P

,

Cheng

J

,

Dike

S

, et al.

RNA maps reveal new RNA classes and a possible function for pervasive transcription

.

Science

2007

;

316

:

1484

–

8

.

3.

Taft

RJ

,

Pang

KC

,

Mercer

TR

, et al.

Non-coding RNAs: regulators of disease

.

J Pathol

2010

;

220

:

126

–

39

.

4.

Bandyopadhyay

S

,

Mitra

R

,

Maulik

U

, et al.

Development of the human cancer microRNA network

.

Silence

2010

;

1

:

6

.

5.

Zhang

B

,

Wang

Q

,

Pan

X

.

MicroRNAs and their regulatory roles in animals and plants

.

J Cell Physiol

2007

;

210

:

279

–

89

.

6.

Wang

WX

,

Rajeev

BW

,

Stromberg

AJ

, et al.

The expression of microRNA mi R-107 decreases early in Alzheimer’s disease and may accelerate disease progression through regulation of beta-site amyloid precursor protein-cleaving enzyme 1

.

Neurobiol Dis

2008

;

28

:

1213

–

23

.

7.

Chen

X

,

Yan

CC

,

Zhang

X

, et al.

Long non-coding RNAs and complex disease: from experimental results to computational models

.

Brief Bioinform

2017

;

18

:

558

–

76

.

PubMed

8.

Chen

X

,

Sun

YZ

,

Zhang

DH

, et al.

NRDTD: a database for clinically or experimentally supported non-coding RNAs and drug targets associations

.

Database

2017

;

2017

:

bax057

.

9.

Chen

X

,

Huang

L

.

LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction

.

PLoS Comput Biol

2017

;

13

:

e1005912

.

10.

Chen

X

,

Xie

D

,

Zhao

Q

, et al.

MicroRNAs and complex diseases: from experimental results to computational models

.

Brief Bioinform

2019

;

20

:

515

–

39

.

11.

Goh

KI

,

Cusick

ME

,

Valle

D

, et al.

The human disease network

.

Proc Natl Acad Sci USA

2007

;

104

:

8685

–

90

.

12.

Huang

Z

,

Liu

L

,

Gao

Y

, et al.

Benchmark of computational methods for predicting microRNA-disease associations

.

Genome Biol

2019

;

20

:

202

.

13.

You

ZH

,

Huang

ZA

,

Zhu

Z

, et al.

PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction

.

PLoS Comput Boil

2017

;

13

:

e1005455

.

10.1109/TCBB.2019.2931546

14.

Zhang

X

,

Zou

Q

,

Paton

AR

, et al.

Meta-path methods for prioritizing candidate disease miRNAs

.

IEEE/ACM Trans Comput Biol Bioinform

2019

;

16

:

283

–

91

.

15.

Li

G

,

Luo

J

,

Xiao

Q

, et al.

Predicting microRNA-disease associations using label propagation based on linear neighborhood similarity

.

J Biomed Inform

2018

;

82

:

169

–

77

.

16.

Zhang

W

,

Li

Z

,

Guo

W

, et al.

A fast linear neighborhood similarity-based network link inference method to predict microRNA-disease associations

.

IEEE/ACM Trans Comput Biol Bioinform

2019

. doi:

.

17.

Chen

X

,

Huang

L

,

Xie

D

, et al.

EGBMMDA: extreme gradient boosting machine for miRNA-disease association prediction

.

Cell Death Dis

2018

;

9

:

3

.

18.

Chen

X

,

Zhu

CC

,

Yin

J

, et al.

Ensemble of decision tree reveals potential miRNA-disease associations

.

PLoS Comput Biol

2019

;

15

:

e1007209

.

19.

Zhao

Y

,

Chen

X

,

Yin

J

.

Adaptive boosting-based computational model for predicting potential miRNA-disease associations

.

Bioinformatics

2019

;

35

:

4730

–

8

.

20.

Chen

X

,

Wang

L

,

Qu

J

, et al.

Predicting miRNA-disease association based on inductive matrix completion

.

Bioinformatics

2018

;

34

:

4256

–

65

.

PubMed

21.

Cheng

L

,

Yu

S

,

Luo

J

.

Adaptive multi-view multi-label learning for identifying disease-associated candidate miRNAs

.

PLoS Comput Biol

2019

;

15

:

e1006931

.

22.

Peng

J

,

Hui

W

,

Li

Q

, et al.

A learning-based framework for miRNA-disease association identification using neural networks

.

Bioinformatics

2019

;

35

:

4364

–

71

.

23.

Gong

Y

,

Niu

Y

,

Zhang

W

, et al.

A network embedding-based multiple information integration method for the MiRNA-disease association prediction

.

BMC Bioinf

2019

;

20

:

468

.

24.

Li

Z

,

Li

J

,

Nie

R

, et al.

A graph auto-encoder model for miRNA-disease associations prediction

.

Brief Bioinf

2020

;

bbaa240

. doi:

10.1093/bib/bbaa240

.

25.

Li

J

,

Zhang

S

,

Liu

T

, et al.

Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction

.

Bioinformatics

2020

;

36

:

2538

–

46

.

26.

Li

Y

,

Qiu

C

,

Tu

J

, et al.

HMDD v2.0: a database for experimentally supported human microRNA and disease associations

.

Nucleic Acids Res

2014

;

42

:

D1070

–

4

.

27.

Huang

Z

,

Shi

J

,

Gao

Y

, et al.

HMDD v3.0: a database for experimentally supported human microRNA-disease associations

.

Nucleic Acids Res

2019

;

47

:

D1013

–

7

.

28.

Goh

JN

,

Loo

SY

,

Datta

A

, et al.

MicroRNAs in breast cancer: regulatory roles governing the hallmarks of cancer

.

Biol Rev

2015

;

91

:

409

–

28

.

29.

Schrauder

MG

,

Strick

R

,

Schulz-Wendtland

R

, et al.

Circulating micro-RNAs as potential blood-based markers for early stage breast cancer detection

.

PLoS One

2012

;

7

:

e29770

.

30.

Robbins

ME

,

Dakhlallah

D

,

Marsh

CB

, et al.

Of mice and men: correlations between microRNA-17~92 cluster expression and promoter methylation in severe bronchopulmonary dysplasia

.

Am J Physiol Lung Cell Mol Physiol

2016

;

311

:

L981

–

4

.

31.

Zhang

J

,

Liu

J

,

Liu

Y

, et al.

miR-101 represses lung cancer by inhibiting interaction of fibroblasts and cancer cells by down-regulating CXCL12

.

Biomed Pharmacother

2015

;

74

:

215

–

21

.

32.

Yan

F

,

Shen

N

,

Pang

J

, et al.

Restoration of miR-101 suppresses lung tumorigenesis through inhibition of DNMT3a-dependent DNA methylation

.

Cell Death Dis

2014

;

5

:

1413

.

33.

Chen

X

,

Yan

CC

,

Zhang

X

, et al.

RBMMMDA: predicting multiple types of disease-microRNA associations

.

Sci Rep

2015

;

5

:

13877

.

34.

Zhang

X

,

Yin

J

,

Zhang

X

.

A semi-supervised learning algorithm for predicting four types MiRNA-disease associations by mutual information in a heterogeneous network

.

Genes

2018

;

9

:

139

.

35.

Huang

F

,

Yue

X

,

Xiong

Z

, et al.

Tensor decomposition with relational constraints for predicting multiple types of microRNA-disease associations

.

Brief Bioinform

2020

;

bbaa140

. doi:

10.1093/bib/bbaa140

. Online ahead of print.

36.

Lu

M

,

Zhang

Q

,

Deng

M

, et al.

An analysis of human microRNA and disease associations

.

Plos One

2008

;

3

:

e3420

.

37.

Wang

D

,

Wang

J

,

Lu

M

, et al.

Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases

.

Bioinformatics

2010

;

26

:

1644

–

50

.

38.

Chen

M

,

Peng

Y

,

Li

A

, et al.

A novel information diffusion method based on network consistency for identifying disease related microRNAs

.

RSC Adv

2018

;

8

:

36675

–

90

.