Abstract

Motivation

There is growing evidence showing that the dysregulations of miRNAs cause diseases through various kinds of the underlying mechanism. Thus, predicting the multiple-category associations between microRNAs (miRNAs) and diseases plays an important role in investigating the roles of miRNAs in diseases. Moreover, in contrast with traditional biological experiments which are time-consuming and expensive, computational approaches for the prediction of multicategory miRNA–disease associations are time-saving and cost-effective that are highly desired for us.

Results

We present a novel data-driven end-to-end learning-based method of neural multiple-category miRNA–disease association prediction (NMCMDA) for predicting multiple-category miRNA–disease associations. The NMCMDA has two main components: (i) encoder operates directly on the miRNA–disease heterogeneous network and leverages Graph Neural Network to learn miRNA and disease latent representations, respectively. (ii) Decoder yields miRNA–disease association scores with the learned latent representations as input. Various kinds of encoders and decoders are proposed for NMCMDA. Finally, the NMCMDA with the encoder of Relational Graph Convolutional Network and the neural multirelational decoder (NMR-RGCN) achieves the best prediction performance. We compared the NMCMDA with other baselines on three experimental datasets. The experimental results show that the NMR-RGCN is significantly superior to the state-of-the-art method TDRC in terms of Top-1 precision, Top-1 Recall, and Top-1 F1. Additionally, case studies are provided for two high-risk human diseases (namely, breast cancer and lung cancer) and we also provide the prediction and validation of top-10 miRNA–disease-category associations based on all known data of HMDD v3.2, which further validate the effectiveness and feasibility of the proposed method.

Introduction

MiRNAs are a sort of small endogenous noncoding RNAs (21–24 nucleotides in length), which play vital roles in multiple biological processes [1–3]. The dysfunction of miRNAs and their target mRNAs may result in various human diseases [4]. For instance, the expression level of let-7 in lung cancer was markedly reduced, confirming that miRNAs are closely related to the occurrence of tumors [5]; abnormal expression of mir-107 may affect the activity of BACE1 (β-secretase 1) to cause Alzheimer disease [6]. Therefore, the identification of disease-related miRNAs can contribute to the pathological study of diseases and disease biomarker detection [7–11]. As the identification of the associations between miRNAs and diseases using biological experiments is time-consuming and expensive [12], in the last few years many computational methods [13–25] have been developed to determine the potential associations between miRNAs and diseases (hereafter abbreviate MDA).

Most of the existing research works, such as [13–25], mainly focused on the binary association prediction (i.e. only predicting the existence of miRNA–disease association) and indeed achieved good results. However, more and more evidence shows that the dysregulations of miRNAs cause diseases through various kinds of underlying mechanisms [26–29]. For example, epigenetic alterations may lead to abnormal expression of miRNAs to cause diseases: the promoter methylation reduces the expression levels of mir-17 ~ 92 cluster, which results in bronchopulmonary dysplasia [30]; the interactions between miRNAs and their targets are related to many diseases: mir-101 inhibits the interaction between fibroblasts and cancer cells through targeting CXCL12 which affects the proliferation of lung cancer cells [31]. Furthermore, the same miRNA may associate with the same disease through different association mechanism. For instance, mir-101 affects lung cancer via targeting CXCL12 as aforementioned, meanwhile, mir-101 also suppresses lung cancer by inhibiting DNA methylation in lung cancer cells [32]. The specific category of miRNA–disease association cannot be identified with the aforementioned methods. Thus, identifying multiple categories of miRNA–disease associations can not only provide more detailed potential associations between miRNAs and diseases but also further deeper our understanding of the molecular basis of diseases in the level of miRNAs.

In the past years, only a few research works have been devoted to identifying multiple categories of miRNA–disease associations (MCMDA). Chen et al. [33] first study the problem of MCMDA. In their study, the model of Restricted Boltzmann machine for multiple types of miRNA–disease association prediction (RBMMMDA) was developed to predict four different types of miRNA–disease associations. Based on this model, not only new miRNA–disease associations but also corresponding association types could be obtained. Zhang et al. [34] proposed a semisupervised model called the network-based label propagation algorithm (NLPMMDA) is proposed to infer multiple types of miRNA–disease associations by mutual information derived from the heterogeneous network. Note that the aforementioned two research works developed their methods based on the Human MicroRNA Disease Database 2.0 [26] (hereafter abbreviate HMDD v2.0) released in 2013.

The recently released Human MicroRNA Disease Database 3.0 [27] (the latest version is HMDD v3.2) provided six generalized categories of associations, including the miRNA–disease associations from the evidence of genetics, epigenetics, circulation miRNAs, tissue, miRNA-target and other interactions, respectively. These associations cover 20 different categories of detailed association evidence codes. HMDD v3.2 significantly extends HMDD v2.0 in miRNA, disease items, miRNA–disease association entries and provides more association categories information. This provides an opportunity for developing new data-driven MCMDA methods.

Recently, oriented to the HMDD v3.2 dataset, Huang et al. [35] represented the multicategory miRNA–disease associations as a tensor and formulate the multicategory miRNA–disease association prediction as a tensor completion task. They proposed a novel tensor decomposition-based model, named tensor decomposition with relational constraints (TDRC) to solve MCMDA, which incorporates the information of miRNA–miRNA similarity and disease–disease similarity as decomposition constraints. Experimental results show TDRC significantly outperformed the other baselines [33–35].

Despite the effectiveness of TDRC and other methods for MCMDA, there are still some limitations to current research results. First, miRNA–disease association prediction relies on the hypothesis that functionally similar miRNAs are often associated with phenotypically similar diseases and vice versa. Therefore, the similarity of information both for miRNAs and diseases is crucial for accurate prediction. However, the RBMMMDA [33] merely made use of known multiple categories of miRNA–disease associations and did not consider the similarity of information between miRNAs (diseases). This restricts the prediction performance of RBMMMDA. Also, although TDRC [35] takes the information of miRNA–miRNA similarity and disease–disease similarity as decomposition constraints, the technical framework of tensor decomposition cannot easily use miRNA and disease feature information from other sources.

Second, the NLPMMDA [34] treats predicting one category of miRNA–disease association as an independent task and therefore ignores the correlations between association categories. However, different association categories may be highly associated with each other. TDRC [35] uses tensor decomposition to capture the complicated multilinear relationship between miRNAs, diseases and association categories through the tensor multiplications to overcome the aforementioned limitations. However, TDRC is essentially a multilinear method and may not be enough to capture the complex and nonlinear interactions between the features of miRNAs and diseases.

To overcome the aforementioned limitations of current approaches for MCMDA, we present a novel data-driven end-to-end learning-based method of neural multiple-category miRNA–disease association prediction (NMCMDA) to address the problem of MCMDA. The NMCMDA has two main components: Encoder operates directly on the miRNA–disease heterogeneous network and leverages Graph Neural Network to learn miRNA and disease latent representations, respectively. Decoder yields miRNA–disease association scores with the learned latent representations as input. Various kinds of encoders and decoders are proposed in this study. Finally, NMCMDA with the encoder of multiple-relational Graph Convolutional Network and the neural multiple-relational decoder (NMR-RGCN) achieves the best prediction performance. We compared the proposed method with other baselines on three experimental datasets. The experimental results show that the NMR-RGCN is significantly superior to the state-of-the-art method TDRC in terms of Top-1 precision, Top-1 Recall and Top-1 F1. Additionally, case studies are provided for two high-risk human diseases (breast cancer and lung cancer) and we also provide the prediction and validation of top-10 miRNA–disease-category associations based on all known data of HMDD v3.2, which further validate the effectiveness and feasibility of the proposed method.

Materials

The Human MiRNA Disease Database (HMDD) [26, 27] is a database that contains experimentally verified human miRNA–disease associations. Since the first version was released in 2007, HMDD has served as an important data source for the research of miRNA–disease associations. HMDD provides multiple types of evidence for associations between miRNAs and diseases which play important roles in understanding the mechanisms underlying the dysregulations of miRNAs causing diseases. These association data provide an important data basis for the research of data-driven MDA. Besides, other side information, such as miRNA–miRNA similarity and disease–disease similarity, can be used to improve MDA prediction performance.

In what follows, we first introduce the details about multiple categories of associations in HMDD. Then, we discuss the methods of constructing the miRNA–miRNA similarity and disease–disease similarity from the corresponding data sources.

Multiple categories of MiRNA–disease associations

The recently released HMDD v3.2 provides six generalized categories of associations (Genetics, Epigenetics, Target, Circulation, Tissue and Other) covering 20 types of detailed evidence codes. After preprocessing and removing duplications, we finally obtain the following datasets:

  • MCD-6 is from HMDD v3.2, which contains above mentioned 6 categories, including 25 849 associations between 894 diseases and 1208 miRNAs.

  • MCD-20 is also obtained from HMDD v3.2 which contains 20 detailed evidence codes for associations between 894 diseases and 1208 miRNAs.

  • In addition, the following datasets are adopted as experimental datasets to compare the performance with our proposed methods.

  • MCD-4 is obtained from HMDD v2.0 [26] released in 2013. MCD-4 contains four categories (Genetics, Epigenetics, Circulation, and Target) of associations between 324 miRNAs and 169 diseases.

  • TDRC v2.0 and TDRC v3.2 are released and provided by the published paper [35]. TDRC v2.0 contains four-category (Genetics, Epigenetics, Circulation and Target) associations between 169 diseases and 324 miRNAs. TDRC v3.2 contains five-category (Genetics, Epigenetics, Circulation, Target, and Tissue) associations between 447 diseases and 713 miRNAs.

The statistics of these datasets are shown in Table 1.

Table 1

Statistics of the datasets used in this study

Data#d#m#c#m–dSr
MCD-689412086258490.399%
MCD-20894120820258490.120%
MCD-4171385420030.761%
TDRC v2.0169324416750.681%
TDRC v3.24477135163411.025%
Data#d#m#c#m–dSr
MCD-689412086258490.399%
MCD-20894120820258490.120%
MCD-4171385420030.761%
TDRC v2.0169324416750.681%
TDRC v3.24477135163411.025%

#d, disease number; #m, miRNA number; #c, category number; #m–d, association number; Sr, sparsity rate.

Table 1

Statistics of the datasets used in this study

Data#d#m#c#m–dSr
MCD-689412086258490.399%
MCD-20894120820258490.120%
MCD-4171385420030.761%
TDRC v2.0169324416750.681%
TDRC v3.24477135163411.025%
Data#d#m#c#m–dSr
MCD-689412086258490.399%
MCD-20894120820258490.120%
MCD-4171385420030.761%
TDRC v2.0169324416750.681%
TDRC v3.24477135163411.025%

#d, disease number; #m, miRNA number; #c, category number; #m–d, association number; Sr, sparsity rate.

We observe that all miRNA–disease associations in HMDD can be divided into different categories. Supplementary Table 1 gives the statistics of md associations of different categories in MCD-6, MCD-4 and TDRC v3.2, respectively. On the other hand, some miRNA–disease associations may occur simultaneously in different categories. Supplementary Table 2 gives the statistics of disease–miRNA associations appearing simultaneously in different categories. We can see that although many miRNA–disease associations belong to only one category, some still belong to two or more categories. This further increases the challenge of accurate predictions.

MiRNA–miRNA similarity

In this study, miRNA similarity is measured using the miRNA’s functional similarity score and the Gaussian interaction profile kernel similarity score. The miRNA functional similarity between a miRNA|$i$| and |$j$|is defined as follows:
(1)
where |$\mathrm{FS}(i,j)$| is the functional similarity scores download from the MISIM 2.0 database (http://www.lirmed.com/misim/). |$\mathrm{mGS}(i,j)$| is the Gaussian interaction profile kernel similarity score [6, 13, 36, 37], which is used to supplement the missing entries in MISIM. Specifically, |$\mathrm{mGS}(i,j)$| is calculated by follows:
(2)
where |$\mathbf{T}[i,:]$| represents the |$i$|-row in the adjacent matrix |$\mathbf{T}$| and |${\theta}_{\mathrm{m}}$| is the kernel bandwidth parameter which is calculated by the following formula:
(3)
where |$m$| is the number of miRNAs i.e. the row number of |$\mathbf{T}$|⁠.

With |$\mathrm{MS}(i,j)$|⁠, the miRNA functional similarity matrix is denoted by |${\mathbf{A}}_{\mathrm{m}}\in{\mathbb{R}}^{m\times m}$|and constructed by |${[{\mathbf{A}}_{\mathrm{m}}]}_{ij}=\mathrm{MS}(i,j)$|⁠.

Disease–disease similarity

The disease–disease semantic similarity scores were calculated based on the disease hierarchical directed acyclic graph (DAG) from the MeSH database (https://www.nlm.nih.gov/mesh/). First, let|$i$| be a disease. |$\mathrm{dag}(i)$| indicates the node set, including node |$i$| and its ancestor nodes in the disease DAG. Then, the first semantic contribution of a disease |$t$| to the disease |$i$| is denoted by |${\mathrm{SC}}_1(i,t)$| and can be formulated using the following equations [38],
(4)
where |$\gamma$| is a semantic contribution decay factor, which shows that as the distances between disease |$t$| and its ancestor diseases increases, their contribution to the semantic value of disease |$d$| progressively decreases. |$\gamma$| was set as 0.5 according to previous literature [37].
Based on the definition of semantic contribution in Equation (4), the first semantic similarity scores between different diseases, denoted by |${\mathrm{dS}}_1$| was established. Let |$i,j$| be two different diseases. |${\mathrm{dS}}_1(i,j)$| is defined as follows:
(5)

Intuitively, |${\mathrm{dS}}_1(i,j)$| is higher if the larger part of DAG is shared by i and j.

However, |${\mathrm{dS}}_1$|ignores the significance of different disease contributions. Supposing that |$i,t,q\in \mathrm{D}$|⁠, if disease |$t$| only appears in the |$\mathrm{dag}(i)$|⁠, and |$q$| appears in both |$\mathrm{dag}(i)$| and the dag of other diseases, |$t$| might have higher semantic contribution to |$i$| than |$q$|⁠. Thus, the second semantic contribution score |${\mathrm{SC}}_2(i,t)$| was presented as follows:
(6)
Based on |${\mathrm{SC}}_2(i,t)$|⁠, the second semantic similarity score |${\mathrm{dS}}_2$|⁠, between two diseases was presented as follows [38]:
(7)

As disease similarity measures calculated using |${\mathrm{dS}}_1$| and |${\mathrm{dS}}_2$| are both from the MeSH database, it provides only a part of the entries in diseases semantic similarity matrix. Hence, the Gaussian interaction profile kernel similarity was adopted to complement the remaining disease similarity entries.

Let |$\mathbf{T}\in{\{0,1\}}^{m\times n}$| be the adjacent matrix constructed using the known HMDD v2.0 miRNA–disease association data. |$\mathbf{T}[:,j]$| is the |$j$|-column binary vector representing disease |$j$|⁠. Then, Gaussian interaction profile kernel similarity between disease |$i$| and disease |$j$| is defined as:
(8)
where |${\theta}_{\mathrm{d}}$| is the kernel bandwidth parameter calculated using the following formula:
(9)
where |$n$| is the number of diseases i.e. the column number of |$\mathbf{T}$|⁠.
With |${\mathrm{dS}}_1$|⁠,|${\mathrm{dS}}_2$| and |$\mathrm{dGS}$|⁠, the disease semantic similarity matrix is denoted by |${\mathbf{A}}_{\mathrm{d}}\in{\mathbb{R}}^{n\times n}$|and constructed using
(10)

Methods

In this section, we first formulate the multicategory miRNA–disease association prediction as a tensor completion problem. Then, an end-to-end learning-based prediction model is proposed to solve the problem.

Problem formulation

Multicategory associations could be organized into a binary three-way tensor|$\mathbf{\mathcal{T}}\in{\{0,1\}}^{|M|\times |D|\times |C|}$|⁠, where|$|M|=m$|⁠,|$|D|=n$| and |$|C|=R$| represent the size of the set of miRNAs, diseases and association categories, respectively. The |$r\in \{1,2,\dots, R\}$| slice of |$\mathbf{\mathcal{T}}$| is an adjacent matrix |${\mathbf{T}}^{(r)}\in{\{0,1\}}^{|M|\times |D|}$| with 0–1 entries with regard to the |$r$|-category known miRNA–disease associations, where|${\mathbf{T}}^{(r)}(i,j)=1$| if a miRNA |$i\in M$| is associated with a disease |$j\in D$| in the |$r$|-evidence category. |${\mathbf{T}}^{(r)}(i,j)=0$| if the association between a miRNA |$i$| and a disease |$j$| is unknown or unobserved.

A problem of multi-category miRNA–disease association prediction (hereafter MCMDA) can be considered with |$m$| miRNAs and |$n$| diseases, and the partially observed |$m\times n\times R$| three-way association tensor|$\mathbf{\mathcal{T}}\in{\{0,1\}}^{m\times n\times R}$| where each entry |$\mathbf{\mathcal{T}}(i,j,r)=1$| represents miRNA |$i$| is associated with disease |$j$| in the |$r$|-category evidence. |$\mathbf{\mathcal{T}}(i,j,r)=0$| if the association between a miRNA |$i$| and a disease |$j$| is unknown or unobserved in the |$r$|-category evidence. Then, the objective of MCMDA is to find an approximation tensor |$\hat{\mathbf{\mathcal{T}}}\in{\mathbb{R}}^{m\times n\times R}$| such that:
(11)
where |${\Vert \mathbf{\mathcal{X}}\Vert}_{\mathrm{F}}^2$| is the tensor Fresenius norm and defined as |${\Vert \mathbf{\mathcal{X}}\Vert}_{\mathrm{F}}^2={\sum}_{i=1}^m{\sum}_{j=1}^n{\sum}_{r=1}^R{(\mathbf{\mathcal{X}}(i,j,r))}^{\mathbf{2}}$|⁠. |$\Omega$|is an index set denoting the indices of observations. |${P}_{\Omega}(\mathbf{\mathcal{X}})$| is the projection of |$\mathbf{\mathcal{X}}$| onto the set |$\Omega$|⁠. The intuitive explanation of Equation (11) is that: for MCMDA, we expect to find a tensor|$\hat{\mathbf{\mathcal{T}}}$|⁠, which subjects to the equality constraints given by the observations (a.k.a., experimentally validated associations).
We note that the objective of Equation (11) only considers the experimentally validated associations. From Table 1, we know that the sparsity rate of |$\mathbf{\mathcal{T}}$| is very low, which means only a few of the experimental validated miRNA–disease entries can be observed. There are many unknown or unobserved entries in |$\mathbf{\mathcal{T}}$|⁠. It should be pointed out that ‘unknown’ does not mean ‘no interaction’. ‘Unknown’ includes two situations, one is ‘no-interaction’, the other is ‘interaction but we don’t know yet’. Neglecting these entries in the optimization objective may yield degenerate results. To remedy this deficiency, well-balancing tradeoffs between the observed and unknown (or unobserved) entries should be considered in the objective function. Additionally, other side information about diseases and miRNAs can be exploited to further enhance the prediction performance. Moreover, a parametrized model should be established to yield |$\hat{\mathbf{\mathcal{T}}}$|⁠. In summary, the MCMDA can be more accurately defined as follows:
(12)
where the parameter |$\alpha \in (0,1)$| is a bias item that appropriately weighs observed and unobserved entries. |$\overline{\Omega}$| denoted a subset set of unknown or unobserved entries in |$\mathbf{\mathcal{T}}$|⁠. |$\mathbf{\mathcal{W}}$| are the trainable parameters for yielding |$\hat{\mathbf{\mathcal{T}}}$|⁠. Figure 1 shows a general flowchart of MCMDA.
The general flowchart of MCMDA, which is mainly composed of two steps: (Step1) The feature representation learning modular learns miRNA and disease latent representations with similarity information and the known associations as input. (Step2) Prediction modular yields multicategory association scores. Totally, MCMDA is formulated as a tensor completion problem with side information.
Figure 1

The general flowchart of MCMDA, which is mainly composed of two steps: (Step1) The feature representation learning modular learns miRNA and disease latent representations with similarity information and the known associations as input. (Step2) Prediction modular yields multicategory association scores. Totally, MCMDA is formulated as a tensor completion problem with side information.

When we solve the MCMDA defined by Equation (12), some questions must be answered. First, what are the appropriate feature representations for miRNAs and diseases? How can the feature matrices |$\mathbf{X}$| and |$\mathbf{Y}$| be obtained by exploiting the miRNA–miRNA similarity, disease–disease similarity, and the known different categories of miRNA–disease associations? Second, how to capture the dependency relationship between the latent feature representation and different association categories and establish the well-designed parameterized model to yield prediction score tensor|$\hat{\mathbf{\mathcal{T}}}$|? Finally, can we integrate the above two modules effectively to build an end-to-end learning pipeline from miRNA and disease feature representations learning to multiple-association prediction?

To answer these questions and solve the problem of MCMDA, in this paper, we develop an end-to-end learning-based prediction model that operates directly on a miRNA–disease multirelational heterogeneous graph. Generally, our proposed model has two main components: (i) encoder leverages graph neural network on the miRNA–disease multirelational heterogeneous graph to learn miRNA and disease latent representations, respectively. (ii) Decoder yields miRNA–disease association scores with the learned latent representations as input. In the following, we discuss the details of two components.

Encoders for latent feature of MiRNA and disease

In this subsection, two kinds of encoders are proposed to obtain latent features of miRNAs and diseases. The first is GCN-encoder which maps the nodes of miRNAs and diseases to the latent features by using graph convolutional network on the miRNA–miRNA similarity network and disease–disease similarity network, respectively. The second is RGCN-encoder which yields latent features by both exploiting similarity networks and the known different categories of miRNA–disease associations.

GCN-encoder

Let us start with discussing the details of the GCN-encoder. The basic assumption for miRNA–disease association prediction is that functionally similar miRNAs are more likely to be associated with phenotypically similar diseases, and vice versa. Therefore, similarity information among miRNAs and diseases are crucial for MDA. Here, the miRNA and disease latent feature representations are yielded by leveraging Graph Convolutional Network (GCN) [39] on the miRNA (or disease) similarity networks.

Let |${G}_m$| and |${G}_d$| be the miRNA functional similarity network and disease semantic similarity network, respectively. |${\mathbf{A}}_{\mathrm{m}}$| denotes the adjacent matrix for |${G}_m$| and |${\mathbf{A}}_d$| for |${G}_d$|⁠. |$|{V}_m|=m$| and |$|{V}_d|=n$| denote the size of the node set |${V}_m$| over |${G}_m$| and |${V}_d$| over |${G}_d,$| respectively. Let |$\mathbf{X}$| and |$\mathbf{Y}$| be the initial features on the set of nodes of the graph |${G}_m$| and |${G}_d$|⁠, respectively. GCN learns a node |$i$|’s feature by exploiting hierarchically aggregating feature information from |$i$|’s neighborhood. Next, we introduced the method of learning features for miRNAs over |${G}_m$|⁠. The way of learning features for diseases over |${G}_d$| is a similar process.

Specifically, a one-layer feature aggregating operator for a node |$i$| in a GCN is defined as:
(13)
where let |${\mathbf{x}}_i^{(l)}\in{\mathbb{R}}^{f_l}$| be the hidden representation of the miRNA node |$i$| in the l-th layer. |$\tilde{\mathbf{A}}_m={\mathbf{A}}_m+{\mathbf{I}}_m$|denotes the adjacent matrix of |${G}_m$|with self-loop, where |${\mathbf{I}}_m$| is the identity matrix. |$\tilde{\mathbf{D}}_m$| denotes the diagonal matrix with |${[\tilde{\mathbf{D}}_m]}_{ii}={\sum}_j{[{\tilde{\mathbf{A}}}_m]}_{ij}$|⁠. |$\boldsymbol{\Theta} \in{\mathbb{R}}^{f_{l+1}\times{f}_l}$| is the filter parameters matrix with |${f}_l$| input channels and |${f}_{l+1}$| filters. |$\sigma (\cdot )$| is a nonlinear active function, such as ReLU. |${c}_{i,r}=\sqrt{\tilde{\mathbf{D}}_m(i,i)\tilde{\mathbf{D}}_m(j,j)}$| is a normalization constant. The equivalent matrix form can be written as follows.
(14)
Denotes |$\tilde{\mathbf{D}}_m^{-1/2}{\tilde{\mathbf{A}}}_m\tilde{\mathbf{D}}_m^{-1/2}$| by |$\tilde{\mathbf{L}}_m$| to simplify notations. Then, multiple layers feature aggregation operators can be stacked as an |$L$|-layer GCNs denoted as follows:
(15)
where |$\mathbf{X}$| is a randomly initialized feature matrix for all miRNAs and |${\boldsymbol{\Theta}}_m=\{{\boldsymbol{\Theta}}_m^{(1)},\dots, {\boldsymbol{\Theta}}_m^{(L)}\}$| is the total trainable parameters. Similarly, for diseases we have
(16)

Thus, considering a miRNA functional similarity network and a disease semantic similarity network, starting from the randomly initialized embedding |$\mathbf{X}$| and |$\mathbf{Y}$|⁠, GCN transforms the features in a layer-by-layer manner and finally outputs |${\mathbf{X}}^{(L)}$| and |${\mathbf{Y}}^{(L)}$|⁠. These learned features will be used as the input for the downstream multicategory association prediction model.

Note that GCN-encoder neglects the known multiple categories of associations between miRNA and disease when yielding the latent feature. Naturally, the quality of latent features may be further improved by considering these associations. In fact, miRNA, disease similarity networks, and the known miRNA–disease associations together form heterogeneous multiple relational networks. Hence, we can exploit the Relational Graph Convolutional Network [40] to yield the latent feature representations.

RGCN-encoder

RGCN-encoder is also a graph convolutional network-based encoder which yields latent features by exploiting both (miRNA, disease) similarity networks and the known different categories of miRNA–disease associations. Specifically, one-layer feature aggregating operator for a miRNA |$i$| by an RGCN-encoder is defined as:
(17)
where let |${\mathbf{x}}_i^{(l)}\in{\mathbb{R}}^{f_l}$| be the current layer hidden representation of the miRNA |$i$|⁠, |${N}_i^r$| is the |$r$|-category neighbors of node |$i$|⁠, |${c}_{i,r}=|{N}_i^r|$| is a normalization constant, |${\boldsymbol{\Theta}}_m^{r,(l)}\in{\mathbb{R}}^{f_{l+1}\times{f}_l}$| is a filter parameters matrix for the |$r$| category association relationship which is shared by all |$r$| category association neighbors of |$i$|⁠. |${\boldsymbol{\Theta}}_m^{(l)}$| is a filter parameters matrix for each node itself. Likely, we also have a one-layer feature aggregating operator for a disease i as follows.
(18)

Similar to GCN-encoder, we can stack multilayer RGCN feature aggregation operations to form an RGCN encoder. The RGCN-encoders for miRNA and disease are denoted by |$\mathrm{R}{\mathrm{GCN}}_m^{(L)}(\mathbf{X};{\boldsymbol{\Theta}}_m)$| and |$\mathrm{R}{\mathrm{GCN}}_d^{(L)}(\mathbf{Y};{\boldsymbol{\Theta}}_d)$|⁠, respectively.

Different from the idea of GCN-encoder, the RGCN-encoder fully considers the different types of relations between miRNAs and diseases, such as miRNA–miRNA similarities, disease–disease similarities and multiple categories of miRNA–disease associations, and transforms the features into different types of feature space with relation-specific filter parameters matrix and finally aggregate different type features. We will demonstrate that this will improve the quality of representation learning via experimental results.

Decoders score multicategory miRNA–disease association

In our proposed prediction model, an association prediction score for the |$r$|-category is obtained by a decoder which is a parameterized score function |${\mathrm{Dec}}^{(r)}(\mathbf{X},\mathbf{Y};\mathbf{\mathcal{W}}):{\mathbb{R}}^m\times R\times{\mathbb{R}}^d\to \mathbb{R}$| where |$\mathbf{X},\mathbf{Y}$| are the encoded miRNA and disease features, respectively. In this paper, we introduce the following three different decoders.

DistMult decoder (denoted by |${\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}$|⁠) adopts the DistMult factorization [41] as the scoring function, which is known to perform well on standard multirelational link prediction problem. In |${\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}$|⁠, every category |$r$| is associated with a diagonal matrix |${\mathbf{D}}^{(r)}$|⁠, and a miRNA–disease pair under |$r$| category |$(\mathbf{x},r,\mathbf{y})$|is scored as
(19)

We observe that the diagonal matrix |${\mathbf{D}}^{(r)}$| in the DistMult decoder only captures the interactions between miRNAs and diseases under the specific |$r$| category. However, as shown in Supplementary Table 2 there may be associations across different categories. To this end, we extend |${\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}$| by incorporating a trainable parameter matrix|$\mathbf{G}$| into it and propose the following Linear Multi-Relational decoder.

LMR-decoder (denoted by |${\mathrm{Dec}}_{\mathrm{LMR}}^{(r)}$|⁠) is defined as follows.
(20)
where |$\mathbf{G}\in{\mathbb{R}}^{d\times d}$| is a parameter matrix which captures global interactions of the latent features of miRNAs and diseases across different categories. |${\mathbf{D}}^{(r)}\in{\mathbb{R}}^{d\times d}$| is still a trainable diagonal matrix which captures the importance of each dimension in latent representations toward |$r$| category association.

Both |${\mathrm{Dec}}_{\mathrm{DistMult}}^{(r)}$| and |${\mathrm{Dec}}_{\mathrm{LMR}}^{(r)}$| are bilinear decoders. Inspired by the idea of NIMCGCN [25], we proposed a novel method of neural multicategory association score model to capture the deeper and nonlinear interactions between the latent features of miRNAs and diseases.

NMR-decoder (denoted by |${\mathrm{Dec}}_{\mathrm{NMR}}^{(r)}$|⁠) represents a Neural Multi-Relational decoder. In the following, we will take miRNA as an example to show the idea of |${\mathrm{Dec}}_{\mathrm{NMR}}^{(r)}$|⁠. The same idea can be applied to diseases.

With GCN-output (or RGCN-output) feature |$\mathbf{X}$| as input, we establish a |$K$|-layer feedforward neural network |${\varphi}_m^{(r;K)}$|to further transform the features of miRNA for each category|$r$|⁠. Specifically,
(21)

A nonlinear transformation from |$k$|-layer to |$(k+1)$|-layer in |${\varphi}_m^{(r;K)}$|is defined as |${\mathbf{X}}^{(r;k+1)}=\sigma \left({\mathbf{W}}_m^{(r;k)}{\mathbf{X}}^{(r;k)}+{\mathbf{b}}_m^{(r;k)}\right)$| where |${\mathbf{X}}^{(r;k)}$| is the feature matrix at k-layer, |${\mathbf{W}}_m^{(r;k)}$| is the transformation parameter matrix and |${\mathbf{b}}_m^{(r;k)}$| is the bias vector. |$\sigma (\cdot )$| is a nonlinear active function. We denote all trainable parameters by |${\boldsymbol{\Psi}}_m=\{{\boldsymbol{\Psi}}_m^{(1)},\dots{\boldsymbol{\Psi}}_m^{(r)},\dots, {\boldsymbol{\Psi}}_m^{(R)}\}$| where |${\boldsymbol{\Psi}}_m^{(r)}=\{{\mathbf{W}}_m^{(r;0)},\dots, {\mathbf{W}}_m^{(r;K)},{\mathbf{b}}_m^{(r;0)},\dots, {\mathbf{b}}_m^{(r;K)}\}$| is the parameters involved in |${\varphi}_m^{(r;K)}$|⁠.

However, the above category-specific neural networks cannot capture the mutual interactions of latent features across different categories. To solve this problem, we establish a global |$H$|-layer feedforward neural network |${\psi}_m^{(H)}$| after category-specific neural networks to further capture the interactions of miRNA latent features across all possible different categories. Specifically,
(22)
where |${\mathbf{X}}^{(r;K)}$| is the feature matrix output by the r-th category-specific neural network. |${\mathbf{W}}_m=\{{\mathbf{W}}_m^{(1)},\dots, {\mathbf{W}}_m^{(H)}\}$|and |${\mathbf{b}}_m=\{{\mathbf{b}}_m^{(1)},\dots, {\mathbf{b}}_m^{(H)}\}$| are the global trainable parameter matrix and bias vector shared by different categories of input features.
After obtaining |${\hat{\mathbf{X}}}^{(r)}$| for all miRNAs and |${\hat{\mathbf{Y}}}^{(r)}$| for all diseases, the association prediction scores matrix for |$r$|-category |${\hat{\mathbf{T}}}^{(r)}$| is the dot product of |${\hat{\mathbf{X}}}^{(r)}$| and |${\hat{\mathbf{Y}}}^{(r)}$|⁠, i.e., |${\hat{\mathbf{T}}}^{(r)}={{{\hat{\mathbf{X}}^{(r)}}}}{}^{\top }{\hat{\mathbf{Y}}}^{(r)}$|⁠.
(23)
Totally, we can integrate the aforementioned components into a unified prediction model and |${\hat{\mathbf{T}}}^{(r)}$| can be calculated by the following dot product.
(24)

End-to-end learning models

The various encoders and decoders described in the previous subsection can be combined into specific MCMDA prediction models. For example, when using RGCN as encoder and NMR as decoder, we get NMR-RGCN. Similarly, we also have NMR-GCN, LMR-RGCN and so on. Next, we introduce a general loss function which can be used as the loss function of different encoder–decoder combinations.

Now, we present the details of the loss function of MCMDA defined in Equation (12). Specifically, the partial observed association tensor|$\mathbf{\mathcal{T}}$| can be described as a set of different categories of association matrices, i.e. |$\mathbf{\mathcal{T}}=[{\mathbf{T}}^{(1)},{\mathbf{T}}^{(2)},\dots, {\mathbf{T}}^{(R)}]$| where |${\mathbf{T}}^{(r)}\in{\{0,1\}}^{m\times n}$| (⁠|$r=1,..,R$|⁠) is the experimentally verified |$r$|-category miRNA–disease association matrix. For |$r$|-category, |${\Omega}^{(r)}$| and |${\overline{\Omega}}^{(r)}$| are used to denote the set of observed and unobserved or unknown miRNA–disease entries from the known association matrix|${\mathbf{T}}^{(r)}$|⁠. The observation |${\Omega}^{(r)}$| consisted only of positive associations, i.e. if |$\forall (i,j)\in{\Omega}^{(r)}$|⁠, |${\mathbf{T}}^{(r)}(i,j)=1$|⁠. |${\overline{\Omega}}^{(r)}$| is the set of unknown or unobserved entries if |$\forall (i,j)\in{\overline{\Omega}}^{(r)}$|⁠, |${\mathbf{T}}^{(r)}(i,j)=0$|⁠. With these notations, the MCMDA defined in Equation (12) can be reformulated as follows:
(25)
where |${\hat{\mathbf{T}}}^{(r)}$| is calculated by Equation (24), |$\mathbf{\mathcal{W}}$| indicates all trainable parameters, which includes encoder and decoder parameters.

It is worth mentioning that an encoder and a decoder are integrated into a unified end-to-end neural network learning framework. Specifically, GNN encoder is first leveraged to learn miRNA and disease features over a miRNA–disease heterogeneous information network, respectively. Then, decoder receives the learned latent features to take further transformations. The final prediction scores are obtained through the dot product of the transformed features. All parameters |$\mathbf{\mathcal{W}}$| involved in encoders and decoders are simultaneously optimized via a gradient descent with adaptive moment estimation [42]. Figure 2 demonstrates the flowchart of NMR-RGCN.

The flowchart of the NMR-RGCN, which is a specific model of Neural Multi-Category MiRNA-Disease Association prediction, i.e. NMCMDA with Neural Multi-Relational decoder and Relational Graph Convolutional Network encoder. NMR-RGCN operates directly on the miRNA–disease heterogeneous network and leverages RGCN to learn miRNA and disease latent representations, respectively. Then, neural multirelational decoder yields miRNA–disease association scores with the learned latent representations as input. The representation learning encoder and the prediction score decoder are integrated into a unified end-to-end neural network learning framework.
Figure 2

The flowchart of the NMR-RGCN, which is a specific model of Neural Multi-Category MiRNA-Disease Association prediction, i.e. NMCMDA with Neural Multi-Relational decoder and Relational Graph Convolutional Network encoder. NMR-RGCN operates directly on the miRNA–disease heterogeneous network and leverages RGCN to learn miRNA and disease latent representations, respectively. Then, neural multirelational decoder yields miRNA–disease association scores with the learned latent representations as input. The representation learning encoder and the prediction score decoder are integrated into a unified end-to-end neural network learning framework.

Results and discussion

Experimental setup

The experimental code is implemented based on the open-source machine learning framework Pytorch (https://pytorch.org). Graph neural network encoders are implemented based on the open-source deep learning on graph library (https://www.dgl.ai/). All experiments are carried on Windows 10 operation system with a Dell Precision T5820 workstation computer of an intel W-2145 8 cores, 3.7GHz CPU and 64G memory.

In this study, two following evaluation settings are setup.

(1) |${\mathbf{CV}}_{\mathbf{triplet}}{:}$| We randomly split all experimentally verified miRNA–disease-category triplets (as positive samples) into 10 subsets. In each fold, one subset and an equal-sized set of randomly sampled unknown triplets (as negative samples) as a testing set, the remaining subsets and an equal-sized set of randomly sampled unknown samples were used as a training set. Note that we were extremely careful to ensure the train and test sets did not include each other. The area under the precision-recall (AUPR) curve and the area under the receiver operating characteristic (AUC) curve were used to evaluate the prediction performance of all prediction methods.

(2) |${\mathbf{CV}}_{\mathbf{category}}{:}$| Every miRNA–disease pair is connected through zero, one or more relation types in our modeling problem. In this case, we randomly split all miRNA–disease pairs which are connected with not less than one type of association into ten subsets. In each fold, one subset as the testing set in turn, and the rest subsets as a training set. Each miRNA–disease pair in the test set is ranked under all association categories according to the predicted score. Then we consider the category with the highest score as the model prediction result for the test sample and calculate the precision (Top-1), recall (Top-1), f1-score (Top-1).

Comparing with the binary miRNA–disease association prediction, we pay more attention to the prediction of the associated category in the problem of miRNA–disease multirelational prediction which is exactly the test goal of |${\mathbf{CV}}_{\mathbf{category}}$|⁠, so we regard |${\mathbf{CV}}_{\mathbf{category}}$| as the primary experimental setting.

Performance analysis of various encoder–decoder combinations

Our proposed model is totally based on two components, i.e. encoders and decoders. Various encoders are presented to obtain the feature representations of input data. With the input of encoded features, several decoders are presented to produce multiple categories of association scores.

In this subsection, we conduct extensive experiments on MCD-6 and MCD-20 datasets to systematically compare the performances of NMR-RGCN (NMR as a decoder and RGCN as an encoder, the meanings of other notations are similar), DistMult-RGCN, LMR-RGCN, NMR-GCN. The compared results are shown in Tables 2 and 3.

Table 2

Experiment results of the various encoder–decoder combinations on MCD-6 dataset

MCD-6 datasetCVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
NMR-GCN0.94670.94940.51160.36900.4288
DistMult-RGCN0.94230.93710.40590.29120.3391
LMR-RGCN0.95250.95010.52320.37220.4350
NMR-RGCN0.95330.95210.55220.39670.4617
MCD-6 datasetCVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
NMR-GCN0.94670.94940.51160.36900.4288
DistMult-RGCN0.94230.93710.40590.29120.3391
LMR-RGCN0.95250.95010.52320.37220.4350
NMR-RGCN0.95330.95210.55220.39670.4617
Table 2

Experiment results of the various encoder–decoder combinations on MCD-6 dataset

MCD-6 datasetCVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
NMR-GCN0.94670.94940.51160.36900.4288
DistMult-RGCN0.94230.93710.40590.29120.3391
LMR-RGCN0.95250.95010.52320.37220.4350
NMR-RGCN0.95330.95210.55220.39670.4617
MCD-6 datasetCVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
NMR-GCN0.94670.94940.51160.36900.4288
DistMult-RGCN0.94230.93710.40590.29120.3391
LMR-RGCN0.95250.95010.52320.37220.4350
NMR-RGCN0.95330.95210.55220.39670.4617
Table 3

Experiment results of the various encoder–decoder combinations on MCD-20 dataset

MCD-20 datasetCVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
NMR-GCN0.94190.93700.35930.25650.2933
DistMult-RGCN0.94250.93740.19610.05160.0817
LMR-RGCN0.94790.95050.36660.27290.3129
NMR-RGCN0.94800.95270.42300.28930.3436
MCD-20 datasetCVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
NMR-GCN0.94190.93700.35930.25650.2933
DistMult-RGCN0.94250.93740.19610.05160.0817
LMR-RGCN0.94790.95050.36660.27290.3129
NMR-RGCN0.94800.95270.42300.28930.3436
Table 3

Experiment results of the various encoder–decoder combinations on MCD-20 dataset

MCD-20 datasetCVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
NMR-GCN0.94190.93700.35930.25650.2933
DistMult-RGCN0.94250.93740.19610.05160.0817
LMR-RGCN0.94790.95050.36660.27290.3129
NMR-RGCN0.94800.95270.42300.28930.3436
MCD-20 datasetCVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
NMR-GCN0.94190.93700.35930.25650.2933
DistMult-RGCN0.94250.93740.19610.05160.0817
LMR-RGCN0.94790.95050.36660.27290.3129
NMR-RGCN0.94800.95270.42300.28930.3436
The effect of the biased item $\boldsymbol{\alpha}$ in the loss function on the performance of NMR-RGCN.
Figure 3

The effect of the biased item |$\boldsymbol{\alpha}$| in the loss function on the performance of NMR-RGCN.

The effect of the layer number of RGCN encoders $\mathbf{L}$ on the performance of NMR-RGCN.
Figure 4

The effect of the layer number of RGCN encoders |$\mathbf{L}$| on the performance of NMR-RGCN.

Table 4

The effect of the number of neural multicategory layer on the performance of NMR-RGCN

Top-1 PTop-1 RTop-1 F1
(⁠|$K=1,\mathrm{LMR}$|⁠) + (⁠|$H=1,\mathrm{LMR}$|⁠)0.38540.30150.3383
(⁠|$K=1,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.51730.38410.4409
(⁠|$K=1,\mathrm{NMR}$|⁠)0.35010.27480.3079
(⁠|$K=2,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.54520.39980.4613
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.55310.40590.4682
(⁠|$K=4,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.55200.40430.4667
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=2,\mathrm{NMR}$|⁠)0.55760.40710.4706
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=3,\mathrm{NMR}$|⁠)0.55690.40630.4698
Top-1 PTop-1 RTop-1 F1
(⁠|$K=1,\mathrm{LMR}$|⁠) + (⁠|$H=1,\mathrm{LMR}$|⁠)0.38540.30150.3383
(⁠|$K=1,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.51730.38410.4409
(⁠|$K=1,\mathrm{NMR}$|⁠)0.35010.27480.3079
(⁠|$K=2,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.54520.39980.4613
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.55310.40590.4682
(⁠|$K=4,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.55200.40430.4667
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=2,\mathrm{NMR}$|⁠)0.55760.40710.4706
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=3,\mathrm{NMR}$|⁠)0.55690.40630.4698
Table 4

The effect of the number of neural multicategory layer on the performance of NMR-RGCN

Top-1 PTop-1 RTop-1 F1
(⁠|$K=1,\mathrm{LMR}$|⁠) + (⁠|$H=1,\mathrm{LMR}$|⁠)0.38540.30150.3383
(⁠|$K=1,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.51730.38410.4409
(⁠|$K=1,\mathrm{NMR}$|⁠)0.35010.27480.3079
(⁠|$K=2,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.54520.39980.4613
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.55310.40590.4682
(⁠|$K=4,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.55200.40430.4667
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=2,\mathrm{NMR}$|⁠)0.55760.40710.4706
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=3,\mathrm{NMR}$|⁠)0.55690.40630.4698
Top-1 PTop-1 RTop-1 F1
(⁠|$K=1,\mathrm{LMR}$|⁠) + (⁠|$H=1,\mathrm{LMR}$|⁠)0.38540.30150.3383
(⁠|$K=1,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.51730.38410.4409
(⁠|$K=1,\mathrm{NMR}$|⁠)0.35010.27480.3079
(⁠|$K=2,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.54520.39980.4613
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.55310.40590.4682
(⁠|$K=4,\mathrm{NMR}$|⁠) + (⁠|$H=1,\mathrm{NMR}$|⁠)0.55200.40430.4667
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=2,\mathrm{NMR}$|⁠)0.55760.40710.4706
(⁠|$K=3,\mathrm{NMR}$|⁠) + (⁠|$H=3,\mathrm{NMR}$|⁠)0.55690.40630.4698

The empirical results in Tables 2 and 3 show the effect of different encoders and decoders on prediction performance. Specifically, RGCN is superior to GCN because it exploits more link information to obtain latent representations. In addition, when using RGCN as encoder, LMR is better than DistMult since LMR learns to captures global interactions of the latent features of miRNAs and diseases across different categories. Furthermore, NMR is superior to LMR since NMR extends LMR to a nonlinear neural network framework.

Since NMR-RGCN has overperformed other encoder–decoder combinations especially under |${\mathbf{CV}}_{\mathbf{category}}$|⁠, in the following paper, we consider NMR-RGCN as the main prediction model and discuss the influence of different parameters on the performance of NMR-RGCN and compare NMR-RGCN with other baselines.

Table 5

Comparisons with existing work

CVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
TDRC v3.2 datasetNLPMMDA [34]0.65640.75810.18440.13800.1579
TDRC [35]0.92840.92010.61780.47410.5365
NMR-GCN (ours)0.92240.92190.62780.48550.5476
NMR-RGCN (ours)0.92790.92570.63340.48910.5520
TDRC v2.0 datasetNLPMMDA [34]0.66100.76350.43970.39190.4144
TDRC [35]0.86630.83790.56090.49990.5286
NMR-GCN (ours)0.88820.88490.71440.62780.6683
NMR-RGCN (ours)0.89810.88890.72540.63300.6761
MCD-6 datasetTDRC [35]0.94460.93770.40670.29530.3422
NMR-GCN (ours)0.94670.94940.51160.36900.4288
NMR-RGCN(ours)0.95330.95210.55220.39670.4617
CVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
TDRC v3.2 datasetNLPMMDA [34]0.65640.75810.18440.13800.1579
TDRC [35]0.92840.92010.61780.47410.5365
NMR-GCN (ours)0.92240.92190.62780.48550.5476
NMR-RGCN (ours)0.92790.92570.63340.48910.5520
TDRC v2.0 datasetNLPMMDA [34]0.66100.76350.43970.39190.4144
TDRC [35]0.86630.83790.56090.49990.5286
NMR-GCN (ours)0.88820.88490.71440.62780.6683
NMR-RGCN (ours)0.89810.88890.72540.63300.6761
MCD-6 datasetTDRC [35]0.94460.93770.40670.29530.3422
NMR-GCN (ours)0.94670.94940.51160.36900.4288
NMR-RGCN(ours)0.95330.95210.55220.39670.4617
Table 5

Comparisons with existing work

CVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
TDRC v3.2 datasetNLPMMDA [34]0.65640.75810.18440.13800.1579
TDRC [35]0.92840.92010.61780.47410.5365
NMR-GCN (ours)0.92240.92190.62780.48550.5476
NMR-RGCN (ours)0.92790.92570.63340.48910.5520
TDRC v2.0 datasetNLPMMDA [34]0.66100.76350.43970.39190.4144
TDRC [35]0.86630.83790.56090.49990.5286
NMR-GCN (ours)0.88820.88490.71440.62780.6683
NMR-RGCN (ours)0.89810.88890.72540.63300.6761
MCD-6 datasetTDRC [35]0.94460.93770.40670.29530.3422
NMR-GCN (ours)0.94670.94940.51160.36900.4288
NMR-RGCN(ours)0.95330.95210.55220.39670.4617
CVtripletCVcategory
AUPRAUCTop-1 PTop-1 RTop-1 F1
TDRC v3.2 datasetNLPMMDA [34]0.65640.75810.18440.13800.1579
TDRC [35]0.92840.92010.61780.47410.5365
NMR-GCN (ours)0.92240.92190.62780.48550.5476
NMR-RGCN (ours)0.92790.92570.63340.48910.5520
TDRC v2.0 datasetNLPMMDA [34]0.66100.76350.43970.39190.4144
TDRC [35]0.86630.83790.56090.49990.5286
NMR-GCN (ours)0.88820.88490.71440.62780.6683
NMR-RGCN (ours)0.89810.88890.72540.63300.6761
MCD-6 datasetTDRC [35]0.94460.93770.40670.29530.3422
NMR-GCN (ours)0.94670.94940.51160.36900.4288
NMR-RGCN(ours)0.95330.95210.55220.39670.4617

Analysis of parameters

The following parameters will significantly affect the performance of NMR-RGCN: (i)|$\alpha$|⁠: the biased item in the loss function defined by Equation (25); (ii)|$L$|⁠: the layer number of RGCN encoders; (iii)|$K$|⁠: the layer number of the category-specific neural networks defined by Equation (21) and (iv)|$H$|⁠: the layer number of the global neural networks across all possible different categories defined by Equation (22). We performed experiments on MCD-6 dataset with the metrics of Top-1 precision, Top-1 Recall and Top-1 F1 to evaluate the effect of these parameters on the performance of NMR-RGCN.

First, the biased item α is introduced to appropriately weigh observed and unobserved entries. The loss function is optimized only using positive samples when α = 0 and only using unobserved samples when α = 1. In this experiment, we fix|$L=3$|⁠, |$K=3$| and|$H=2$|⁠. Figure 3 shows the effects of different α on the prediction performance of NMR-RGCN. The performance when α = 0.2 is superior to the performances when α is set to be other values.

Second, we analyze the effect of |$L$| to the prediction performance. In this experiment, we fix|$\alpha =0.2$|⁠, |$K=3$| and|$H=2$|⁠. The comparative results are shown in Figure 4. From the results, we see that NMR-RGCN with |$L=3$| provides the best performance. Additionally, we also note that with L increasing (⁠|$L>3$|⁠), the performance of RGCN encoders slightly decreases as higher graph convolutional layers oversmoothed the encoded embeddings.

Finally, as described in the subsection Decoders score multicategory miRNA–disease association, the neural multirelational decoder consists of the local category-specific neural networks and the global neural networks across all categories. In the experiments, we fix |$\alpha =0.2$| and |$L=3$|⁠. We test the following situations:

(1) Local linear decoder + global linear decoder. In this case, we only consider the linear local and global decoders, i.e. |$K=1$|⁠, |$H=1$| and remove the nonlinear activation function. The results show that the performance of the linear decoder is far worse than that of the neural decoder.

(2) Local neural decoder. We remove the global decoder and only consider the local neural decoders. From the experimental results in Table 4, we found that the prediction performance of the model decreases when we removed the global decoder. It indicates that the global decoder is necessary for NMR-RGCN to improve the prediction performance.

(3) Local neural decoder + global neural decoder. We test the performance of several combinations of local neural decoders and global neural decoders. From the results in Table 4, we found that the prediction performance improved as the number of local neural decoder layers increases. However, when |$K>3$|⁠, the prediction performance decreases slightly. Similarly, for the global neural projection, when |$H\le 2$|⁠, the prediction performance improved as the number of layers increases. Note that the NMR decoder with |$K=3$| and |$H=2$| achieves the best performance.

Comparisons with existing work

Based on the above evaluation experimental results in the previous subsection, we use α = 0.2 |$, L=3,K=3$| and |$H=2$| as experimental settings for NMR-RGCN and α = 0.2|$, L=1, K=3$| and |$H=2$| for NMR-GCN in the comparison experiments. The learning rate of Adaptive Moment Estimation is 9e-4 and the regulation coefficient is 1e-8.

We have not found other approaches developed for predicting multiple categories of miRNA–disease associations except for only three methods [33–35]. Here, we compare NMR-RGCN (our method) with two methods: NLPMMDA [34], TDRC [35]. NLPMMDA predicted multirelational miRNA–disease associations by label propagation which integrates miRNA similarity and disease similarity. There are two hyperparameters in NLPMMDA, we set|${\lambda}_m=0.2$|⁠, |${\lambda}_d=0.2$|⁠, respectively as the original paper to get a best performance. TDRC introduced tensor decomposition with miRNA functional similarity and disease semantic similarity as relational constraints to solve the multiple types of miRNA–disease association prediction. There are four hyperparameters in TDRC, we set |$\lambda =0.001$|⁠, |$\gamma =4$|⁠, |$\alpha =0.125$| and |$\beta =0.25$|⁠, respectively as the original paper to get a best performance. All compared experiments were carried on TDRC v3.2 dataset, TDRC v2.0 dataset and MCD-6 dataset, respectively.

We summarize the experimental comparison results of our proposed model and baselines on three datasets in Table 5. All experiments are evaluated under the two aforementioned experimental settings. We start by comparing the results in TDRC v2.0 and TDRC v3.2 datasets. We found that the improvements of NMR-RGCN, NMR-GCN and TDRC are particularly obvious compared with NLPMMDA, which highlights the importance of considering the relationship between each type of association. Then, we further compared NMR-RGCN with TDRC and NMR-GCN on the MCD-6 dataset. We observed that NMR-RGCN achieves relative performance gains over TDRC by 35.78% in terms of Top-1 precision and 1.51% in terms of AUC. This shows that capturing the nonlinear relationship between multiple types of miRNAs and diseases can enable us to obtain better results, because TDRC is essentially a multilinear method and may not be enough to capture the complex and nonlinear interactions between the features of miRNAs and diseases. When compared with NMR-GCN, the average relative Top-1 precision of NMR-RGCN is improved by 7.94%, because the proposed NMR-RGCN also takes into account heterogeneous information from other sources in the heterogeneous network when encoding node information.

Case studies

To further evaluate the accuracy of our proposed model for predicting unobserved miRNA–disease-type entries, we conduct case studies on two widespread human diseases, i.e. breast cancer and lung cancer (Supplementary Tables 3 and 4). We prioritized candidate miRNA-type pairs for a specific disease using the trained model based on MCD-4 dataset (all known HMDD v2.0 data), and then we verified the top-50 predictions with HMDD v3.2 dataset and recent literature. We also trained RGCN-NMR model based on MCD-6 dataset (all known HMDD v3.2 data), and then prioritized all unknown miRNA–disease-type entries using their predicted scores. Table 6 shows the top-10 predicted results and seven predictions can be confirmed according to recent literature. The results prove that our proposed model is effective.

Table 6

The prediction and validation of top-10 miRNA–disease-category associations

MiRNADiseaseCategoriesEvidence(PMID)
hsa-mir-34cCarcinoma, hepatocellularEpigenetics27165229
hsa-mir-34aCarcinoma, hepatocellularEpigenetics27165229
hsa-mir-21Multiple sclerosisOtherUnconfirmed
hsa-mir-196a-2Ovarian neoplasmsGenetics30930933
hsa-mir-126Breast neoplasmsCirculation20801493
hsa-mir-21Leukemia, lymphoblastic, acuteCirculationUnconfirmed
hsa-mir-21Alzheimer diseaseCirculation31592314
hsa-mir-146aOsteosarcomaGeneticsUnconfirmed
hsa-mir-155Bladder neoplasmsTissue31298320
hsa-mir-150Breast neoplasmsCirculation31963351
MiRNADiseaseCategoriesEvidence(PMID)
hsa-mir-34cCarcinoma, hepatocellularEpigenetics27165229
hsa-mir-34aCarcinoma, hepatocellularEpigenetics27165229
hsa-mir-21Multiple sclerosisOtherUnconfirmed
hsa-mir-196a-2Ovarian neoplasmsGenetics30930933
hsa-mir-126Breast neoplasmsCirculation20801493
hsa-mir-21Leukemia, lymphoblastic, acuteCirculationUnconfirmed
hsa-mir-21Alzheimer diseaseCirculation31592314
hsa-mir-146aOsteosarcomaGeneticsUnconfirmed
hsa-mir-155Bladder neoplasmsTissue31298320
hsa-mir-150Breast neoplasmsCirculation31963351
Table 6

The prediction and validation of top-10 miRNA–disease-category associations

MiRNADiseaseCategoriesEvidence(PMID)
hsa-mir-34cCarcinoma, hepatocellularEpigenetics27165229
hsa-mir-34aCarcinoma, hepatocellularEpigenetics27165229
hsa-mir-21Multiple sclerosisOtherUnconfirmed
hsa-mir-196a-2Ovarian neoplasmsGenetics30930933
hsa-mir-126Breast neoplasmsCirculation20801493
hsa-mir-21Leukemia, lymphoblastic, acuteCirculationUnconfirmed
hsa-mir-21Alzheimer diseaseCirculation31592314
hsa-mir-146aOsteosarcomaGeneticsUnconfirmed
hsa-mir-155Bladder neoplasmsTissue31298320
hsa-mir-150Breast neoplasmsCirculation31963351
MiRNADiseaseCategoriesEvidence(PMID)
hsa-mir-34cCarcinoma, hepatocellularEpigenetics27165229
hsa-mir-34aCarcinoma, hepatocellularEpigenetics27165229
hsa-mir-21Multiple sclerosisOtherUnconfirmed
hsa-mir-196a-2Ovarian neoplasmsGenetics30930933
hsa-mir-126Breast neoplasmsCirculation20801493
hsa-mir-21Leukemia, lymphoblastic, acuteCirculationUnconfirmed
hsa-mir-21Alzheimer diseaseCirculation31592314
hsa-mir-146aOsteosarcomaGeneticsUnconfirmed
hsa-mir-155Bladder neoplasmsTissue31298320
hsa-mir-150Breast neoplasmsCirculation31963351

Conclusion

Identification of potential multicategory miRNA–disease associations using computational approaches is important as it will improve our understanding of the pathogenesis of diseases and guide treatment. In this study, we develop a novel data-driven end-to-end learning-based method NMCMDA of neural multiple-category miRNA–disease association prediction to deal with the issue of MCMDA. NMCMDA shows excellent performance for the prediction of multicategory miRNA–disease associations and is significantly superior to the state-of-the-art method TDRC [35] in terms of Top-1 precision, Top-1 Recall and Top-1 F1. In terms of our understanding, the advantages of NMCMDA over TDRC [35] may lie in the following two aspects. First, NMCMDA is an end-to-end learning framework where encoder operates directly on the miRNA–disease heterogeneous network and leverages GNN to learn latent representations. Decoder yields miRNA–disease association scores with the learned latent representations as input. All parameters involved in encoders and decoders are simultaneously optimized via a gradient descent. Although TDRC is not an end-to-end learning algorithm which cannot guarantee the latent feature is directly related to prediction objective. Second, TDRC is essentially a multilinear method and may not be enough to capture the complex and nonlinear interactions between the features of miRNAs and diseases. Case studies also provided for two high-risk human diseases breast cancer and lung cancer) and we also provide the prediction and validation of top-10 miRNA–disease-category associations based on all known data of HMDD v3.2, which further validate the effectiveness and feasibility of the proposed method.

There are several directions for future study. First, the structural information regarding miRNA and disease similarity networks significantly affects the learned feature representations, which further affects the final prediction results. Other sources of biomedical information, such as miRNA–gene interactions and disease–gene interactions, etc., might be relevant for modeling miRNA–disease associations, and we hope to investigate the utility of integrating them into the model. As our proposed NMCMDA is a general approach for multirelational link prediction in any heterogeneous network, it would be interesting to apply it to other domains and problems, for example, the identification of multirelational drug–drug interactions [43].

Key Points
  • Present a novel data-driven end-to-end learning-based method of neural multiple-category miRNA–disease association prediction (NMCMDA) for predicting multiple-category miRNA–disease associations.

  • NMCMDA with the encoder of Relational Graph Convolutional Network and the neural multirelational decoder (NMR-RGCN) achieves the best prediction performance.

  • The experimental results show that the NMR-RGCN is significantly superior to the state-of-the-art method TDRC in terms of Top-1 precision, Top-1 Recall, and Top-1 F1.

Acknowledgements

We thank anonymous reviewers for valuable suggestions.

Funding

National Natural Science Foundation of China (U1802271), the Science Foundation for Distinguished Young Scholars of Yunnan Province (2019FJ011), the Fundamental Research Project of Yunnan Province (201901BB050052).

Jingru Wang is a postgraduate student at Yunnan University. Her research focuses on bioinformatics and machine learning.

Jin Li is currently a professor at the School of Software, Yunnan University. His research interests include bioinformatics and machine learning.

Kun Yue is currently a professor at the School of Information, Yunnan University. His research interests include data and knowledge engineering.

Li Wang is a postgraduate student at Yunnan University. Her research focuses on bioinformatics and graph data analysis.

Yuyun Ma is a postgraduate student at Yunnan University. Her research focuses on graph neural network.

Qing Li is an associate chief physician at the First Affiliated Hospital of Kunming Medical University. His research focuses on pathology.

References

1.

Ambros
V
.
The functions of animal microRNAs
.
Nature
2004
;
431
:
350
5
.

2.

Kapranov
P
,
Cheng
J
,
Dike
S
, et al.
RNA maps reveal new RNA classes and a possible function for pervasive transcription
.
Science
2007
;
316
:
1484
8
.

3.

Taft
RJ
,
Pang
KC
,
Mercer
TR
, et al.
Non-coding RNAs: regulators of disease
.
J Pathol
2010
;
220
:
126
39
.

4.

Bandyopadhyay
S
,
Mitra
R
,
Maulik
U
, et al.
Development of the human cancer microRNA network
.
Silence
2010
;
1
:
6
.

5.

Zhang
B
,
Wang
Q
,
Pan
X
.
MicroRNAs and their regulatory roles in animals and plants
.
J Cell Physiol
2007
;
210
:
279
89
.

6.

Wang
WX
,
Rajeev
BW
,
Stromberg
AJ
, et al.
The expression of microRNA mi R-107 decreases early in Alzheimer’s disease and may accelerate disease progression through regulation of beta-site amyloid precursor protein-cleaving enzyme 1
.
Neurobiol Dis
2008
;
28
:
1213
23
.

7.

Chen
X
,
Yan
CC
,
Zhang
X
, et al.
Long non-coding RNAs and complex disease: from experimental results to computational models
.
Brief Bioinform
2017
;
18
:
558
76
.

8.

Chen
X
,
Sun
YZ
,
Zhang
DH
, et al.
NRDTD: a database for clinically or experimentally supported non-coding RNAs and drug targets associations
.
Database
2017
;
2017
:
bax057
.

9.

Chen
X
,
Huang
L
.
LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction
.
PLoS Comput Biol
2017
;
13
:
e1005912
.

10.

Chen
X
,
Xie
D
,
Zhao
Q
, et al.
MicroRNAs and complex diseases: from experimental results to computational models
.
Brief Bioinform
2019
;
20
:
515
39
.

11.

Goh
KI
,
Cusick
ME
,
Valle
D
, et al.
The human disease network
.
Proc Natl Acad Sci USA
2007
;
104
:
8685
90
.

12.

Huang
Z
,
Liu
L
,
Gao
Y
, et al.
Benchmark of computational methods for predicting microRNA-disease associations
.
Genome Biol
2019
;
20
:
202
.

13.

You
ZH
,
Huang
ZA
,
Zhu
Z
, et al.
PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction
.
PLoS Comput Boil
2017
;
13
:
e1005455
.

14.

Zhang
X
,
Zou
Q
,
Paton
AR
, et al.
Meta-path methods for prioritizing candidate disease miRNAs
.
IEEE/ACM Trans Comput Biol Bioinform
2019
;
16
:
283
91
.

15.

Li
G
,
Luo
J
,
Xiao
Q
, et al.
Predicting microRNA-disease associations using label propagation based on linear neighborhood similarity
.
J Biomed Inform
2018
;
82
:
169
77
.

16.

Zhang
W
,
Li
Z
,
Guo
W
, et al.
A fast linear neighborhood similarity-based network link inference method to predict microRNA-disease associations
.
IEEE/ACM Trans Comput Biol Bioinform
2019
. doi: .

17.

Chen
X
,
Huang
L
,
Xie
D
, et al.
EGBMMDA: extreme gradient boosting machine for miRNA-disease association prediction
.
Cell Death Dis
2018
;
9
:
3
.

18.

Chen
X
,
Zhu
CC
,
Yin
J
, et al.
Ensemble of decision tree reveals potential miRNA-disease associations
.
PLoS Comput Biol
2019
;
15
:
e1007209
.

19.

Zhao
Y
,
Chen
X
,
Yin
J
.
Adaptive boosting-based computational model for predicting potential miRNA-disease associations
.
Bioinformatics
2019
;
35
:
4730
8
.

20.

Chen
X
,
Wang
L
,
Qu
J
, et al.
Predicting miRNA-disease association based on inductive matrix completion
.
Bioinformatics
2018
;
34
:
4256
65
.

21.

Cheng
L
,
Yu
S
,
Luo
J
.
Adaptive multi-view multi-label learning for identifying disease-associated candidate miRNAs
.
PLoS Comput Biol
2019
;
15
:
e1006931
.

22.

Peng
J
,
Hui
W
,
Li
Q
, et al.
A learning-based framework for miRNA-disease association identification using neural networks
.
Bioinformatics
2019
;
35
:
4364
71
.

23.

Gong
Y
,
Niu
Y
,
Zhang
W
, et al.
A network embedding-based multiple information integration method for the MiRNA-disease association prediction
.
BMC Bioinf
2019
;
20
:
468
.

24.

Li
Z
,
Li
J
,
Nie
R
, et al.
A graph auto-encoder model for miRNA-disease associations prediction
.
Brief Bioinf
2020
;
bbaa240
. doi: .

25.

Li
J
,
Zhang
S
,
Liu
T
, et al.
Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction
.
Bioinformatics
2020
;
36
:
2538
46
.

26.

Li
Y
,
Qiu
C
,
Tu
J
, et al.
HMDD v2.0: a database for experimentally supported human microRNA and disease associations
.
Nucleic Acids Res
2014
;
42
:
D1070
4
.

27.

Huang
Z
,
Shi
J
,
Gao
Y
, et al.
HMDD v3.0: a database for experimentally supported human microRNA-disease associations
.
Nucleic Acids Res
2019
;
47
:
D1013
7
.

28.

Goh
JN
,
Loo
SY
,
Datta
A
, et al.
MicroRNAs in breast cancer: regulatory roles governing the hallmarks of cancer
.
Biol Rev
2015
;
91
:
409
28
.

29.

Schrauder
MG
,
Strick
R
,
Schulz-Wendtland
R
, et al.
Circulating micro-RNAs as potential blood-based markers for early stage breast cancer detection
.
PLoS One
2012
;
7
:
e29770
.

30.

Robbins
ME
,
Dakhlallah
D
,
Marsh
CB
, et al.
Of mice and men: correlations between microRNA-17~92 cluster expression and promoter methylation in severe bronchopulmonary dysplasia
.
Am J Physiol Lung Cell Mol Physiol
2016
;
311
:
L981
4
.

31.

Zhang
J
,
Liu
J
,
Liu
Y
, et al.
miR-101 represses lung cancer by inhibiting interaction of fibroblasts and cancer cells by down-regulating CXCL12
.
Biomed Pharmacother
2015
;
74
:
215
21
.

32.

Yan
F
,
Shen
N
,
Pang
J
, et al.
Restoration of miR-101 suppresses lung tumorigenesis through inhibition of DNMT3a-dependent DNA methylation
.
Cell Death Dis
2014
;
5
:
1413
.

33.

Chen
X
,
Yan
CC
,
Zhang
X
, et al.
RBMMMDA: predicting multiple types of disease-microRNA associations
.
Sci Rep
2015
;
5
:
13877
.

34.

Zhang
X
,
Yin
J
,
Zhang
X
.
A semi-supervised learning algorithm for predicting four types MiRNA-disease associations by mutual information in a heterogeneous network
.
Genes
2018
;
9
:
139
.

35.

Huang
F
,
Yue
X
,
Xiong
Z
, et al.
Tensor decomposition with relational constraints for predicting multiple types of microRNA-disease associations
.
Brief Bioinform
2020
;
bbaa140
. doi: . Online ahead of print.

36.

Lu
M
,
Zhang
Q
,
Deng
M
, et al.
An analysis of human microRNA and disease associations
.
Plos One
2008
;
3
:
e3420
.

37.

Wang
D
,
Wang
J
,
Lu
M
, et al.
Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases
.
Bioinformatics
2010
;
26
:
1644
50
.

38.

Chen
M
,
Peng
Y
,
Li
A
, et al.
A novel information diffusion method based on network consistency for identifying disease related microRNAs
.
RSC Adv
2018
;
8
:
36675
90
.

39.

Kipf
TN
,
Welling
M
. Semi-supervised classification with graph convolutional networks. In:
The 5th International Conference on Learning Representations (ICLR)
, Toulon, France.
2017
.

40.

Schlichtkrull
MS
,
Kipf
TN
,
Bloem
P
, et al. Modeling relational data with graph convolutional networks. In:
The 18th Extended Semantic Web Conference
. Cham. Heraklion, Crete, Greece: Springer,
2018
.

41.

Yang
B
,
Yih
W
,
He
X
, et al. Embedding entities and relations for learning and inference in knowledge bases. In:
The 3th International Conference on Learning Representations (ICLR)
, San Diego, CA, USA.
2015
.

42.

Kingma
DP
,
Ba
J
. Adam: a method for stochastic optimization. In:
The 3th International Conference on Learning Representations (ICLR)
, San Diego, CA, USA.
2015
.

43.

Ryu
JY
,
Kim
HU
,
Lee
SY
.
Deep learning improves prediction of drug–drug and drug–food interactions
.
Proc Natl Acad Sci USA
2018
;
115
:
E4304
11
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)