Integration of pairwise neighbor topologies and miRNA family and cluster attributes for miRNA–disease association prediction

Xuan, Ping; Wang, Dong; Cui, Hui; Zhang, Tiangang; Nakaguchi, Toshiya

doi:10.1093/bib/bbab428

Abstract

Identifying disease-related microRNAs (miRNAs) assists the understanding of disease pathogenesis. Existing research methods integrate multiple kinds of data related to miRNAs and diseases to infer candidate disease-related miRNAs. The attributes of miRNA nodes including their family and cluster belonging information, however, have not been deeply integrated. Besides, the learning of neighbor topology representation of a pair of miRNA and disease is a challenging issue. We present a disease-related miRNA prediction method by encoding and integrating multiple representations of miRNA and disease nodes learnt from the generative and adversarial perspective. We firstly construct a bilayer heterogeneous network of miRNA and disease nodes, and it contains multiple types of connections among these nodes, which reflect neighbor topology of miRNA–disease pairs, and the attributes of miRNA nodes, especially miRNA-related families and clusters. To learn enhanced pairwise neighbor topology, we propose a generative and adversarial model with a convolutional autoencoder-based generator to encode the low-dimensional topological representation of the miRNA–disease pair and multi-layer convolutional neural network-based discriminator to discriminate between the true and false neighbor topology embeddings. Besides, we design a novel feature category-level attention mechanism to learn the various importance of different features for final adaptive fusion and prediction. Comparison results with five miRNA–disease association methods demonstrated the superior performance of our model and technical contributions in terms of area under the receiver operating characteristic curve and area under the precision-recall curve. The results of recall rates confirmed that our model can find more actual miRNA–disease associations among top-ranked candidates. Case studies on three cancers further proved the ability to detect potential candidate miRNAs.

miRNA-disease association prediction, generative adversarial network, pairwise neighbour topology, a bilayer heterogeneous network with node attributes, feature category-level attention

1 Introduction

MicroRNAs (miRNAs) are a class of endogenous non-coding RNAs [1–3] with a length of approximately 22–24 nucleotides. Accumulating evidence has demonstrated a close relationship between miRNAs and the occurrence and development of various diseases [4, 5]. Therefore, identifying candidate disease-related miRNA may contribute to exploring the pathogenesis of a disease.

Reliable disease-related candidate miRNAs can be obtained by biological experiments; however, these methods are costly and time consuming. In recent years, an increasing number of researchers have proposed computerized prediction methods to screen potential miRNA–disease associations. Previous works can be divided into two main categories. The 1st category of miRNA–disease association is based on the biological premise that miRNAs with similar functions are usually associated with similar diseases [6]. Hence, diverse functional similarities of miRNAs are combined to infer candidate miRNAs associated with the specific diseases. For instance, the functional similarity of miRNA is calculated by two groups of associated diseases in [7] before constructing a similarity network of miRNA. Xuan |$et$||$al.$| [8] proposed a prediction method called Human Disease MiRNA Prediction (HDMP) based on weighted k-neighbor information of the most similar miRNA nodes. Random walk (RW) algorithms have been widely used to learn network topology for miRNA–disease associations presentation [9, 10]. However, these methods rely on specific diseases with known associated miRNAs associations and are not suitable for new diseases without known related miRNAs. To overcome the limitations of the aforementioned methods, the 2nd category of methods introduces additional disease-related information and constructs an miRNA–disease heterogeneous network. Chen |$et$||$al.$| [11] combined k-nearest neighbors and support vector machine to predict miRNA–disease associations. You |$et$||$al.$| [12] obtained potential miRNA candidates by calculating the scores of the paths between each miRNA and disease. Several methods perform RW on heterogeneous networks to obtain topology information and then infer candidate miRNAs [13–16]. Wang |$et$||$al.$| [17] used natural language-processing technology to extract the sequence feature of miRNA and then used a logical tree classifier to obtain the correlation score of miRNA–disease. The algorithm of non-negative matrix factorization [18–21] has also been explored to integrate multiple connections over miRNA–disease heterogeneous networks. Chen |$et$||$al.$| [22] further proposed the prediction model MDHGI based on matrix decomposition and heterogeneous graph inference. There are many types of connected edges in the miRNA–disease heterogeneous network, and these connected edges have nonlinear complex relationships. However, the aforementioned methods are all shallow prediction models which cannot discover deep relations between miRNA and disease nodes.

Recently, deep learning algorithms have made great progress. An increasing number of models can extract deep and complex features to improve the prediction performance. A couple of convolutional neural network (CNN)-based prediction models have been proposed to predict disease-related miRNAs [23, 24]. Ji |$et$||$al.$| [25] constructed a prediction model based on a deep autoencoder to estimate the scores of miRNAs and diseases. A model based on variational autoencoder is proposed to improve the performance of miRNA–disease association prediction [26]. Liu |$et$||$al.$| [27] applied a stacked autoencoder to learn the latent features of miRNA and disease. Recently, graph CNNs (GCNNs) have been applied to heterogeneous miRNA–disease network [28, 29]. Li |$et$||$al.$| [30] combined GCNN and matrix decomposition to predict potential miRNA candidates. Wang |$et$||$al.$| [31] constructed a model based on the combination of GCN autoencoder to infer the relationships between miRNAs and diseases. Chen |$et$||$al.$| [32] proposed a deep belief network prediction method based on the inference of associations between miRNAs and diseases. However, most of the above models ignore specific attributes of miRNA nodes, including family and cluster information to which miRNAs belong.

MiRBase [33] and RFam [34] databases contain abundant information on miRNA family. MiRNAs, which belong to the same family, usually have similar seed regions. These seed regions refer to the 2–8 bases at the 5|$^\prime $|ends of the miRNA mature body, which is the key region for miRNA to participate in the regulation of target genes. Hence, miRNAs belonging to identical families may interact with similar target genes and are more likely to participate in common disease processes [35]. Many miRNAs are located at relatively close locations in the human genome to form miRNA clusters, such as the chrXq27.3 cluster [36]. An miRNA cluster is usually transcribed and coordinated simultaneously [37], and they are more likely to be involved in the same disease process [38]. Therefore, the family and cluster information of miRNAs can be regarded as node attributes of miRNAs, which would be helpful to improve the accuracy of disease-related miRNAs prediction.

In this work, we propose a prediction method, Generative MiRNA Disease Association (GMDA), to fully integrate miRNA and disease-related data for disease-related candidate miRNAs screening. To integrate multiple node types and different connections between nodes, we first constructed a bilayer heterogeneous network with node attributes of miRNA–diseases. Then a generative adversarial network (GAN)-based prediction model is proposed to learn and encode enhanced pairwise neighbor topology, family and cluster belonging information. The output association score indicates the likelihood a pair of miRNA–disease is associated. The higher the score, the more likely they are to be associated. The unique contributions of our model are as follows:

|$\bullet $| Firstly, we construct a bilayer heterogeneous network with node attributes to facilitate the learning of the neighbor topology representations of miRNA–disease nodes. The network consists of multiple types of connections to embed the similarity and association between miRNAs and diseases, miRNAs-related family and cluster attributes. We also design an embedding mechanism to extract the pairwise neighbor topology from the network.
|$\bullet $| Secondly, we exploit the idea of generative and adversarial to learn enhanced representations of a pair of miRNA and disease nodes. The generator consists of an automatic encoder and decoder to generate false neighbor topology feature embedding of the node pair. The encoder based on a multi-layer convolution neural networks encoded a neighbor topology representation of the node pair. This was followed by reconstruction of the neighbor topology embedding of node pair based on a multi-layer transposed convolution decoder.
|$\bullet $| The discriminator is based on a multi-layer CNN to discriminate the false neighbor topology embedding and the original true topology embedding generated by the generator. This discriminant strategy benefits the generator to generate neighbor topology embedding as close to the true topology embedding as possible and obtain the final neighbor topology representation of miRNA–disease node pairs.
|$\bullet $| Since the neighbor topology and attribute features of miRNAs have different contributions to the prediction of miRNA–disease associations, we propose an attentional feature category-level module to discriminate the contributions of the two types of features for adaptive fusion. Comprehensive evaluations and comparisons on public dataset demonstrate the improved performance and contributions of our technical innovations.

2 Materials and methods

2.1 Dataset

In this study, the miRNA–disease association is extracted from a public database [39], which contains 7908 miRNA–disease associations, covering 793 miRNAs and 341 diseases that have been validated by experiments. A disease usually consists of several related terms. The terms information for the disease are extracted from the American Library of Medicine [40]. The semantic similarities among diseases are calculated using the terms information to construct directed acyclic graphs (DAGs) [7]. We extract miRNA family information from miRBase [33] and acquire cluster information by setting the distance between two miRNAs not exceeding 20 kb.

2.2 A bilayer heterogeneous network of miRNA–disease with node attributes

Since different connections reflect the miRNA–disease association from different perspectives, and the family and cluster attributes to which miRNAs belong are also important auxiliary information, we construct an miRNA and disease bilayer heterogeneous network with node attributes as shown in Figure 1. Let |$G={(V,E)}$| denote the bilayer miRNA-disease network node set |$V=\left \{V^{miRNA}\cup V^{disease} \right \}$| consists of a series of miRNA nodes |$V^{miRNA}$| and disease nodes |$V^{disease}$|⁠. The node pairs |${v_{i},v_{j}}\in V$| are connected through edge |$e_{ij}\in E $|⁠. The network includes a variety of connections between miRNAs and diseases, including disease–disease similarity, miRNA–miRNA similarity and miRNA and disease association. In addition, an miRNA node also contains its unique biological properties, i.e. the family and cluster information to which it belongs.

2.2.1 Construction of miRNA similarity network

In general, two miRNAs may be associated with similar diseases if they have similar functions. Hence, two groups of miRNA-associated diseases are used to calculate the similarity between miRNAs [7]. For instance, it is assumed that miRNA |${m_{1}}$| is associated with diseases |${d_{1}}$|⁠, |${d_{3}}$| and |${d_{6}}$|⁠, and miRNA |${m_{2}}$| is associated with diseases |${d_{3}}$|⁠, |${d_{5}}$|⁠, |${d_{6}}$| and |${d_{7}}$|⁠. The similarity of diseases set |$\left \{d_{1},d_{3},d_{6}\right \}$| and |$\left \{d_{3},d_{5},d_{6},d_{7}\right \}$| is taken as the similarity between |${m_{1}}$| and |${m_{2}}$| and denoted as |${M_{12}}$|⁠. Using the association between miRNA and disease, we use Wang’s method [7] to calculate the similarity of miRNAs. To construct the similarity network of miRNA, we calculate the similarity value between all miRNA node pairs and connect them if their similarity is greater than 0. Moreover, the similarity value is taken as the weight of the edges of the two nodes (Figure 2). The network is represented by a similarity matrix |${M=[M_{ij}]}\in{\mathbb{R}^{N_{m} \times{N_{m}}}}$|⁠, in which the similarity between miRNA |${m_{i}}$| and miRNA |${m_{j}}$| is represented by |${M_{ij}}$|⁠, and its value is between 0 and 1.

2.2.2 Construction of disease similarity network

The similarity among diseases needs to be calculated for the construction of a disease similarity network. We exploit Wang’s DAG method [7] to calculate the similarity between diseases. Briefly, a disease is represented by a DAG comprising all disease terms related to the disease. The more disease terms the DAGs of the two diseases contain, the more similar they are.

We connect all disease pairs whose similarity value is greater than 0 and regarded similarity as the value of the weighted edge, based on which the disease similarity network is constructed. It can be represented by matrix |$D=[D_{ij}]\in{\mathbb{R}^{N_{d} \times{N_{d}}}}$| where |$D_{ij}$| represents the similarity between disease |$d_i$| and disease |$d_j$|⁠, with a similarity value between 0 and 1.

2.2.3 Construction of miRNA–disease association network

Given the miRNA similarity network and disease similarity network, we further construct an miRNA–disease association network. If an miRNA is known to be associated with a disease, we connect the miRNA node in the miRNA similarity network to the disease node in the disease similarity network. Based on this, we construct an association matrix |$A=[A_{ij}] \in{\mathbb{R}^{N_{m} \times{N_{d}}}}$| between |${N_m}$| miRNAs and |${N_d}$| diseases. Each row represents the association between an miRNA and all diseases, and each column represents the type of disease. If |${A_{ij}=1}$|⁠, it denotes miRNA |${m_{i}}$| and the disease |${d_{j}}$| are considered to be associated with each other, and no association is observed between miRNA |${m_{i}}$| and disease |${d_{j}}$| when |${A_{ij}=0}$|⁠.

2.2.4 MiRNA node attributes

If miRNA |${m_{i}}$| and miRNA |${m_{j}}$| belong to more common families or clusters, they may be associated with the same disease [41]. Therefore, the family and cluster information of miRNAs play an important role in predicting miRNA–disease association. A matrix |$C \in{\mathbb{R}^{N_{m} \times{(N_{f} + N_{c})}}}$| is used to represent information of the miRNA family and cluster (Figure 2). |${C_{i}}$| is the |$i$|-th row of matrix |$C$| to indicate that the |$i$|-th miRNA belongs to |${N_{f}}$| families and |${N_{c}}$| clusters. |${C_{ij}=1}$| denotes the |$i$|-th miRNA belonging to a family (or cluster); otherwise, |${C_{ij}=0}$|⁠.

2.3 MiRNA–disease association prediction model

We propose a new prediction method for GAN-based prediction method for the inference of potential disease-related candidate miRNAs (Figure 1). For a pair of miRNAs and diseases, miRNAs in the same family or cluster or with higher functional similarity are more likely to be associated with similar diseases [11]. Therefore, we integrate miRNA similarity, disease similarity, miRNA–disease association, miRNA family and cluster information to construct an miRNA–disease pair association prediction model.

Figure 1

Framework of the proposed GMDA model. (A) Construction of bilayer heterogenous network with node attributes. (B) Generator based on CNN integrates the neighbor topology embedding of miRNA and disease nodes and produces neighbor topology representation. (C) Discriminator based on multi-layer CNN discriminates true topology embedding and the false one. (D) Integration of topology representation and miRNA node attributes to form enhanced feature matrix. (E) Final fusion and association score prediction.

Open in new tab Download slide

Figure 2

Bilayer heterogeneous network with node attributes composed of miRNA–miRNA similarity, miRNA–disease association and disease–disease similarity.

Open in new tab Download slide

As shown in Figure 1, the model is composed of a generator, discriminator and convolution module. The adversarial learning between the generator and discriminator produces a neighbor topology representation of miRNA |${m_{i}}$| and disease |${d_{j}}$|⁠, and the convolution neural network module with the attention mechanism integrates the neighbor topology representation of |${m_{i}}$| and |${d_{j}}$|⁠, the node attribute representation of |${m_{i}}$| and the association prediction score of |${m_{i}}-{d_{j}}$| that is obtained through the fully connected layer and softmax layer.

2.3.1 GANs of miRNA–disease node pairs

The framework based on a GAN is mainly composed of a generator and discriminator. Given a pair of miRNAs and diseases, the main goal of the generator is to generate a false sample pair corresponding to the node pair as much as possible. The discriminator tries to distinguish the false sample pair from the true sample pair. The better trained the discriminator, the more helpful it is for the generator to generate more robust false sample pairs, which is iterative. In this iterative process, the performance of the generator and discriminator is significantly improved. The final discriminator can accurately distinguish true or false sample pairs, and the code part of the generator can produce a more robust low-dimensional neighbor topology representation of miRNA–disease pairs.

Construction of neighbor topology embedding matrix. The process of constructing the embedding matrix of the miRNA |${m_{1}}$| and disease |${d_{2}}$| node pairs is shown in Figure 3A. |${M_{1}}$| is the 1st row of the miRNA similarity matrix |$M$|⁠, which represents the information of |${m_{1}}$| with all miRNA connected edges, and the weight of the edge is the similarity between |${m_{1}}$| and each miRNA. As shown in Figure 3A, |${m_{1}}$| has connected edges with |${m_{2}}$|⁠, |${m_{3}}$| and |${m_{4}}$|⁠, indicating that they have similar functions. The 1st row of the miRNA–disease association matrix |${A_{1}}$| represents the connections among |${m_{1}}$| and all diseases. A connection indicates that there is an association, and no connection denotes that it does not exist or that there is no association. We record the front and rear splices as the neighbor topology feature vector as |${x_{m_{1}}}$|⁠. Similarly, |${A_{2}} ^{\mathrm{T}}$| and |${D_{2}}$| record the association-based and similar connection edges of |${d_{2}}$| with each miRNAs and diseases. The |${d_{2}}$| neighbor topology feature vector is formed by their front and rear splicing. Finally, |${x_{m_{1}}}$| and |${x_{d_{2}}}$| are stacked up and down to form the neighbor topology feature matrix |$X \in{\mathbb{R}^{2 \times{(N_{m} + N_{d})}}}$| of the node |${m_{1}}-{d_{2}}$| pairs. This representation is regarded as a true sample.

$An illustration of the proposed construction of topology embedding and formation enhancement matrix between a pair of miRNA and disease. (A) Topology embedding matrix $X$ of ${m_{1}}-{d_{2}}$ by embedding mechanism. (B) An attention mechanism at the feature category level combines the neighbor topology representation, family and cluster attributes of $m_{1}$ to establish pairwise enhanced feature matrix.$

Figure 3

An illustration of the proposed construction of topology embedding and formation enhancement matrix between a pair of miRNA and disease. (A) Topology embedding matrix |$X$| of |${m_{1}}-{d_{2}}$| by embedding mechanism. (B) An attention mechanism at the feature category level combines the neighbor topology representation, family and cluster attributes of |$m_{1}$| to establish pairwise enhanced feature matrix.

Open in new tab Download slide

Generator based on convolutional autoencoder. The generator generates the reconstructed neighbor topology matrix |$\hat{X}$| of the miRNA–disease node pair, which is regarded as a false sample. The main goal of the generator is to make the generated |$\hat{X}$| as close to |$X$| as possible. Given a node pair |${m_{i}}-{d_{j}}$|⁠, the neighbor topology is embedded as |$X$|⁠; the generator |$G(,;{\theta }_G)$| encodes it as shown in Figure 1.

Encoder. The learning framework of the generator encoder consists of two convolutional and max-pooling layers. To learn marginal information, zero padding is performed and input as the 1st hidden layer in the encoder. The output |$X_{en}$| of the |$l$|-th hidden layer of the encoder is obtained. Each hidden layer is represented as follows:

$$\begin{align}& X_{en}^{(1)} = max\left(f\left(W_{en}^{(1)} \ast{X}+b_{en}^{(1)}\right) \right) \end{align}$$

(1)

$$\begin{align}& X_{en}^{(l)} = max\left(f\left(W_{en}^{(l)}\ast X_{en}^{(l-1)}+b_{en}^{(l)}\right) \right),l=2,...,N_{en} \end{align}$$

(2)

where |$^{\prime}\ast ^{\prime}$| represents the convolution operation. |$W_{en}^{(l)}$| and |$b_{en}^{(l)}$| are the weight matrix and bias vector for the |$l$|-layer convolution operation, respectively. |$N_{en}$| represents the number of encoder layers. |$f(.)$| represents the nonlinear activation function |$Relu$| [42], and |$max(.)$| represents the max-pooling operation, which is downsampling in each feature graph after convolution to preserve more important features.

Decoder. The framework based on a bilayer transposed CNN is used to reconstruct the neighbor topology feature matrix |$\hat{X}$| of |${m_{i}}-{d_{j}}$|⁠. To obtain a better |$\hat{X}$|⁠, we project it back to the original feature space to calculate the error between |$\hat{X}$| and the original neighbor topology feature matrix |$X$|⁠. The reconstructed feature map for each hidden layer is expressed as follows:

$$\begin{align}& {\hat X}_{de}^{(1)} = \sigma\left(W_{de}^{(1)} \cdot{X_{en}^{(N_{en})}}+b_{de}^{(1)}\right) \end{align}$$

(3)

$$\begin{align}& {\hat X}_{de}^{(l)} = \sigma\left(W_{de}^{(l)}\cdot{\hat X}_{de}^{(l-1)}+b_{de}^{(l)}\right),l=2,...,N_{de} \end{align}$$

(4)

where |$^{\prime}\cdot ^{\prime}$| represents the transpose convolution operation, |$\sigma (.)$| is a nonlinear activation function |$LeakyRelu$| and |$N_{de}$| is the number of decoder layers. The weight matrix |$W_{de}^{(l)}$| and bias vector |$b_{de}^{(l)}$| of the |$l$|-th hidden layer in the decoder section are obtained.

Generator loss. When training the generator, we hope to generate the reconstructed neighbor topology feature matrix |$\hat{X}$|⁠, which is as close to the original |$X$| as possible, to deceive the discriminator. Such a discriminator should give a higher score. The loss of generator, |$\ell _G$|⁠, is calculated as follows:

$$\begin{align}& \ell_G=\mathbb{E}_{X\sim{P_{data}},\hat{X}\sim{G({m_{i}},{d_{j}};{\theta}_G)}}-log D(\hat{X};\theta_G)+\lambda_G\parallel{\theta_G}{\parallel^{2}_{2}} \end{align}$$

(5)

where |$P_{data}$| represents the probability distribution of miRNA and disease nodes in the miRNA–disease association network. |$\mathbb{E}_{X\sim{P_{data}}}$| is used to select the sample pair |${m_{i}}-{d_{j}}$| from the distribution and construct its neighbor topology feature matrix |$X$|⁠. Similarly, |$\mathbb{E}_{\hat{X}\sim{G({m_{i}},{d_{j}};{\theta }_G)}}$| indicate that the neighbor topology feature matrix |$\hat{X}$| reconstructed by |${m_{i}}-{d_{j}}$| is generated by the generator. We use the Adam optimization algorithm to minimize |$\ell _G$| and then update all parameter sets of the generator |${\theta }_G$|⁠. To avoid generator overfitting, the |$l{_2}$| norm constraint is added to the parameter set |${\theta }_G$|⁠. |$\lambda _G$| is a parameter that balances the generator training loss and the regularization term.

Discriminator based on multi-layer convolution neural networks. Given any node pair |${m_{i}}-{d_{j}}$|⁠, its neighbor topology feature matrix |$X$| is regarded as a true sample, where |$\hat{X}$| generated by an autoencoder based on a CNN is regarded as a false sample. The discriminator attempts to distinguish between the true sample |$X$| and the false sample |$\hat{X}$|⁠, thereby promoting the autoencoder to obtain a better neighbor topology representation of |$m_i$| and |$d_j$|⁠. The discriminator essentially evaluates the possibility that |$X$| and |$\hat{X}$| are true or false samples. Specifically, for the true sample |$X$|⁠, the discriminator should give a high score, and for the false sample |$\hat{X}$| generated by the generator, the discriminator outputs a lower score. Let |$H_{d is}^{(l)}$| be the output of the hidden layer after the |$l$|-th layer convolution and pooling in the discriminator.

$$\begin{align}& {H}_{dis}^{(1)} = \sigma\left(W_{dis}^{\left(1\right)} \ast{X}+b_{dis}^{\left(1\right)}\right) \end{align}$$

(6)

$$\begin{align}& {H}_{dis}^{(l)} = \sigma\left(W_{dis}^{(l)} \ast{H_{dis}^{(l-1)}}+b_{dis}^{(l)}\right),l=2,...,N_{dis} \end{align}$$

(7)

where |$W_{dis}^{(l)}$| and |$b_{dis}^{(l)}$| are the weight matrix and bias vector of |$l$|-th the layer, respectively. |$N_{dis}$| is the number of hidden layers in the discriminator. |$H_{dis}^{(l)}$| is activated by the activation function |$LeakyReLu$| and then fed into the fully connected layer to obtain the score distribution of the true sample and the false sample, |$D(X;\theta _D)$|⁠.

$$\begin{align}& D\left(X;\theta_D\right)=softmax{\left(W_D{H}_{dis}^{\left(N_{dis}\right)}+b_D\right)} \end{align}$$

(8)

where |$\theta _D$| denotes set of parameters of the discriminator model. |$W_D$| and |$b_D$| are the weight matrix and bias vector of the fully connected layer of the discriminator, respectively.

For the discriminator to distinguish the possibility of a pair of true and false samples, there are two possible inputs as follows:

Case 1: neighbor topology feature matrix |$X$| in miRNA–disease association network. The node pair |${m_{i}}-{d_{j}}$| is selected from miRNA–disease association network, and the neighbor topology feature matrix is defined as |$X$|⁠. We anticipate that the discriminator outputs a high score after the input |$X$|⁠, defining the loss as |$\ell _{D}^{(1)}$|⁠.

$$\begin{align}& \ell_{D}^{(1)}=\mathbb{E}_{X\sim{P_{data}}}-log D(X;\theta_D) \end{align}$$

(9)

Case 2: Reconstructed neighbor topology feature matrix generated by the generator |$\hat{X}$|⁠. The neighbor topology matrix |$X$| of the |${m_{i}}-{d_{j}}$| node pair is fed into the generator, and the reconstructed neighbor topology matrix |$\hat{X}$| is obtained. |$\hat{X}$| mimics the |$X$| neighbor topology information, and thus |$\hat{X}$| is very close to |$X$|⁠. When the discriminator inputs |$\hat{X}$|⁠, we aim to obtain a lower score; therefore, we define the loss of this part as |$\ell _{D}^{(2)}$|⁠.

$$\begin{align}& \ell_{D}^{(2)}=\mathbb{E}_{X\sim{P_{data}},\hat{X}\sim{G(m_{i},d_{j};\theta_G)}}-log \left(1-D\left(\hat{X};\theta_D\right)\right) \end{align}$$

(10)

The neighbor topology matrix |$\hat{X}$| of the false sample is generated by the generator from the encoder through convolution, and its dimensions are consistent with those of |$X$|⁠. The optimization objective function of the discriminator is |$\ell _D$|⁠.

$$\begin{equation} \min\limits_{\theta}\ell_D\left(\theta\right)=\ell_{D}^{\left(1\right)}+\ell_{D}^{\left(2\right)}+\lambda_D\parallel{\theta_D}{\parallel^{2}_{2}} \end{equation}$$

(11)

The 3rd item is to reduce the optimization term established by the overfitting of the model, and |$\lambda _D$| is a trade-off parameter to avoid the overfitting of the discriminator. We use the Adam optimization algorithm to optimize |$\ell _D$| to update the parameter set |$\theta _D$| of the discriminator.

At the end of the optimization process, |${\left (X_{en}^{\left (N_{en}\right )}\right )}_1$| and |${\left (X_{en}^{\left (N_{en}\right )}\right )}_2$| are the 1st and 2nd rows of the matrix, respectively, and they are represented as neighbor topologies of the nodes |$m_i$| and |$d_j$|⁠.

Table 1

Open in new tab

Results of ablation experiments on our method

GAN	FC-attention	Average AUC	Average AUPR
✗	✓	0.882	0.197
✓	✗	0.917	0.237
✓	✓	0.928	0.250

The bold values indicate highest AUC and AUPR.

Table 1

Open in new tab

Results of ablation experiments on our method

GAN	FC-attention	Average AUC	Average AUPR
✗	✓	0.882	0.197
✓	✗	0.917	0.237
✓	✓	0.928	0.250

The bold values indicate highest AUC and AUPR.

2.3.2 Attention mechanism at the feature category level

The more consistent the families and clusters to which miRNAs |$m_i$| and |$d_j$| belong, the more likely they are to be associated with similar diseases. Therefore, it is necessary to integrate the neighbor topology representation of |$m_i$| and its related family and cluster information. The neighbor topology feature and their family and cluster features of |$m_i$| have different contributions to the association prediction of miRNA–disease. Therefore, we construct an attentional feature category level module (Figure 3B). |${\left (X_{en}^{\left (N_{en}\right )}\right )}_1$| is considered the 1st type of feature and is denoted as |$y_1$|⁠. |$C_i$| describes the families and clusters to which |$m_i$| belongs, which is regarded as the 2nd type of feature vector and is denoted as |$y_2$|⁠. The attention score of the |$i$|-th category feature vector is |$S_i$|⁠.

$$\begin{align}& S_i=h_{att}tanh\left(W_{att}y_i+b_{att} \right) \end{align}$$

(12)

$$\begin{align}& \alpha_i=\frac{exp\left(S_i\right)}{\sum\nolimits_{j=2} exp\left(S_j\right)} \end{align}$$

(13)

where |$W_{att}$| is the weight matrix, |$b_{att}$| is the bias vector, |$h_{att}$| is the weight vector, |$exp\left (.\right )$| is an exponential function and |$\alpha _i$| is a normalized representation of |$S_i$|⁠. Finally, we obtain the two types of |$m_i$| weighted summation vectors |$z_1$|⁠:

$$\begin{align}& z_1=\sum\limits_{i}\alpha_iy_i \end{align}$$

(14)

Table 2

Open in new tab

GMDA and other methods for the area under the ROC curve of all the diseases and 15 well-characterized diseases

Disease name	AUC
	GMDA	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
Average AUC on 341 diseases	0.928	0.891	0.857	0.890	0.807	0.905	0.907	0.916
Breast neoplasms	0.991	0.920	0.906	0.974	0.837	0.983	0.982	0.990
Hepatocellular carcinoma	0.985	0.929	0.910	0.931	0.791	0.967	0.974	0.973
Glioma	0.957	0.914	0.882	0.835	0.786	0.928	0.940	0.944
Acute myeloid leukmia	0.948	0.910	0.885	0.963	0.796	0.937	0.968	0.920
Lung neoplasma	0.985	0.906	0.862	0.944	0.813	0.947	0.955	0.960
Melanoma	0.975	0.893	0.849	0.910	0.758	0.954	0.962	0.969
Osteosarcoma	0.967	0.897	0.860	0.985	0.771	0.968	0.961	0.987
Ovarian neoplasms	0.972	0.918	0.888	0.967	0.844	0.955	0.968	0.970
Pancreatic neoplasms	0.966	0.902	0.879	0.821	0.833	0.904	0.898	0.963
Alzheimer disease	0.925	0.875	0.833	0.922	0.816	0.897	0.901	0.924
Carcinoma, renal cell	0.946	0.900	0.856	0.791	0.786	0.935	0.799	0.956
Diabetes mellitus, type 2	0.963	0.905	0.870	0.936	0.870	0.898	0.951	0.957
Glioblastoma	0.945	0.889	0.849	0.894	0.759	0.912	0.930	0.941
Heart failure	0.938	0.909	0.884	0.962	0.814	0.899	0.943	0.945
Atherosclerosis	0.929	0.911	0.891	0.955	0.824	0.961	0.959	0.924

Disease name	AUC
	GMDA	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
Average AUC on 341 diseases	0.928	0.891	0.857	0.890	0.807	0.905	0.907	0.916
Breast neoplasms	0.991	0.920	0.906	0.974	0.837	0.983	0.982	0.990
Hepatocellular carcinoma	0.985	0.929	0.910	0.931	0.791	0.967	0.974	0.973
Glioma	0.957	0.914	0.882	0.835	0.786	0.928	0.940	0.944
Acute myeloid leukmia	0.948	0.910	0.885	0.963	0.796	0.937	0.968	0.920
Lung neoplasma	0.985	0.906	0.862	0.944	0.813	0.947	0.955	0.960
Melanoma	0.975	0.893	0.849	0.910	0.758	0.954	0.962	0.969
Osteosarcoma	0.967	0.897	0.860	0.985	0.771	0.968	0.961	0.987
Ovarian neoplasms	0.972	0.918	0.888	0.967	0.844	0.955	0.968	0.970
Pancreatic neoplasms	0.966	0.902	0.879	0.821	0.833	0.904	0.898	0.963
Alzheimer disease	0.925	0.875	0.833	0.922	0.816	0.897	0.901	0.924
Carcinoma, renal cell	0.946	0.900	0.856	0.791	0.786	0.935	0.799	0.956
Diabetes mellitus, type 2	0.963	0.905	0.870	0.936	0.870	0.898	0.951	0.957
Glioblastoma	0.945	0.889	0.849	0.894	0.759	0.912	0.930	0.941
Heart failure	0.938	0.909	0.884	0.962	0.814	0.899	0.943	0.945
Atherosclerosis	0.929	0.911	0.891	0.955	0.824	0.961	0.959	0.924

The bold values indicate the highest AUCs.

Table 2

Open in new tab

GMDA and other methods for the area under the ROC curve of all the diseases and 15 well-characterized diseases

Disease name	AUC
	GMDA	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
Average AUC on 341 diseases	0.928	0.891	0.857	0.890	0.807	0.905	0.907	0.916
Breast neoplasms	0.991	0.920	0.906	0.974	0.837	0.983	0.982	0.990
Hepatocellular carcinoma	0.985	0.929	0.910	0.931	0.791	0.967	0.974	0.973
Glioma	0.957	0.914	0.882	0.835	0.786	0.928	0.940	0.944
Acute myeloid leukmia	0.948	0.910	0.885	0.963	0.796	0.937	0.968	0.920
Lung neoplasma	0.985	0.906	0.862	0.944	0.813	0.947	0.955	0.960
Melanoma	0.975	0.893	0.849	0.910	0.758	0.954	0.962	0.969
Osteosarcoma	0.967	0.897	0.860	0.985	0.771	0.968	0.961	0.987
Ovarian neoplasms	0.972	0.918	0.888	0.967	0.844	0.955	0.968	0.970
Pancreatic neoplasms	0.966	0.902	0.879	0.821	0.833	0.904	0.898	0.963
Alzheimer disease	0.925	0.875	0.833	0.922	0.816	0.897	0.901	0.924
Carcinoma, renal cell	0.946	0.900	0.856	0.791	0.786	0.935	0.799	0.956
Diabetes mellitus, type 2	0.963	0.905	0.870	0.936	0.870	0.898	0.951	0.957
Glioblastoma	0.945	0.889	0.849	0.894	0.759	0.912	0.930	0.941
Heart failure	0.938	0.909	0.884	0.962	0.814	0.899	0.943	0.945
Atherosclerosis	0.929	0.911	0.891	0.955	0.824	0.961	0.959	0.924

Disease name	AUC
	GMDA	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
Average AUC on 341 diseases	0.928	0.891	0.857	0.890	0.807	0.905	0.907	0.916
Breast neoplasms	0.991	0.920	0.906	0.974	0.837	0.983	0.982	0.990
Hepatocellular carcinoma	0.985	0.929	0.910	0.931	0.791	0.967	0.974	0.973
Glioma	0.957	0.914	0.882	0.835	0.786	0.928	0.940	0.944
Acute myeloid leukmia	0.948	0.910	0.885	0.963	0.796	0.937	0.968	0.920
Lung neoplasma	0.985	0.906	0.862	0.944	0.813	0.947	0.955	0.960
Melanoma	0.975	0.893	0.849	0.910	0.758	0.954	0.962	0.969
Osteosarcoma	0.967	0.897	0.860	0.985	0.771	0.968	0.961	0.987
Ovarian neoplasms	0.972	0.918	0.888	0.967	0.844	0.955	0.968	0.970
Pancreatic neoplasms	0.966	0.902	0.879	0.821	0.833	0.904	0.898	0.963
Alzheimer disease	0.925	0.875	0.833	0.922	0.816	0.897	0.901	0.924
Carcinoma, renal cell	0.946	0.900	0.856	0.791	0.786	0.935	0.799	0.956
Diabetes mellitus, type 2	0.963	0.905	0.870	0.936	0.870	0.898	0.951	0.957
Glioblastoma	0.945	0.889	0.849	0.894	0.759	0.912	0.930	0.941
Heart failure	0.938	0.909	0.884	0.962	0.814	0.899	0.943	0.945
Atherosclerosis	0.929	0.911	0.891	0.955	0.824	0.961	0.959	0.924

The bold values indicate the highest AUCs.

Table 3

Open in new tab

GMDA and other methods for the area under the PR curve of all the diseases and 15 well-characterized diseases

Disease name	AUPR
	GMDA	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
Average AUPR on 341 diseases	0.250	0.099	0.090	0.086	0.040	0.166	0.187	0.219
Breast neoplasms	0.915	0.725	0.718	0.800	0.389	0.812	0.821	0.903
Hepatocellular carcinoma	0.886	0.750	0.767	0.715	0.482	0.831	0.845	0.877
Glioma	0.439	0.436	0.390	0.175	0.225	0.312	0.210	0.427
Acute myeloid leukmia	0.332	0.41	0.385	0.466	0.123	0.358	0.369	0.478
Lung neoplasma	0.792	0.596	0.562	0.620	0.370	0.685	0.741	0.768
Melanoma	0.602	0.524	0.483	0.366	0.205	0.493	0.512	0.581
Osteosarcoma	0.499	0.374	0.357	0.620	0.180	0.486	0.604	0.621
Ovarian neoplasms	0.669	0.556	0.528	0.366	0.395	0.480	0.486	0.637
Pancreatic neoplasms	0.583	0.485	0.458	0.569	0.333	0.824	0.531	0.519
Alzheimer disease	0.235	0.143	0.136	0.351	0.086	0.218	0.359	0.361
Carcinoma, renal cell	0.341	0.354	0.314	0.206	0.136	0.254	0.293	0.322
Diabetes mellitus, type 2	0.375	0.302	0.259	0.398	0.132	0.399	0.401	0.384
Glioblastoma	0.380	0.364	0.346	0.284	0.162	0.293	0.318	0.372
Heart failure	0.397	0.348	0.301	0.393	0.135	0.262	0.289	0.341
Atherosclerosis	0.276	0.298	0.306	0.309	0.084	0.289	0.310	0.287

Disease name	AUPR
	GMDA	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
Average AUPR on 341 diseases	0.250	0.099	0.090	0.086	0.040	0.166	0.187	0.219
Breast neoplasms	0.915	0.725	0.718	0.800	0.389	0.812	0.821	0.903
Hepatocellular carcinoma	0.886	0.750	0.767	0.715	0.482	0.831	0.845	0.877
Glioma	0.439	0.436	0.390	0.175	0.225	0.312	0.210	0.427
Acute myeloid leukmia	0.332	0.41	0.385	0.466	0.123	0.358	0.369	0.478
Lung neoplasma	0.792	0.596	0.562	0.620	0.370	0.685	0.741	0.768
Melanoma	0.602	0.524	0.483	0.366	0.205	0.493	0.512	0.581
Osteosarcoma	0.499	0.374	0.357	0.620	0.180	0.486	0.604	0.621
Ovarian neoplasms	0.669	0.556	0.528	0.366	0.395	0.480	0.486	0.637
Pancreatic neoplasms	0.583	0.485	0.458	0.569	0.333	0.824	0.531	0.519
Alzheimer disease	0.235	0.143	0.136	0.351	0.086	0.218	0.359	0.361
Carcinoma, renal cell	0.341	0.354	0.314	0.206	0.136	0.254	0.293	0.322
Diabetes mellitus, type 2	0.375	0.302	0.259	0.398	0.132	0.399	0.401	0.384
Glioblastoma	0.380	0.364	0.346	0.284	0.162	0.293	0.318	0.372
Heart failure	0.397	0.348	0.301	0.393	0.135	0.262	0.289	0.341
Atherosclerosis	0.276	0.298	0.306	0.309	0.084	0.289	0.310	0.287

The bold values indicate the highest AUPRs.

Table 3

Open in new tab

GMDA and other methods for the area under the PR curve of all the diseases and 15 well-characterized diseases

Disease name	AUPR
	GMDA	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
Average AUPR on 341 diseases	0.250	0.099	0.090	0.086	0.040	0.166	0.187	0.219
Breast neoplasms	0.915	0.725	0.718	0.800	0.389	0.812	0.821	0.903
Hepatocellular carcinoma	0.886	0.750	0.767	0.715	0.482	0.831	0.845	0.877
Glioma	0.439	0.436	0.390	0.175	0.225	0.312	0.210	0.427
Acute myeloid leukmia	0.332	0.41	0.385	0.466	0.123	0.358	0.369	0.478
Lung neoplasma	0.792	0.596	0.562	0.620	0.370	0.685	0.741	0.768
Melanoma	0.602	0.524	0.483	0.366	0.205	0.493	0.512	0.581
Osteosarcoma	0.499	0.374	0.357	0.620	0.180	0.486	0.604	0.621
Ovarian neoplasms	0.669	0.556	0.528	0.366	0.395	0.480	0.486	0.637
Pancreatic neoplasms	0.583	0.485	0.458	0.569	0.333	0.824	0.531	0.519
Alzheimer disease	0.235	0.143	0.136	0.351	0.086	0.218	0.359	0.361
Carcinoma, renal cell	0.341	0.354	0.314	0.206	0.136	0.254	0.293	0.322
Diabetes mellitus, type 2	0.375	0.302	0.259	0.398	0.132	0.399	0.401	0.384
Glioblastoma	0.380	0.364	0.346	0.284	0.162	0.293	0.318	0.372
Heart failure	0.397	0.348	0.301	0.393	0.135	0.262	0.289	0.341
Atherosclerosis	0.276	0.298	0.306	0.309	0.084	0.289	0.310	0.287

Disease name	AUPR
	GMDA	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
Average AUPR on 341 diseases	0.250	0.099	0.090	0.086	0.040	0.166	0.187	0.219
Breast neoplasms	0.915	0.725	0.718	0.800	0.389	0.812	0.821	0.903
Hepatocellular carcinoma	0.886	0.750	0.767	0.715	0.482	0.831	0.845	0.877
Glioma	0.439	0.436	0.390	0.175	0.225	0.312	0.210	0.427
Acute myeloid leukmia	0.332	0.41	0.385	0.466	0.123	0.358	0.369	0.478
Lung neoplasma	0.792	0.596	0.562	0.620	0.370	0.685	0.741	0.768
Melanoma	0.602	0.524	0.483	0.366	0.205	0.493	0.512	0.581
Osteosarcoma	0.499	0.374	0.357	0.620	0.180	0.486	0.604	0.621
Ovarian neoplasms	0.669	0.556	0.528	0.366	0.395	0.480	0.486	0.637
Pancreatic neoplasms	0.583	0.485	0.458	0.569	0.333	0.824	0.531	0.519
Alzheimer disease	0.235	0.143	0.136	0.351	0.086	0.218	0.359	0.361
Carcinoma, renal cell	0.341	0.354	0.314	0.206	0.136	0.254	0.293	0.322
Diabetes mellitus, type 2	0.375	0.302	0.259	0.398	0.132	0.399	0.401	0.384
Glioblastoma	0.380	0.364	0.346	0.284	0.162	0.293	0.318	0.372
Heart failure	0.397	0.348	0.301	0.393	0.135	0.262	0.289	0.341
Atherosclerosis	0.276	0.298	0.306	0.309	0.084	0.289	0.310	0.287

The bold values indicate the highest AUPRs.

2.3.3 Predictive score evaluation based on convolution neural network

The neighbor topology representation of |${\left (X_{en}^{\left (N_{en}\right )}\right )}_2$| is |$d_j$|⁠, which is named |$z_2$|⁠. |$z_1$| and |$z_2$| are stacked up and down to form a enhanced low-dimensional feature matrix |$Z$| of |${m_{i}}-{d_{j}}$|⁠. To learn the marginal information of |$Z$|⁠, zero padding is performed and then input into the convolution neural network module. For the convolution layer, if the length of the filter is |$n_l$|⁠, the width is |$n_w$|⁠, and the number is |$n_{conv}$|⁠, then the filter |$W \in{\mathbb{R}^{{n_w} \times{n_l} \times{n_{conv}}}}$| can be applied to the low-dimensional feature. After |$Z$| passes through two convolution and pooling layers, the feature map |$P \in{\mathbb{R}^{2 \times \left ({N_m}+{N_d}-{n_l}+1\right ) \times{n_{conv}}}}$| can be obtained. The flattened vector |$h_0$| is input into the fully connected and |$softmax$| layer to obtain the final binary evaluation result |$score$|⁠:

$$\begin{align}& score=softmax\left(W_{sco}h_0+b_{sco}\right) \end{align}$$

(15)

where |$W_{sco}$| and |$b_{sco}$| represent the weight matrix and bias vector of the fully connected layer, respectively. |$score_1$| represents the predicted score of |$m_i$| and |$d_j$|⁠, and |$score_2$| represents the predicted score of their unrelated relationships.

3 Experimental results and discussions

3.1 Parameter setting

In GMDA, the window sizes of |$3 \times 17$| and |$1 \times 2$| were applied to the convolution layer and the max-pooling layer, respectively. The transpose convolution used a window size of |$3 \times 18$|⁠. The number of filters encoded in the generator was 30 and 60 in the 1st and 2nd layers, respectively. In contrast, the number of filters in the decoding part was 60 and 30 in the 1st and 2nd layers, respectively. During the process of association score evaluation, the sizes of filters within two convolutional layers were |$2\times 2$|⁠. The numbers of filters were 40 and 20 in the 1st and 2nd layers, respectively. In the pooling layers, the window sizes of max pooling were |$2\times 2$|⁠. We analyzed the time complexity of the GMDA. The corresponding analysis and the training time and testing time were listed in Supplementary Table 4. In addition, we evaluated the sensitivity coefficients [43] for GMDA’s parameters by changing one parameter and fixing the remaining ones. The number of the encoding layers |$L_{en}$| and of the decoding layers |$L_{de}$| was selected from {1, 2, 3}. For area under the receiver operating characteristic (ROC) curve (AUC), the sensitivity coefficient values of |$L_{en}$| and |$L_{de}$| were 0.04863 and 0.01790, respectively. In terms of area under the precision–recall (AUPR), the corresponding sensitivity coefficient values were 0.09905 and 0.07145. It indicates GMDA was not sensitive to the variations of |$L_{en}$| and |$L_{de}$| for the evaluation metric AUC, but they are a little sensitive to those of |$L_{en}$| and |$L_{de}$| for another metric AUPR. The experimental results were demonstrated in Supplementary Table 5. We used the Pytorch library to train and optimize the neural network parameters and used a GPU card (NVIDIA GeForce GTX 1080Ti) to speed up the training process.

3.2 Performance evaluation metrics

Five-fold cross-validation was performed to evaluate the performance of GMDA and the other models. All known miRNA–disease association data were partitioned into positive samples and randomly divided into five subsets. The same amount of unknown related data was randomly selected and divided into five negative subsets. In each fold, four positive and four negative subsets were used for training, and the remaining subsets were used for testing. The prediction scores of all test samples were arranged in descending order, such that the higher the ranking of positive samples, the better the performance of our model.

Figure 4

GMDA and other methods for the area under ROC and PR curves of all diseases.

Open in new tab Download slide

Table 4

Open in new tab

AUC paired Wilcoxon test results of eight methods

\|$p$\|-value between GMDA and other methods	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
\|$p$\|-value of AUC	4.2496e-41	9.8477e-32	7.3596e-37	6.5331e-20	6.3647e-22	5.6742e-18	3.1421e-12
\|$p$\|-value of AUPR	1.2063e-13	1.988e-22	6.5496e-8	1.4698e-15	3.0124e-16	2.1854e-10	1.7412e-11

\|$p$\|-value between GMDA and other methods	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
\|$p$\|-value of AUC	4.2496e-41	9.8477e-32	7.3596e-37	6.5331e-20	6.3647e-22	5.6742e-18	3.1421e-12
\|$p$\|-value of AUPR	1.2063e-13	1.988e-22	6.5496e-8	1.4698e-15	3.0124e-16	2.1854e-10	1.7412e-11

Table 4

Open in new tab

AUC paired Wilcoxon test results of eight methods

\|$p$\|-value between GMDA and other methods	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
\|$p$\|-value of AUC	4.2496e-41	9.8477e-32	7.3596e-37	6.5331e-20	6.3647e-22	5.6742e-18	3.1421e-12
\|$p$\|-value of AUPR	1.2063e-13	1.988e-22	6.5496e-8	1.4698e-15	3.0124e-16	2.1854e-10	1.7412e-11

\|$p$\|-value between GMDA and other methods	Liu’s method	PBMDA	DMPred	GSTRW	NCMCMDA	DBNMDA	AEMDA
\|$p$\|-value of AUC	4.2496e-41	9.8477e-32	7.3596e-37	6.5331e-20	6.3647e-22	5.6742e-18	3.1421e-12
\|$p$\|-value of AUPR	1.2063e-13	1.988e-22	6.5496e-8	1.4698e-15	3.0124e-16	2.1854e-10	1.7412e-11

Given a threshold |$\lambda $|⁠, if the predicted correlation score was greater than |$\lambda $|⁠, we considered the sample as a positive example, otherwise as a negative sample. By gradually changing the size of |$\lambda $|⁠, the corresponding true positive rate (TPR) and false positive rate (FPR) were calculated as follows:

$$\begin{align}& TPR=\frac{TP}{TP+FN}, FPR=\frac{FP}{TN+FP} \end{align}$$

(16)

where |$TP$| and |$TN$| are the numbers of correctly identified positive samples and negative samples, respectively. |$FN$| and |$FP$| were the number of misidentified negative and positive samples, respectively. The ROC curve was drawn according to these values, and the AUC was used as the criterion for evaluating the performance.

In our dataset, the proportion of positive samples to negative samples of miRNA–disease was approximately |$1/33$|⁠, indicating an imbalanced distributions between them. The AUPR was more informative than AUC [44]. |$Precision$| is the percentage of correctly determined positive samples among all correctly predicted positive samples, and |$Recall$| is similar to TPR. These are defined as follows:

$$\begin{align}& Precision=\frac{TP}{TP+FP}, Recall=\frac{TP}{TP+FN} \end{align}$$

(17)

The AUPR was also used to evaluate prediction performance.

Moreover, biologists often select the top candidate miRNAs in the predicted results for further biological experiments. Therefore, it is more attractive for biologists to have more positive samples in the order of the top |$k$| prediction results. We calculated the miRNA–disease pair recall rate of the top |$k$| candidates, i.e. the ratio of positive samples to all positive samples as another criterion to evaluate performance.

3.3 Ablation experiments

We performed ablation experiments to validate the contributions of major innovations including GAN and feature category-level attention (FC-attention). As shown in Table 1, GMDA that is the baseline with GAN and FC-attention achieved the best performance. Without GAN, AUC and AUPR dropped down by 4.6% and 5.3% when compared with baseline. The primary reason is GAN enhanced the feature representations of miRNA–disease node pairs which may improve the prediction performance. AUC and AUPR of the model without FC-attention mechanism decreased by 1.1% and 1.3%. It indicates that it is essential to deeply integrate the attributes of miRNA nodes.

Figure 5

Recall rates of all diseases under different top |$k$|⁠.

Open in new tab Download slide

3.4 Comparison with other methods

GMDA was compared with state-of-the-art models for predicting disease-related miRNA candidates, including Liu’s method [13], PBMDA [12], DMPred [18], GSTRW [15], NCMCMDA [21], DBNMDA [32] and AEMDA [25]. To obtain the best performance, we set the super parameters to reach the value of their optimal performance described in the original paper. Our method and the compared methods are trained by using the same training dataset and testing dataset.

As shown in Figure 4A, GMDA achieved the highest average AUC (AUC=0.928+/-0.0041) among all 341 diseases tested. It exceeded Liu’s method by 3.7%, PBMDA by 7.1%, DMPred by 3.8%, GSTRW by 12.1%, NCMCMDA by 2.3%, DBNMDA by 2.1% and AEMDA by 1.2%. In addition, for the 15 well-characterized diseases, we listed the performances of these six methods. Because all these diseases have more than 80 associated miRNAs, the prediction results of these diseases could better reflect the performance of the model. The highest value of 10 diseases was obtained by GMDA among the AUCs of 15 diseases (Table 2).

As shown in Figure 4B, the value of GMDA under the average PR curve of 341 diseases was higher than that of the other models (AUPR = 0.250+/-0.0036). Its average performance exceeded that of Liu’s method, PBMDA, DMPred, GSTRW, NCMCMDA, DBNMDA and AEMDA by 15.1%, 16%, 16.4%, 21%, 8.4%, 6.3% and 3.1%, respectively. Among the 15 well-characterized diseases, GMDA achieved the highest value of 10 (Table 3).

Both the AUC and AUPR of GMDA achieved the best performance because our prediction model, based on GANs, could deeply integrate the neighbor topology of miRNA and disease node pairs, the attribute information of the miRNA family and cluster. AEMDA achieved the 2nd-best performance (AUC = 0.916). It adopted a fully connected autoencoder to infer the potential associations between miRNAs and diseases. DBNMDA achieved the 3rd-best performance (AUC = 0.907). The prediction algorithm of this method was based on a deep belief network and was subordinate to a class of deep learning algorithms. This indicated deep learning-based algorithms can deeply integrate the complex relationships between miRNA and disease-related associations. Hence, it is helpful to construct a deep learning-based algorithm. Both the AUC and AUPR of NCMCMDA were higher than those of Liu’s method, PBMDA, DMPred and GSTRW. The AUPR of Liu’s method was slightly higher than that of PBMDA, but its AUC was 3.4% higher than that of PBMDA. The AUPR of DMPred was slightly lower than that of PBMDA, but its AUC is higher than PBMDA. The AUC and AUPR of GSTRW were not as high as those of other methods. The main reason was that it only uses the similarity of diseases and miRNA and does not consider the important correlation information between them. This confirmed that it is necessary to make full use of the multiple connections between miRNAs and diseases.

To further verify whether GMDA’s performance was significantly better than that of the others, we conducted a paired Wilcoxon test on 341 diseases. As shown in Table 4, the |$p$|-value values of GMDA in terms of both AUC and AUPR were less than 0.05. This statistical result showed that its performance was significantly better than that of the other methods.

In addition, the higher the recall rate of the top |$k$| miRNAs and disease-related candidates, the more disease-related miRNAs could be correctly identified. The average recall rates of the eight methods in the top |$k$| candidates among the 341 diseases are shown in Figure 5. Under different threshold |$k$|⁠, the recall rates of GMDA were better than those of the other methods. It ranked 20.3% of the positive samples in the top 10, 45% in the top 30, 74.2% in the top 60 and 90.6% in the top 90. AEMDA achieved the decent recall rates, which ranked 20.5%, 44.9%, 74.2% and 90.4% in the top 10, 30, 60 and 90. The recall rates of DMPred were very close to those of DBNMDA. The former ranked 18.9%, 44.7%, 67.4% and 79.1% in the top 10, 30, 60 and 90. The latter was slightly higher than the former, and it ranked 19.9%, 44.8%, 69.1% and 80.2%. NCMCMDA ranked 18.4%, 45.2%, 68.7% and 79.8%. The recall rates of Liu’s method were higher than those of PBMDA. The former ranked 17.5%, 40.2%, 59.5%, 70.5% and the latter ranked 15.8%, 38.8%, 58.1%, 68.3% in the top 10, 30, 60, 90, respectively. The recall rates of GSTRW were still not as high as those of the other methods, and its corresponding recall rates were 10.1%, 21.8%, 37.9% and 48.4%.

3.5 Case studies: lung neoplasms, breast neoplasms and pancreatic neoplasms

To further confirm that GMDA can detect high-quality potential disease-related candidate miRNAs, we performed case studies on lung tumors, breast tumors and pancreatic tumors. We used lung tumor as an example, and the results of the top 50 candidate miRNAs are given in Table 5. In addition, we listed the top 50 candidates for breast and pancreatic tumors in Supplementary Tables 1 and 2, respectively.

Table 5

Open in new tab

The top 50 miRNA candidates related to lung neoplasms

Rank	MiRNA name	Evidence	Rank	MiRNA name	Evidence
1	hsa-mir-614	dbDEMC,miRCancer	26	hsa-mir-608	dbDEMC
2	hsa-mir-610	dbDEMC,miRCancer	27	hsa-mir-199a	dbDEMC,PhenomiR
3	hsa-mir-203b	dbDEMC	28	hsa-mir-22	dbDEMC,PhenomiR,miRCancer
4	hsa-mir-216b	dbDEMC,miRCancer	29	hsa-mir-376a	dbDEMC,PhenomiR,miRCancer
5	hsa-mir-378d	dbDEMC	30	hsa-mir-708	dbDEMC,miRCancer
6	hsa-mir-137	dbDEMC,PhenomiR,miRCancer	31	hsa-mir-410	dbDEMC,miRCancer
7	hsa-mir-574	dbDEMC,PhenomiR	32	hsa-mir-223	dbDEMC,PhenomiR,miRCancer
8	hsa-mir-517a	dbDEMC	33	hsa-mir-219	dbDEMC,PhenomiR,miRCancer
9	hsa-mir-190b	dbDEMC	34	hsa-mir-148a	dbDEMC,PhenomiR,miRCancer
10	hsa-mir-23b	dbDEMC,PhenomiR,miRCancer	35	hsa-mir-203	dbDEMC,PhenomiR,miRCancer
11	hsa-mir-32	dbDEMC,PhenomiR,miRCancer	36	hsa-mir-361	dbDEMC,PhenomiR,miRCancer
12	hsa-mir-187	dbDEMC,PhenomiR,miRCancer	37	hsa-mir-19	Literature [48]
13	hsa-mir-10b	dbDEMC,PhenomiR,miRCancer	38	hsa-mir-374a	dbDEMC,PhenomiR,miRCancer
14	hsa-mir-29a	dbDEMC,PhenomiR,miRCancer	39	hsa-mir-106b	dbDEMC,PhenomiR,miRCancer
15	hsa-mir-500	dbDEMC	40	hsa-mir-302a	dbDEMC,PhenomiR,miRCancer
16	hsa-mir-15b	dbDEMC,PhenomiR,miRCancer	41	hsa-mir-30d	dbDEMC,PhenomiR,miRCancer
17	hsa-mir-663	dbDEMC,miRCancer	42	hsa-mir-15a	dbDEMC,PhenomiR,miRCancer
18	hsa-mir-93	dbDEMC,PhenomiR,miRCancer	43	hsa-mir-208a	dbDEMC,PhenomiR,miRCancer
19	hsa-mir-27b	dbDEMC,PhenomiR,miRCancer	44	hsa-mir-30b	dbDEMC,PhenomiR,miRCancer
20	hsa-mir-96	dbDEMC,PhenomiR,miRCancer	45	hsa-mir-222	dbDEMC,PhenomiR,miRCancer
21	hsa-mir-33b	dbDEMC,miRCancer	46	hsa-mir-302c	dbDEMC,PhenomiR
22	hsa-mir-429	dbDEMC,miRCancer	47	hsa-mir-326	dbDEMC,PhenomiR,miRCancer
23	hsa-mir-140	dbDEMC,PhenomiR,miRCancer	48	hsa-mir-381	dbDEMC,PhenomiR,miRCancer
24	hsa-mir-127	dbDEMC,PhenomiR,miRCancer	49	hsa-mir-20b	dbDEMC,PhenomiR
25	hsa-mir-720	dbDEMC	50	hsa-mir-141	dbDEMC,PhenomiR,miRCancer

Rank	MiRNA name	Evidence	Rank	MiRNA name	Evidence
1	hsa-mir-614	dbDEMC,miRCancer	26	hsa-mir-608	dbDEMC
2	hsa-mir-610	dbDEMC,miRCancer	27	hsa-mir-199a	dbDEMC,PhenomiR
3	hsa-mir-203b	dbDEMC	28	hsa-mir-22	dbDEMC,PhenomiR,miRCancer
4	hsa-mir-216b	dbDEMC,miRCancer	29	hsa-mir-376a	dbDEMC,PhenomiR,miRCancer
5	hsa-mir-378d	dbDEMC	30	hsa-mir-708	dbDEMC,miRCancer
6	hsa-mir-137	dbDEMC,PhenomiR,miRCancer	31	hsa-mir-410	dbDEMC,miRCancer
7	hsa-mir-574	dbDEMC,PhenomiR	32	hsa-mir-223	dbDEMC,PhenomiR,miRCancer
8	hsa-mir-517a	dbDEMC	33	hsa-mir-219	dbDEMC,PhenomiR,miRCancer
9	hsa-mir-190b	dbDEMC	34	hsa-mir-148a	dbDEMC,PhenomiR,miRCancer
10	hsa-mir-23b	dbDEMC,PhenomiR,miRCancer	35	hsa-mir-203	dbDEMC,PhenomiR,miRCancer
11	hsa-mir-32	dbDEMC,PhenomiR,miRCancer	36	hsa-mir-361	dbDEMC,PhenomiR,miRCancer
12	hsa-mir-187	dbDEMC,PhenomiR,miRCancer	37	hsa-mir-19	Literature [48]
13	hsa-mir-10b	dbDEMC,PhenomiR,miRCancer	38	hsa-mir-374a	dbDEMC,PhenomiR,miRCancer
14	hsa-mir-29a	dbDEMC,PhenomiR,miRCancer	39	hsa-mir-106b	dbDEMC,PhenomiR,miRCancer
15	hsa-mir-500	dbDEMC	40	hsa-mir-302a	dbDEMC,PhenomiR,miRCancer
16	hsa-mir-15b	dbDEMC,PhenomiR,miRCancer	41	hsa-mir-30d	dbDEMC,PhenomiR,miRCancer
17	hsa-mir-663	dbDEMC,miRCancer	42	hsa-mir-15a	dbDEMC,PhenomiR,miRCancer
18	hsa-mir-93	dbDEMC,PhenomiR,miRCancer	43	hsa-mir-208a	dbDEMC,PhenomiR,miRCancer
19	hsa-mir-27b	dbDEMC,PhenomiR,miRCancer	44	hsa-mir-30b	dbDEMC,PhenomiR,miRCancer
20	hsa-mir-96	dbDEMC,PhenomiR,miRCancer	45	hsa-mir-222	dbDEMC,PhenomiR,miRCancer
21	hsa-mir-33b	dbDEMC,miRCancer	46	hsa-mir-302c	dbDEMC,PhenomiR
22	hsa-mir-429	dbDEMC,miRCancer	47	hsa-mir-326	dbDEMC,PhenomiR,miRCancer
23	hsa-mir-140	dbDEMC,PhenomiR,miRCancer	48	hsa-mir-381	dbDEMC,PhenomiR,miRCancer
24	hsa-mir-127	dbDEMC,PhenomiR,miRCancer	49	hsa-mir-20b	dbDEMC,PhenomiR
25	hsa-mir-720	dbDEMC	50	hsa-mir-141	dbDEMC,PhenomiR,miRCancer

Table 5

Open in new tab

The top 50 miRNA candidates related to lung neoplasms

Rank	MiRNA name	Evidence	Rank	MiRNA name	Evidence
1	hsa-mir-614	dbDEMC,miRCancer	26	hsa-mir-608	dbDEMC
2	hsa-mir-610	dbDEMC,miRCancer	27	hsa-mir-199a	dbDEMC,PhenomiR
3	hsa-mir-203b	dbDEMC	28	hsa-mir-22	dbDEMC,PhenomiR,miRCancer
4	hsa-mir-216b	dbDEMC,miRCancer	29	hsa-mir-376a	dbDEMC,PhenomiR,miRCancer
5	hsa-mir-378d	dbDEMC	30	hsa-mir-708	dbDEMC,miRCancer
6	hsa-mir-137	dbDEMC,PhenomiR,miRCancer	31	hsa-mir-410	dbDEMC,miRCancer
7	hsa-mir-574	dbDEMC,PhenomiR	32	hsa-mir-223	dbDEMC,PhenomiR,miRCancer
8	hsa-mir-517a	dbDEMC	33	hsa-mir-219	dbDEMC,PhenomiR,miRCancer
9	hsa-mir-190b	dbDEMC	34	hsa-mir-148a	dbDEMC,PhenomiR,miRCancer
10	hsa-mir-23b	dbDEMC,PhenomiR,miRCancer	35	hsa-mir-203	dbDEMC,PhenomiR,miRCancer
11	hsa-mir-32	dbDEMC,PhenomiR,miRCancer	36	hsa-mir-361	dbDEMC,PhenomiR,miRCancer
12	hsa-mir-187	dbDEMC,PhenomiR,miRCancer	37	hsa-mir-19	Literature [48]
13	hsa-mir-10b	dbDEMC,PhenomiR,miRCancer	38	hsa-mir-374a	dbDEMC,PhenomiR,miRCancer
14	hsa-mir-29a	dbDEMC,PhenomiR,miRCancer	39	hsa-mir-106b	dbDEMC,PhenomiR,miRCancer
15	hsa-mir-500	dbDEMC	40	hsa-mir-302a	dbDEMC,PhenomiR,miRCancer
16	hsa-mir-15b	dbDEMC,PhenomiR,miRCancer	41	hsa-mir-30d	dbDEMC,PhenomiR,miRCancer
17	hsa-mir-663	dbDEMC,miRCancer	42	hsa-mir-15a	dbDEMC,PhenomiR,miRCancer
18	hsa-mir-93	dbDEMC,PhenomiR,miRCancer	43	hsa-mir-208a	dbDEMC,PhenomiR,miRCancer
19	hsa-mir-27b	dbDEMC,PhenomiR,miRCancer	44	hsa-mir-30b	dbDEMC,PhenomiR,miRCancer
20	hsa-mir-96	dbDEMC,PhenomiR,miRCancer	45	hsa-mir-222	dbDEMC,PhenomiR,miRCancer
21	hsa-mir-33b	dbDEMC,miRCancer	46	hsa-mir-302c	dbDEMC,PhenomiR
22	hsa-mir-429	dbDEMC,miRCancer	47	hsa-mir-326	dbDEMC,PhenomiR,miRCancer
23	hsa-mir-140	dbDEMC,PhenomiR,miRCancer	48	hsa-mir-381	dbDEMC,PhenomiR,miRCancer
24	hsa-mir-127	dbDEMC,PhenomiR,miRCancer	49	hsa-mir-20b	dbDEMC,PhenomiR
25	hsa-mir-720	dbDEMC	50	hsa-mir-141	dbDEMC,PhenomiR,miRCancer

Rank	MiRNA name	Evidence	Rank	MiRNA name	Evidence
1	hsa-mir-614	dbDEMC,miRCancer	26	hsa-mir-608	dbDEMC
2	hsa-mir-610	dbDEMC,miRCancer	27	hsa-mir-199a	dbDEMC,PhenomiR
3	hsa-mir-203b	dbDEMC	28	hsa-mir-22	dbDEMC,PhenomiR,miRCancer
4	hsa-mir-216b	dbDEMC,miRCancer	29	hsa-mir-376a	dbDEMC,PhenomiR,miRCancer
5	hsa-mir-378d	dbDEMC	30	hsa-mir-708	dbDEMC,miRCancer
6	hsa-mir-137	dbDEMC,PhenomiR,miRCancer	31	hsa-mir-410	dbDEMC,miRCancer
7	hsa-mir-574	dbDEMC,PhenomiR	32	hsa-mir-223	dbDEMC,PhenomiR,miRCancer
8	hsa-mir-517a	dbDEMC	33	hsa-mir-219	dbDEMC,PhenomiR,miRCancer
9	hsa-mir-190b	dbDEMC	34	hsa-mir-148a	dbDEMC,PhenomiR,miRCancer
10	hsa-mir-23b	dbDEMC,PhenomiR,miRCancer	35	hsa-mir-203	dbDEMC,PhenomiR,miRCancer
11	hsa-mir-32	dbDEMC,PhenomiR,miRCancer	36	hsa-mir-361	dbDEMC,PhenomiR,miRCancer
12	hsa-mir-187	dbDEMC,PhenomiR,miRCancer	37	hsa-mir-19	Literature [48]
13	hsa-mir-10b	dbDEMC,PhenomiR,miRCancer	38	hsa-mir-374a	dbDEMC,PhenomiR,miRCancer
14	hsa-mir-29a	dbDEMC,PhenomiR,miRCancer	39	hsa-mir-106b	dbDEMC,PhenomiR,miRCancer
15	hsa-mir-500	dbDEMC	40	hsa-mir-302a	dbDEMC,PhenomiR,miRCancer
16	hsa-mir-15b	dbDEMC,PhenomiR,miRCancer	41	hsa-mir-30d	dbDEMC,PhenomiR,miRCancer
17	hsa-mir-663	dbDEMC,miRCancer	42	hsa-mir-15a	dbDEMC,PhenomiR,miRCancer
18	hsa-mir-93	dbDEMC,PhenomiR,miRCancer	43	hsa-mir-208a	dbDEMC,PhenomiR,miRCancer
19	hsa-mir-27b	dbDEMC,PhenomiR,miRCancer	44	hsa-mir-30b	dbDEMC,PhenomiR,miRCancer
20	hsa-mir-96	dbDEMC,PhenomiR,miRCancer	45	hsa-mir-222	dbDEMC,PhenomiR,miRCancer
21	hsa-mir-33b	dbDEMC,miRCancer	46	hsa-mir-302c	dbDEMC,PhenomiR
22	hsa-mir-429	dbDEMC,miRCancer	47	hsa-mir-326	dbDEMC,PhenomiR,miRCancer
23	hsa-mir-140	dbDEMC,PhenomiR,miRCancer	48	hsa-mir-381	dbDEMC,PhenomiR,miRCancer
24	hsa-mir-127	dbDEMC,PhenomiR,miRCancer	49	hsa-mir-20b	dbDEMC,PhenomiR
25	hsa-mir-720	dbDEMC	50	hsa-mir-141	dbDEMC,PhenomiR,miRCancer

First, text mining techniques described by Xie |$et$||$al.$| [45] to extract experimentally validated associations between miRNAs and cancers were used. The associations were further verified manually and included in the miRCancer database, which contained 9080 miRNA–disease associations, covering 57984 miRNAs and 196 cancers. As shown in Table 5, miRCancer included 38 candidates, indicating that these miRNAs were associated with regulatory disorders in lung tumors and that these candidates were associated with the disease. Second, dbDEMC [46] is an integrated miRNA database designed to show the differential expression of miRNAs in human cancers. The database covers 2224 miRNAs and 36 cancer types. Similarly, the PhenomiR [47] database provides information about differentially regulated miRNA expression in diseases and other biological processes. dbDEMC includes 49 candidate miRNAs and PhenomiR includes 34 candidates, which indicated that their expression is upregulated or downregulated in lung cancer tissues. One candidate miRNA labeled ‘literature’ was supported by the published literature. Compared with normal tissues, the expression of these three miRNAs in lung tumors was confirmed to be dysregulative [48].

Among the 50 candidate miRNAs for breast tumors (Supplementary Table 1), 33 candidates were recorded by miRCancer, which indicated that they were indeed associated with diseases. PhenomiR and dbDEMC included 25 and 49 candidate miRNAs, respectively, indicating that there was a significant difference in their expression between normal tissues and breast tumors. Supplementary Table 2 records the top 50 candidate miRNAs, for pancreatic tumors, of which miRCancer contained 24 candidates, and 45 and 30 candidate miRNAs were recorded by dbDEMC and PhenomiR, respectively. One candidate supported in the literature was abnormally expressed in pancreatic tumor tissues.

3.6 Prediction of novel miRNAs related to diseases

After the prediction model is trained by using all the miRNA–disease associations, we applied it to predict candidate miRNAs for each disease. We randomly selected the unobserved miRNA–disease association pairs (negative samples) whose size is equal to that of known miRNA–disease associations (positive samples) to train the model. The top-ranked 50 candidate miRNAs are listed in the Supplementary Table 3, which may help the biologists identify actual miRNA–disease associations by wet laboratory experiments.

4 Conclusions

We proposed a GMDA model which learns from a bilayer heterogeneous network to predict potential miRNA–disease associations. This model captured the similarities between miRNA and disease nodes and, more importantly, the intra-relations in terms of known miRNAs and diseases associations, family and cluster information related to miRNA nodes. The generator based on convolutional autoencoder and the discriminator based on multi-layer CNN learned and integrated neighbor topology representation and miRNA node attribute representation. The novel feature category level attention mechanism assigned different weights to the two types of miRNA features for adaptively fusion. Comparison results with state-of-the-art methods demonstrated the improved performance for miRNA–disease association prediction. Case studies on three cancers further confirmed our model’s ability in identifying potential candidate disease-related miRNAs. GMDA can be used as a prioritization tool to generate reliable candidates for facilitating biological wet laboratory experiments to identify real associations between miRNAs and diseases.

Keypoints

|$\bullet $| A bilayer heterogeneous network with node attributes is constructed, which benefits the extraction and representation of miRNAs and diseases relations, as well as the family and cluster attributes of miRNA nodes.
|$\bullet $| Novel pairwise neighbor topology representation to reveal the common neighbors of miRNA and disease; a module based on generative and adversarial network to enhance the pairwise neighbor topology learning.
|$\bullet $| Newly proposed node attributes of an miRNA show the families and clusters to which the miRNA belongs. A newly proposed attention mechanism at feature category level to distinguish the importance of different features of miRNA nodes for weighted fusion.
|$\bullet $| Improved predictive performance confirmed by comprehensive evaluations, including comparison with five state-of-the-art models on public dataset, a paired Wilcoxon test, recall rates on 341 diseases and case studies of three cancers.

Funding

Natural Science Foundation of China (61972135, 62172143); Natural Science Foundation of Heilongjiang Province (LH2019F049 and LH2019A029); China Postdoctoral Science Foundation (2019M650069, 2020M670939); Heilongjiang Postdoctoral Scientific Research Staring Foundation (BHLQ18104); Fundamental Research Foundation of Universities in Heilongjiang Province for Technology Innovation (KJCX201805); Innovation Talents Project of Harbin Science and Technology Bureau (2017RAQXJ094); Fundamental Research Foundation of Universities in Heilongjiang Province for Youth Innovation Team (RCYJTD201805); Foundation of Graduate Innovative Research (YJSCX2021-199HLJU).

Ping Xuan, PhD (Harbin Institute of Technology), is a professor at the School of Computer Science and Technology, Heilongjiang University, Harbin, China. Her current research interests include computational biology, complex network analysis and medical image analysis.

Dong Wang is studying for his master’s degree in the School of Computer Science and Technology at Heilongjiang University, Harbin, China. His research interests include complex network analysis and deep learning.

Hui Cui, PhD (The University of Sydney), is a lecturer at Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia. Her research interests lie in data-driven and computerized models for biomedical and health informatics.

Tiangang Zhang, PhD (The University of Tokyo), is an associate professor of the School of Mathematical Science, Heilongjiang University, Harbin, China. His current research interests include complex network analysis and computational fluid dynamics.

Toshiya Nakaguchi, PhD (Sophia University), is a professor at the Center for Frontier Medical Engineering, Chiba University, Chiba, Japan. His current research interests include medical image processing, machine learning, image-guided surgery and biomedical measurement.

References

1.

Chen

L

,

Heikkinen

L

,

Wang

C

, et al.

Trends in the development of miRNA bioinformatics tools

.

Brief Bioinform

2019

;

20

(

5

):

1836

–

52

.

2.

Gebert LFR , MacRae IJ.

Regulation of microRNA function in animals

.

Nat Rev Mol Cell Biol

2019

;

20

:

21

–

37

.

Crossref

PubMed

WorldCat

3.

Van Meter

E

,

Onyango

J

,

Teske

K

.

A review of currently identified small molecule modulators of microRNA function

.

Eur J Med Chem

2020

;

188

:

1

–

24

.

Google Scholar

Crossref

WorldCat

4.

Chen

X

,

Xie

D

,

Zhao

Q

, et al.

MicroRNAs and complex diseases: from experimental results to computational models

.

Brief Bioinform

2019

;

20

(

2

):

515

–

39

.

5.

Matsuyama H, Suzuki HI.

Systems and synthetic microRNA biology: from biogenesis to disease pathogenesis

.

Int J Mol Sci

2019

;

21

(

1

):

1

–

23

.

Crossref

WorldCat

6.

Zhao

Y

,

Chen

X

,

Yin

J

.

Adaptive boosting-based computational model for predicting potential miRNA–disease associations

.

Bioinformatics

2019

;

35

(

22

):

4730

–

8

.

7.

Wang

D

,

Wang

J

,

Lu

M

, et al.

Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases

.

Bioinformatics

2010

;

26

(

13

):

1644

–

50

.

8.

Xuan

P

,

Han

K

,

Guo

M

, et al.

Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors

.

PLoS One

2013

;

8

(

8

):

1

–

15

.

Google Scholar

Crossref

WorldCat

9.

Chen

X

,

Liu

M

,

Yan

G

, et al.

RWRMDA: predicting novel human microRNA-disease associations

.

Mol Biosyst

2012

;

8

(

10

):

2792

–

8

.

10.

Xuan

P

,

Han

K

,

Guo

Y

, et al.

Prediction of potential disease-associated microRNAs based on random walk

.

Bioinformatics

2015

;

31

(

11

):

1805

–

15

.

11.

Chen X, Wu QF, Yan GY, et al.

RKNNMDA: ranking-based KNN for miRNA–disease association prediction

.

J RNA Biol

2017

;

14

(

7

):

1

–

11

.

OpenURL Placeholder Text

WorldCat

12.

You

Z

, Huang ZA, Zhu ZX, et al.

PBMDA: a novel and effective path-based computational model for miRNA–disease association prediction

.

PLoS Comput Biol

2017

;

13

(

3

):

1

–

22

.

Google Scholar

Crossref

WorldCat

13.

Liu YS, Zeng XX, He ZY, et al.

Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources

.

IEEE/ACM Trans Comput Biol Bioinform

2017

;

14

(

4

):

905

–

15

.

Crossref

PubMed

WorldCat

14.

Luo JW, Xiao Q.

A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network

.

J Biomed Inform

2017

;

66

:

194

–

203

.

Crossref

PubMed

WorldCat

15.

Chen

M

,

Liao

B

,

Li

Z

.

Global similarity method based on a two-tier random walk for the prediction of microRNA-disease association

.

Sci Rep

2018

;

8

(

1

):

1

–

16

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

16.

Li A, Deng YW, Tan Y, et al.

A novel miRNA–disease association prediction model using dual random walk with restart and space projection federated method

.

PLoS One

2021

;

16

(

6

):

1

–

17

.

OpenURL Placeholder Text

WorldCat

17.

Wang

L

,

You

Z

,

Chen

X

, et al.

LMTRDA: using logistic model tree to predict miRNA–disease associations by fusing multisource information of sequences and similarities

.

PLoS Comput Biol

2019

;

15

(

3

):

1

–

18

.

Google Scholar

OpenURL Placeholder Text

WorldCat

18.

Zhong

Y

,

Xuan

P

,

Wang

X

, et al.

A non-negative matrix factorization based method for predicting disease-associated miRNAs in miRNA–disease bilayer network

.

Bioinformatics

2018

;

34

(

2

):

267

–

77

.

19.

Zhao

Y

,

Chen

X

,

Yin

J

.

A novel computational method for the identification of potential miRNA–disease association based on symmetric non-negative matrix factorization and kronecker regularized least square

.

Front Genet

2018

;

9

:

1

–

12

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

20.

Xuan P, Shen TH, Wang X, et al.

Inferring disease-associated microRNAs in heterogeneous networks with node attributes

.

IEEE/ACM Trans Comput Biol Bioinform

2020

;

17

(

3

):

1019

–

31

.

Crossref

WorldCat

21.

Chen X, Sun LG, Zhao Y.

NCMCMDA: miRNA–disease association prediction through neighborhood constraint matrix completion

.

Brief Bioinform

2021

;

22

(

1

):

485

–

96

.

Crossref

PubMed

WorldCat

22.

Chen

X

,

Yin

J

,

Qu

J

, et al.

MDHGI: matrix decomposition and heterogeneous graph inference for miRNA–disease association prediction

.

PLoS Comput Biol

2018

;

14

(

8

):

1

–

24

.

Google Scholar

Crossref

WorldCat

23.

Xuan

P

,

Dong

Y

,

Guo

Y

, et al.

Dual convolutional neural network based method for predicting disease-related miRNAs

.

Int J Mol Sci

2018

;

19

(

12

):

1

–

15

.

Google Scholar

Crossref

WorldCat

24.

Peng JJ, Hui WW, Li Q, et al.

A learning-based framework for miRNA–disease association identification using neural networks

.

Bioinformatics

2019

;

35

(

21

):

4364

–

71

.

Crossref

PubMed

WorldCat

25.

Ji CM, Gao Z, Ma X, et al.

AEMDA: inferring miRNA–disease associations based on deep autoencoder

.

Bioinformatics

2021

;

37

(

1

):

66

–

27

.

Crossref

PubMed

WorldCat

26.

Ji

CM

,

Wang

YT

,

Gao

Z

, et al.

A semi-supervised learning method for miRNA–disease association prediction based on variational autoencoder

.

IEEE/ACM Trans Comput Biol Bioinform

2021

;

14

(

8

):

1

–

11

.

Google Scholar

OpenURL Placeholder Text

WorldCat

27.

Liu

DY

,

Huang

YB

,

Nie

WJ

, et al.

SMALF: miRNA–disease associations prediction based on stacked autoencoder and XGBoost

.

BMC Bioinformatics

2021

;

22

(

1

):

1

–

18

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

28.

Li

J

,

Li

Z

,

Nie

R

, et al.

FCGCNMDA: predicting miRNA–disease associations by applying fully connected graph convolutional networks

.

Mol Genet Genomics

2020

;

295

(

5

):

1197

–

209

.

29.

Zhu

RX

, Ji CJ, Wang YY, et al.

Heterogeneous graph convolutional networks and matrix completion for miRNA–disease association prediction

.

Front Bioeng Biotechnol

2020

;

8

:

1

–

9

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

30.

Li

J

,

Zhang

S

,

Liu

T

, et al.

Neural inductive matrix completion with graph convolutional networks for miRNA–disease association prediction

.

Bioinformatics

2020

;

36

(

8

):

2538

–

46

.

31.

Wang

JR

,

Li

J

,

Yue

K

, et al.

NMCMDA: neural multicategory miRNA–disease association prediction

.

Brief Bioinform

2021

;

1

–

11

.

Google Scholar

OpenURL Placeholder Text

WorldCat

32.

Chen X, Li TH, Zhao Y, et al.

Deep-belief network for predicting potential miRNA–disease associations

.

Brief Bioinform

2021

;

22

(

3

):

1

–

10

.

PubMed

OpenURL Placeholder Text

WorldCat

33.

Kozomara A, Birgaoanu M, Griffiths-Jones S, et al.

miRBase: from microRNA sequences to function

.

Nucleic Acids Res

2019

;

47

(

D1

):

D155

–

62

.

Crossref

PubMed

WorldCat

34.

Kalvari, I, Nawrocki, EP, Ontiveros-Palacios N, et al.

Rfam 14: expanded coverage of metagenomic, viral and microRNA families

.

Nucleic Acids Res

2021

;

49

(

D1

):

D192

–

200

.

Crossref

PubMed

WorldCat

35.

Gu

C

,

Liao

B

,

Li

X

, et al.

Network consistency projection for human miRNA–disease associations inference

.

Sci Rep

2016

;

6

(

1

):

1

–

10

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

36.

Yoshida

K

,

Yokoi

A

,

Sugiyama

M

, et al.

Expression of the chrXq27.3 miRNA cluster in recurrent ovarian clear cell carcinoma and its impact on cisplatin resistance

.

Oncogene

2021

;

40

(

7

):

1255

–

68

.

37.

Lim D, Lee S, Choi MS, et al.

The conserved microRNA miR-8-3p coordinates the expression of V-ATPase subunits to regulate ecdysone biosynthesis for Drosophila metamorphosis

.

FASEB J

2020

;

34

(

5

):

6449

–

65

.

Crossref

PubMed

WorldCat

38.

Stark VA, Facey COB, Viswanathan V, et al.

The role of miRNAs, miRNA clusters, and isomiRs in development of cancer stem cell populations in colorectal cancer

.

Int J Mol Sci

2021

;

22

(

3

):

1

–

17

.

OpenURL Placeholder Text

WorldCat

39.

Yang

L

,

Qiu

C

,

Tu

J

, et al.

HMDD v2.0: a database for experimentally supported human microRNA and disease associations

.

Nucleic Acids Res

2014

;

D1

:

D1070

–

4

.

Google Scholar

OpenURL Placeholder Text

WorldCat

40.

Kim

S

,

Yeganova

L

,

Wilbur

W

.

Meshable: searching pubmed abstracts by utilizing mesh and mesh-derived topical terms

.

Bioinformatics

2016

;

32

(

19

):

3044

–

6

.

41.

Eniafe

J

,

Jiang

S

.

MicroRNA-99 family in cancer and immunity

.

WIREs RNA

2021

;

12

(

3

):

1

–

22

.

Google Scholar

Crossref

WorldCat

42.

Nair V, Hinton GE.

Rectified linear units improve restricted boltzmann machines

. In:

International Conference on Machine Learning

. Haifa, Israel: Omnipress

2010

, pp.

807

–

14

.

OpenURL Placeholder Text

WorldCat

43.

van Riel

NAW

.

Dynamic modelling and analysis of biochemical networks: mechanism-based models and model-based experiments

.

Brief Bioinform

2006

;

7

(

4

):

364

–

74

.

44.

Saito

T

,

Rehmsmeier

M

.

The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets

.

PLoS One

2015

;

10

(

3

):

1

–

21

.

Google Scholar

Crossref

WorldCat

45.

Xie

B

,

Ding

Q

,

Han

H

, et al.

miRCancer: a microRNA-cancer association database constructed by text mining on literature

.

Bioinformatics

2013

;

29

:

638

–

44

.

46.

Yang

Z

,

Wu

L

,

Wang

A

, et al.

dbDEMC 2.0: updated database of differentially expressed miRNAs in human cancers

.

Nucleic Acids Res

2017

;

D1

:

D812

–

8

.

Google Scholar

OpenURL Placeholder Text

WorldCat

47.

Ruepp

A

,

Kowarsch

A

,

Schmidl

D

, et al.

A knowledgebase for microRNA expression in diseases and biological processes

.

Genome Biol

2010

;

11

:

1

–

11

.

Google Scholar

Crossref

WorldCat

48.

Lu S, Kong H, Hou Y, et al.

Two plasma microRNA panels for diagnosis and subtype discrimination of lung cancer

.

Lung Cancer

2018

;

123

:

44

–

51

.

Crossref

PubMed

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
October 2021	126
November 2021	71
December 2021	24
January 2022	34
February 2022	27
March 2022	50
April 2022	27
May 2022	22
June 2022	18
July 2022	28
August 2022	15
September 2022	37
October 2022	21
November 2022	32
December 2022	18
January 2023	8
February 2023	3
March 2023	34
April 2023	16
May 2023	11
June 2023	22
July 2023	16
August 2023	5
September 2023	4
October 2023	4
November 2023	18
December 2023	5
January 2024	53
February 2024	27
March 2024	29
April 2024	24
May 2024	26
June 2024	36
July 2024	32
August 2024	21
September 2024	20
October 2024	28
November 2024	38
December 2024	30
January 2025	21
February 2025	23
March 2025	54
April 2025	27
May 2025	7

Article Contents

Integration of pairwise neighbor topologies and miRNA family and cluster attributes for miRNA–disease association prediction

Abstract

1 Introduction

2 Materials and methods

2.1 Dataset

2.2 A bilayer heterogeneous network of miRNA–disease with node attributes

2.2.1 Construction of miRNA similarity network

2.2.2 Construction of disease similarity network

2.2.3 Construction of miRNA–disease association network

2.2.4 MiRNA node attributes

2.3 MiRNA–disease association prediction model

2.3.1 GANs of miRNA–disease node pairs

2.3.2 Attention mechanism at the feature category level

2.3.3 Predictive score evaluation based on convolution neural network

3 Experimental results and discussions

3.1 Parameter setting

3.2 Performance evaluation metrics

3.3 Ablation experiments

3.4 Comparison with other methods

3.5 Case studies: lung neoplasms, breast neoplasms and pancreatic neoplasms

3.6 Prediction of novel miRNAs related to diseases

4 Conclusions

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Integration of pairwise neighbor topologies and miRNA family and cluster attributes for miRNA–disease association prediction Free

Abstract

1 Introduction

2 Materials and methods

2.1 Dataset

2.2 A bilayer heterogeneous network of miRNA–disease with node attributes

2.2.1 Construction of miRNA similarity network

2.2.2 Construction of disease similarity network

2.2.3 Construction of miRNA–disease association network

2.2.4 MiRNA node attributes

2.3 MiRNA–disease association prediction model

2.3.1 GANs of miRNA–disease node pairs

2.3.2 Attention mechanism at the feature category level

2.3.3 Predictive score evaluation based on convolution neural network

3 Experimental results and discussions

3.1 Parameter setting

3.2 Performance evaluation metrics

3.3 Ablation experiments

3.4 Comparison with other methods

3.5 Case studies: lung neoplasms, breast neoplasms and pancreatic neoplasms

3.6 Prediction of novel miRNAs related to diseases

4 Conclusions

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Integration of pairwise neighbor topologies and miRNA family and cluster attributes for miRNA–disease association prediction