-
PDF
- Split View
-
Views
-
Cite
Cite
Chun-Chun Wang, Tian-Hao Li, Li Huang, Xing Chen, Prediction of potential miRNA–disease associations based on stacked autoencoder, Briefings in Bioinformatics, Volume 23, Issue 2, March 2022, bbac021, https://doi.org/10.1093/bib/bbac021
- Share Icon Share
Abstract
In recent years, increasing biological experiments and scientific studies have demonstrated that microRNA (miRNA) plays an important role in the development of human complex diseases. Therefore, discovering miRNA–disease associations can contribute to accurate diagnosis and effective treatment of diseases. Identifying miRNA–disease associations through computational methods based on biological data has been proven to be low-cost and high-efficiency. In this study, we proposed a computational model named Stacked Autoencoder for potential MiRNA–Disease Association prediction (SAEMDA). In SAEMDA, all the miRNA–disease samples were used to pretrain a Stacked Autoencoder (SAE) in an unsupervised manner. Then, the positive samples and the same number of selected negative samples were utilized to fine-tune SAE in a supervised manner after adding an output layer with softmax classifier to the SAE. SAEMDA can make full use of the feature information of all unlabeled miRNA–disease pairs. Therefore, SAEMDA is suitable for our dataset containing small labeled samples and large unlabeled samples. As a result, SAEMDA achieved AUCs of 0.9210 and 0.8343 in global and local leave-one-out cross validation. Besides, SAEMDA obtained an average AUC and standard deviation of 0.9102 ± /−0.0029 in 100 times of 5-fold cross validation. These results were better than those of previous models. Moreover, we carried out three case studies to further demonstrate the predictive accuracy of SAEMDA. As a result, 82% (breast neoplasms), 100% (lung neoplasms) and 90% (esophageal neoplasms) of the top 50 predicted miRNAs were verified by databases. Thus, SAEMDA could be a useful and reliable model to predict potential miRNA–disease associations.
Introduction
MicroRNAs (miRNAs) are a class of endogenous small noncoding RNAs with the length of about 20–24 nucleotides and they play important regulatory roles in cells [1]. Studies have indicated the involvement of miRNAs in a number of important life processes including cell growth [2], cell differentiation [3], cell proliferation [4] and cell death [5]. In addition, miRNAs have become a research hotspot in the field of biomedicine due to their vital roles in the occurrence and development of human diseases [6, 7]. A series of studies have confirmed that miRNAs are associated with various diseases [8–12]. For example, He et al. [13] demonstrated that abnormal expression of the mir-17-92 cluster can induce B cell lymphoma. In addition, miR-200c can inhibit the clonal expansion of breast cancer cells in vitro and suppress the tumor formation driven by human breast cancer stem cells in vivo [14]. Furthermore, the expression levels of miR-17-3p and miR-92 were significantly elevated in plasma of patients with colorectal cancer and the levels were significantly decreased after surgical removal of the primary tumors, indicating that these miRNAs can be used as biomarkers for the diagnosis of colorectal cancer [15]. Moreover, Hu et al. [16] found that the expression levels of miR-30d, miR-499, miR-486 and miR-1 from the serum were significantly related with overall survival of patients with non–small-cell lung cancer. In addition to miRNAs mentioned above, many other miRNAs are associated with the diagnosis, treatment and prognosis of human complex diseases [17–19]. Therefore, it is very important to identify associations between miRNAs and human complex diseases. The traditional biological experiment for identifying miRNA–disease associations was adopted in the early days, but the experimental period is long and the cost is high. Nowadays, with the increasing amount of available biological data, computational methods for miRNA–disease association prediction emerge as auxiliary tools for the traditional experiments. It can effectively shorten the time and cost of the traditional experiments by performing experimental verification on those highly probable associations predicted by computational models.
Over the past few years, based on the assumption that miRNAs with similar functions tend to be associated with similar diseases [20], researchers have developed a variety of miRNA–disease association prediction models, which can be divided into three categories [21]. The first type of prediction models is the score function–based models that used probability distributions or statistical analysis to establish score functions. For example, Chen et al. [22] developed a computational model called Within and Between Score for MiRNA–Disease Association prediction (WBSMDA). They defined two different types of functions to calculate Within-score and Between-score of miRNA–disease pair and integrated these two scores to obtain the final association score. Mørk et al. [23] further proposed a model called miRNA–Protein–Disease Association prediction (miRPD) to infer potential miRNA–disease associations. They defined miRNA–disease association scoring function based on miRNA–protein and protein–disease association scores. Here, protein was introduced as a mediator for the miRNA–disease inference.
The second type of prediction models is network algorithm–based models, which take advantage of miRNA and disease similarity from different perspectives. For example, Chen et al. [24] proposed a novel model named Random Walk with Restart for MiRNA–Disease Association (RWRMDA) by implementing random walk on the miRNA functional similarity network to prioritize candidate miRNAs for the disease of interest. In addition, Shi et al. [25] constructed a protein–protein interaction (PPI) network and further implemented random walk on the PPI network with the disease genes and miRNA targets as seed nodes to obtain two gene rank lists, respectively. Then, the miRNA–disease association can be identified through investigating the functional link between miRNA targets and disease genes based on the gene set enrichment analysis for the sets of miRNA targets and disease genes on above two gene lists, respectively. Later, a new computational model named MiRNAs associated with Diseases Prediction (MIDP) was developed by Xuan et al. [26]. MIDP adopts random walk algorithm in the miRNA similarity network to predict potential associated miRNAs for diseases, which have some known related miRNAs. For diseases without any known related miRNAs, they exploited miRNA similarity network, disease similarity network and known miRNA–disease associations to construct an miRNA–disease bilayer network. Then, they performed random walk on this bilayer network and thus the model could work for diseases without any known related miRNAs. Furthermore, Yu et al. [27] proposed a network information flow model to predict miRNA–disease associations. First, they established a MicroRNAome–phenome network by integrating miRNA–disease association network, disease semantic and phenotypic similarity network as well as miRNA functional similarity network. Then, for a given disease, the information flow leaving the candidate miRNA can be computed based on network information flow model and further used as the association score between the miRNA and the given disease. In addition, Chen et al. [28] presented a model called Heterogeneous Graph Inference for MiRNA-Disease Association prediction (HGIMDA). They defined association score of unlabeled miRNA–disease pair by summarizing all paths connecting investigated miRNA and disease with a length of three in miRNA–disease heterogeneous network. You et al. [29] developed the Path-Based MiRNA-Disease Association (PBMDA) predictive method. First, all paths connecting the investigated miRNA and disease with the length less than or equal to three were searched in miRNA–disease heterogeneous network. Then, the association score between investigated miRNA and disease can be computed based on the number of paths and the length of each path. Chen et al. [30] further proposed a model named Triple Layer Heterogeneous Network Based Inference for MiRNA-Disease Association prediction (TLHNMDA). This model built a triple layer heterogeneous network containing miRNA, disease and long noncoding RNA (lncRNA) nodes. Based on this triple layer network, an iterative equation was constructed to obtain the miRNA–disease correlation probability. Besides, Chen et al. [31] developed a model of Matrix Decomposition and Heterogeneous Graph Inference (MDHGI) for miRNA–disease association prediction. Firstly, the Sparse Learning Method (SLM) was used to reconstruct a new miRNA–disease association adjacency matrix. Then, a heterogeneous graph was built based on the reconstructed adjacency matrix, miRNA similarity matrix and disease similarity matrix. Lastly, an iterative equation was formulated to infer the correlation probability of miRNA–disease pairs.
The third type of prediction models is constructed based on machine learning algorithms. For example, Xuan et al. [32] proposed a model named Human-Disease-related MiRNA Prediction (HDMP) based on weighted k most similar neighbors. Firstly, they constructed the miRNA functional similarity matrix. Then, the association score between a given disease and a candidate miRNA can be obtained by summing subscores of the k neighbors of the candidate miRNA. Each neighbor’s subscore can be computed by inspecting two key metrics including the neighbor’s weight and the similarity between the candidate miRNA and its neighbor. In this study, the neighbor would be assigned higher weight if the neighbor and the candidate miRNA belong to the same family or cluster. Besides, Chen et al. [33] developed a model named Regularized Least Squares for MiRNA-Disease-Association (RLSMDA). They constructed the semisupervised classifier in the miRNA and disease space, respectively, under the framework of regularized least squares (RLS), and then combined the optimal classifiers in two different spaces to obtain the probability of miRNA–disease pair. Chen et al. [34] further proposed another model named Restricted Boltzmann Machine for Multiple types of MiRNA-Disease Association prediction (RBMMMDA). RBMMMDA was constructed based on the restricted Boltzmann machine, a two-layer undirected graphical model consisting of layers of visible units and hidden units, respectively. A visible unit represented a disease and a hidden unit stood for an unknown miRNA–disease pair. Innovatively, RBMMMDA could predict not only the probability of potential associations but also the types of associations. In addition, Pasquier and Gardes proposed a novel model named MiRAI by using distributional semantics to reveal information attached to miRNAs and diseases [35]. They firstly represented the distributional information on miRNAs and diseases in a high-dimensional vector space. Then, the association probability between miRNAs and diseases can be computed based on their vector similarity. Li et al. [36] used the singular value thresholding (SVT) algorithm to develop a model named Matrix Completion for MiRNA-Disease Association prediction (MCMDA). Matrix completion algorithm was adopted to update the miRNA–disease adjacency matrix to obtain the final miRNA–disease association matrix. Furthermore, Chen et al. [37] developed a model named Ranking-based K-Nearest Neighbors for MiRNA-Disease Association prediction (RKNNMDA). They utilized KNN algorithm to obtain the K nearest neighbors of investigated miRNA and employed support vector machine (SVM) to re-rank the K neighbors. Then, the association score between the investigated miRNA and candidate disease can be calculated by inspecting the association information between the K neighbors and the candidate disease. Similarly, the authors also computed association score from the perspective of disease. Finally, they integrated the association scores from two different perspectives to predict potential miRNA–disease associations. Moreover, Extreme Gradient Boosting Machine for MiRNA-Disease Association (EGBMMDA) was raised by Chen et al. [38]. EGBMMDA constructed three different types of features and connected them to generate composite feature vectors as input. The probability of potential miRNA–disease association was obtained by training a regression tree under the framework of gradient boosting. Recently, Zhu et al. [39] developed the model of Bayesian Ranking for MiRNA-Disease Association prediction (BRMDA). They improved Bayesian Personalized Ranking algorithm and defined a new optimization criterion by incorporating miRNA bias and adding similarity information of miRNA and disease to infer potential miRNA–disease associations. In addition, a neighborhood-based approach was utilized to predict associations for new diseases and miRNAs.
Although the above models show reliable performance to some extent, each still has its own limitations and needs further improvement. Since deep learning technology can better learn representations of data and has been successively used in many domains such as genomics and drug discovery in recent years [40], we consider applying it to the prediction of miRNA–disease associations. In addition, only pairs with known label could be used to train an ordinary multilayer perceptron network, so we need to pretrain multilayer perceptron network by using all miRNA–disease pairs to reduce the impact of too few known associations on the predictive accuracy to some extent. Inspired by Bahi et al. [41], we presented a model of Stacked Autoencoder for potential MiRNA-Disease Association prediction (SAEMDA) that took advantage of both deep learning and pretraining. We first pretrain the Stacked Autoencoder (SAE) using all miRNA–disease pairs in an unsupervised manner. Then, positive samples and the same number of randomly selected negative samples were utilized to fine-tune the SAE in a supervised manner. Predictive performance of our method was evaluated by three kinds of cross validation. As a result, SAEMDA obtained the AUC of 0.9210 in global leave-one-out cross validation (LOOCV), the AUC of 0.8343 in local LOOCV as well as the average AUC and standard deviation of 0.9102 ± 0.0029 in 100 times of 5-fold cross validation. In addition, we also carried out three case studies to demonstrate the prediction accuracy of SAEMDA. In the three different types of case studies for breast neoplasms (BN), lung neoplasms (LN) and esophageal neoplasms (EN), 41, 50 and 45 of the top 50 predicted potentially related miRNAs were verified by databases.
Results
Performance evaluation
We first obtained the training data from HMDD v2.0 [42] containing 5430 known associations between 495 miRNAs and 383 diseases and then adopt global and local LOOCV to verify the accuracy of SAEMDA. In global and local LOOCV, each known association was left out as a test sample in turn and the remaining known associations were regarded as training samples. Besides, all unlabeled miRNA–disease pairs were considered as candidate samples in global LOOCV, while candidate samples were unlabeled pairs between miRNAs and investigated disease in local LOOCV. For both global and local LOOCV, we scored the test sample and candidate samples through SAEMDA and obtained the rank of the test sample through comparing the score of the test sample with those of candidate samples. Then, we evaluated the performance of SAEMDA by drawing a receiver operating characteristics (ROC) curve and calculating the area under the ROC curve (AUC). As shown in Figure 1, SAEMDA obtained an AUC of 0.9210 in global LOOCV, which was superior to PBMDA (0.9169), EGBMMDA (0.9123), MDHGI (0.8945), TLHNMDA (0.8795), MCMDA (0.8749), MaxFlow (0.8629), RLSMDA (0.8426), HDMP (0.8366) and WBSMDA (0.8030). In local LOOCV, the AUC of SAEMDA was 0.8343 and better than those of all the other models: PBMDA (0.8341), EGBMMDA (0.8221), MDHGI (0.8240), TLHNMDA (0.7756), MCMDA (0.7718), MaxFlow (0.7774), RLSMDA (0.6953), HDMP (0.7702), WBSMDA (0.8031), MiRAI (0.6299) and MIDP (0.8196). It is worth mentioning that MIDP was not suitable for global LOOCV comparison, because it was a local ranking method based on random walk and could not simultaneously make predictions for all diseases. In addition, global LOOCV could not be applied to MiRAI, either. The predicted association scores were positively correlated with the number of miRNAs known to be associated with different diseases, so the comparison of predicted association scores for different diseases was unreasonable. It can also be seen that the AUC of MiRAI was significantly lower than other methods, because the predictive accuracy of MiRAI would be severely affected by data sparsity. There were only 83 important diseases with at least 20 associated miRNAs considered in the original literature [35]. In contrast, the number of diseases is far more than 83 and many diseases have fewer associated miRNAs in our dataset.

AUCs of SAEMDA under global and local LOOCV compared with some previous computational models.
In addition, we performed 5-fold cross validation to further evaluate the performance of SAEMDA. All known miRNA–disease associations were randomly divided into five equally sized subsets. Each subset was used as the test set in turn, while the other four subsets were used as training sets. We applied SAEMDA to score all unlabeled miRNA–disease pairs and test samples. Then we got the rank of each test sample through comparing the score of each test sample with the scores of all unlabeled pairs. To reduce the bias caused by random division of known miRNA–disease associations, we repeated 5-fold cross validation for 100 times. As a result (Table 1), SAEMDA obtained an average AUC and standard deviation of 0.9102+/−0.0029, which was higher than those of eight previous models and slightly lower than that of PBMDA. It is worth noting that all prediction models were compared with SAEMDA under the same dataset in LOOCV and 5-fold cross validation.
Performance comparison between SAEMDA and other nine models under 5-fold cross-validation
Prediction model . | AUC . | Standard deviation . |
---|---|---|
SAEMDA | 0.9102 | 0.0029 |
PBMDA | 0.9172 | 0.0007 |
EGBMMDA | 0.9048 | 0.0012 |
MDHGI | 0.8794 | 0.0021 |
TLHNMDA | 0.8795 | 0.0010 |
MCMDA | 0.8767 | 0.0011 |
MaxFlow | 0.8579 | 0.001 |
RLSMDA | 0.8569 | 0.0020 |
HDMP | 0.8342 | 0.0010 |
WBSMDA | 0.8185 | 0.0009 |
Prediction model . | AUC . | Standard deviation . |
---|---|---|
SAEMDA | 0.9102 | 0.0029 |
PBMDA | 0.9172 | 0.0007 |
EGBMMDA | 0.9048 | 0.0012 |
MDHGI | 0.8794 | 0.0021 |
TLHNMDA | 0.8795 | 0.0010 |
MCMDA | 0.8767 | 0.0011 |
MaxFlow | 0.8579 | 0.001 |
RLSMDA | 0.8569 | 0.0020 |
HDMP | 0.8342 | 0.0010 |
WBSMDA | 0.8185 | 0.0009 |
Performance comparison between SAEMDA and other nine models under 5-fold cross-validation
Prediction model . | AUC . | Standard deviation . |
---|---|---|
SAEMDA | 0.9102 | 0.0029 |
PBMDA | 0.9172 | 0.0007 |
EGBMMDA | 0.9048 | 0.0012 |
MDHGI | 0.8794 | 0.0021 |
TLHNMDA | 0.8795 | 0.0010 |
MCMDA | 0.8767 | 0.0011 |
MaxFlow | 0.8579 | 0.001 |
RLSMDA | 0.8569 | 0.0020 |
HDMP | 0.8342 | 0.0010 |
WBSMDA | 0.8185 | 0.0009 |
Prediction model . | AUC . | Standard deviation . |
---|---|---|
SAEMDA | 0.9102 | 0.0029 |
PBMDA | 0.9172 | 0.0007 |
EGBMMDA | 0.9048 | 0.0012 |
MDHGI | 0.8794 | 0.0021 |
TLHNMDA | 0.8795 | 0.0010 |
MCMDA | 0.8767 | 0.0011 |
MaxFlow | 0.8579 | 0.001 |
RLSMDA | 0.8569 | 0.0020 |
HDMP | 0.8342 | 0.0010 |
WBSMDA | 0.8185 | 0.0009 |
Case studies
In our work, we carried out three different types of case studies to further illustrate the predictive power of SAEMDA. In the first case study, we obtained known associations from HMDD v2.0 database and then verified the predicted results through dbDEMC [43] and miR2Disease [44] database. We chose BN, the most common malignant disease in women, as the investigated disease. BN begins as a local disease and can spread to lymph nodes and other organs [45]. Clinical breast examination is one of the main methods to detect BN and early diagnosis can greatly improve the cure rate of BN [46]. Studies have found that most of BN patients have abnormal miRNA expression [47], implying that miRNA could be a potential biomarker for the diagnosis of BN. For example, Heneghan et al. [48] found that the expression of miR-195 was significantly increased in BN patients. We utilized SAEMDA to reveal more miRNAs related to BN. As a result, 8 out of the top 10 and 41 out of the top 50 potential miRNAs were confirmed based on dbDEMC and miR2Disease databases (Table 2).
Validation of the top 50 miRNAs predicted to be associated with BN by SAEMDA based on the known associations in HMDD v2.0. The first column records the top 1–25 predicted miRNAs and the third column records the 26–50 predicted miRNAs
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-196a | dbDEMC; miR2Disease | hsa-mir-210 | dbDEMC; miR2Disease |
hsa-mir-1246 | unconfirmed | hsa-mir-101 | dbDEMC; miR2Disease |
hsa-mir-198 | dbDEMC | hsa-mir-125a | dbDEMC; miR2Disease |
hsa-mir-29a | dbDEMC | hsa-mir-99b | dbDEMC |
hsa-mir-205 | dbDEMC; miR2Disease | hsa-let-7f | dbDEMC; miR2Disease |
hsa-mir-200b | dbDEMC; miR2Disease | hsa-mir-590 | dbDEMC |
hsa-mir-200c | dbDEMC; miR2Disease | hsa-mir-7 | dbDEMC; miR2Disease |
hsa-mir-635 | unconfirmed | hsa-mir-144 | dbDEMC |
hsa-mir-27b | dbDEMC | hsa-mir-499a | unconfirmed |
hsa-mir-143 | dbDEMC; miR2Disease | hsa-mir-141 | dbDEMC; miR2Disease |
hsa-mir-103a | unconfirmed | hsa-mir-195 | dbDEMC; miR2Disease |
hsa-mir-19b | dbDEMC | hsa-mir-191 | dbDEMC; miR2Disease |
hsa-mir-93 | dbDEMC | hsa-mir-204 | dbDEMC; miR2Disease |
hsa-mir-363 | dbDEMC | hsa-mir-200a | dbDEMC; miR2Disease |
hsa-mir-133a | dbDEMC | hsa-mir-650 | dbDEMC |
hsa-let-7a | dbDEMC; miR2Disease | hsa-mir-10b | dbDEMC; miR2Disease |
hsa-mir-124 | dbDEMC | hsa-mir-125b | miR2Disease |
hsa-mir-29b | dbDEMC; miR2Disease | hsa-mir-30e | unconfirmed |
hsa-mir-30a | miR2Disease | hsa-mir-449a | unconfirmed |
hsa-mir-20a | miR2Disease | hsa-mir-1972 | unconfirmed |
hsa-mir-1273a | unconfirmed | hsa-mir-23b | dbDEMC |
hsa-mir-433 | dbDEMC | hsa-mir-34b | dbDEMC |
hsa-mir-31 | dbDEMC; miR2Disease | hsa-mir-95 | dbDEMC |
hsa-mir-221 | dbDEMC; miR2Disease | hsa-mir-1302 | unconfirmed |
hsa-mir-223 | dbDEMC | hsa-mir-505 | dbDEMC |
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-196a | dbDEMC; miR2Disease | hsa-mir-210 | dbDEMC; miR2Disease |
hsa-mir-1246 | unconfirmed | hsa-mir-101 | dbDEMC; miR2Disease |
hsa-mir-198 | dbDEMC | hsa-mir-125a | dbDEMC; miR2Disease |
hsa-mir-29a | dbDEMC | hsa-mir-99b | dbDEMC |
hsa-mir-205 | dbDEMC; miR2Disease | hsa-let-7f | dbDEMC; miR2Disease |
hsa-mir-200b | dbDEMC; miR2Disease | hsa-mir-590 | dbDEMC |
hsa-mir-200c | dbDEMC; miR2Disease | hsa-mir-7 | dbDEMC; miR2Disease |
hsa-mir-635 | unconfirmed | hsa-mir-144 | dbDEMC |
hsa-mir-27b | dbDEMC | hsa-mir-499a | unconfirmed |
hsa-mir-143 | dbDEMC; miR2Disease | hsa-mir-141 | dbDEMC; miR2Disease |
hsa-mir-103a | unconfirmed | hsa-mir-195 | dbDEMC; miR2Disease |
hsa-mir-19b | dbDEMC | hsa-mir-191 | dbDEMC; miR2Disease |
hsa-mir-93 | dbDEMC | hsa-mir-204 | dbDEMC; miR2Disease |
hsa-mir-363 | dbDEMC | hsa-mir-200a | dbDEMC; miR2Disease |
hsa-mir-133a | dbDEMC | hsa-mir-650 | dbDEMC |
hsa-let-7a | dbDEMC; miR2Disease | hsa-mir-10b | dbDEMC; miR2Disease |
hsa-mir-124 | dbDEMC | hsa-mir-125b | miR2Disease |
hsa-mir-29b | dbDEMC; miR2Disease | hsa-mir-30e | unconfirmed |
hsa-mir-30a | miR2Disease | hsa-mir-449a | unconfirmed |
hsa-mir-20a | miR2Disease | hsa-mir-1972 | unconfirmed |
hsa-mir-1273a | unconfirmed | hsa-mir-23b | dbDEMC |
hsa-mir-433 | dbDEMC | hsa-mir-34b | dbDEMC |
hsa-mir-31 | dbDEMC; miR2Disease | hsa-mir-95 | dbDEMC |
hsa-mir-221 | dbDEMC; miR2Disease | hsa-mir-1302 | unconfirmed |
hsa-mir-223 | dbDEMC | hsa-mir-505 | dbDEMC |
Validation of the top 50 miRNAs predicted to be associated with BN by SAEMDA based on the known associations in HMDD v2.0. The first column records the top 1–25 predicted miRNAs and the third column records the 26–50 predicted miRNAs
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-196a | dbDEMC; miR2Disease | hsa-mir-210 | dbDEMC; miR2Disease |
hsa-mir-1246 | unconfirmed | hsa-mir-101 | dbDEMC; miR2Disease |
hsa-mir-198 | dbDEMC | hsa-mir-125a | dbDEMC; miR2Disease |
hsa-mir-29a | dbDEMC | hsa-mir-99b | dbDEMC |
hsa-mir-205 | dbDEMC; miR2Disease | hsa-let-7f | dbDEMC; miR2Disease |
hsa-mir-200b | dbDEMC; miR2Disease | hsa-mir-590 | dbDEMC |
hsa-mir-200c | dbDEMC; miR2Disease | hsa-mir-7 | dbDEMC; miR2Disease |
hsa-mir-635 | unconfirmed | hsa-mir-144 | dbDEMC |
hsa-mir-27b | dbDEMC | hsa-mir-499a | unconfirmed |
hsa-mir-143 | dbDEMC; miR2Disease | hsa-mir-141 | dbDEMC; miR2Disease |
hsa-mir-103a | unconfirmed | hsa-mir-195 | dbDEMC; miR2Disease |
hsa-mir-19b | dbDEMC | hsa-mir-191 | dbDEMC; miR2Disease |
hsa-mir-93 | dbDEMC | hsa-mir-204 | dbDEMC; miR2Disease |
hsa-mir-363 | dbDEMC | hsa-mir-200a | dbDEMC; miR2Disease |
hsa-mir-133a | dbDEMC | hsa-mir-650 | dbDEMC |
hsa-let-7a | dbDEMC; miR2Disease | hsa-mir-10b | dbDEMC; miR2Disease |
hsa-mir-124 | dbDEMC | hsa-mir-125b | miR2Disease |
hsa-mir-29b | dbDEMC; miR2Disease | hsa-mir-30e | unconfirmed |
hsa-mir-30a | miR2Disease | hsa-mir-449a | unconfirmed |
hsa-mir-20a | miR2Disease | hsa-mir-1972 | unconfirmed |
hsa-mir-1273a | unconfirmed | hsa-mir-23b | dbDEMC |
hsa-mir-433 | dbDEMC | hsa-mir-34b | dbDEMC |
hsa-mir-31 | dbDEMC; miR2Disease | hsa-mir-95 | dbDEMC |
hsa-mir-221 | dbDEMC; miR2Disease | hsa-mir-1302 | unconfirmed |
hsa-mir-223 | dbDEMC | hsa-mir-505 | dbDEMC |
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-196a | dbDEMC; miR2Disease | hsa-mir-210 | dbDEMC; miR2Disease |
hsa-mir-1246 | unconfirmed | hsa-mir-101 | dbDEMC; miR2Disease |
hsa-mir-198 | dbDEMC | hsa-mir-125a | dbDEMC; miR2Disease |
hsa-mir-29a | dbDEMC | hsa-mir-99b | dbDEMC |
hsa-mir-205 | dbDEMC; miR2Disease | hsa-let-7f | dbDEMC; miR2Disease |
hsa-mir-200b | dbDEMC; miR2Disease | hsa-mir-590 | dbDEMC |
hsa-mir-200c | dbDEMC; miR2Disease | hsa-mir-7 | dbDEMC; miR2Disease |
hsa-mir-635 | unconfirmed | hsa-mir-144 | dbDEMC |
hsa-mir-27b | dbDEMC | hsa-mir-499a | unconfirmed |
hsa-mir-143 | dbDEMC; miR2Disease | hsa-mir-141 | dbDEMC; miR2Disease |
hsa-mir-103a | unconfirmed | hsa-mir-195 | dbDEMC; miR2Disease |
hsa-mir-19b | dbDEMC | hsa-mir-191 | dbDEMC; miR2Disease |
hsa-mir-93 | dbDEMC | hsa-mir-204 | dbDEMC; miR2Disease |
hsa-mir-363 | dbDEMC | hsa-mir-200a | dbDEMC; miR2Disease |
hsa-mir-133a | dbDEMC | hsa-mir-650 | dbDEMC |
hsa-let-7a | dbDEMC; miR2Disease | hsa-mir-10b | dbDEMC; miR2Disease |
hsa-mir-124 | dbDEMC | hsa-mir-125b | miR2Disease |
hsa-mir-29b | dbDEMC; miR2Disease | hsa-mir-30e | unconfirmed |
hsa-mir-30a | miR2Disease | hsa-mir-449a | unconfirmed |
hsa-mir-20a | miR2Disease | hsa-mir-1972 | unconfirmed |
hsa-mir-1273a | unconfirmed | hsa-mir-23b | dbDEMC |
hsa-mir-433 | dbDEMC | hsa-mir-34b | dbDEMC |
hsa-mir-31 | dbDEMC; miR2Disease | hsa-mir-95 | dbDEMC |
hsa-mir-221 | dbDEMC; miR2Disease | hsa-mir-1302 | unconfirmed |
hsa-mir-223 | dbDEMC | hsa-mir-505 | dbDEMC |
In the second case study, we sought to verify the performance of SAEMDA when it was applied to disease without any known associated miRNAs and took LN as the investigated disease. The training data were also collected from HMDD v2.0 database. We removed all association information for LN from the training data to simulate LN as a new disease. LN is one of the malignant tumors with the fastest increase in morbidity and mortality [49]. About 230 000 new cases of LN will be diagnosed in the United States in 2021 [49]. Although great progress has been made in imaging diagnostic techniques at present, there is no desirable method to significantly improve the early detection rate of LN, which makes most patients miss the optimal treatment period [50]. Therefore, it is very important to find an effective method of early screening and diagnosis for LN. Some studies have found that the occurrence of LN is closely related to miRNAs [21]. For example, the miR-17-92 cluster was found to be overexpressed in human LN [51]. Besides, the expression level of miR-224 in non–small cell lung cancer (NSCLC) is higher than that in normal lung tissue and it can promote tumor progression in NSCLC [52]. We trained SAEMDA to infer potential LN-related miRNAs. The validation results showed that all the top 50 predicted miRNAs were confirmed by HMDD v2.0, dbDEMC and miR2Disease (Table 3).
Validation of the top 50 miRNAs predicted to be associated with LN by SAEMDA based on the known associations in HMDD v2.0. Especially, LN was considered as new disease by removing association information of LN from HMDD v2.0. The first column records the top 1–25 predicted miRNAs and the third column records the 26–50 predicted miRNAs
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-21 | dbDEMC; miR2Disease; HMDD | hsa-mir-223 | HMDD |
hsa-mir-155 | dbDEMC; miR2Disease; HMDD | hsa-mir-146b | miR2Disease; HMDD |
hsa-mir-92a | HMDD | hsa-mir-19a | dbDEMC; miR2Disease; HMDD |
hsa-mir-30a | miR2Disease; HMDD | hsa-mir-24 | miR2Disease; HMDD |
hsa-mir-19b | dbDEMC; HMDD | hsa-mir-125b | miR2Disease; HMDD |
hsa-mir-195 | dbDEMC; miR2Disease | hsa-mir-181a | dbDEMC; HMDD |
hsa-mir-17 | miR2Disease; HMDD | hsa-mir-125a | dbDEMC; miR2Disease; HMDD |
hsa-mir-29c | dbDEMC; miR2Disease; HMDD | hsa-mir-34a | dbDEMC; HMDD |
hsa-mir-210 | dbDEMC; miR2Disease; HMDD | hsa-mir-145 | dbDEMC; miR2Disease; HMDD |
hsa-mir-29a | dbDEMC; miR2Disease; HMDD | hsa-let-7c | dbDEMC; miR2Disease; HMDD |
hsa-mir-16 | dbDEMC; miR2Disease | hsa-mir-27a | dbDEMC; HMDD |
hsa-mir-126 | dbDEMC; miR2Disease; HMDD | hsa-mir-15b | dbDEMC |
hsa-mir-26a | dbDEMC; miR2Disease; HMDD | hsa-mir-1 | dbDEMC; miR2Disease; HMDD |
hsa-mir-142 | HMDD | hsa-mir-199b | dbDEMC; miR2Disease; HMDD |
hsa-mir-29b | dbDEMC; miR2Disease; HMDD | hsa-mir-9 | miR2Disease; HMDD |
hsa-mir-200c | dbDEMC; miR2Disease; HMDD | hsa-let-7e | miR2Disease; HMDD |
hsa-mir-146a | dbDEMC; miR2Disease; HMDD | hsa-mir-22 | miR2Disease; HMDD |
hsa-mir-150 | dbDEMC; miR2Disease; HMDD | hsa-let-7b | miR2Disease; HMDD |
hsa-mir-7 | miR2Disease; HMDD | hsa-mir-30e | miR2Disease; HMDD |
hsa-mir-15a | dbDEMC | hsa-mir-148a | dbDEMC; HMDD |
hsa-mir-106b | dbDEMC | hsa-let-7d | dbDEMC; miR2Disease; HMDD |
hsa-mir-18a | dbDEMC; miR2Disease; HMDD | hsa-mir-221 | dbDEMC; HMDD |
hsa-mir-23b | dbDEMC | hsa-mir-192 | dbDEMC; miR2Disease; HMDD |
hsa-let-7a | dbDEMC; miR2Disease; HMDD | hsa-mir-196a | dbDEMC; HMDD |
hsa-mir-20a | dbDEMC; miR2Disease; HMDD | hsa-mir-20b | dbDEMC |
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-21 | dbDEMC; miR2Disease; HMDD | hsa-mir-223 | HMDD |
hsa-mir-155 | dbDEMC; miR2Disease; HMDD | hsa-mir-146b | miR2Disease; HMDD |
hsa-mir-92a | HMDD | hsa-mir-19a | dbDEMC; miR2Disease; HMDD |
hsa-mir-30a | miR2Disease; HMDD | hsa-mir-24 | miR2Disease; HMDD |
hsa-mir-19b | dbDEMC; HMDD | hsa-mir-125b | miR2Disease; HMDD |
hsa-mir-195 | dbDEMC; miR2Disease | hsa-mir-181a | dbDEMC; HMDD |
hsa-mir-17 | miR2Disease; HMDD | hsa-mir-125a | dbDEMC; miR2Disease; HMDD |
hsa-mir-29c | dbDEMC; miR2Disease; HMDD | hsa-mir-34a | dbDEMC; HMDD |
hsa-mir-210 | dbDEMC; miR2Disease; HMDD | hsa-mir-145 | dbDEMC; miR2Disease; HMDD |
hsa-mir-29a | dbDEMC; miR2Disease; HMDD | hsa-let-7c | dbDEMC; miR2Disease; HMDD |
hsa-mir-16 | dbDEMC; miR2Disease | hsa-mir-27a | dbDEMC; HMDD |
hsa-mir-126 | dbDEMC; miR2Disease; HMDD | hsa-mir-15b | dbDEMC |
hsa-mir-26a | dbDEMC; miR2Disease; HMDD | hsa-mir-1 | dbDEMC; miR2Disease; HMDD |
hsa-mir-142 | HMDD | hsa-mir-199b | dbDEMC; miR2Disease; HMDD |
hsa-mir-29b | dbDEMC; miR2Disease; HMDD | hsa-mir-9 | miR2Disease; HMDD |
hsa-mir-200c | dbDEMC; miR2Disease; HMDD | hsa-let-7e | miR2Disease; HMDD |
hsa-mir-146a | dbDEMC; miR2Disease; HMDD | hsa-mir-22 | miR2Disease; HMDD |
hsa-mir-150 | dbDEMC; miR2Disease; HMDD | hsa-let-7b | miR2Disease; HMDD |
hsa-mir-7 | miR2Disease; HMDD | hsa-mir-30e | miR2Disease; HMDD |
hsa-mir-15a | dbDEMC | hsa-mir-148a | dbDEMC; HMDD |
hsa-mir-106b | dbDEMC | hsa-let-7d | dbDEMC; miR2Disease; HMDD |
hsa-mir-18a | dbDEMC; miR2Disease; HMDD | hsa-mir-221 | dbDEMC; HMDD |
hsa-mir-23b | dbDEMC | hsa-mir-192 | dbDEMC; miR2Disease; HMDD |
hsa-let-7a | dbDEMC; miR2Disease; HMDD | hsa-mir-196a | dbDEMC; HMDD |
hsa-mir-20a | dbDEMC; miR2Disease; HMDD | hsa-mir-20b | dbDEMC |
Validation of the top 50 miRNAs predicted to be associated with LN by SAEMDA based on the known associations in HMDD v2.0. Especially, LN was considered as new disease by removing association information of LN from HMDD v2.0. The first column records the top 1–25 predicted miRNAs and the third column records the 26–50 predicted miRNAs
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-21 | dbDEMC; miR2Disease; HMDD | hsa-mir-223 | HMDD |
hsa-mir-155 | dbDEMC; miR2Disease; HMDD | hsa-mir-146b | miR2Disease; HMDD |
hsa-mir-92a | HMDD | hsa-mir-19a | dbDEMC; miR2Disease; HMDD |
hsa-mir-30a | miR2Disease; HMDD | hsa-mir-24 | miR2Disease; HMDD |
hsa-mir-19b | dbDEMC; HMDD | hsa-mir-125b | miR2Disease; HMDD |
hsa-mir-195 | dbDEMC; miR2Disease | hsa-mir-181a | dbDEMC; HMDD |
hsa-mir-17 | miR2Disease; HMDD | hsa-mir-125a | dbDEMC; miR2Disease; HMDD |
hsa-mir-29c | dbDEMC; miR2Disease; HMDD | hsa-mir-34a | dbDEMC; HMDD |
hsa-mir-210 | dbDEMC; miR2Disease; HMDD | hsa-mir-145 | dbDEMC; miR2Disease; HMDD |
hsa-mir-29a | dbDEMC; miR2Disease; HMDD | hsa-let-7c | dbDEMC; miR2Disease; HMDD |
hsa-mir-16 | dbDEMC; miR2Disease | hsa-mir-27a | dbDEMC; HMDD |
hsa-mir-126 | dbDEMC; miR2Disease; HMDD | hsa-mir-15b | dbDEMC |
hsa-mir-26a | dbDEMC; miR2Disease; HMDD | hsa-mir-1 | dbDEMC; miR2Disease; HMDD |
hsa-mir-142 | HMDD | hsa-mir-199b | dbDEMC; miR2Disease; HMDD |
hsa-mir-29b | dbDEMC; miR2Disease; HMDD | hsa-mir-9 | miR2Disease; HMDD |
hsa-mir-200c | dbDEMC; miR2Disease; HMDD | hsa-let-7e | miR2Disease; HMDD |
hsa-mir-146a | dbDEMC; miR2Disease; HMDD | hsa-mir-22 | miR2Disease; HMDD |
hsa-mir-150 | dbDEMC; miR2Disease; HMDD | hsa-let-7b | miR2Disease; HMDD |
hsa-mir-7 | miR2Disease; HMDD | hsa-mir-30e | miR2Disease; HMDD |
hsa-mir-15a | dbDEMC | hsa-mir-148a | dbDEMC; HMDD |
hsa-mir-106b | dbDEMC | hsa-let-7d | dbDEMC; miR2Disease; HMDD |
hsa-mir-18a | dbDEMC; miR2Disease; HMDD | hsa-mir-221 | dbDEMC; HMDD |
hsa-mir-23b | dbDEMC | hsa-mir-192 | dbDEMC; miR2Disease; HMDD |
hsa-let-7a | dbDEMC; miR2Disease; HMDD | hsa-mir-196a | dbDEMC; HMDD |
hsa-mir-20a | dbDEMC; miR2Disease; HMDD | hsa-mir-20b | dbDEMC |
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-21 | dbDEMC; miR2Disease; HMDD | hsa-mir-223 | HMDD |
hsa-mir-155 | dbDEMC; miR2Disease; HMDD | hsa-mir-146b | miR2Disease; HMDD |
hsa-mir-92a | HMDD | hsa-mir-19a | dbDEMC; miR2Disease; HMDD |
hsa-mir-30a | miR2Disease; HMDD | hsa-mir-24 | miR2Disease; HMDD |
hsa-mir-19b | dbDEMC; HMDD | hsa-mir-125b | miR2Disease; HMDD |
hsa-mir-195 | dbDEMC; miR2Disease | hsa-mir-181a | dbDEMC; HMDD |
hsa-mir-17 | miR2Disease; HMDD | hsa-mir-125a | dbDEMC; miR2Disease; HMDD |
hsa-mir-29c | dbDEMC; miR2Disease; HMDD | hsa-mir-34a | dbDEMC; HMDD |
hsa-mir-210 | dbDEMC; miR2Disease; HMDD | hsa-mir-145 | dbDEMC; miR2Disease; HMDD |
hsa-mir-29a | dbDEMC; miR2Disease; HMDD | hsa-let-7c | dbDEMC; miR2Disease; HMDD |
hsa-mir-16 | dbDEMC; miR2Disease | hsa-mir-27a | dbDEMC; HMDD |
hsa-mir-126 | dbDEMC; miR2Disease; HMDD | hsa-mir-15b | dbDEMC |
hsa-mir-26a | dbDEMC; miR2Disease; HMDD | hsa-mir-1 | dbDEMC; miR2Disease; HMDD |
hsa-mir-142 | HMDD | hsa-mir-199b | dbDEMC; miR2Disease; HMDD |
hsa-mir-29b | dbDEMC; miR2Disease; HMDD | hsa-mir-9 | miR2Disease; HMDD |
hsa-mir-200c | dbDEMC; miR2Disease; HMDD | hsa-let-7e | miR2Disease; HMDD |
hsa-mir-146a | dbDEMC; miR2Disease; HMDD | hsa-mir-22 | miR2Disease; HMDD |
hsa-mir-150 | dbDEMC; miR2Disease; HMDD | hsa-let-7b | miR2Disease; HMDD |
hsa-mir-7 | miR2Disease; HMDD | hsa-mir-30e | miR2Disease; HMDD |
hsa-mir-15a | dbDEMC | hsa-mir-148a | dbDEMC; HMDD |
hsa-mir-106b | dbDEMC | hsa-let-7d | dbDEMC; miR2Disease; HMDD |
hsa-mir-18a | dbDEMC; miR2Disease; HMDD | hsa-mir-221 | dbDEMC; HMDD |
hsa-mir-23b | dbDEMC | hsa-mir-192 | dbDEMC; miR2Disease; HMDD |
hsa-let-7a | dbDEMC; miR2Disease; HMDD | hsa-mir-196a | dbDEMC; HMDD |
hsa-mir-20a | dbDEMC; miR2Disease; HMDD | hsa-mir-20b | dbDEMC |
In the third case study, to demonstrate the generalization ability of SAEMDA on different datasets, we obtained the training data from HMDD v1.0 containing 1395 known associations between 271 miRNAs and 137 diseases. EN was selected for the case study. EN is one of the most high-risk cancers in the world and its mortality rate ranks sixth among all cancers [53]. During recent years, the incidence of EN in Asia has gradually increased [54]. Although chemotherapy, radiotherapy and other technologies are developing rapidly, they cannot provide satisfactory treatment for advanced EN patients [54]. Therefore, identifying biomarkers of EN for early diagnosis will make a significant impact on the prospects for diagnosis and treatment of EN. Current studies show that the occurrence, development and prognosis of EN are related to the abnormal regulation of miRNAs [55]. For example, miR-377 can suppress initiation and progression of EN by inhibiting CD133 and VEGF [56]. In addition, miR-296 was overexpressed in esophageal squamous cell cancer tissues and downregulation of miR-296 can suppress growth of EN cells [57]. Here, we employed SAEMDA to predict EN-associated miRNAs based on known associations in HMDD v1.0. As a result, 45 out of the top 50 predicted miRNAs were verified by HMDD v2.0, dbDEMC and miR2Disease databases (Table 4).
Validation of the top 50 miRNAs predicted to be associated with EN by SAEMDA based on the known associations in HMDD v1.0. The first column records the top 1–25 predicted miRNAs and the third column records the 26–50 predicted miRNAs
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-155 | dbDEMC; HMDD | hsa-mir-208b | unconfirmed |
hsa-mir-365 | unconfirmed | hsa-mir-92b | dbDEMC |
hsa-mir-448 | dbDEMC | hsa-mir-200b | dbDEMC |
hsa-mir-221 | dbDEMC | hsa-let-7d | dbDEMC |
hsa-mir-146a | dbDEMC; HMDD | hsa-let-7i | dbDEMC |
hsa-let-7c | dbDEMC; HMDD | hsa-mir-29a | dbDEMC |
hsa-mir-222 | dbDEMC | hsa-mir-181b | dbDEMC |
hsa-mir-20a | dbDEMC; HMDD | hsa-mir-181a | dbDEMC |
hsa-mir-92a | HMDD | hsa-let-7 g | dbDEMC |
hsa-mir-514 | unconfirmed | hsa-mir-125b | dbDEMC |
hsa-mir-338 | dbDEMC | hsa-mir-210 | dbDEMC; HMDD |
hsa-mir-137 | dbDEMC | hsa-mir-141 | dbDEMC; HMDD |
hsa-mir-18a | dbDEMC | hsa-mir-300 | unconfirmed |
hsa-mir-145 | dbDEMC; HMDD | hsa-mir-383 | dbDEMC |
hsa-mir-423 | dbDEMC | hsa-mir-515 | unconfirmed |
hsa-mir-19a | dbDEMC; HMDD | hsa-mir-602 | dbDEMC |
hsa-mir-29c | dbDEMC; HMDD | hsa-mir-196b | dbDEMC; HMDD |
hsa-mir-199b | dbDEMC | hsa-mir-135b | dbDEMC; HMDD |
hsa-mir-23b | dbDEMC | hsa-mir-206 | dbDEMC |
hsa-let-7b | dbDEMC; HMDD | hsa-mir-127 | dbDEMC |
hsa-mir-520b | dbDEMC | hsa-mir-98 | dbDEMC; HMDD |
hsa-mir-335 | dbDEMC | hsa-mir-9 | dbDEMC |
hsa-mir-330 | dbDEMC | hsa-mir-373 | dbDEMC; miR2Disease |
hsa-mir-223 | dbDEMC; miR2Disease; HMDD | hsa-mir-132 | dbDEMC |
hsa-mir-34a | dbDEMC; HMDD | hsa-mir-134 | dbDEMC |
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-155 | dbDEMC; HMDD | hsa-mir-208b | unconfirmed |
hsa-mir-365 | unconfirmed | hsa-mir-92b | dbDEMC |
hsa-mir-448 | dbDEMC | hsa-mir-200b | dbDEMC |
hsa-mir-221 | dbDEMC | hsa-let-7d | dbDEMC |
hsa-mir-146a | dbDEMC; HMDD | hsa-let-7i | dbDEMC |
hsa-let-7c | dbDEMC; HMDD | hsa-mir-29a | dbDEMC |
hsa-mir-222 | dbDEMC | hsa-mir-181b | dbDEMC |
hsa-mir-20a | dbDEMC; HMDD | hsa-mir-181a | dbDEMC |
hsa-mir-92a | HMDD | hsa-let-7 g | dbDEMC |
hsa-mir-514 | unconfirmed | hsa-mir-125b | dbDEMC |
hsa-mir-338 | dbDEMC | hsa-mir-210 | dbDEMC; HMDD |
hsa-mir-137 | dbDEMC | hsa-mir-141 | dbDEMC; HMDD |
hsa-mir-18a | dbDEMC | hsa-mir-300 | unconfirmed |
hsa-mir-145 | dbDEMC; HMDD | hsa-mir-383 | dbDEMC |
hsa-mir-423 | dbDEMC | hsa-mir-515 | unconfirmed |
hsa-mir-19a | dbDEMC; HMDD | hsa-mir-602 | dbDEMC |
hsa-mir-29c | dbDEMC; HMDD | hsa-mir-196b | dbDEMC; HMDD |
hsa-mir-199b | dbDEMC | hsa-mir-135b | dbDEMC; HMDD |
hsa-mir-23b | dbDEMC | hsa-mir-206 | dbDEMC |
hsa-let-7b | dbDEMC; HMDD | hsa-mir-127 | dbDEMC |
hsa-mir-520b | dbDEMC | hsa-mir-98 | dbDEMC; HMDD |
hsa-mir-335 | dbDEMC | hsa-mir-9 | dbDEMC |
hsa-mir-330 | dbDEMC | hsa-mir-373 | dbDEMC; miR2Disease |
hsa-mir-223 | dbDEMC; miR2Disease; HMDD | hsa-mir-132 | dbDEMC |
hsa-mir-34a | dbDEMC; HMDD | hsa-mir-134 | dbDEMC |
Validation of the top 50 miRNAs predicted to be associated with EN by SAEMDA based on the known associations in HMDD v1.0. The first column records the top 1–25 predicted miRNAs and the third column records the 26–50 predicted miRNAs
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-155 | dbDEMC; HMDD | hsa-mir-208b | unconfirmed |
hsa-mir-365 | unconfirmed | hsa-mir-92b | dbDEMC |
hsa-mir-448 | dbDEMC | hsa-mir-200b | dbDEMC |
hsa-mir-221 | dbDEMC | hsa-let-7d | dbDEMC |
hsa-mir-146a | dbDEMC; HMDD | hsa-let-7i | dbDEMC |
hsa-let-7c | dbDEMC; HMDD | hsa-mir-29a | dbDEMC |
hsa-mir-222 | dbDEMC | hsa-mir-181b | dbDEMC |
hsa-mir-20a | dbDEMC; HMDD | hsa-mir-181a | dbDEMC |
hsa-mir-92a | HMDD | hsa-let-7 g | dbDEMC |
hsa-mir-514 | unconfirmed | hsa-mir-125b | dbDEMC |
hsa-mir-338 | dbDEMC | hsa-mir-210 | dbDEMC; HMDD |
hsa-mir-137 | dbDEMC | hsa-mir-141 | dbDEMC; HMDD |
hsa-mir-18a | dbDEMC | hsa-mir-300 | unconfirmed |
hsa-mir-145 | dbDEMC; HMDD | hsa-mir-383 | dbDEMC |
hsa-mir-423 | dbDEMC | hsa-mir-515 | unconfirmed |
hsa-mir-19a | dbDEMC; HMDD | hsa-mir-602 | dbDEMC |
hsa-mir-29c | dbDEMC; HMDD | hsa-mir-196b | dbDEMC; HMDD |
hsa-mir-199b | dbDEMC | hsa-mir-135b | dbDEMC; HMDD |
hsa-mir-23b | dbDEMC | hsa-mir-206 | dbDEMC |
hsa-let-7b | dbDEMC; HMDD | hsa-mir-127 | dbDEMC |
hsa-mir-520b | dbDEMC | hsa-mir-98 | dbDEMC; HMDD |
hsa-mir-335 | dbDEMC | hsa-mir-9 | dbDEMC |
hsa-mir-330 | dbDEMC | hsa-mir-373 | dbDEMC; miR2Disease |
hsa-mir-223 | dbDEMC; miR2Disease; HMDD | hsa-mir-132 | dbDEMC |
hsa-mir-34a | dbDEMC; HMDD | hsa-mir-134 | dbDEMC |
miRNA . | Evidence . | miRNA . | Evidence . |
---|---|---|---|
hsa-mir-155 | dbDEMC; HMDD | hsa-mir-208b | unconfirmed |
hsa-mir-365 | unconfirmed | hsa-mir-92b | dbDEMC |
hsa-mir-448 | dbDEMC | hsa-mir-200b | dbDEMC |
hsa-mir-221 | dbDEMC | hsa-let-7d | dbDEMC |
hsa-mir-146a | dbDEMC; HMDD | hsa-let-7i | dbDEMC |
hsa-let-7c | dbDEMC; HMDD | hsa-mir-29a | dbDEMC |
hsa-mir-222 | dbDEMC | hsa-mir-181b | dbDEMC |
hsa-mir-20a | dbDEMC; HMDD | hsa-mir-181a | dbDEMC |
hsa-mir-92a | HMDD | hsa-let-7 g | dbDEMC |
hsa-mir-514 | unconfirmed | hsa-mir-125b | dbDEMC |
hsa-mir-338 | dbDEMC | hsa-mir-210 | dbDEMC; HMDD |
hsa-mir-137 | dbDEMC | hsa-mir-141 | dbDEMC; HMDD |
hsa-mir-18a | dbDEMC | hsa-mir-300 | unconfirmed |
hsa-mir-145 | dbDEMC; HMDD | hsa-mir-383 | dbDEMC |
hsa-mir-423 | dbDEMC | hsa-mir-515 | unconfirmed |
hsa-mir-19a | dbDEMC; HMDD | hsa-mir-602 | dbDEMC |
hsa-mir-29c | dbDEMC; HMDD | hsa-mir-196b | dbDEMC; HMDD |
hsa-mir-199b | dbDEMC | hsa-mir-135b | dbDEMC; HMDD |
hsa-mir-23b | dbDEMC | hsa-mir-206 | dbDEMC |
hsa-let-7b | dbDEMC; HMDD | hsa-mir-127 | dbDEMC |
hsa-mir-520b | dbDEMC | hsa-mir-98 | dbDEMC; HMDD |
hsa-mir-335 | dbDEMC | hsa-mir-9 | dbDEMC |
hsa-mir-330 | dbDEMC | hsa-mir-373 | dbDEMC; miR2Disease |
hsa-mir-223 | dbDEMC; miR2Disease; HMDD | hsa-mir-132 | dbDEMC |
hsa-mir-34a | dbDEMC; HMDD | hsa-mir-134 | dbDEMC |
Discussion
Predicting potential miRNA–disease associations enables researchers to better understand the mechanisms of diseases and promotes the diagnosis, treatment and prognosis of complex diseases. In this study, we developed SAEMDA that can be an effective supplement to traditional biological experimental methods. In SAEMDA, all miRNA–disease samples were used to pretrain an SAE. Then, the SAE was fine-tuned with the positive samples and the same number of negative samples. SAEMDA obtained better performance than other models in three types of cross validation. SAEMDA is superior to previous methods mainly because it makes full use of the information of all unlabeled samples in the training process. In addition, the results of three kinds of case studies further illustrated the reliable prediction performance of SAEMDA. In addition to miRNA–disease association prediction, there are many important link prediction problems in the field of bioinformatics, such as lncRNA–disease association prediction [58], circular RNA (circRNA)–disease association prediction [59] and protein–protein interaction prediction [60]. In the task of miRNA–disease association prediction, SAEMDA shows good performance. Therefore, the framework of SAEMDA could be considered to be utilized to solve above link prediction problems.
The reliable performance of SAEMDA was due to the following aspects. Firstly, the data used in our study contain 189 585 miRNA–disease pairs for 495 miRNAs and 383 diseases, with only 5430 known associations. SAEMDA was especially suitable for the dataset composed of a large amount of unlabeled data and a small amount of labeled data, because SAEMDA adopts a combination of unsupervised pretraining and supervised fine-tuning. The pretraining process enabled the model to learn the features of all miRNA–disease pairs and made up for the defect that traditional supervised learning model only can be trained with label samples. Besides, fine-tuning process enabled the model to learn label information of a small amount of labeled data for further performance improvement. Secondly, SAEMDA integrated diverse similarity networks so that the features could better capture the information of all miRNA–disease pairs. Finally, we selected Adam optimizer in the training process of SAEMDA, as it is more efficient than traditional Stochastic Gradient Descent (SGD) optimizer.
However, SAEMDA still has some limitations. Firstly, hyperparameter of neural networks (such as the number of hidden layers and the number of neurons per layer) was not well determined. Secondly, SAEMDA obtained larger standard deviation than comparison models in 100 times of 5-fold cross validation. Therefore, SAEMDA was slightly inferior to other models in terms of stability, which is a common problem in deep learning. Thirdly, positive and negative samples are needed in the process of fine-tuning, but randomly selecting unlabeled samples as negative samples would bring inaccurate information. Finally, there is room for improvement in splicing similarity of disease and miRNA as features of disease–miRNA pair. Therefore, how to construct and extract reliable features of miRNA–disease pairs would be a future research direction of prediction method design. Besides, it is necessary to design appropriate methods to change the way of negative sample selection. Clustering algorithm could be considered to be used in the process of negative sample selection [61–63]. In addition, it may be an important direction to design effective methods to introduce other biological information to help predict potential miRNA–disease associations.
Materials and methods
Materials

Flowchart of SAEMDA to predict potential miRNA–disease associations.
SAEMDA
In this study, we proposed a new model named SAEMDA to predict potential miRNA–disease associations. The flowchart of SAEMDA is depicted in Figure 2. The first step of SAEMDA is data preparation, which is to denote the miRNA–disease pairs as feature vectors. As presented in previous sections, we constructed the adjacency matrix A of miRNA–disease pairs (nm × nd), the integrated miRNA similarity matrix SM (nm × nm) and the integrated disease similarity matrix SD (nd × nd). From them, nm and nd features were extracted for each miRNA and disease, respectively. Concatenating the feature vectors of the investigated disease and miRNA yielded nm + nd features for each miRNA–disease pair. Among all miRNA–disease pairs, a total of 5430 pairs were known associations and the remaining miRNA–disease pairs were unlabeled.
In this study, SAE was constructed by stacking three AEs according to previous research [41]. The unsupervised pretraining of SAE was carried out as follows:
An AE was trained using the feature vectors of all miRNA–disease pairs.
The decoder layer was removed from the AE. Then, a new AE was constructed with the feature vectors generated by the first AE as input.
The new AE was trained, while weights and bias of the previously trained AE remain unchanged.
Repeated steps 2 and 3 until three AEs are stacked.
After the unsupervised pretraining, we obtained the weight matrices W1, W2 and W3 as well as the bias vectors of b1, b2 and b3 of SAE. Then, the third step of SAEMDA is supervised fine-tuning of SAE based on positive and negative samples. Here, the 5430 known miRNA–disease associations were taken as positive samples. In addition, 5430 negative samples were randomly selected from the unlabeled miRNA–disease pairs. The fine-tuning process contained the following steps:
An output layer was added into the SAE obtained in the pretraining process. Here, the weight matrix W4 and bias vector b4 between the output layer and previous layer were randomly initialized.
Positive samples and the same number of selected negative samples were used to train the SAE.
Finally, the trained SAE can be used to predict potential miRNA–disease associations. It is worth noting that SAEMDA used the tanh activation function in each hidden layer and the softmax classifier in the output layer. Besides, cross entropy was used as loss function in the fine-tuning process and Adam optimizer was utilized to optimize SAE. In addition, we set the number of hidden layers of three AE as 512, 256 and 128, respectively. After setting the hyperparameters of the model, we trained SAEMDA with a learning rate of 0.0001 to obtain the final miRNA–disease association score.
SAEMDA was especially suitable for the dataset composed of a large amount of unlabeled data and a small amount of labeled data.
SAEMDA integrated diverse similarity networks. Therefore, the features could better capture the information of all miRNA–disease pairs.
We selected Adam optimizer in the training process of SAEMDA, as it is more efficient than traditional Stochastic Gradient Descent (SGD) optimizer.
Leave-one-out cross validation and case studies were implemented to evaluate the prediction performance of SAEMDA.
Data availability
We provided the python code and data for SAEMDA at https://github.com/xpnbs/SAEMDA.
Funding
This work was supported by Fundamental Research Funds for the Central Universities (2019ZDPY01).
Chun-Chun Wang is a PhD student of School of Information and Control Engineering, China University of Mining and Technology. His research interests include bioinformatics, complex network algorithm and machine learning.
Tian-Hao Li is a master’s student of School of Information and Control Engineering, China University of Mining and Technology. His research interests include bioinformatics and machine learning.
Li Huang is a PhD student of Academy of Arts and Design, Tsinghua University. His research interests include bioinformatics, complex network algorithm and machine learning.
Xing Chen, PhD, is a professor of China University of Mining and Technology. He is the associate dean of Artificial Intelligence Research Institute, China University of Mining and Technology. He is also the founding director of Institute of Bioinformatics, China University of Mining and Technology and Big Data Research Center, China University of Mining and Technology. His research interests include complex disease-related noncoding RNA biomarker prediction, computational models for drug discovery and early detection of human complex disease based on big data and artificial intelligence algorithms.