Abstract

Currently, there exist no generally accepted strategies of evaluating computational models for microRNA-disease associations (MDAs). Though K-fold cross validations and case studies seem to be must-have procedures, the value of K, the evaluation metrics, and the choice of query diseases as well as the inclusion of other procedures (such as parameter sensitivity tests, ablation studies and computational cost reports) are all determined on a case-by-case basis and depending on the researchers’ choices. In the current review, we include a comprehensive analysis on how 29 state-of-the-art models for predicting MDAs were evaluated. Based on the analytical results, we recommend a feasible evaluation workflow that would suit any future model to facilitate fair and systematic assessment of predictive performance.

Introduction

The quality of a model’s predictions can be evaluated either by hold-out validation or by K-fold cross validation (CV) [1]. The former randomly splits data samples into two subsets: a training set for building the model and a test set for measuring the fitted model’s predictive performance. The latter randomly partitions data into K subsets with the same size: each is regarded as the test set in turn and the remaining subsets are used to train the model. K-fold CV is more appropriate when working on limited data size, as that in the task of predicting microRNA-disease associations (MDAs), typically in the order of thousands. As indicated in the previous MDA review by Chen et al. [2], this validation approach, together with case studies on predictions made for specific diseases, has been extensively used for performance evaluation since the earliest MDA prediction models such as the hypergeometric distribution method by Jiang et al. [3] in 2009 and random walk with restart for MDA (RWRMDA) by Chen et al. [4] in 2012.

Recent works have also involved other evaluation procedures, largely due to data fusion and model fusion (For detailed discussion on data fusion and model fusion, refer to another two review articles of ours, respectively entitled ‘Updated review of advances in microRNAs and diseases: experimental results, databases, webservers, and data fusion’ and ‘Updated review of advances in microRNAs and diseases: taxonomy, trends, and challenges of computational models’.). The former means that diverse data sources (including but not limited to miRNA functional similarity [5], miRNA–target interactions [6, 7], miRNA sequences [8] and disease semantic similarity [5]) were integrated by MDA prediction models to gain a more comprehensive research perspective. The latter indicates that multiple computational modules (or sub-models) were usually fused to learn accurate representations of the complex multi-source data and hence improve predictive performance. There are three model fusion schemes summarized from state-of-the-art MDA prediction models: sequential fusion where sub-models were integrated in a sequential order (that is, one appended to another) to facilitate ‘deeper’ representation learning; parallel fusion where predictions of several classifiers were combined, typically by taking their weighted average, to obtain better prediction outcomes than those made by any single classifier; hybrid fusion where a model performed both parallel and sequential fusion to take advantages of both schemes. In addition to these three fusion schemes, we further proposed the non-fusion scheme where a model conducted no fusion operation but adopted an advanced machine learning algorithm to ensure satisfactory prediction results. Together, data fusion, model fusion and non-fusion (with advanced algorithms) brought in interesting questions on (i) how the constructed model could be fine-tuned, (ii) which data sources or sub-models were more conducive to learning than others and (iii) how much computational cost was incurred by data/model fusion.

Devising additional evaluation schemes to answer these questions, along with the mainstream K-fold CV and case studies, would help better understand the model and enhance its credibility in superior performance to that of counterparts. Thus, we selected 29 state-of-the-art MDA prediction models for analysis in this review and they were all published in or after 2017 from Briefings in Bioinformatics, Bioinformatics, PLoS Computational Biology and other high-impact journals. Table 1 presents the models’ names, associated literature references and corresponding categories (based on model fusion or non-fusion schemes as well as machine learning types, and elaborated in another review article of ours, entitled ‘Updated review of advances in microRNAs and diseases: taxonomy, trends, and challenges of computational models’).

Table 1

The two-dimensional taxonomy organized the 29 state-of-the-art MDA prediction models along two perspectives: fusion or non-fusion scheme (first column) and learning type (second column)

Fusion or non-fusion schemeLearning typeModels
Sequential fusionMatrix decompositionMDHGI [61]
MLPMDA [21]
Deep learningDBNMDA [62]
MDA–CNN [53]
MDA–GCNFTG [32]
Other machine learningNSEMDA [16]
Parallel fusionMatrix decompositionM2LFL [10]
Deep learningMVMTMDA [15]
Decision treeEGBMMDA [63]
EDTMDA [47]
RFMDA [38]
ABMDA [64]
Other machine learningLRSSLMDA [52]
BNPMDA [65]
ELLPMDA [66]
BLHARMDA [39]
Hybrid fusionMatrix decompositionMDLPMDA [67]
Matrix completionNIMCGCN [28]
Deep learningAEMDA [17]
NMCMDA [36]
MMGCN [9]
Decision treeLMTRDA [48]
Non-fusionMatrix decompositionGRNMF [20]
DMPred [42]
BRMDA [68]
TDRC [30]
Matrix completionNCMCMDA [69]
IMCMDA [19]
Deep learningGAEMDA [37]
Fusion or non-fusion schemeLearning typeModels
Sequential fusionMatrix decompositionMDHGI [61]
MLPMDA [21]
Deep learningDBNMDA [62]
MDA–CNN [53]
MDA–GCNFTG [32]
Other machine learningNSEMDA [16]
Parallel fusionMatrix decompositionM2LFL [10]
Deep learningMVMTMDA [15]
Decision treeEGBMMDA [63]
EDTMDA [47]
RFMDA [38]
ABMDA [64]
Other machine learningLRSSLMDA [52]
BNPMDA [65]
ELLPMDA [66]
BLHARMDA [39]
Hybrid fusionMatrix decompositionMDLPMDA [67]
Matrix completionNIMCGCN [28]
Deep learningAEMDA [17]
NMCMDA [36]
MMGCN [9]
Decision treeLMTRDA [48]
Non-fusionMatrix decompositionGRNMF [20]
DMPred [42]
BRMDA [68]
TDRC [30]
Matrix completionNCMCMDA [69]
IMCMDA [19]
Deep learningGAEMDA [37]
Table 1

The two-dimensional taxonomy organized the 29 state-of-the-art MDA prediction models along two perspectives: fusion or non-fusion scheme (first column) and learning type (second column)

Fusion or non-fusion schemeLearning typeModels
Sequential fusionMatrix decompositionMDHGI [61]
MLPMDA [21]
Deep learningDBNMDA [62]
MDA–CNN [53]
MDA–GCNFTG [32]
Other machine learningNSEMDA [16]
Parallel fusionMatrix decompositionM2LFL [10]
Deep learningMVMTMDA [15]
Decision treeEGBMMDA [63]
EDTMDA [47]
RFMDA [38]
ABMDA [64]
Other machine learningLRSSLMDA [52]
BNPMDA [65]
ELLPMDA [66]
BLHARMDA [39]
Hybrid fusionMatrix decompositionMDLPMDA [67]
Matrix completionNIMCGCN [28]
Deep learningAEMDA [17]
NMCMDA [36]
MMGCN [9]
Decision treeLMTRDA [48]
Non-fusionMatrix decompositionGRNMF [20]
DMPred [42]
BRMDA [68]
TDRC [30]
Matrix completionNCMCMDA [69]
IMCMDA [19]
Deep learningGAEMDA [37]
Fusion or non-fusion schemeLearning typeModels
Sequential fusionMatrix decompositionMDHGI [61]
MLPMDA [21]
Deep learningDBNMDA [62]
MDA–CNN [53]
MDA–GCNFTG [32]
Other machine learningNSEMDA [16]
Parallel fusionMatrix decompositionM2LFL [10]
Deep learningMVMTMDA [15]
Decision treeEGBMMDA [63]
EDTMDA [47]
RFMDA [38]
ABMDA [64]
Other machine learningLRSSLMDA [52]
BNPMDA [65]
ELLPMDA [66]
BLHARMDA [39]
Hybrid fusionMatrix decompositionMDLPMDA [67]
Matrix completionNIMCGCN [28]
Deep learningAEMDA [17]
NMCMDA [36]
MMGCN [9]
Decision treeLMTRDA [48]
Non-fusionMatrix decompositionGRNMF [20]
DMPred [42]
BRMDA [68]
TDRC [30]
Matrix completionNCMCMDA [69]
IMCMDA [19]
Deep learningGAEMDA [37]

A fraction of the selected models carried out parameter sensitivity analysis to conclude the impact of individual parameters on predictions. Besides, several models performed ablation studies to examine the contribution of respective sequential/parallel modules (mainly depending on the fusion schemes). Moreover, time efficiency analysis was observed in a couple of works to demonstrate the learning speed. Not to mention the nuances in extensively conducted K-fold CVs and case studies, all these imply that so far no consensus on model evaluation strategies has been reached among researchers: existing works diversified in selecting the value of K, the evaluation metrics in K-fold CV, the diseases in case studies, and the investigated modules in ablation, as well as making decisions on whether to conduct parameter sensitivity and time efficiency analyses or not. In this review paper, we comprehensively analyze the evaluation procedures carried out by the 29 state-of-the-art models and propose a possibly all-encompassing workflow to facilitate systematic evaluation of future models.

K-fold cross validation

As a resampling method, K-fold CV iteratively trained and tested a model with different folds (that is, subsets) of the known MDA data. In most studies, the value of K was set to 1, 5 and 10; when it equalled 1, the process became leave-one-out CV (LOOCV), where each subset contained exactly one sample. The evaluated model would output association scores for the samples in the test set and, based on the scores, they were ranked against the candidate samples in a descending order. Whether a test MDP was predicted to be associated or not depended on the rank threshold.

Given a threshold k, various metrics could be computed, including sensitivity (also known as recall), specificity, precision, accuracy (ACC), Matthews correlation coefficient (MCC) and F1-score. A metric reported for the top k ranking set was known as metric@k, such as recall@k and precision@k. However, it seems that metric@k was not mainstream, as only two of the 29 reviewed models used it in K-fold CV evaluation: the model of multi-view multichannel attention graph convolutional network (MMGCN) [9] computed precision and recall for the top 5% and 10% predictions, respectively, whereas the model of adaptive multi-source multi-view latent feature learning (M2LFL) [10] calculated precision for the top 10, 20, 30, 40 and 50 predictions, respectively. More models were evaluated in terms of precision and recall for all predictions (to assess the models’ overall performance), rather than for the top k ones.

Apart from these threshold-specific metrics, the receiver operating characteristic (ROC) curve was created by plotting sensitivity against 1-specificity at different thresholds to obtain the area under the ROC curve (AUC). As an indicator of overall performance, AUC represented the probability that the model would rank a randomly selected positive sample (that is, an associated MDP) above a randomly selected negative one (that is, an unlabelled MDP) [11]. Furthermore, the precision-recall (PR) curve could also be created to complement AUC, via plotting precision against recall at different thresholds to derive the area under the PR curve (AUPR). This metric was proven to be particularly informative of a model’s performance with imbalanced datasets [12], such as the MDA dataset with more negative samples than positive ones. In this case, a large change of FP would result in only a small change of 1-specificity, which could not be effectively visualized by ROC nor reflected in the value of AUC. AUPR, in contrast, used precision instead of 1-specificity to better capture the influence of a large negative sample size on the prediction performance.

Lastly, a less frequently-used metric was normalized discounted cumulative gain (NDCG) [13], commonly adopted to assess the performance of recommendation systems like web search engines in the field of information retrieval. An MDA prediction model could be viewed as a recommender in the sense that possibly relevant microRNAs (miRNAs) were suggested for a query disease d. NDCG was calculated in four steps: first, the recommended miRNAs were ranked in terms of their relevance scores with d; second, each score was discounted via dividing it by the log of the corresponding miRNA’s rank; third, the discounted relevance scores were summed up to give the discounted cumulative gain (DCG) of the recommendations for d; fourth, DCG was normalized by the ideal DCG, computed based on the ground truth ranks of miRNAs. Both NDCG and AUC ranged between 0 and 1, with 1 indicating a perfect recommender.

Figure 1 shows the frequency of usage for each metric among the 29 computational models reviewed in this paper. Every model used AUC, implying that it has been the generally accepted metric for comparing performance. Recall was the second most frequently used metric (for eight times), resonating with the fact that the rate of true positives was a primary concern in the MDA prediction task. Precision, F1 and AUPR were utilized at the same frequency of seven times, perhaps because the latter two were derived from the former; all three could serve as an effective complement to AUC. ACC, the commonly used metric in classification problems, however, was only applied for five times to MDA prediction. We reckon the reason to be that ACC could fail to evaluate MDA models because the dataset was highly imbalanced and a naïve algorithm inferring all test samples to be negative would easily achieve a high ACC. Both MCC and NDCG were used once. Nonetheless, we expect to observe more future studies to include the former in evaluation, because recent research proved that MCC takes advantage of all four elements of the confusion matrix (namely, TP, TN, FP and FN) in its calculation and is hence more informative than F1 and ACC [14]. As for the latter, we attribute the less frequent usage to its major drawback: no punishment on false positive miRNAs that were recommended to the query disease or on true positive miRNAs that were neglected in recommendation. This made NDCG less informative for depicting performance than other metrics. As shown in the model of multi-view multi-task MDA prediction (MVMTMDA) [15], the only model using the metric, the best NDCG was achieved at 0.5030, while AUC and recall reached 0.8521 and 0.7603, respectively.

Frequency of evaluation metrics used by the reviewed 29 models.
Figure 1

Frequency of evaluation metrics used by the reviewed 29 models.

In addition, by further analyzing the evaluation metrics for models belonging to the five different learning types, we discovered that the eight deep learning models tended to more likely use precision (five out of eight), recall (six out of eight), ACC (three out of eight), F1-score (five out of eight) and AUPR (four out of eight) than the remaining models based on matrix decomposition, matrix completion, decision tree, or other machine learning. We reckon the reason to be that evaluation in terms of these metrics (together with AUC) is a common practice adopted by the deep learning community to gain a better understanding of the performance of the complicated neural networks. This practice should also be encouraged for assessing models of the other four learning types in the future.

As a final point, in the cases of K > 1 in K-fold CV, the process was usually repeated for 10, 20 or 100 times to mitigate the effect of randomness in data partition [16–18]. The metric obtained from multiple rounds of CVs, whether it be AUC, AUPR or else, was reported and compared in the form of average ± SD. Besides, paired t-test could be performed to determine the statistical significance of difference between two models’ predicted rankings [19] or metric values (such as AUCs) attained from multiple rounds of CVs [19, 20].

Parameter sensitivity analysis

Analyzing sensitivity promoted understanding of the relationship between a model and its parameters, thereby concluding with the optimal parameter combination to achieve the best predictive performance under the current dataset. It is worth noting that this optimum was data-dependent, and that using the same parameters for a different dataset might not guarantee the best outcome. Usually, the parameter space was explored by the grid search technique as follows: researchers manually specified a vector of viable values for each parameter, formed a grid from the vectors for all parameters, and then exhaustively traversed the grid to observe how the model’s evaluation metrics varied with different combinations of parameters.

Analyses performed by state-of-the-art models

Among the 29 state-of-the-art models, five matrix decomposition-or completion-based models and six deep learning-based models carried out parameter sensitivity analysis. Here, we present the analytical procedures of these models in groups according to their machine learning categories.

Matrix decomposition/completion-based models

Multi-layer linear projection for miRNA-disease association

The model named multi-layer linear projection for MDA (MLPMDA) prediction [21] contained n sequentially integrated computation layers, each projecting the input latent matrix (which was the output of the previous layer) onto a new latent one. The input of the first layer was a heterogeneous matrix |$H$|⁠, constructed from the following three sources. First, the MDA adjacency matrix |$A$| was formulated based on the Human microRNA Disease Database (HMDD) [22] and updated by the similarity information of top |$k$| neighbours of each miRNA and each disease [23]. Second, the integrated miRNA similarity matrix MS was attained by combining the miRNA functional similarity matrix MFS [5, 24], the miRNA semantic similarity matrix |$\text{M}{\text{S}}_{\text{sem}}$| based on gene ontology (GO) annotations of miRNA target genes in the miRTarBase [25], and the miRNA Gaussian interaction profile (GIP) kernel similarity matrix KM [24, 26]. Here, MFS and MSsem were firstly combined via weighted averaging into an intermediate similarity matrix, which was then integrated with KM to form MS with α and β controlling the respective weights. Third, the integrated disease similarity matrix DS was obtained by integrating the disease semantic similarity matrix DSS [5] and the disease Gaussian interaction profile kernel similarity matrix FD [24, 26], with the contribution of the latter two matrices controlled by weight parameters |$w$| and |$o$|⁠, respectively. In addition, the model used Zm and Zd as the balance coefficients of MS and DS, respectively, for building |$H$|⁠. Lastly, the hyperparameter η controlled the contribution of the projection loss term to the objective function of each computation layer. Sensitivity analysis of these hyperparameters including |$n$|⁠, |$k$|⁠, α, β, |$w$|⁠, |$o$|⁠, |${Z}_m$| and η were performed based on 10-fold CV; the optimal value of each hyperparameter was determined via observing how its different values affected AUC while fixing the remaining hyperparameters at relatively satisfactory values [21].

Multi-source multi-view latent feature learning

The model [10] formulated two objective functions from the perspectives of miRNAs and diseases, respectively. The former aimed to learn miRNA latent feature representations from six miRNA-related matrices, including the miRNA functional similarity matrix MFS, the miRNA sequence similarity matrix MseqS based on miRNA mature sequences from miRBase [8], the miRNA semantic similarity matrix MSS [27], the miRNA GIP kernel similarity matrix KM, the MDA adjacency matrix |$A$| (used as binary association features for miRNAs) and the latent matrix (with dimensionality controlled by the parameter σm) obtained by applying singular value decomposition (SVD) to KM. The latter sought to learn disease latent feature representations from four disease-related matrices, including the disease semantic similarity matrix DSS, the disease GIP kernel similarity matrix KD, |$A$| (used as binary association features for diseases) and the latent matrix (with dimensionality controlled by the parameter σd) obtained by applying SVD to KD. The objective functions contained three regularization coefficients λ, μ and η for respectively controlling the contributions of different terms to latent representation learning. The grid created to search for their optimal values was given by the following 3 × 6 matrix
(1)

Parameter sensitivity was analyzed by inspecting how the AUPR metric of 5-fold CV changed when fixing each of the three parameters in turn and varying the other two according to the values in Eq. (1). Moreover, the SVD dimensionality parameters σm and σd were further analyzed from the value range of |$\big(0.1,0.2,...,0.9\big)$| with λ, μ and η kept at their respective optimal values that collectively achieved the highest AUPR.

Nonlinear inductive matrix completion with graph convolutional network

The model named nonlinear inductive matrix completion with graph convolutional network (NIMCGCN) [28] used two l-layered GCN encoders to respectively learn miRNA and disease feature representations from the integrated miRNA similarity matrix MS and the integrated disease similarity matrix DS, then adopted two h-layered nonlinear neural networks to obtain nonlinear transformations on the learnt representations, and finally fed the transformed results to an induction matrix completion (IMC) model [29]. These modules were integrated in an end-to-end supervised learning framework, whose objective function contained a bias parameter α to control the weights of matrix projections on positive and unlabelled sample sets. Sensitivity analysis of α, |$l$| and |$h$| were conducted by varying the value of one parameter to test its effect on 5-fold CV AUC and fixing the other two parameters. Specifically, α was analyzed within the range of |$\big(0.1,0.2,...,0.9\big)$|⁠, whereas both |$l$| and |$h$| were assessed within the range of |$\big(1,2,3,4\big)$|⁠.

Graph regularized nonnegative matrix factorization

The model named graph regularized nonnegative matrix factorization (GRNMF) for MDA prediction [20] incorporated a graph Laplacian regularization term and a Tikhonov (L2) term into the objective function of the original nonnegative matrix factorization (NMF) model. The input data included the miRNA functional similarity matrix MFS, the disease semantic similarity matrix DSS and the MDA adjacency matrix |$A$| updated by the similarity information of top |$k$| neighbours of each miRNA and each disease. The updated version of |$A$| was factorized into two latent matrices whose dimensionality was determined by the parameter |$d$|⁠. Multiplying the two latent matrices would give the predicted association score matrix |$Z$|⁠. Sensitivity of |$k$|⁠, |$d$| and the maximum number of iterations |$t$| for solving the objective function were analyzed via investigating how AUC was affected by the value of |$k$| ranging from 20 to 140 with 20 increments under constant |$d$| and |$t$|⁠. Similarly, the impact of |$d$| on AUC was evaluated within the range of |$\big(1,2,...,10\big)$|⁠, followed by an analysis of the value of |$t$| ranging from 5 to 30 with five increments.

Tensor decomposition with relational constraints

The model named tensor decomposition with relational constraints (TDRC) [30] was able to identify the type of associations between miRNAs and diseases, rather than simply binary associations. Therefore, unlike other models taking the MDA adjacency matrix |$A$| as input, TDRC made predictions based on a 3D miRNA-disease-type tensor MDT, which was created by combining |$A$|⁠, the miRNA functional similarity matrix MFS and the disease semantic similarity matrix DSS. The model aimed to reconstruct MDT as the form of |$\big[{L}_m,{L}_d,{L}_t\big]$| where |${L}_m$|⁠, |${L}_d$| and |${L}_t$| were the latent representation matrices of miRNA, disease and type features, respectively; the rank of |$\big[{L}_m,{L}_d,{L}_t\big]$| was controlled by the parameter |$r$|⁠. The reconstruction was achieved via an objective function of CANDECOMP/PARAFAC (CP) tensor decomposition [31], with additional regularization terms on projection matrices, |${L}_m$| and |${L}_d$|⁠. Sensitivity analysis was performed on the three corresponding regularization coefficients λ, α and β as well as the rank parameter |$r$|⁠. The authors asserted that λ imposed less influence on the predictive performance, hence empirically setting it to 0.001. Then, α, β and |$r$| were traversed according to the following matrix
(2)

Two parameters were fixed in turn and the remaining one was varied to investigate its impact on precision@1 under the 10-fold CV experimental setting.

Deep learning-based models

miRNA-disease associations based on graph convolutional networks vis graph sampling through the feature and topology graph

The model named MDAs based on graph convolutional networks vis graph sampling through the feature and topology graph [32] built a homogeneous feature and topology graph |$G$| that encompassed both miRNA and disease similarity features and incorporated the topological information of the MDA adjacency matrix |$A$|⁠. MDPs were considered as the nodes in |$G$| and the node feature matrix |$F$| was obtained by combining the integrated miRNA similarity matrix MS and the integrated disease similarity matrix DS. An edge between two nodes was constructed if they were neighbours identified by a |$k$|-nearest neighbour (KNN) classifier. Specifically, the KNN classifier was used to predict the labels of MDP nodes and, if the predicted label of a node was the same as the true label, the node would be considered as a correctly prediction one. For an investigated node |$u$|⁠, each of its |$k$| nearest nodes (in terms of Euclidean distances calculated based on |$F$|⁠) was predicted by KNN and only those correctly predicted ones were regarded as the |$u$| neighbours in |$G$|⁠. Then, a novel GCN module based on graph sampling and normalization was trained with |$G$| and |$F$| to make MDA predictions. Sensitivity analysis was performed on the parameter |$k$| of the KNN classifier with different values including |$\big(1,3,5,7,10,15\big)$| in terms of ACC, precision, recall, AUC, F1-score and AUPR under the 5-fold CV experimental setting.

Multi-view multi-task miRNA-disease association

The multi-view multi-task MDA prediction model [15] utilized associations between long non-coding RNAs (lncRNAs) and miRNAs [33] as the auxiliary information for predicting MDAs. Three fully-connected neural networks NNd, NNm and NNlnc were fitted in parallel to respectively learn latent representation matrices for diseases, miRNAs and lncRNAs. Subsequently, the matrices for diseases and miRNAs were combined to generate the predicted association score matrix |$Z$|⁠. As a supervised learning model, MVMTMDA required not only positive instances including MDAs and lncRNA–miRNA interactions (LMIs) but also negative instances of MDPs and lncRNA–miRNA pairs (LMPs). The latter instances were randomly sampled from the unlabelled data and the size was controlled by the negative sampling ratio γ. Parameter sensitivity analysis concerned with γ and the number of neural network layers |$n$|⁠. With the same size of negative samples, MVMTMDA was evaluated with |$n=2,3,\text{or}\ 4$| in terms of AUPR, the ROC curve, hit ratio (HR) [15], NDCG and training loss under the 5-fold CV experimental setting. Then, the effect of |$\gamma =1,3\kern0.5em \text{or}\ 5$| on the PR curve and AUPR was analyzed with a fixed value of |$n$|⁠.

Auto-encoder for inferring miRNA-disease associations

The model named deep auto-encoder for inferring MDAs [17] firstly used two regression models to learn disease and miRNA latent representation vectors from the integrated disease similarity matrix DS and the integrated miRNA similarity matrix MS, respectively. Here, DS (or MS) was created by taking weighted average of the disease semantic similarity matrix DSS (or the miRNA functional similarity matrix MFS) and the disease GIP kernel similarity matrix KD (or the miRNA GIP kernel similarity matrix KM) with a weight parameter α (or β); the size of the latent representation vector for each disease (or miRNA) was determined by parameter |${s}_d$| (or |${s}_m$|⁠). Next, the vectors for a pair of disease and miRNA were concatenated and fed into a deep autoencoder. The overall objective function for AEMDA consisted of the loss term for reconstructing the input vectors by the autoencoder and a regularization term with coefficient λ to improve the robustness of learning. Sensitivity analysis was performed on the above five parameters in terms of 5-fold CV AUCs: both α and β were evaluated within the range of |$\big(0.0,0.1,...,1.0\big)$|⁠; both |${s}_d$| and |${s}_m$| were assessed within |$\big({2}^7,{2}^8,...,{2}^{13}\big)$|⁠; λ was investigated with values of |$1$|⁠, |$0.1$| and |$0.01$|⁠.

Multi-view multichannel attention graph convolutional network

The model [9] deployed four |$n$|-layered GCNs to respectively learn miRNA and disease embeddings from the miRNA sequence similarity matrix MseqS, the miRNA functional similarity matrix MFS, the disease semantic similarity matrix DSS and the target-based disease similarity matrix TDS [34]. The size of each embedding was the same and determined by parameter |$s$|⁠. All miRNA (or disease) embeddings were concatenated to form a miRNA (or disease) feature tensor, which was then fed into a multichannel attention module [35] for dimensionality reduction and channel normalization. The resulting normalized miRNA (or disease) channels were aggregated by a convolutional neural network (CNN) module with |$f$| filters to yield the final miRNA (or disease) embeddings. Combining these miRNA and disease embeddings would give the predicted association score matrix |$Z$|⁠. MMGCN was trained by gradient descent with the learning rate η controlling the step size. Sensitivity analysis of the above four parameters were conducted in terms of average AUC under the experimental setting of 5-fold CV for 10 repetitions: the ranges of values for investigation were respectively given by |$n=\big(2,3,4\big)$|⁠, |$r=\big({2}^5,{2}^6,{2}^7,{2}^8\big)$|⁠, |$f=\big({2}^5,{2}^6,{2}^7,{2}^8\big)$| and |$\eta =\big(0.1,0.01,0.001\big)$|⁠.

Neural multiple-category miRNA-disease association

Like TDRC [30], the neural multiple-category MDA prediction model [36] could also predict the association categories of MDAs. The model applied two |${n}_{\text{RGCN}}$|-layered relational graph convolutional networks (RGCNs) to learning the latent representations of miRNA node features and disease node features in the heterogeneous graph |$H$|⁠, respectively. Then, another two neural multi-relational (NMR) decoders were used to respectively reconstruct the miRNA and disease node features. Each decoder was comprised of multiple |${n}_{\text{csNN}}$|-layer MDA category-specific neural networks, all outputs of which were fed into a |${n}_{\text{gNN}}$|-layer global neural network that produced the reconstructed features. The RGCN encoders and the NMR decoders formed an end-to-end learning framework with the bias parameter α controlling the contribution of learning on positive and unlabelled samples in the objective function. The optimized miRNA and disease latent representations were combined to generate the prediction association score matrix. Sensitivity analysis of the four parameters was carried out under the 10-fold CV experimental setting using three evaluation metrics including precision@1, recall@1 and F1-score@1: both |${n}_{\text{RGCN}}$| and |${n}_{\text{csNN}}$| were investigated within the range of |$\big(1,2,3,4\big)$|⁠; |${n}_{\text{gNN}}$| was evaluated within the range of |$\big(0,1,2,3\big)$|⁠; α was assessed within the range of |$\big(0.0,0.2,...,1.0\big)$|⁠.

Graph auto-encoder model for predicting potential miRNA-disease associations

The graph auto-encoder model for predicting potential MDAs [37] contained two GNN-based encoder for respectively learning |$s$|-dimensional embeddings of miRNAs and diseases from the MDA adjacency matrix |$A$|⁠, the integrated miRNA similarity matrix MS and the integrated disease similarity matrix DS. Each encoder was constructed by an aggregator function and |$n$|-layered multi-layer perceptron (MLP). Both miRNA and disease embeddings were then fed into a bilinear decoder to obtain the reconstructed |$A$| as the predicted association score matrix |$Z$|⁠. The model’s performance in terms of 5-fold CV AUC was evaluated with |$r=\big({2}^5,{2}^6,{2}^7,{2}^8\big)$| and |$n=\big(1,2,...,7\big)$|⁠.

Types of analyzed parameters

Parameters involved in the aforementioned sensitivity analyses of the 10 state-of-the-art models can be classified into three types (see Table 2). First, matrix decomposition/completion-related parameters included the regularization coefficients in the objective function for matrix decomposition/completion (MLPMDA [21], M2LFL [10], NIMCGCN [28] and TDRC [30]), the dimensionality parameter for latent feature representation matrices or vectors of miRNAs and diseases (M2LFL [10], GRNMF [20] and TDRC [30]), the number of computation layers for matrix decomposition/completion (MLPMDA [21]), and the maximum number of iterations for optimizing the objective function (GRNMF [20]). Second, deep learning-related parameters included the regularization coefficients in the loss function of neural networks (M2LFL [10] and GRNMF [20]), the size of miRNA and disease embeddings (AEMDA [17], MMGCN [9] and GAEMDA [37]), the learning rate (MMGCN [9]), the number of filters in CNN (MMGCN [9]), the number of neural network layers (MVMTMDA [15], MMGCN [9], NMCMDA [36], GAEMDA [37] and NIMCGCN [28]) (Note: the matrix completion-based model NIMCGCN also analyzed the number of neural network layers because it involved GCNs to learn feature representations of miRNAs and diseases, before performing IMC.), and the negative sampling ratio (MVMTMDA [15]). Third, miRNA -and disease-related parameters included miRNAs and diseases’ k nearest neighbours whose similarity features were used to update the MDA adjacency matrix (MLPMDA [21] and GRNMF [20]), the weights for constructing integrated similarity matrices of miRNAs and diseases (MLPMDA [21] and AEMDA [17]), and the parameter k of the KNN classifier for constructing the edges between the MDP nodes in the homogeneous graph (MDA–GCNFTG [32]).

Table 2

Three types of parameters summarized from the procedures of parameter sensitivity analysis by 11 state-of-the-art computational models

Type of parameters analyzedModels performing the analysis
Matrix decomposition/completion-related parameters
 The regularization coefficients in the objective functionMLPMDA [21]
M2LFL [10]
NIMCGCN [28]
TDRC [30]
 The dimensionality parameter for latent feature representation matrices of miRNAs and diseasesM2LFL [10]
GRNMF [20]
TDRC [30]
 The number of computation layers for matrix decomposition/completionMLPMDA [21]
 The maximum number of iterations for optimizing the objective functionGRNMF [20]
Deep learning-related parameters
 The regularization coefficients in the loss functionM2LFL [10]
GRNMF [20]
 The size of miRNA and disease embeddingsAEMDA [17]
MMGCN [9]
GAEMDA [37]
 The learning rateMMGCN [9]
 The number of filters in convolutional neural networkMMGCN [9]
 The number of neural network layersMVMTMDA [15]
MMGCN [9]
NMCMDA [36]
GAEMDA [37]
NIMCGCN [28]
 The negative sampling ratioMVMTMDA [15]
miRNA- and disease-related parameters
 MiRNAs and diseases’ k nearest neighbours, whose similarity features were used to update the MDA adjacency matrixMLPMDA [21]
GRNMF [20]
 Weights for constructing integrated similarity matrices of miRNAs and diseasesMLPMDA [21]
AEMDA [17]
 Parameter k of the KNN classifier for constructing the edges between the MDP nodes in the homogeneous graphMDA–GCNFTG [32]
Type of parameters analyzedModels performing the analysis
Matrix decomposition/completion-related parameters
 The regularization coefficients in the objective functionMLPMDA [21]
M2LFL [10]
NIMCGCN [28]
TDRC [30]
 The dimensionality parameter for latent feature representation matrices of miRNAs and diseasesM2LFL [10]
GRNMF [20]
TDRC [30]
 The number of computation layers for matrix decomposition/completionMLPMDA [21]
 The maximum number of iterations for optimizing the objective functionGRNMF [20]
Deep learning-related parameters
 The regularization coefficients in the loss functionM2LFL [10]
GRNMF [20]
 The size of miRNA and disease embeddingsAEMDA [17]
MMGCN [9]
GAEMDA [37]
 The learning rateMMGCN [9]
 The number of filters in convolutional neural networkMMGCN [9]
 The number of neural network layersMVMTMDA [15]
MMGCN [9]
NMCMDA [36]
GAEMDA [37]
NIMCGCN [28]
 The negative sampling ratioMVMTMDA [15]
miRNA- and disease-related parameters
 MiRNAs and diseases’ k nearest neighbours, whose similarity features were used to update the MDA adjacency matrixMLPMDA [21]
GRNMF [20]
 Weights for constructing integrated similarity matrices of miRNAs and diseasesMLPMDA [21]
AEMDA [17]
 Parameter k of the KNN classifier for constructing the edges between the MDP nodes in the homogeneous graphMDA–GCNFTG [32]
Table 2

Three types of parameters summarized from the procedures of parameter sensitivity analysis by 11 state-of-the-art computational models

Type of parameters analyzedModels performing the analysis
Matrix decomposition/completion-related parameters
 The regularization coefficients in the objective functionMLPMDA [21]
M2LFL [10]
NIMCGCN [28]
TDRC [30]
 The dimensionality parameter for latent feature representation matrices of miRNAs and diseasesM2LFL [10]
GRNMF [20]
TDRC [30]
 The number of computation layers for matrix decomposition/completionMLPMDA [21]
 The maximum number of iterations for optimizing the objective functionGRNMF [20]
Deep learning-related parameters
 The regularization coefficients in the loss functionM2LFL [10]
GRNMF [20]
 The size of miRNA and disease embeddingsAEMDA [17]
MMGCN [9]
GAEMDA [37]
 The learning rateMMGCN [9]
 The number of filters in convolutional neural networkMMGCN [9]
 The number of neural network layersMVMTMDA [15]
MMGCN [9]
NMCMDA [36]
GAEMDA [37]
NIMCGCN [28]
 The negative sampling ratioMVMTMDA [15]
miRNA- and disease-related parameters
 MiRNAs and diseases’ k nearest neighbours, whose similarity features were used to update the MDA adjacency matrixMLPMDA [21]
GRNMF [20]
 Weights for constructing integrated similarity matrices of miRNAs and diseasesMLPMDA [21]
AEMDA [17]
 Parameter k of the KNN classifier for constructing the edges between the MDP nodes in the homogeneous graphMDA–GCNFTG [32]
Type of parameters analyzedModels performing the analysis
Matrix decomposition/completion-related parameters
 The regularization coefficients in the objective functionMLPMDA [21]
M2LFL [10]
NIMCGCN [28]
TDRC [30]
 The dimensionality parameter for latent feature representation matrices of miRNAs and diseasesM2LFL [10]
GRNMF [20]
TDRC [30]
 The number of computation layers for matrix decomposition/completionMLPMDA [21]
 The maximum number of iterations for optimizing the objective functionGRNMF [20]
Deep learning-related parameters
 The regularization coefficients in the loss functionM2LFL [10]
GRNMF [20]
 The size of miRNA and disease embeddingsAEMDA [17]
MMGCN [9]
GAEMDA [37]
 The learning rateMMGCN [9]
 The number of filters in convolutional neural networkMMGCN [9]
 The number of neural network layersMVMTMDA [15]
MMGCN [9]
NMCMDA [36]
GAEMDA [37]
NIMCGCN [28]
 The negative sampling ratioMVMTMDA [15]
miRNA- and disease-related parameters
 MiRNAs and diseases’ k nearest neighbours, whose similarity features were used to update the MDA adjacency matrixMLPMDA [21]
GRNMF [20]
 Weights for constructing integrated similarity matrices of miRNAs and diseasesMLPMDA [21]
AEMDA [17]
 Parameter k of the KNN classifier for constructing the edges between the MDP nodes in the homogeneous graphMDA–GCNFTG [32]

Different machine learning categories tended to analyze the sensitivity of diverse parameters: matrix decomposition/completion-based models concerned with the dimensionality of latent representation matrices, whereas deep learning-based models considered the size of embeddings (vectors); the former models tested the maximum number of iterations, while the latter assessed the learning rate; a matrix decomposition-based model, if performing sequential fusion like MLPMDA [21], would investigate the number of matrix decomposition layers and, in contrast, a deep learning-based model could evaluate the number of neural network layers regardless of what fusion scheme it carried out. Moreover, additional parameters could be analyzed by the latter type of models, such as the number of filters (if a model contained CNN modules) and the negative sampling ratio (due to the typical supervised learning framework of this model type). As for similarities in the analyses, both matrix decomposition/completion- and deep learning-based models examined the regularization coefficients in the objective function and the loss function, respectively. Besides, the miRNA- and disease-related parameters could be commonly analyzed, because all models relied on the MDA adjacency matrix as the gold standard and most of them used the miRNA and disease integrated similarity matrices as the auxiliary information for making predictions.

It should be noted that not all categories (refer to both fusion or non-fusion schemes and machine learning types in Table 1) involved sensitivity analysis of category-specific parameters. In particular, no parallel fusion model carried out analysis of the ensemble weights. We deduce the reason to be that such models were typically based on homogeneous ensemble learning to combine the prediction outcomes from the same type of classifiers, such as the three neural networks in MVMTMDA [15] and the two similarly formulated matrix projection functions in M2LFL [10]. Thus, there might be limited added value of delving into the sensitivity of their weights. However, sensitivity analysis of ensemble weights will become necessary if more future models begin to adopt heterogeneous ensemble learning. Moreover, it is noticeable that no parameter sensitivity analyses were performed on either decision tree models or other machine learning models. We conjecture the reason for this to be that all these models were based on popular classifiers, such as random forest in the model of Random Forest for MDA prediction (RFMDA) [38] and error correction-based k-nearest neighbour regression (ECkNN) in the model of bipartite local models and hubness-aware regression for MDA (BLHARMDA) [39], which have been well understood by existing research (with empirical findings in parameter-tuning) and hence may not need further exploration of the parameters. Despite this deduction, we expect that more works will emerge in these learning types and be used as sources for summarizing the parameters involved in sensitivity analysis.

Ablation study

As a rule of thumb for MDA prediction, most computational models are devised by developing variants of existing computational modules and/or adding new ones to existing models. A natural question that arises is why the modified and/or additional sub-models are necessary, which can be answered by ablation study. Each investigated sub-model is removed to observe the change of predictive performance, thereby helping understand its contribution to the overall performance. A model feasible for ablation study needs to have the property of graceful degradation [40], that is, the ability of properly functioning when a proportion of sub-models are failed or absent. This resonates with the fact that not every MDA prediction model (12 out of the 29 reviewed works) included ablation study in the evaluation process.

Ablation performed by state-of-the-art models

In this subsection, we present the 12 models in groups based on the machine learning types (as depicted in Table 1). Each model had one or more modules removed or replaced by simpler ones to form the model’s variants, which were then compared with the full model. In each group, we further discuss the effectiveness of the ablated modules in improving the performance of the models. The experimental setting was usually 5-fold CV with AUC as the evaluation metric, and the effectiveness of performance increase was measured by the percentage change of AUC between exclusion and inclusion of the module
(3)

It should be noted that, although most of the reviewed models were evaluated under similar experimental settings, the training samples and the sample size might not be consistent. Hence any statistic of effectiveness reported in this subsection was only meant to reflect the impact of an investigated module on a specific model (in which ablation study was carried out to obtain the statistic), rather than on general ones. To safely conclude general patterns of the module’s effectiveness, more analysis with the same benchmark dataset under the same experimental setting should be further conducted across a large suite of models.

Matrix decomposition/completion-based models

Multi-layer linear projection for miRNA-disease association

As mentioned in parameter sensitivity analysis, the model [21] contained a series of computation layers, each projecting the input latent matrix onto a new latent one. As the model input, the heterogeneous matrix |$H$| involved the MDA adjacency matrix |$A$| updated by the similarity information of top |$k$| neighbours of each miRNA and each disease, the intermediate similarity matrix (calculated as the weighted average of the miRNA functional similarity matrix MFS and the miRNA semantic similarity matrix MSsem), the integrated miRNA similarity matrix MS (constructed by the intermediate similarity matrix and the miRNA GIP kernel similarity matrix KM), the integrated disease similarity matrix DS (formed by the disease semantic similarity matrix DSS and the disease Gaussian interaction profile kernel similarity matrix KD). Two balance coefficients |${Z}_m$| and |${Z}_d$| were used to respectively control the weight of MS and DS for building |$H$|⁠. Moreover, additional computations were performed from two single-views of MFS and MSsem, respectively, to capture the information loss of miRNA similarity resulted from the computation layers taking |$H$| as input. Ablation study assessed the following procedures regarding their contribution to improvement of 10-fold CV AUC; the assessment began with a simple model without involving any procedures, gradually added up each investigated procedure, and compared the performance between the model with and without the procedure added. Specifically, the model taking the weighted average of MFS and MSsem outperformed that taking the simple average of the two matrices by 1.1%; the model involving KM and KD in respectively building MS and DS outperformed that without these two matrices by 0.4%; the model updating |$A$| using neighbourhood information outperformed that directly using |$A$| without any updates by 0.5%; the model containing |${Z}_m$| and |${Z}_d$| to balance the weight of MS and DS for building |$H$| outperformed that without any balancing operation by 2.3%; the model implementing stacked layers outperformed that with single computation layer by 0.3%; and lastly, the model with additional single-view computation outperformed that without it by 0.3%.

Nonlinear inductive matrix completion with graph convolutional network

As described in the section of parameter sensitivity analysis, this model [28] used two GCN encoders to respectively learn miRNA and disease feature representations from the integrated miRNA similarity matrix MS and the integrated disease similarity matrix DS, then adopted two nonlinear neural networks to nonlinearly transform the learnt representations, and finally fed the transformed results to the IMC model. In the ablation study of NIMCGCN, the nonlinear neural networks were replaced by single layered linear projection modules and the GCN encoders were changed to similarity matrix encoders [41]. Both variants were compared with the full model. Experimental results show that, compared with the counterparts, the GCN encoders were able to enhance the 5-fold CV AUC by 3.6%; and that the nonlinear neural networks could improve 5-fold CV AUC by 4.9% over the linear projection modules.

Graph regularized nonnegative matrix factorization

As illustrated in the previous section, the model [20] updated the adjacency matrix |$A$| using the similarity information (based on the miRNA functional similarity matrix MFS and the disease semantic similarity matrix DSS) of top |$k$| neighbours of each miRNA and each disease. Such information was known as weighted k nearest neighbour profiles (WKNNP). The updated version of |$A$| was factorized by GRNMF into two latent matrices, the product of which yielded the predicted association score matrix |$Z$|⁠. Ablation study of this model removed WKNNP and directly used |$A$| in matrix factorization. Evaluated with 13 diseases, WKNNP was able to improve the model’s 5-fold CV AUC by 5.8% on average.

Disease-miRNA association prediction

The disease-miRNA association prediction model [42] performed NMF on the heterogeneous graph |$H$| that was constructed from the miRNA functional similarity matrix MFS, the integrated disease similarity matrix DS, and the MDA adjacency matrix |$A$|⁠. The three matrices were modelled individually with respect to the predicted association score matrix |$Z$|⁠, before being used to formulate the NMF objective function. A sparse penalty term imposed on |$Z$| was further added to the function to improve the learning outcome. In ablation study, this term was removed to assess its effect on the predictive performance. Evaluated with 15 diseases, sparse penalty could boost the 5-fold CV AUC by 5.2% on average.

Inductive matrix completion-based model for miRNA-disease association

The inductive matrix completion-based model for MDA prediction [19] relied on IMC to recover the missing entries in the MDA adjacency matrix |$A$| with the auxiliary matrices MS and DS. These three matrices were ablated in turn to investigate their respective influences on the model’s performance. Under the experimental setting of LOOCV for 50 repetitions, the averaged results indicate that inclusion of |$A$|⁠, MS, and DS in computation improved AUC by 61.1%, 23.7% and 0.61%, respectively.

The abovementioned experimental results conveyed the effectiveness of each ablated modules in improving the predictive performance of their corresponding models, though the extent of improvement was various. In MLPMDA [21], balancing MS and DS via coefficients |${Z}_m$| and |${Z}_d$| for building the model input |$H$| was the most beneficial module (2.3%), whereas both stacked computation layers and single-view computation brough in marginal increase (0.3%). In NIMCGCN [28], the nonlinear neural networks for transforming latent representations (4.9%) was more effective than the GCN encoders for learning latent representations (3.6%). Moreover, the WKNNP information used for updating |$A$| effectively promoted the predictive performance of GRNMF [20] (5.8%). In the objective function of DMPred [42], the sparse penalty on the predicted association score matrix |$Z$| was proven to enhance the performance by 5.2%. Lastly, among the three data sources of IMCMDA [19], the MDA adjacency matrix |$A$| was the most important input with 61.6% performance increase, followed by the miRNA functional similarity matrix MS (23.7%) and the disease semantic similarity matrix DS (0.61%).

Deep learning-based models

miRNA-disease association prediction model based on convolutional neural networks

The MDA prediction model based on convolutional neural networks [43] firstly utilized a network-based feature extractor to learn effective miRNA and disease feature vectors from a three-layer miRNA–gene-disease graph MGD. The features were then fed into an autoencoder for generating lower-dimensional embeddings, which were finally used to fit a CNN predictor for inferring potential MDAs. Ablation study of the model concerned with the feature extractor and the CNN predictor: the former was replaced by a simple generator of binary feature representations to create the BR–CNN model; the latter was changed to support vector machine (SVM) to form the MDA–SVM model. Under the 10-fold CV experimental setting, MDA–CNN outperformed BR–CNN and MDA–SVM by 4.2% and 2.9% respectively in terms of AUC.

miRNA-disease associations based on graph convolutional networks vis graph sampling through the feature and topology graph

As introduced in the previous section, the model [32] made MDA predictions by constructing the homogeneous feature and topology graph |$G$| (whose nodes represented MDPs and edges were formed via the KNN classifier) and using the novel GCN module for sampling and normalizing subgraphs from |$G$|⁠. Ablation study was conducted on the latter, replaced by the traditional GCN [44]. This variant was compared with the proposed model on several experimental tasks of MDA prediction for (i) both miRNAs and diseases with known associations, (ii) new miRNAs without known associated diseases and (iii) new diseases without known associated miRNAs; each type of task was further split into cases of balanced and unbalanced training examples. The maximum increase of 5-fold CV AUC resulted from the novel GCN module was 6.9% on the task of predicting MDAs for new miRNAs with balanced training data.

Neural multiple-category miRNA-disease association

As presented in the previous section, this model [36] consisted of RGCN encoders for learning the latent node representations in the heterogeneous graph |$H$| and NMR decoders for reconstructing the node features. Various combinations of encoders and decoders were included in ablation study: (i) DistMult-RGCN containing decoders based on DistMult factorization [45] and RGCN encoders, (ii) LMR–RGCN comprised of linear multi-relational (LMR) decoders and RGCN encoders and (iii) NMR–GCN including NMR decoders and GCN encoders. The experiments were carried out with two multi-category MDA datasets based on the HMDD v3.2 database [46]. The results show that NMCMDA with the NMR–RGCN architecture at most outperformed the three counterparts by 1.6%, 0.2% and 1.7%, respectively, in terms of 10-fold CV AUC.

Multi-view multichannel attention graph convolutional network

As elaborated in the previous section, this model [9] deployed four GCNs to respectively learn miRNA and disease embeddings from the miRNA sequence similarity matrix MseqS, the miRNA functional similarity matrix MFS, the disease semantic similarity matrix DSS and the target-based disease similarity matrix TDS. The embeddings were concatenated and fed into the multichannel attention module for dimensionality reduction and channel normalization. The resulting normalized channels were aggregated by the CNN module to produce the final miRNA and disease embeddings, the product of which was the predicted association score matrix |$Z$|⁠. In ablation study, two variants of MMGCN were created: MMGCN-nl that only used the |$n$|th layer outputs of the GCNs as the embeddings and MMGCN-noatten that obliviated the attention module and directly deployed CNN to combine the embeddings. In terms of 5-fold CV AUC, MMGCN outperformed MMGCN-nl and MMGCN-noatten by 5.1% and 0.8%, respectively. Furthermore, the four learning views corresponding to the input matrices MseqS, MFS, DSS and TDS were ablated in turn as follows: the full model containing all four views was compared with that covering three views and excluding the ablated one. For instance, to assess the contribution of MseqS to the predictive performance, MMGCN with MseqS, MFS, DSS and TDS was evaluated against that with MFS, DSS and TDS only; as a result, the former outperformed the latter by 0.9%, indicating that MseqS could increase AUC by 0.9%. Similarly, the percentage increases of AUC for MFS, DSS and TDS were reported as 1.0%, 1.5% and 0.4%, respectively.

Ablation study was carried out in the above four deep learning-based models, each with different experimental results. In MDA–CNN [43], the network-based feature extractor was slightly more effective in promoting AUC than the CNN predictor (4.2% versus 2.9%). The novel GCN module proposed in MDA–GCNFTG [32] could at most outperform the traditional one by 6.9%. In NMCMDA [36], the proposed autoencoder architecture of NMR–RGCN was superior to all three counterparts, indicating the strengths of RGCN encoder (1.7%) and NMR decoder (1.6% at most). Finally, in MMGCN, the GCNs for learning miRNA and disease embeddings (5.1%) seem significantly more important than the multichannel attention module (0.8%); the latter may need further exploration of its potential. Besides, the four learning views incurred relatively similar performance increases.

Decision tree-based models

Ensemble of decision tree based miRNA-disease association

The model named Ensemble of Decision Tree based MDA prediction [47] extracted statistical features from the integrated miRNA similarity matrix MS and the integrated disease similarity matrix DS, graph theory-related features from DS and the MDA adjacency matrix |$A$|⁠, and latent vectors of matrix factorization on |$A$|⁠. A proportion of these features were randomly sampled to form the feature subset, on which principal component analysis was further performed for dimensionality reduction. The final lower-dimensional feature set was used to fit a series of decision trees based on the classification and regression trees (CART) algorithm. The PCA module was ablated to inspect its impact on the performance. The 5-fold CV results show that PCA improved AUC by 1.3%.

Logistic model tree for predicting miRNA-disease association

The model of logistic model tree for predicting MDA [48] considered the integrated disease similarity matrix DS, the integrated miRNA similarity matrix MS and the miRNA sequence embedding matrix SEM (obtained by converting miRNA sequences based on the 6-mers method [49] and then using the skip-gram model [50] for embedding generation) as feature matrices. The final feature vector for a pair of miRNA and disease was attained by concatenating the corresponding vectors from DS, MS and SEM, and was then fed into the logical model tree (LMT) based on the LogitBoost algorithm [51]. Two variants of the final feature vectors were created in ablation study: DescSeq containing vectors only from DS and SEM, and DescSim consisting of vectors only from DS and MS. Compared with DescSeq and DescSim, the full feature vectors enhanced the 5-fold AUC by 3.3% and 1.1%, respectively.

Ablation study was not as extensively conducted in decision tree-based models as the former two types of models. In EDTMDA, the PCA module for feature dimensionality reduction raised the performance by 1.3%. In LMTRDA, the feature descriptors DescSeq and DescSim exhibited 3.3% and 1.1% increases of AUC, respectively, implying that well-designed feature engineering was essential for the success of this kind of model.

Other machine learning-based models

Laplacian regularized sparse subspace learning for miRNA-disease association

The model named Laplacian regularized sparse subspace learning for MDA prediction [52] predicted MDAs through three steps: (i) data preparation including extraction of statistical feature profiles and graph theoretical feature profiles from the integrated miRNA similarity matrix MS and the integrated disease similarity matrix DS as well as construction of graph Laplacian matrices corresponding to the profiles; (ii) model formation that fitted a Laplacian regularized sparse subspace learning (LRSSL) model; (iii) optimization that solved the objective function of LRSSL to obtain the predicted association score matrix |$Z$|⁠. In ablation study, the statistical feature profiles and graph theoretical feature profiles were opted out in turn to examine their respective effect on the performance. The 5-fold CV results indicate that the two profiles respectively contributed to 0.07% and 0.04% increases of AUC.

The individual profiles enhanced the performance at marginal levels. This is because each one was informative enough to achieve a satisfactory AUC alone (0.9175 for the statistical profile and 0.9177 for the graph theoretical profile), while combining them yielded an AUC of 0.9181. Among the five other machine learning-based models (see Table 1), only LRSSLMDA carried out ablation study. Further analysis can be conducted when more models of this kind in the future involve ablated modules.

Trends of ablation study

The previous subsection discussed the effectiveness of ablated modules in improving the predictive performance of models under each machine learning category. Here, we focus on the other dimension of the taxonomy shown in Table 1, that is, different fusion or non-fusion schemes: each seemed to exhibit distinct patterns of ablation study (as summarized in Table 3). The three sequential fusion models ablated sequentially integrated modules. MLPMDA [21] replaced the stacked computation layers with a single one for matrix projection. MDA–CNN [53] removed its sequentially integrated modules in turn to analyze their effect on performance: the feature representation learning module was replaced and compared with the simple binary representation, followed by the CNN sub-model compared with the baseline SVM classifier. MDA–GCNFTG [32] changed the novel GCN module based on graph sampling and normalization to the traditional GCN. Although the two parallel fusion models, namely, EDTMDA [47] and LRSSLMDA [52], performed ablation on model-specific computational modules (the PCA module and the feature profiles, respectively), we expect more future models of this fusion scheme to analyze the contribution of individual classifiers used in ensemble learning. The hybrid fusion models evaluated parallel and/or sequential sub-models: NIMCGCN [28] compared the GCN encoders with the baseline similarity matrix encoders and neural projection with the default linear projection; NMCMDA [36] assessed the performance gaps between the proposed NMR–RGCN architecture and the three variants DistMult-RGCN, LMR–RGCN and NMR–GCN; MMGCN [9] reduced the layer size of the GCN encoders to one to see the effect of this module, then ablated the multichannel attention mechanism, and finally assessed the influence of individual miRNA/disease views on performance; LMTRDA [48] also analyzed the contribution of single miRNA/disease views. The three non-fusion models in Table 3 were GRNMF [20], DMPred [42] and IMCMDA [19], conducting ablation of the input data sources, the data pre-processing techniques, and the individual terms of the objective function, respectively.

Table 3

Ablation studies carried out in 12 reviewed computational models

Fusion or non-fusion schemeModelsAblation
Sequential fusionMLPMDA [21]• Input data sources including the miRNA similarity matrices and the disease similarity matrices
• Procedure of updating the MDA adjacency matrix using neighbourhood information
• Sequential computation layers
MDA-CNN [53]• Sequential feature extractor
• Sequential CNN module
MDA-GCNFTG [32]• Sequential GCN module
Parallel fusionEDTMDA [47]• PCA for dimensionality reduction of input features
LRSSLMDA [52]• Statistical and graph theoretical profiles for miRNAs and diseases at input level
HybridNIMCGCN [28]• Sequential GCN encoder module
• Sequential neural projection module
NMCMDA [36]• Sequential RGCN encoder module
• Sequential NMR decoder module
MMGCN [9]• Sequential GCN encoder module
• Sequential multichannel attention mechanism
• Parallel miRNA and disease views
LMTRDA [48]• Parallel miRNA and disease feature descriptors
Non-fusionGRNMF [20]• Procedure of updating the MDA adjacency matrix using neighbourhood information
DMPred [42]• Sparse penalty term in the objective function
IMCMDA [19]• Input data sources including the miRNA similarity matrix, the disease similarity matrix and the MDA adjacency matrix
Fusion or non-fusion schemeModelsAblation
Sequential fusionMLPMDA [21]• Input data sources including the miRNA similarity matrices and the disease similarity matrices
• Procedure of updating the MDA adjacency matrix using neighbourhood information
• Sequential computation layers
MDA-CNN [53]• Sequential feature extractor
• Sequential CNN module
MDA-GCNFTG [32]• Sequential GCN module
Parallel fusionEDTMDA [47]• PCA for dimensionality reduction of input features
LRSSLMDA [52]• Statistical and graph theoretical profiles for miRNAs and diseases at input level
HybridNIMCGCN [28]• Sequential GCN encoder module
• Sequential neural projection module
NMCMDA [36]• Sequential RGCN encoder module
• Sequential NMR decoder module
MMGCN [9]• Sequential GCN encoder module
• Sequential multichannel attention mechanism
• Parallel miRNA and disease views
LMTRDA [48]• Parallel miRNA and disease feature descriptors
Non-fusionGRNMF [20]• Procedure of updating the MDA adjacency matrix using neighbourhood information
DMPred [42]• Sparse penalty term in the objective function
IMCMDA [19]• Input data sources including the miRNA similarity matrix, the disease similarity matrix and the MDA adjacency matrix
Table 3

Ablation studies carried out in 12 reviewed computational models

Fusion or non-fusion schemeModelsAblation
Sequential fusionMLPMDA [21]• Input data sources including the miRNA similarity matrices and the disease similarity matrices
• Procedure of updating the MDA adjacency matrix using neighbourhood information
• Sequential computation layers
MDA-CNN [53]• Sequential feature extractor
• Sequential CNN module
MDA-GCNFTG [32]• Sequential GCN module
Parallel fusionEDTMDA [47]• PCA for dimensionality reduction of input features
LRSSLMDA [52]• Statistical and graph theoretical profiles for miRNAs and diseases at input level
HybridNIMCGCN [28]• Sequential GCN encoder module
• Sequential neural projection module
NMCMDA [36]• Sequential RGCN encoder module
• Sequential NMR decoder module
MMGCN [9]• Sequential GCN encoder module
• Sequential multichannel attention mechanism
• Parallel miRNA and disease views
LMTRDA [48]• Parallel miRNA and disease feature descriptors
Non-fusionGRNMF [20]• Procedure of updating the MDA adjacency matrix using neighbourhood information
DMPred [42]• Sparse penalty term in the objective function
IMCMDA [19]• Input data sources including the miRNA similarity matrix, the disease similarity matrix and the MDA adjacency matrix
Fusion or non-fusion schemeModelsAblation
Sequential fusionMLPMDA [21]• Input data sources including the miRNA similarity matrices and the disease similarity matrices
• Procedure of updating the MDA adjacency matrix using neighbourhood information
• Sequential computation layers
MDA-CNN [53]• Sequential feature extractor
• Sequential CNN module
MDA-GCNFTG [32]• Sequential GCN module
Parallel fusionEDTMDA [47]• PCA for dimensionality reduction of input features
LRSSLMDA [52]• Statistical and graph theoretical profiles for miRNAs and diseases at input level
HybridNIMCGCN [28]• Sequential GCN encoder module
• Sequential neural projection module
NMCMDA [36]• Sequential RGCN encoder module
• Sequential NMR decoder module
MMGCN [9]• Sequential GCN encoder module
• Sequential multichannel attention mechanism
• Parallel miRNA and disease views
LMTRDA [48]• Parallel miRNA and disease feature descriptors
Non-fusionGRNMF [20]• Procedure of updating the MDA adjacency matrix using neighbourhood information
DMPred [42]• Sparse penalty term in the objective function
IMCMDA [19]• Input data sources including the miRNA similarity matrix, the disease similarity matrix and the MDA adjacency matrix

Time efficiency

In the aforementioned evaluation procedures, metrics such as precision, recall, F1-score, AUC and AUPR were intensively used to depict a model’s capability of making correct predictions. Time efficiency analysis, from another perspective, aimed to describe the running time it took to train the model that generated reliable outcomes. This was an informative but usually overlooked procedure, only adopted by two out of the 29 reviewed models, namely, TDRC [30] and MDA–GCNFTG [32]. The two works measured time efficiency differently due to their diverse machine learning types.

The matrix decomposition-based model TDRC was compared with two counterparts of the same type, namely, the Candecomp/Parafac (CP) model [31] and the model of tensor factorization using auxiliary information (TFAI) [54], in terms of the average running time of 20 complete runs. Notably, TDRC converged within the shortest time (13.74 s), less than half of the time for CP (31.37 s) and 1/30 of the time for TFAI (431.66 s).

Deep learning usually exhibits higher computational complexity [55] than the other four machine learning types in Table 1; models of the former category typically require more iterations to optimize weights. Perhaps for this reason, the GCN-based model MDA–GCNFTG (In the work of MDA–GCNFTG, time efficiency was measured in three different MDA prediction tasks and on datasets of balanced and unbalanced classes. Here reported is the time taken for the task of predicting new MDAs for known diseases and miRNAs using the balanced class dataset. Please refer to the original manuscript for analyses in all cases.) was evaluated in terms of the running time of a single epoch. It achieved half of the training time for the traditional GCN counterpart [44] (13.45 versus 27.21 s).

Two conjectures can be made with the available information gathered from the reviewed models, to guide the future research of time efficiency. First, it might be more proper to measure the average running time of multiple complete executions for matrix decomposition/completion-based models and adopt the one-epoch running time on models of the other three learning types (namely, deep learning, decision tree and other machine learning). Second, models with more fusion operations would expectedly need more time to finish running than with less ones, as illustrated in the following examples. A model with a larger number of sequential modules would be more time-consuming than with thinner layers; ensemble learning on more classifiers in a parallel fusion model would result in a longer training phase. Moreover, it might not be surprising to observe hybrid fusion substantially less efficient or non-fusion outperforming all other modes.

Case studies

Like K-fold CV, case studies of predicting associated miRNAs for specific diseases had been extensively conducted in performance evaluation since Chen et al. [2] reviewed 20 iconic MDA prediction models in 2017. The 29 state-of-the-art models in the current review carried out 92 case studies on 14 different diseases; and on average, a model would expectedly be applied to predicting three diseases’ associated miRNAs. Figure 2 shows the histogram for the investigated diseases. It is apparent that neoplasms at high incidence rates were emphasized: neoplasms of breast (19), oesophagus (16), colon (15), lung (14), lymphoma (10), kidney (7), liver (6) and prostate (2) appeared more frequently in case studies than other diseases, which were all among the world’s 20 most common cancers according to the Global Cancer Statistics 2020 [56].

Histogram for the 14 investigated diseases in case studies.
Figure 2

Histogram for the 14 investigated diseases in case studies.

The 92 case studies fell in to three categories. First, the standard out-of-sample test (55%, 50 case studies) used all MDAs from the HMDD database (version 2.0 [22] or 3.0 [46]) to train a matrix decomposition or completion model; or utilized all labelled MDPs (with a fraction of unlabelled ones) to learn a model based on deep learning, decision tree or other machine learning. The fitted model was then applied to computing the predicted association scores of the investigated disease’s candidate (unlabelled) miRNAs, the top k of which were verified by other MDA databases including miRCancer [57], dbDEMC [58], and miR2Disease [59], and/or relevant literature. The number of candidates confirmed by these resources would be considered as the outcome of the case study. Second, the data-masking test (16%, 15 case studies) aimed at assessing a model’s applicability to a disease without known related miRNAs, by removing (masking) all associations for that disease in the training dataset. Subsequently, the same training process as in the standard out-of-sample test was carried out to fit the model, followed by the same verification step that reported the number of confirmed top predictions. Third, the extensibility test (29%, 27 case studies) sought to demonstrate whether inferences made by a model learnt on a previous version of HMDD would be validated by a later version. Specifically, model-training involved MDAs from HMDD v1.0 (or v2.0) and, among the top k predictions, the count of candidates confirmed by HMDD v2.0 (or v3.0) and other literature would be regarded as the extensibility measure for the model.

A recommended evaluation workflow

As illustrated in Figure 3, a workflow encompassing the abovementioned five evaluations is recommended to facilitate extensive model assessment. Notably, the workflow may be considered as an adjustable reference manual, where all evaluation strategies are study-related, and researchers designing future MDA prediction models can make reasonable deletion, addition and/or amendment to it.

The recommended evaluation workflow, involving K-fold CV, parameter sensitivity analysis, ablation study, time efficiency analysis, and case studies from the learning type and fusion scheme perspectives.
Figure 3

The recommended evaluation workflow, involving K-fold CV, parameter sensitivity analysis, ablation study, time efficiency analysis, and case studies from the learning type and fusion scheme perspectives.

The recommendation starts with K-fold CV: both LOOCV and CV with K > 1 folds (such as 5 or 10-fold CV) are suggested to a matrix decomposition or completion model. For a model of the other three learning types, if it involves deep neural networks, LOOCV will become less feasible and not be recommended because of the high computational cost incurred by many neural layers; otherwise, both CV operations shall be performed. The evaluation metrics should include AUC for depicting the model’s capability of distinguishing positive samples from negatives and AUPR for better understanding the effect of imbalanced MDA labels on performance. In addition, recall@k, precision@k, F1-score@k, ACC@k and MCC@k can also be calculated to describe performance at specific ranking thresholds.

Then, parameter sensitivity can be analyzed by adopting grid search on various parameters, primarily depending on the learning type. As shown in Table 2, parameters of matrix decomposition/completion-based models include the regularization coefficients in the objective function, the dimensionality parameter for latent feature representation matrices of miRNAs and diseases, the number of computation layers for matrix decomposition/completion and the maximum number of iterations for optimizing the objective function. Regarding deep learning-based models, parameters include the regularization coefficients in the loss function, the size of miRNA and disease embeddings, the learning rate, the number of neural network layers (and/or filters in CNN if applicable), and the negative sampling ratio. As the other three learning types, algorithm-specific parameters should be analyzed. In addition, the miRNA- and disease-related parameters are suitable for all model types and include miRNAs and diseases’ k nearest neighbours (whose similarity features can be used to update the MDA adjacency matrix) and the weights for constructing integrated similarity matrices of miRNAs and diseases. Furthermore, the type of parameters to be analyzed is also related to the fusion or non-fusion schemes: sequential fusion covers the number of sequentially integrated modules; hybrid fusion includes regularization coefficients on modules fused in parallel and the number of sequentially integrated sub-models; weights for individual classifiers of ensemble learning are examined in the parallel fusion scheme; and lastly, non-fusion concerns with parameters specific to a single model. It should be noted that the learning type and fusion scheme perspectives should be considered independently because they describe different characteristics of a model. This indicates that, if the model is based on matrix decomposition/completion, then the dimensionality of latent representation matrices and their related regularization coefficients are recommended for parameter sensitivity analysis, regardless of what fusion or non-fusion scheme it belongs to.

Subsequently, ablation study can be conducted if the investigated model has graceful degradation, and is otherwise skipped to the next evaluation step. Ablating the multi-source input datasets should be recommended to all models (regardless of their categories) due to the data fusion paradigm. Then, ablating a model’s computational modules means that they are removed in turn to observe the extent to which predictive performance is affected. A diverse range of sub-models should be enrolled in the study according to different fusion or non-fusion schemes: specifically, sequentially integrated modules for sequential fusion, individual classifiers of ensemble learning or individual feature views for parallel fusion, and all sub-models integrated sequentially or in parallel for hybrid fusion. Moreover, ablation for non-fusion is case-by-case depending on the model structure.

The ensuing step is time efficiency analysis that compares a model’s running time with that of its counterparts. The measure will be calculated as the average running time of multiple complete executions if the model is based on matrix decomposition or completion, and otherwise the running time of one epoch for the other three learning types. The recommended evaluation workflow ends with the three types of case studies, namely, standard out-of-sample test, data-masking test and extensibility test. It is suggested to investigate the model’s performance on diseases of high incidence and mortality rates such as neoplasms of breast, lung, prostate, colon, liver, oesophagus, lymphoma and kidney. This can facilitate a fair comparison with other models, because these diseases have been widely inspected in case studies of previous works.

It should be noted that all evaluation procedures included in the recommended workflow are commonly adopted in the fields of machine learning and data analysis, and that each existing MDA prediction study only implemented a fraction of the full procedures. The value of the workflow is 2-fold, corresponding to these two facts.

First, despite being commonplace, the procedures are associated with the context of MDA prediction research via the workflow from the following perspectives: (i) the MDA dataset for training models is highly imbalanced, containing 3.3% positive instances according to the latest HMDD v3.2 database [46], and thus the metrics of AUPR and precision [12] are particularly useful in the MDA prediction task; (ii) a trend of computational model research is the increasing popularity of deep learning, which is highly computationally expensive, and hence the traditional LOOCV procedure becomes less favourable than K-fold CV with |$K>1$|⁠; (iii) parameter sensitivity analysis is correlated with the machine learning types of MDA prediction models, as reflected in Table 2 and elaborated in the third section of this paper; (iv) ablation study exhibit trends related to the model fusion or non-fusion schemes, as displayed in Table 3 and described in the subsection entitled ‘Trends of ablation study’ under the fourth section; (v) the form of time efficiency analysis is also dependent on the machine learning types; (vi) existing works have primarily chosen fatal and common diseases (based on Global Cancer Statistics [56]) to investigate in case studies.

Second, the fact that not all evaluation methods were adopted by an existing work is partly due to the model being unsuitable for certain methods. For instance, the deep learning-based model MDA–CNN [53] used 10-fold CV instead of LOOCV due to high computational complexity of CNN; none of the matrix decomposition/completion-based models performed sensitivity analysis on the negative sampling ratio because they were based on semi-supervised learning, hence not requiring negative samples to train; ablation study was not carried out by the non-fusion scheme-based GAEMDA model [37, 60] because it lacked the property of graceful degradation [40]. The procedures implemented (or ignored) by existing MDA prediction models can be explained by the yes–no flowcharts presented in the workflow (see Figure 3). It can also be deployed as a reference manual for researchers to conveniently and justifiably decide which evaluation procedure to undertake for a future model.

Conclusion

In this article, we analyzed the evaluation procedures of the 29 state-of-the-art MDA prediction models. Every model performed K-fold CV, yet with different K values and diverse performance metrics ranging from AUC and AUPR (calculated based on various thresholds), to ACC, MCC, F1, precision and recall (obtained at a specific threshold), and to NDCG measuring the ranking quality of diseases’ candidate miRNAs. AUC was the most widely used metric, followed by recall, precision and AUPR that provided additional information on the reliability of predictions given an imbalanced label distribution in the MDA prediction task. Each reviewed model also carried out case studies but selected different sets of query diseases. It is necessary to define a selection criterion to avoid the risk of cherry-picking. Parameter sensitivity analysis, ablation studies and time efficiency assessments were less frequently performed, based on the algorithmic structure of models and the researchers’ choices. We concluded with a recommended evaluation workflow in a series of binary yes–no flowcharts for all possible procedures (see Figure 3). Further models can rely on the workflow to devise more appropriate validation schemes, thereby working towards systematic evaluation that benefits both the authors in demonstration and the audience to appreciation of the proposed model’s superiority to its counterparts.

As the last point, more attention is required for the reproducibility of baseline models. Not all models were open-sourced and so bioinformaticians might only use those publicly available ones for baseline evaluation or write programs for the models from scratch, which could be time-consuming, particularly for nowadays increasingly more advanced algorithms. This resulted in the fact that each publication usually compared the proposed model with an incomplete list of state-of-the-art ones. It is hence suggested that researchers release the code alongside their paper publication for better reproducibility and convenient evaluation in future works.

Key Points
  • Currently, there exist no generally accepted strategies of evaluating computational models for microRNA-disease associations.

  • K-fold cross validation and case studies on specific diseases have been extensively adopted on models for performance evaluation, but with different values, metrics, and choices of query diseases.

  • Parameter sensitivity analysis, ablation studies and time efficiency assessments have been less frequently performed, according to the algorithmic structure of models and the researchers’ choices.

  • We summarized and analyzed the evaluation procedures of the 29 state-of-the-art MDA prediction models.

  • Based on the analytical results, a possibly all-encompassing evaluation workflow was recommended to facilitate systematic performance comparison of prediction models.

Data availability

The source code and data of MDA-CNN are available at https://github.com/Issingjessica/MDA-CNN. The source code and data of MDA-GCNFTG are available at https://github.com/a96123155/MDA-GCNFTG. The source code and data of MVMTMDA are available at https://github.com/yahuang1991polyu/MVMTMDA/. The source code and data of EDTMDA are available at https://github.com/chiyoung1/EDTMDA. The source code and data of ABMDA are available at https://github.com/githubcode007/ABMDA. The source code and data of NIMCGCN are available at https://github.com/ljatynu/NIMCGCN/. The source code and data of AEMDA are available at https://github.com/CunmeiJi/AEMDA. The source code and data of NMCMDA are available at https://github.com/ljatynu/NMCMDA/. The source code and data of MMGCN are available at https://github.com/Txinru/MMGCN. The source code and data of GRNMF are available at https://github.com/XIAO-HN/GRNMF/. The source code and data of BRMDA is available at https://github.com/xpnbs/BRMDA. The source code and data of TDRC are available at https://github.com/BioMedicalBigDataMiningLab/TDRC. The source code and dataset of IMCMDA are available at https://github.com/IMCMDAsourcecode/IMCMDA. The source code and data of GAEMDA are available at https://github.com/chimianbuhetang/GAEMDA.

Funding

National Natural Science Foundation of China (grant nos. 61972399 and 11931008 to X.C.).

Author Biographies

Li Huang is a PhD student of Academy of Arts and Design, Tsinghua University. His research interests include bioinformatics, complex network algorithms, machine learning and visual analytics.

Li Zhang is a PhD student of School of Information and Control Engineering, China University of Mining and Technology. His research interests include bioinformatics, drug discovery, neural networks and deep learning.

Xing Chen, PhD, is a professor of China University of Mining and Technology. He is the associate dean of Artificial Intelligence Research Institute, China University of Mining and Technology. He is also the founding director of Institute of Bioinformatics, China University of Mining and Technology and Big Data Research Center, China University of Mining and Technology. His research interests include complex disease-related non-coding RNA biomarker prediction, computational models for drug discovery, and early detection of human complex disease based on big data and artificial intelligence algorithms.

References

1.

Yadav
S
,
Shukla
S
. Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In:
2016 IEEE 6th International Conference on Advanced Computing (IACC)
.
2016
, p.
78
83
.
IEEE
, New York, NY, USA.

2.

Chen
X
,
Xie
D
,
Zhao
Q
, et al.
MicroRNAs and complex diseases: from experimental results to computational models
.
Brief Bioinform
2019
;
20
:
515
39
.

3.

Jiang
Q
,
Hao
Y
,
Wang
G
, et al.
Prioritization of disease microRNAs through a human phenome-microRNAome network
.
BMC Syst Biol
2010
;
4
:
1
9
.

4.

Chen
X
,
Liu
M-X
,
Yan
G-Y
.
RWRMDA: predicting novel human microRNA–disease associations
.
Mol Biosyst
2012
;
8
:
2792
8
.

5.

Wang
D
,
Wang
J
,
Lu
M
, et al.
Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases
.
Bioinformatics
2010
;
26
:
1644
50
.

6.

Huang
H-Y
,
Lin
Y-C-D
,
Li
J
, et al.
miRTarBase 2020: updates to the experimentally validated microRNA–target interaction database
.
Nucleic Acids Res
2020
;
48
:
D148
54
.

7.

Karagkouni
D
,
Paraskevopoulou
MD
,
Chatzopoulos
S
, et al.
DIANA-TarBase v8: a decade-long collection of experimentally supported miRNA–gene interactions
.
Nucleic Acids Res
2018
;
46
:
D239
45
.

8.

Kozomara
A
,
Birgaoanu
M
,
Griffiths-Jones
S
.
miRBase: from microRNA sequences to function
.
Nucleic Acids Res
2019
;
47
:
D155
62
.

9.

Tang
X
,
Luo
J
,
Shen
C
, et al.
Multi-view multichannel attention graph convolutional network for miRNA-disease association prediction
.
Brief Bioinform
2021
;
22
:
bbab174
.

10.

Xiao
Q
,
Zhang
N
,
Luo
J
, et al.
Adaptive multi-source multi-view latent feature learning for inferring potential disease-associated miRNAs
.
Brief Bioinform
2021
;
22
:
2043
57
.

11.

Fawcett
T
.
An introduction to ROC analysis
.
Pattern Recognit Lett
2006
;
27
:
861
74
.

12.

Davis
J
,
Goadrich
M
. The relationship between Precision-Recall and ROC curves. In:
Proceedings of the 23rd International Conference on Machine Learning
.
2006
, p.
233
40
. ACM, New York, NY, USA.

13.

He
X
,
Chen
T
,
Kan
M-Y
et al. Trirank: review-aware explainable recommendation by modeling aspects. In:
Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
.
2015
, p.
1661
70
. ACM, New York, NY, USA.

14.

Chicco
D
,
Jurman
G
.
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
.
BMC Genomics
2020
;
21
:
1
13
.

15.

Huang
YA
,
Chan
KCC
,
You
ZH
, et al.
Predicting microRNA-disease associations from lncRNA-microRNA interactions via multiview multitask learning
.
Brief Bioinform
2021
;
22
:
bbaa133
.

16.

Wang
CC
,
Chen
X
,
Yin
J
, et al.
An integrated framework for the identification of potential miRNA-disease association based on novel negative samples extraction strategy
.
RNA Biol
2019
;
16
:
257
69
.

17.

Ji
C
,
Gao
Z
,
Ma
X
, et al.
AEMDA: inferring miRNA-disease associations based on deep autoencoder
.
Bioinformatics
2021
;
37
:
66
72
.

18.

Yang
Y
,
Fu
X
,
Qu
W
, et al.
MiRGOFS: a GO-based functional similarity measurement for miRNAs, with applications to the prediction of miRNA subcellular localization and miRNA-disease association
.
Bioinformatics
2018
;
34
:
3547
56
.

19.

Chen
X
,
Wang
L
,
Qu
J
, et al.
Predicting miRNA-disease association based on inductive matrix completion
.
Bioinformatics
2018
;
34
:
4256
65
.

20.

Xiao
Q
,
Luo
J
,
Liang
C
, et al.
A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations
.
Bioinformatics
2018
;
34
:
239
48
.

21.

Guo
L
,
Shi
K
,
Wang
L
.
MLPMDA: multi-layer linear projection for predicting miRNA-disease association
.
Knowl Based Syst
2021
;
214
:
106718
.

22.

Li
Y
,
Qiu
C
,
Tu
J
, et al.
HMDD v2.0: a database for experimentally supported human microRNA and disease associations
.
Nucleic Acids Res
2014
;
42
:
D1070
4
.

23.

Wei
H
,
Liu
B
.
iCircDA-MF: identification of circRNA-disease associations based on matrix factorization
.
Brief Bioinform
2020
;
21
:
1356
67
.

24.

Chen
X
,
Yan
CC
,
Zhang
X
, et al.
WBSMDA: within and between score for MiRNA-disease association prediction
.
Sci Rep
2016
;
6
:
1
9
.

25.

Hsu
SD
,
Tseng
YT
,
Shrestha
S
, et al.
miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions
.
Nucleic Acids Res
2014
;
42
:
D78
85
.

26.

Van Laarhoven
T
,
Nabuurs
SB
,
Marchiori
E
.
Gaussian interaction profile kernels for predicting drug–target interaction
.
Bioinformatics
2011
;
27
:
3036
43
.

27.

Yu
G
,
Li
F
,
Qin
Y
, et al.
GOSemSim: an R package for measuring semantic similarity among GO terms and gene products
.
Bioinformatics
2010
;
26
:
976
8
.

28.

Li
J
,
Zhang
S
,
Liu
T
, et al.
Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction
.
Bioinformatics
2020
;
36
:
2538
46
.

29.

Natarajan
N
,
Dhillon
IS
.
Inductive matrix completion for predicting gene–disease associations
.
Bioinformatics
2014
;
30
:
i60
8
.

30.

Huang
F
,
Yue
X
,
Xiong
Z
, et al.
Tensor decomposition with relational constraints for predicting multiple types of microRNA-disease associations
.
Brief Bioinform
2021
;
22
:
bbaa140
.

31.

Kolda
TG
,
Bader
BW
.
Tensor decompositions and applications
.
SIAM Rev
2009
;
51
:
455
500
.

32.

Chu
Y
,
Wang
X
,
Dai
Q
, et al.
MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph
.
Brief Bioinform
2021
;
22
:
bbab165
.

33.

Ning
S
,
Yue
M
,
Wang
P
, et al.
LincSNP 2.0: an updated database for linking disease-associated SNPs to human long non-coding RNAs and their TFBSs
.
Nucleic Acids Res
2017
;
45
:
D74
8
.

34.

Hwang
S
,
Kim
CY
,
Yang
S
, et al.
HumanNet v2: human gene networks for disease research
.
Nucleic Acids Res
2019
;
47
:
D573
80
.

35.

Hu
J
,
Shen
L
,
Sun
G
. Squeeze-and-excitation networks. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
.
2018
, p.
7132
41
. IEEE, New York, NY, USA.

36.

Wang
J
,
Li
J
,
Yue
K
, et al.
NMCMDA: neural multicategory MiRNA–disease association prediction
.
Brief Bioinform
2021
;
22
:
bbab074
.

37.

Li
Z
,
Li
J
,
Nie
R
, et al.
A graph auto-encoder model for miRNA-disease associations prediction
.
Brief Bioinform
2021
;
22
:
bbaa240
.

38.

Chen
X
,
Wang
CC
,
Yin
J
, et al.
Novel human miRNA-disease association inference based on random Forest
.
Mol Ther Nucleic Acids
2018
;
13
:
568
79
.

39.

Chen
X
,
Cheng
J-Y
,
Yin
J
.
Predicting microRNA-disease associations using bipartite local models and hubness-aware regression
.
RNA Biol
2018
;
15
:
1192
205
.

40.

González
O
,
Shrikumar
H
,
Stankovic
JA
et al. Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling. In:
Proceedings Real-Time Systems Symposium
.
1997
, p.
79
89
.
IEEE
, New York, NY, USA.

41.

Chen
M
,
Peng
Y
,
Li
A
, et al.
A novel information diffusion method based on network consistency for identifying disease related microRNAs
.
RSC Adv
2018
;
8
:
36675
90
.

42.

Zhong
Y
,
Xuan
P
,
Wang
X
, et al.
A non-negative matrix factorization based method for predicting disease-associated miRNAs in miRNA-disease bilayer network
.
Bioinformatics
2018
;
34
:
267
77
.

43.

LeCun
Y
,
Boser
B
,
Denker
JS
, et al.
Backpropagation applied to handwritten zip code recognition
.
Neural Comput
1989
;
1
:
541
51
.

44.

Kipf
TN
,
Welling
M
.
Semi-supervised classification with graph convolutional networks
.
arXiv preprint arXiv:160902907
2016
.

45.

Yang
B
,
Yih
W-T
,
He
X
, et al.
Embedding entities and relations for learning and inference in knowledge bases
.
arXiv preprint arXiv:14126575
2014
.

46.

Huang
Z
,
Shi
J
,
Gao
Y
, et al.
HMDD v3.0: a database for experimentally supported human microRNA-disease associations
.
Nucleic Acids Res
2019
;
47
:
D1013
7
.

47.

Chen
X
,
Zhu
CC
,
Yin
J
.
Ensemble of decision tree reveals potential miRNA-disease associations
.
PLoS Comput Biol
2019
;
15
:
e1007209
.

48.

Wang
L
,
You
ZH
,
Chen
X
, et al.
LMTRDA: using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities
.
PLoS Comput Biol
2019
;
15
:
e1006865
.

49.

Pan
X
,
Shen
H-B
.
Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network
.
Neurocomputing
2018
;
305
:
51
8
.

50.

Mikolov
T
,
Chen
K
,
Corrado
G
, et al.
Efficient estimation of word representations in vector space
.
arXiv preprint arXiv:13013781
2013
.

51.

Friedman
J
,
Hastie
T
,
Tibshirani
R
.
Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)
.
The annals of statistics
2000
;
28
:
337
407
.

52.

Chen
X
,
Huang
L
.
LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction
.
PLoS Comput Biol
2017
;
13
:
e1005912
.

53.

Peng
J
,
Hui
W
,
Li
Q
, et al.
A learning-based framework for miRNA-disease association identification using neural networks
.
Bioinformatics
2019
;
35
:
4364
71
.

54.

Narita
A
,
Hayashi
K
,
Tomioka
R
, et al.
Tensor factorization using auxiliary information
.
Data Min Knowl Disc
2012;
25
:298–324.

55.

Ng
A.
Machine Learning Yearning
.
Technical Strategy for AI Engineers Draft
2018;
08
. Online available at: http://www.mlyearning.org.

56.

Sung
H
,
Ferlay
J
,
Siegel
RL
, et al.
Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
2021
;
71
:
209
49
.

57.

Xie
B
,
Ding
Q
,
Han
H
, et al.
miRCancer: a microRNA–cancer association database constructed by text mining on literature
.
Bioinformatics
2013
;
29
:
638
44
.

58.

Yang
Z
,
Wu
L
,
Wang
A
, et al.
dbDEMC 2.0: updated database of differentially expressed miRNAs in human cancers
.
Nucleic Acids Res
2017
;
45
:
D812
8
.

59.

Jiang
Q
,
Wang
Y
,
Hao
Y
, et al.
miR2Disease: a manually curated database for microRNA deregulation in human disease
.
Nucleic Acids Res
2009
;
37
:
D98
104
.

60.

Li
Z
,
Li
J
,
Nie
R
, et al.
A graph auto-encoder model for miRNA-disease associations prediction
.
Brief Bioinform
2021;
22
:bbaa240.

61.

Chen
X
,
Yin
J
,
Qu
J
, et al.
MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction
.
PLoS Comput Biol
2018
;
14
:
e1006418
.

62.

Chen
X
,
Li
T-H
,
Zhao
Y
, et al.
Deep-belief network for predicting potential miRNA-disease associations
.
Brief Bioinform
2021
;
22
:
bbaa186
.

63.

Chen
X
,
Huang
L
,
Xie
D
, et al.
EGBMMDA: extreme gradient boosting machine for MiRNA-disease association prediction
.
Cell Death Dis
2018
;
9
:
3
.

64.

Zhao
Y
,
Chen
X
,
Yin
J
.
Adaptive boosting-based computational model for predicting potential miRNA-disease associations
.
Bioinformatics
2019
;
35
:
4730
8
.

65.

Chen
X
,
Xie
D
,
Wang
L
, et al.
BNPMDA: bipartite network projection for MiRNA-disease association prediction
.
Bioinformatics
2018
;
34
:
3178
86
.

66.

Chen
X
,
Zhou
Z
,
Zhao
Y
.
ELLPMDA: ensemble learning and link prediction for miRNA-disease association prediction
.
RNA Biol
2018
;
15
:
807
18
.

67.

Qu
J
,
Chen
X
,
Yin
J
, et al.
Prediction of potential miRNA-disease associations using matrix decomposition and label propagation
.
Knowl Based Syst
2019
;
186
:
104963
.

68.

Zhu
C-C
,
Wang
C-C
,
Zhao
Y
, et al.
Identification of miRNA–disease associations via multiple information integration with Bayesian ranking
.
Brief Bioinform
2021
;
22
:
bbab302
.

69.

Chen
X
,
Sun
LG
,
Zhao
Y
.
NCMCMDA: miRNA-disease association prediction through neighborhood constraint matrix completion
.
Brief Bioinform
2021
;
22
:
485
96
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)