Abstract

MicroRNAs (miRNAs) are human post-transcriptional regulators in humans, which are involved in regulating various physiological processes by regulating the gene expression. The subcellular localization of miRNAs plays a crucial role in the discovery of their biological functions. Although several computational methods based on miRNA functional similarity networks have been presented to identify the subcellular localization of miRNAs, it remains difficult for these approaches to effectively extract well-referenced miRNA functional representations due to insufficient miRNA–disease association representation and disease semantic representation. Currently, there has been a significant amount of research on miRNA–disease associations, making it possible to address the issue of insufficient miRNA functional representation. In this work, a novel model is established, named DAmiRLocGNet, based on graph convolutional network (GCN) and autoencoder (AE) for identifying the subcellular localizations of miRNA. The DAmiRLocGNet constructs the features based on miRNA sequence information, miRNA–disease association information and disease semantic information. GCN is utilized to gather the information of neighboring nodes and capture the implicit information of network structures from miRNA–disease association information and disease semantic information. AE is employed to capture sequence semantics from sequence similarity networks. The evaluation demonstrates that the performance of DAmiRLocGNet is superior to other competing computational approaches, benefiting from implicit features captured by using GCNs. The DAmiRLocGNet has the potential to be applied to the identification of subcellular localization of other non-coding RNAs. Moreover, it can facilitate further investigation into the functional mechanisms underlying miRNA localization. The source code and datasets are accessed at http://bliulab.net/DAmiRLocGNet.

INTRODUCTION

MicroRNAs (miRNAs) are a kind of short and small non-coding RNAs [1–4]. In humans, miRNAs act as post-transcriptional regulators [1, 5–9], with many mature miRNAs regulating gene expression to impact various physiological activities [10–12]. The regulation of miRNAs is critical in processes like stress response, disease physiology and normal metabolism. Meanwhile, it is increasing evidence of a correlation between miRNA abnormalities and many diseases [1, 13–15], including cancer [16], Alzheimer’s disease [15] and cardiovascular diseases [14]. Moreover, subcellular location of miRNAs plays a critical role in their functionality, as it indicates that different subcellular locations may occur in different gene expressions [7, 11, 15]. Although there is evidence to suggest that miRNA plays a specific physiological function role with different cellular sites [10, 17], the functional expressions mechanism underlying subcellular localization [4] remains unclear [18–20]. In recent years, several computational methods have been established to identify the subcellular localizations of miRNAs. However, due to the length of miRNAs sequence (∼22 nucleotides), the subcellular localization of miRNA is hard to model and analyze [18, 21]. In addition, recent research suggests that RNA may coexist in multiple subcellular locations simultaneously [4, 22], which poses further challenges in predicting and understanding the functional mechanism of subcellular localization. The computational difficulty in predicting miRNA subcellular localization is to capture well-referenced features with richer representations of miRNA functionality.

In recent years, several machine-learning-based methodologies have been proposed to identify subcellular localizations of miRNA. Most of these methodologies are based on sequence information of miRNAs, utilizing different sequence feature representation schemes to predict subcellular localization of miRNA [23–26]. MiRLocator [22] is a machine-learning predictor that identifies miRNA subcellular localization using a deep encoder-decoder model based on miRNA sequence (MS) information. Prabina et al. [17] presented a computational methodology called miRNALoc, which uses structural properties, pseudo dinucleotide compositions and principal component scores of thermodynamics as features based on MS to classify subcellular localization. Muhammad et al. [18] proposed MirLocPredictor, a methodology base on convolutional neural network (CNN) and recurrent neural network, which uses KmerPR2vec, a novel representation scheme of MS information. Muhammad et al. [19] proposed a predictor of miRNA subcellular localization, named L2S-MirLoc, which uses a feature encoding scheme for extracting optimal vectorial representation of sequences using a lightweight-physicochemical-property approach. Zhang et al. [10] established iLoc-miRNA, a multi-head self-attention-based deep bidirectional long short-term memory network, which represents whole MSs by using one-hot encoding with post padding for differentiating extracellular miRNAs from intracellular miRNAs. In addition to sequence-based prediction methods, other methods use functional-similarity networks to distinguish the subcellular localizations including MirGOFS [27] based on GO semantic similarity network and MiRLoc [21] based on miRNA–disease association (MDA) similarity network. These methodologies take advantage of the functional similarities and associations between miRNA and different subcellular locations, which can provide valuable information for predicting miRNA subcellular localization.

Although these computational approaches have shown promising results, both the sequence-based [10, 17–19] and functional-similarity-network-based [21, 27] predictors have limitations due to the lack of a valid and comprehensive representation of MSs and functional features, which can lead to potential biases. However, current predictors rely solely on either sequence-based or functional-similarity-networks-based representations [10, 21]. To improve prediction accuracy, an efficient approach is to combine features derived from both sequence-based and functional-similarity-networks-based representations for identifying subcellular localizations of miRNAs. In order to capture well-referenced functional representations, it is crucial to extract discriminative MDA information and disease semantic (DS) representation to comprehensively represent the function feature of miRNA [13, 16, 28–35]. There is growing evidence that graph-convolutional networks (GCNs) [2, 12, 13, 36–38] can capture complex nonlinear associations in bioinformatics [36, 38, 39]. The functional representations of miRNA can be extracted from MDA and DS information using GCN.

Herein, we established a new prediction methodology called DAmiRLocGNet by using GCN [36, 40] and autoencoder (AE)-based methodology [16] for identifying miRNAs subcellular localizations. The DAmiRLocGNet considers both sequence information and functional representation information, enabling simultaneous learning for improved predictive performance. We utilized GCN to gather information from neighboring nodes and extract implicit network structure information from MDA and DS representation. The GCN-extracted node representations can comprehensively potential functional similarity network of miRNA from MDA information and DS information. In addition, we utilized AE to extract sequence information from MS similarity network. The experimental results illustrated that DAmiRLocGNet outperformed other competing methods, and the ablation study results further showed the advantages of DAmiRLocGNet. The datasets and source code are freely available at http://bliulab.net/DAmiRLocGNet.

METHODS

Overview

In this study, a miRNA subcellular localization predictor called DAmiRLocGNet is proposed, which is based on AE and GCN. The DAmiRLocGNet is carried out in five steps: (i) Data set, which involves collecting data on miRNA subcellular localization information, MDAs information, DS information and MS information. (ii) Data preprocessing, which involves the similarity measures used to preprocess similarity network representations from DS, MDA and MS information. (iii) GCN- and AE-based feature learning, which uses DS similarity network, MDA network and MS similarity network to design two feature representation learning methodologies for predicting miRNA subcellular localization. (iv) Prediction, which combines the GCN- and AE-based feature representation learning representations to obtain comprehensive potential miRNA functional representations and sequence features for prediction. (v) Result, which involves evaluating and displaying miRNA subcellular localizations using criteria. The architecture of DAmiRLocGNet is illustrated in Figure 1.

The overview of DAmiRLocGNet framework.
Figure 1

The overview of DAmiRLocGNet framework.

Data set

Differing from previous work [10, 18, 19, 21, 27], we used MS information, DS information and MDA information to construct the implicit functional representation of miRNA. We obtained miRNA subcellular localization dataset from literature [21], and seven subcellular localizations of miRNA were selected including exosome, nucleolus, microvesicle, nucleus, cytoplasm, mitochondrion and extracellular vesicle, which was collected by Xu et al. from the latest RNALocate2.0 database (http://www.rna-society.org/rnalocate) [41]. This database contains >210 000 RNA-associated subcellular localization entries with experimental evidence. The miRNA subcellular localization datasets, in which miRNA is associated with one or more compartments simultaneously within each compartment, are described in detail in Table 1. For the statistical information, see Supplementary Figure S1. In addition, we downloaded the sequence from miRBase (https://www.mirbase.org/), which provides comprehensive documentation of miRNA annotations.

Table 1

The miRNA subcellular localization datasets

Component1234567Overall
Cytoplasm081694886141308
Exosome25331191130916141870
Nucleolus00023214167
Nucleus1818144127906141499
Extracellular vesicle005694141102
Microvesicle2321179130916141825
Mitochondrion124131836041259
Overall locative samples476825795244603722942930
Overall actual samples1041
Component1234567Overall
Cytoplasm081694886141308
Exosome25331191130916141870
Nucleolus00023214167
Nucleus1818144127906141499
Extracellular vesicle005694141102
Microvesicle2321179130916141825
Mitochondrion124131836041259
Overall locative samples476825795244603722942930
Overall actual samples1041
Table 1

The miRNA subcellular localization datasets

Component1234567Overall
Cytoplasm081694886141308
Exosome25331191130916141870
Nucleolus00023214167
Nucleus1818144127906141499
Extracellular vesicle005694141102
Microvesicle2321179130916141825
Mitochondrion124131836041259
Overall locative samples476825795244603722942930
Overall actual samples1041
Component1234567Overall
Cytoplasm081694886141308
Exosome25331191130916141870
Nucleolus00023214167
Nucleus1818144127906141499
Extracellular vesicle005694141102
Microvesicle2321179130916141825
Mitochondrion124131836041259
Overall locative samples476825795244603722942930
Overall actual samples1041

To obtain information on the network representations of miRNA function similarity, we collected MDA data from human miRNA disease database (HMDD) v3.2 [42], which documents association terms between miRNAs and human diseases that are experimentally supported and can be accessed at http://www.cuilab.cn/hmdd. From HMDD v3.2, we collected experimentally supported associated entities and diseases that were matched with miRNAs [8, 43], resulting in 15 547 experimentally confirmed MDAs involving 640 diseases. These associations were represented in Supplementary Figure S2, from which we can see that miRNAs may be associated with one or more diseases. The information about subcellular localization and disease association is depicted in Table 2.

Table 2

Overview of the data used in this work

miRNAHMDD v3.2 terms positiveDiseaseLocalization
104115 5476407
miRNAHMDD v3.2 terms positiveDiseaseLocalization
104115 5476407
Table 2

Overview of the data used in this work

miRNAHMDD v3.2 terms positiveDiseaseLocalization
104115 5476407
miRNAHMDD v3.2 terms positiveDiseaseLocalization
104115 5476407

Data preprocessing

DS similarity

The medical subject heading (MeSH) is used for analysis of biomedical and health-related information, which descriptors can be downloaded at https://www.nlm.nih.gov/mesh/meshhhome.html. In the MeSH, the disease information can be represented as an abstract data structure that is used to generate directed acyclic graph (DAGs) [44]. In DAGs, the diseases represent vertex and the edges are encoded with directed links from parent disease nodes to child disease nodes. In this study, the MeSH was employed to accurately describe the semantic similarity information of disease, which was available for computing the association between different diseases [45]. Definition of disease |${A}^{{\prime}}$|s contribution to disease |${t}^{{\prime}}$|s semantic value as follows:

(1)

where |$\varDelta$| is the decaying parameter set as 0.5 [45]. The disease |$A$| semantic value can be represented as |$DS(A)={\Sigma}_{t\epsilon{P}_A}{D}_A(t)$|⁠, where |${P}_A$| is the ancestors of disease |$t$| set. Hence, the definition of semantic similarity between two diseases as follows:

(2)

where |${P}_A$| and |${P}_B$| represent the set of diseases |$A$| ancestors and |$B$| ancestors, respectively.

MiRNA functional similarity

When two miRNAs are similar in sequence or function, they may exhibit similar patterns of function expression [2, 29]. This functional similarity is significant because it can be used to identify similarities between related diseases. In the study, Wang’s methods [45] were used to obtain the functional similarity of miRNAs. These methods hypothesize that the functional similarities between miRNAs are likely to have an impact on similar diseases [1]. Specifically, the definition of similarity between a set of diseases |$DG$| and disease |${d}_i$| is calculated as |$DS{G}_{d_i}(DG)={\mathit{\max}}_{B\in DG}\left( DS\left({d}_i,B\right)\right)$|⁠. Assumed that |${\mathcal{D}}_i$| represents the set of disease related with miRNA |${m}_i$|⁠, the functional similarity between miRNA |${m}_1$| and miRNA |${m}_2$| is then represented as

(3)

The similarity of Gaussian interaction profile (GIP) kernel

In this study, two similar miRNAs may exhibit similar patterns of function expression, and therefore miRNAs may be associated with one or more diseases. The GIP kernel similarity [46] was used, based on the hypothesis that similar miRNAs in interaction networks show similar patterns of action in diseases, meaning that similar miRNAs exhibit similar interaction behavior. Assumed MDA matrix is |$MD$|⁠, the GIP kernel similarity between miRNA |${m}_1$| and miRNA |${m}_2$| is defined as follows:

(4)

where |$IP\left({m}_i\right)$| and |$IP\left({m}_j\right)$| represent the |$i{th}$| and |$j{th}$| rows of the |$MD$|⁠, respectively. The parameter |${\lambda}_m$| is calculated by |$\frac{n_m}{\sum_{i=1}^{n_m}\parallel IP\left({m}_i\right){\parallel}^2}$|⁠, where |${n}_m$| represent the number of miRNAs.

Similarly, the GIP kernel similarity between diseases |${d}_i$| and diseases |${d}_j$|is defined as follows:

(5)

where |$IP\left({d}_i\right)$| and |$IP\left({d}_j\right)$| represent the |$i\mathrm{th}$| and |$j\mathrm{th}$| columns of the |$MD$|⁠, respectively. The parameter |${\lambda}_d$| is calculated by |$\frac{n_d}{\sum_{i=1}^{n_d}\parallel IP\left({d}_i\right){\parallel}^2}$|⁠, where |${n}_d$| represent the number of diseases.

MS similarity

The sequence characteristics reflect their biological activities and can be used to understand biological activities. Due to miRNAs play an important role in regulating gene expression and are involved in various biological processes, MS similarity has been utilized for predicting MDA [2, 16, 30, 47], and various approaches have been developed to measure sequence similarity [48] that captures the sequence characteristics. The Smith–Waterman algorithm [36, 49], which is widely used for calculating similarities between pairs of MS, was utilized in measuring sequence similarity. It can be calculated as follows:

(6)

where |${SW}\left({m}_i,{m}_j\right)$| represents the similarity between miRNA |${m}_i$| and miRNA |${m}_j$|⁠. |${sp}\left({m}_i,{m}_j\right)$| represents local alignment score between miRNA |${m}_i$| and miRNA |${m}_j$| based on the Smith–Waterman algorithm.

Similarity matrix completion and feature fusion

The similarity measurement methods described above, such as MS similarity, miRNA functional similarity and DS similarity, often produce sparse matrices. In addition, the precursor sequences of some miRNAs cannot be found in the miRBase database. In this study, we employed the GIP kernel similarity to supplement the missing entries including the DS similarity |$DS\left({d}_i,{d}_j\right)$|⁠, the functional similarity of miRNA |$MS\left({m}_i,{m}_j\right)$| and MS similarity information |$SW\left({m}_i,{m}_j\right)$|⁠. Therefore, the similarity completion metrics are calculated as follows:

(7)
(8)
(9)

Random walk restart representation

In order to capture overall structure information of the similarity matrix network, in this study, the random walk restart (RWR) [50] was applied to the DS similarity network and the functional similarity network of miRNA. This involved starting with an initial probability for each miRNA and transitioning from the current node to its neighbors, based on the weight of the edges in the similarity network. The similarity information network can be effectively preserved by using random walk sequences [36], which convert the original similarity network data into a linear structure in the form of a sequence of nodes [36]. Therefore, the RWR was implemented as following iteration equation:

(10)

where |${P}_{{i},{j}}^{{k}}\left({i}\right)$| describes the random walking probability from miRNA or disease node |${p}_{i}$| to node |${p}_{j}$| after k steps. |${e}_{i}$| describes the initial vector of miRNA or disease |${p}_{i}$|⁠, which is the row vector of an identity matrix.

GCN-based and AE-based feature learning

GCN-based feature learning

GCN has been widely used in various fields [13, 36, 40, 43], owing to its remarkable ability to capture complex structural information and implicit interaction patterns. In this study, GCN was employed to extract representation. A key step in our approach was to extract MDA network information through GCN. GCN can acquire information from neighboring nodes and capture the implicit information of network structure, enabling it to effectively extract the distinguishing features of nodes.

According to literature [36, 40], let |${H}^{{l}}\in{\mathrm{R}}^{{d}}$| represents vertex representation of |${l}$|th GCN layer, the vertex representation |${H}^{{l}+\mathrm{1}}\in{\mathrm{R}}^{{d}}$| can be calculated by |$\left({l}+\mathrm{1}\right)$|-th GCN layer as follows:

(11)
(12)
(13)

where |$S$| represents the adjacency matrix of network node associations, |$I$| represents identity matrix, |$\overset{\sim }{D}$| represents the degree matrix of |$\overset{\sim }{S}$|⁠, |${W}^{{l}}$| represents the model trainable parameter matrix and |${\sigma} \left(\bullet \right)$| represents nonlinear activation function.

As shown in Figure 1, two GCN modules were employed to extract feature representations of miRNA and disease nodes. The association representation GCN was designed to capture node embedding representations from MDA networks, which are represented as |${{G}}_{{asso}}=\left\{{{V}}_{{p}},{{V}}_{{d}},{{E}}_{{p}-{d}}\right\},$|where |${{V}}_{{p}}$| and |${{V}}_{{d}}$| represent miRNA and disease node representations, respectively, and |${{E}}_{{p}-{d}}$| represents the association between miRNA and disease nodes.

In this study, miRNA and disease node representation were generated from heterogeneous similarity networks using GCN to capture miRNA functional similarity and DS features. Specifically, the MDA representation GCN module learned miRNA and disease node representations by combining neighbor heterogeneous node from the miRNA–disease interaction network. The miRNA and disease representation GCN modules further extracted node features by aggregating homology node information. Ultimately, the outputs of these two modules were inner-producted to redefine the miRNA functional feature representation.

AE-based feature learning

The sequence is essential for analyzing cellular processes and has been applied in diverse bioinformatics fields, such as therapeutic peptide recognition [51, 52], identification of intrinsically disordered proteins/regions [53, 54], RNA binding protein [55–57], RNA and disease association [48, 58–60], subcellular localization prediction [61–65], etc. To extract sequence information from MS similarity, AE was employed to extract sequence features. By AE, implicit similarity can be preserved within the low-dimensional representation of the similarity network of MS.

AE consists of an encoder and a decoder. Assumed the input space |${X}\in{\mathcal{X}}$| and feature space |${h}\in{\mathcal{F}}$|⁠. AE solves the mapping between |${f}$| and |${g}$|⁠, and minimizes the reconstruction error.

(14)

As illustrated in Figure 1, the MS similarities were utilized as inputs and translated into low-dimensional vector representations to capture the discriminative features of the MSs within the input data. This was done to ensure that these discriminative features were preserved.

Model training and test

In this study, we trained the DAmiRLocGNet model using GCN and AE, which considered both MS representations and MDA similarity network representations to predict subcellular localizations. The final prediction model, denoted by M, is a concatenation of the output of GCN and AE, which can be calculated as

(15)

where |${{M}}_{{seq}}$| is the MS similarity network, |${{M}}_{{mdas}}$| represents the implicit functional features generated from the MDA, DS similarity network and similarity network of miRNA function.

The proposed method used GCN and AE to predict subcellular localization of miRNA. The training loss function considers both sequence loss information and miRNA–disease similarity network loss information. To process the training loss of the target label and predicted label of subcellular localization, binary cross-entropy (BCE) was introduced. The mean squared error (MSE) was used to measure the training loss between the input and output of MS. The loss function of sample |${Loss}$| can be calculated as below:

(16)
(17)
(18)

where |$p,y$| denote the probability value and ground truth value, respectively. N represents the number of samples in each batch, and M represents the index of subcellular locations.

We trained the prediction model and evaluated its performance using 10-fold cross-validation. The deep learning model was implemented using PyTorch 1.9.1, Scikit-learn and PyTorch Geometric. The model training and test were performed using GPU Nvidia TITAN RTX v100. For the hyperparameter optimal, such as the number of layers, the restart probability of RWR algorithm |$\alpha$| and learning rare |$lr$|⁠. In order to find the optimum hyperparameters, the grid optimization method was used (Supplementary Table S1).

Performance evaluation

To evaluate the model prediction performance [21], we utilized measurement metrics including precision, sensitivity (Sn) and specificity (Sp) to evaluate the presented method and other competing methods [66, 67]. We also plotted receiver operating characteristic (ROC) [68] curve and precision-recall (PR) curve to visualize model performance. The area under the PR curve (AUPR) and the area under the ROC curve (AUC) were also implemented to assess model the performance [69]. The criteria of evaluation are calculated as follows:

(19)
(20)
(21)
(22)
(23)
(24)

where TN, TP, FN and FP represent the number of true negatives, true positives, false negatives and false positives, respectively.

RESULTS AND DISCUSSION

Performance comparison with different prediction methods

To assess the performance of DAmiRLocGNet in miRNA subcellular localization prediction, we compared DAmiRLocGNet with existing predictors by using 10-fold cross-validation. However, existing methods rely solely on sequence information or miRNA functional similarity information (Supplementary Table S2), DAmiRLocGNet considers both sequence information and miRNA functional similarity information. The other methods cannot be directly compared with our methods because the corresponding source code cannot be used [10]. It should be noted that there are some miRNAs in the dataset for which the corresponding sequences cannot be found, the results of MirLocPredictor are measured from a subset of the MiRLoc dataset after removing data where the sequence cannot be found. In other words, DAmiRLocGNet can predict the subcellular localization of miRNAs without sequences.

The comparison of performance is shown in Tables 3 and 4, the following conclusions can be drawn: (i) The DAmiRLocGNet outperforms the MiRLoc, indicating that integrating the DS information into miRNA functional feature representation is a more efficient way for predicting subcellular localization of miRNA. (ii) The performance of DAmiRLocGNet and MiRLoc is superior to the sequence-based methods. The reason is that the predictor embedding feature incorporates miRNA functional feature representation. In particular, DAmiRLocGNet outperforms these methods in all subcellular localization in terms of AUC. The subcellular performance of cytoplasm, exosome, nucleolus, nucleus, extracellular vesicle, microvesicle and mitochondrion in terms of AUC are 0.8606, 0.7051, 0.9289, 0.7960, 0.8350, 0.6757 and 0.8332, respectively. Overall, the AUPR obtained by DAmiRLocGNet achieves promising results. DAmiRLocGNet outperforms the competing methods in subcellular of nucleolus, extracellular vesicle and mitochondrion are 0.5739, 0.4619 and 0.6882, respectively. DAmiRLocGNet achieved similar performance compared with the competing methods for predicting the rest of subcellular positions in terms of AUPR. These results demonstrate that the predictor can effectively improve the predictive performance. In addition, the performance comparison with sequence features extracted from iLearnPlus and trained seven different basic predictors can be seen in Supplementary Tables S3S24, from which we can know that the proposed method DAmiRLocGNet outperforms the traditional methods.

Table 3

Performance comparison in terms of AUC

CompartmentMethods
MiRLocaMirLocPredictorbTextRNNcOUR
Cytoplasm0.82180.57410.54870.8606
Exosome0.57510.58420.54140.7051
Nucleolus0.83710.52860.50500.9289
Nucleus0.77560.67520.63010.7960
Extracellular vesicle0.80030.63350.62820.835
Microvesicle0.50990.59730.57980.6757
Mitochondrion0.76940.67580.63240.8332
AVG0.7270 ± 0.120030.6098 ± 0.050530.5808 ± 0.047310.8049 ± 0.08755
CompartmentMethods
MiRLocaMirLocPredictorbTextRNNcOUR
Cytoplasm0.82180.57410.54870.8606
Exosome0.57510.58420.54140.7051
Nucleolus0.83710.52860.50500.9289
Nucleus0.77560.67520.63010.7960
Extracellular vesicle0.80030.63350.62820.835
Microvesicle0.50990.59730.57980.6757
Mitochondrion0.76940.67580.63240.8332
AVG0.7270 ± 0.120030.6098 ± 0.050530.5808 ± 0.047310.8049 ± 0.08755

Note: The bold value is the maximum value of the row.

a

The results from MiRLoc RWR algorithm with MS information.

b

The results from MirLocPredictor with sequence information, and dataset is subset of a.

c

The results from TextRNN with sequence information, and dataset is subset of a.

Table 3

Performance comparison in terms of AUC

CompartmentMethods
MiRLocaMirLocPredictorbTextRNNcOUR
Cytoplasm0.82180.57410.54870.8606
Exosome0.57510.58420.54140.7051
Nucleolus0.83710.52860.50500.9289
Nucleus0.77560.67520.63010.7960
Extracellular vesicle0.80030.63350.62820.835
Microvesicle0.50990.59730.57980.6757
Mitochondrion0.76940.67580.63240.8332
AVG0.7270 ± 0.120030.6098 ± 0.050530.5808 ± 0.047310.8049 ± 0.08755
CompartmentMethods
MiRLocaMirLocPredictorbTextRNNcOUR
Cytoplasm0.82180.57410.54870.8606
Exosome0.57510.58420.54140.7051
Nucleolus0.83710.52860.50500.9289
Nucleus0.77560.67520.63010.7960
Extracellular vesicle0.80030.63350.62820.835
Microvesicle0.50990.59730.57980.6757
Mitochondrion0.76940.67580.63240.8332
AVG0.7270 ± 0.120030.6098 ± 0.050530.5808 ± 0.047310.8049 ± 0.08755

Note: The bold value is the maximum value of the row.

a

The results from MiRLoc RWR algorithm with MS information.

b

The results from MirLocPredictor with sequence information, and dataset is subset of a.

c

The results from TextRNN with sequence information, and dataset is subset of a.

Table 4

Performance comparison in terms of AUPR

CompartmentMethods
MiRLocaMirLocPredictorbTextRNNcOUR
Cytoplasm0.6620.83910.82670.7636
Exosome0.9740.82480.80720.9248
Nucleolus0.21850.49250.48380.5739
Nucleus0.80210.43490.37730.7961
Extracellular vesicle0.29160.34340.33220.4619
Microvesicle0.92030.24690.22580.8883
Mitochondrion0.52770.31130.25670.6882
AVG0.6280 ± 0.274520.4990 ± 0.223280.4728 ± 0.231080.7281 ± 0.16564
CompartmentMethods
MiRLocaMirLocPredictorbTextRNNcOUR
Cytoplasm0.6620.83910.82670.7636
Exosome0.9740.82480.80720.9248
Nucleolus0.21850.49250.48380.5739
Nucleus0.80210.43490.37730.7961
Extracellular vesicle0.29160.34340.33220.4619
Microvesicle0.92030.24690.22580.8883
Mitochondrion0.52770.31130.25670.6882
AVG0.6280 ± 0.274520.4990 ± 0.223280.4728 ± 0.231080.7281 ± 0.16564

Note: The bold value is the maximum value of the row.

a

The results from MiRLoc RWR algorithm with MS information.

b

The results from MirLocPredictor with sequence information, and dataset is subset of a.

c

The results from TextRNN with sequence information, and dataset is subset of a.

Table 4

Performance comparison in terms of AUPR

CompartmentMethods
MiRLocaMirLocPredictorbTextRNNcOUR
Cytoplasm0.6620.83910.82670.7636
Exosome0.9740.82480.80720.9248
Nucleolus0.21850.49250.48380.5739
Nucleus0.80210.43490.37730.7961
Extracellular vesicle0.29160.34340.33220.4619
Microvesicle0.92030.24690.22580.8883
Mitochondrion0.52770.31130.25670.6882
AVG0.6280 ± 0.274520.4990 ± 0.223280.4728 ± 0.231080.7281 ± 0.16564
CompartmentMethods
MiRLocaMirLocPredictorbTextRNNcOUR
Cytoplasm0.6620.83910.82670.7636
Exosome0.9740.82480.80720.9248
Nucleolus0.21850.49250.48380.5739
Nucleus0.80210.43490.37730.7961
Extracellular vesicle0.29160.34340.33220.4619
Microvesicle0.92030.24690.22580.8883
Mitochondrion0.52770.31130.25670.6882
AVG0.6280 ± 0.274520.4990 ± 0.223280.4728 ± 0.231080.7281 ± 0.16564

Note: The bold value is the maximum value of the row.

a

The results from MiRLoc RWR algorithm with MS information.

b

The results from MirLocPredictor with sequence information, and dataset is subset of a.

c

The results from TextRNN with sequence information, and dataset is subset of a.

Performance in each subcellular location

We used the PR curve and the ROC curve to evaluate the performance of each subcellular location prediction. The performance of DAmiRLocGNet on different subcellular locations is shown in Figure 2. The results indicate that the AUC for cytoplasm, exosome, nucleolus, nucleus, extracellular vesicle, microvesicle and mitochondrion are 0.8581, 0.6988, 0.9156, 0.7934, 0.8390, 0.6700 and 0.8287, respectively. The AUPR for cytoplasm, exosome, nucleolus, nucleus, extracellular vesicle, microvesicle and mitochondrion are 0.7187, 0.8950, 0.4477, 0.7668, 0.3531, 0.8545 and 0.6588, respectively. Moreover, to estimate the effect of random seeds of the deep learning model on the stability of model performance and SD, the performance is shown in Supplementary Figure S3. The performance of RWR algorithm and 10-fold cross-validation can be seen in Supplementary Figures S4S7.

Performance on PR and ROC curve.
Figure 2

Performance on PR and ROC curve.

Identifying DAmiRLocGNet’s essential component

To further assess the impact of each component of DAmiRLocGNet on its prediction performance, we carried out an ablation study. We retrained the model by individually removing each component, which include Sequence layer, GCN layer, Disease RWR and MiRNA RWR. (i) Sequence layer that extracts sequence information from MS similarity through AE. (ii) GCN layer that extracts miRNA function representation from miRNA function similarity network, DS similarity network and MDA network. (iii) Disease RWR layer that provides deep structural information re-representation of DS similarity network through restart random walk representation. (iv) MiRNA RWR layer that provides deep structural information re-representation of miRNA functional similarity network through restart random walk representation.

According to the results, the AUC/AUPR results from Tables 5 and 6 suggest that the most important component of DAmiRLocGNet is GCN layer, as it significantly improves the model performance. Furthermore, the overall predictive capability of the model is improved by incorporating the DS similarity network suggesting that miRNA and disease association networks play a crucial role. While RWR layers of miRNA and disease are important, the results indicate that it is not as crucial as the GCN layer and the DS similarity layer. In conclusion, although each component of DAmiRLocGNet can improve the model performance, the layer associated with the DS similarity layer is the crucial and essential ingredient. The average performance of different feature components of DAmiRLocGNet at six subcellular locations can be seen in Supplementary Figure S8, from which we can see that the feature importance rank as follows: miRNA functional representation features extracted from MDAs, miRNA Functional similarity and MS features.

Table 5

The performance of DAmiRLocGNet on AUC

Ablation settingsCytopExosNucleoNucleEVMicrovMitochon
abcd
0.57890.56110.40120.46540.4690.40110.5108
0.85170.59950.92060.78740.83380.59480.8236
0.86320.66190.93570.79390.84670.64870.8384
0.84920.59230.92920.78170.84140.56090.8303
0.86410.68280.93290.79780.83840.66140.8336
Ablation settingsCytopExosNucleoNucleEVMicrovMitochon
abcd
0.57890.56110.40120.46540.4690.40110.5108
0.85170.59950.92060.78740.83380.59480.8236
0.86320.66190.93570.79390.84670.64870.8384
0.84920.59230.92920.78170.84140.56090.8303
0.86410.68280.93290.79780.83840.66140.8336

a: Sequence layer; b: GCN layer; c: miRNA RWR layer; d: disease RWR layer. Cytop: cytoplasm, Exos: exosome, Nucleo: nucleolus, Nucle: nucleus, EV: extracellular vesicle, Microv: microvesicle and Mitochon: mitochondrion.

Table 5

The performance of DAmiRLocGNet on AUC

Ablation settingsCytopExosNucleoNucleEVMicrovMitochon
abcd
0.57890.56110.40120.46540.4690.40110.5108
0.85170.59950.92060.78740.83380.59480.8236
0.86320.66190.93570.79390.84670.64870.8384
0.84920.59230.92920.78170.84140.56090.8303
0.86410.68280.93290.79780.83840.66140.8336
Ablation settingsCytopExosNucleoNucleEVMicrovMitochon
abcd
0.57890.56110.40120.46540.4690.40110.5108
0.85170.59950.92060.78740.83380.59480.8236
0.86320.66190.93570.79390.84670.64870.8384
0.84920.59230.92920.78170.84140.56090.8303
0.86410.68280.93290.79780.83840.66140.8336

a: Sequence layer; b: GCN layer; c: miRNA RWR layer; d: disease RWR layer. Cytop: cytoplasm, Exos: exosome, Nucleo: nucleolus, Nucle: nucleus, EV: extracellular vesicle, Microv: microvesicle and Mitochon: mitochondrion.

Table 6

The performance of DAmiRLocGNet on AUPR

Ablation settingsCytopExosNucleoNucleEVMicrovMitochon
abcd
0.48630.88840.15810.58480.15020.82240.3812
0.74410.88820.50820.78600.44900.85530.6737
0.76810.90920.60050.79360.50130.87840.6968
0.73810.89130.53590.78130.43810.84800.6774
0.76870.91410.57840.79460.46130.87970.6963
Ablation settingsCytopExosNucleoNucleEVMicrovMitochon
abcd
0.48630.88840.15810.58480.15020.82240.3812
0.74410.88820.50820.78600.44900.85530.6737
0.76810.90920.60050.79360.50130.87840.6968
0.73810.89130.53590.78130.43810.84800.6774
0.76870.91410.57840.79460.46130.87970.6963

a: Sequence layer; b: GCN layer; c: miRNA RWR layer; d: disease RWR layer. Cytop: cytoplasm, Exos: exosome, Nucleo: nucleolus, Nucle: nucleus, EV: extracellular vesicle, Microv: microvesicle and Mitochon: mitochondrion.

Table 6

The performance of DAmiRLocGNet on AUPR

Ablation settingsCytopExosNucleoNucleEVMicrovMitochon
abcd
0.48630.88840.15810.58480.15020.82240.3812
0.74410.88820.50820.78600.44900.85530.6737
0.76810.90920.60050.79360.50130.87840.6968
0.73810.89130.53590.78130.43810.84800.6774
0.76870.91410.57840.79460.46130.87970.6963
Ablation settingsCytopExosNucleoNucleEVMicrovMitochon
abcd
0.48630.88840.15810.58480.15020.82240.3812
0.74410.88820.50820.78600.44900.85530.6737
0.76810.90920.60050.79360.50130.87840.6968
0.73810.89130.53590.78130.43810.84800.6774
0.76870.91410.57840.79460.46130.87970.6963

a: Sequence layer; b: GCN layer; c: miRNA RWR layer; d: disease RWR layer. Cytop: cytoplasm, Exos: exosome, Nucleo: nucleolus, Nucle: nucleus, EV: extracellular vesicle, Microv: microvesicle and Mitochon: mitochondrion.

The effectiveness of GCN layers

The proposed method utilizes GCN layers to aggregate information from MDA neighbor nodes, and obtain a comprehensive miRNA function vector representation for miRNA subcellular localization prediction. However, the number of GCN layers can affect the prediction performance. We further evaluated the effectiveness of various GCN layer numbers on AUC and AUPR, and the result is illustrated in Tables 7 and 8.

Table 7

The effectiveness of the GCN layers of DAmiRLocGNet in terms of AUC

GCN layer numberCytoplasmExosomeNucleolusNucleusExtracellular vesicleMicrovesicleMitochondrion
00.52940.51530.54060.49750.54680.49050.5046
10.86060.70510.92890.79600.83500.67570.8332
20.84070.55630.90400.77260.82790.53310.8207
30.54040.45270.45920.49830.45840.47810.4876
GCN layer numberCytoplasmExosomeNucleolusNucleusExtracellular vesicleMicrovesicleMitochondrion
00.52940.51530.54060.49750.54680.49050.5046
10.86060.70510.92890.79600.83500.67570.8332
20.84070.55630.90400.77260.82790.53310.8207
30.54040.45270.45920.49830.45840.47810.4876
Table 7

The effectiveness of the GCN layers of DAmiRLocGNet in terms of AUC

GCN layer numberCytoplasmExosomeNucleolusNucleusExtracellular vesicleMicrovesicleMitochondrion
00.52940.51530.54060.49750.54680.49050.5046
10.86060.70510.92890.79600.83500.67570.8332
20.84070.55630.90400.77260.82790.53310.8207
30.54040.45270.45920.49830.45840.47810.4876
GCN layer numberCytoplasmExosomeNucleolusNucleusExtracellular vesicleMicrovesicleMitochondrion
00.52940.51530.54060.49750.54680.49050.5046
10.86060.70510.92890.79600.83500.67570.8332
20.84070.55630.90400.77260.82790.53310.8207
30.54040.45270.45920.49830.45840.47810.4876
Table 8

The effectiveness of the GCN layers of DAmiRLocGNet in terms of AUPR

GCN layer numberCytoplasmExosomeNucleolusNucleusExtracellular vesicleMicrovesicleMitochondrion
00.37250.86760.10750.52280.16920.82100.2973
10.76360.92480.57390.79610.46190.88830.6882
20.74340.87780.45670.76450.40910.83270.6572
30.38710.86370.17700.57020.38050.82440.3501
GCN layer numberCytoplasmExosomeNucleolusNucleusExtracellular vesicleMicrovesicleMitochondrion
00.37250.86760.10750.52280.16920.82100.2973
10.76360.92480.57390.79610.46190.88830.6882
20.74340.87780.45670.76450.40910.83270.6572
30.38710.86370.17700.57020.38050.82440.3501
Table 8

The effectiveness of the GCN layers of DAmiRLocGNet in terms of AUPR

GCN layer numberCytoplasmExosomeNucleolusNucleusExtracellular vesicleMicrovesicleMitochondrion
00.37250.86760.10750.52280.16920.82100.2973
10.76360.92480.57390.79610.46190.88830.6882
20.74340.87780.45670.76450.40910.83270.6572
30.38710.86370.17700.57020.38050.82440.3501
GCN layer numberCytoplasmExosomeNucleolusNucleusExtracellular vesicleMicrovesicleMitochondrion
00.37250.86760.10750.52280.16920.82100.2973
10.76360.92480.57390.79610.46190.88830.6882
20.74340.87780.45670.76450.40910.83270.6572
30.38710.86370.17700.57020.38050.82440.3501

As shown in Tables 7 and 8, these results indicate that: (i) DAmiRLocGNet becomes a poor predictor without the GCN module, while it performs better when the disease association network extracted by the GCN module is added; (ii) The optimal performance of DAmiRLocGNet is achieved when a single GCN layer is added, and performance decreases when more than one layer is added. The possible reason for this is that in a network of miRNA and gene interactions, information related to a specific miRNA and gene is dispersed within a limited neighborhood of nodes. This neighborhood reflects a unique set of interactions, including proximate and remote ones. The proximate nodes represent directly related nodes, while the remote nodes outside this neighborhood may contain irrelevant and potentially misleading information. Therefore, DAmiRLocGNet with more GCN layers tends to utilize more remote nodes, ultimately compromising the predictive ability of the model.

CONCLUSION

In this study, we proposed DAmiRLocGNet, a GCN- and AE-based deep learning method for identifying subcellular localizations of miRNAs. The results indicate that DAmiRLocGNet outperforms other competing approaches. The superior performance is attributed to three main factors: (i) the use of MDA and DS information, which allows us to construct a comprehensive miRNA functional representation that covers more discriminative interactions informative; (ii) the efficient extraction of miRNA functional structure information and sequence association correlation using GCN- and AE-based method; and (iii) the use of fully connected networks to extract the MDA, DS and MS features, which enables us to effectively capture meaningful and discriminative vectorial representations for prediction. The datasets and source code of DAmiRLocGNet are accessed online at http://bliulab.net/DAmiRLocGNet.

While DAmiRLocGNet has shown promising results in miRNA subcellular localization prediction, some limitations should be considered. (i) The performance of DAmiRLocGNet is dependent on the quality of functional-similarity-networks-based representations. Improvements in the accuracy of these functional-similarity-networks-based representations could lead to better prediction results. (ii) Incorporating additional sequence-based features, such as evolutionary conservation, physiochemical properties and amino acid composition, may help to improve the accuracy of subcellular localization prediction. We intend to address these limitations in future research and improvements.

Key Points
  • In this study, we proposed DAmiRLocGNet, a GCN and AE-based method to identify the subcellular localizations of miRNA.

  • DamiRLocGNet incorporates sequence information, DS information and MDA information into the GCN for comprehensive miRNA potential functional representation.

  • Experimental results show that DAmiRLocGNet is superior to other competing computational approaches. The datasets and source code of DAmiRLocGNet are accessed online at http://bliulab.net/DAmiRLocGNet.

FUNDING

National Natural Science Foundation of China (Nos. U22A2039, 62271049 and 62250028).

CODE AND DATA AVAILABILITY

The datasets and source code of DAmiRLocGNet are accessed online at http://bliulab.net/DAmiRLocGNet.

Author Biographies

Tao Bai is a doctoral candidate at the School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China. He is also a lecturer at the School of Mathematics & Computer Science, Yan’an University, Shaanxi, 716000, China. His research interests include bioinformatics, natural language processing and machine learning.

Ke Yan is currently an assistant professor at the School of Computer Science and Technology, Beijing Institute of Technology University, Beijing, China. His research interests include bioinformatics, pattern recognition and machine learning.

Bin Liu, PhD, is a professor at the School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China. His expertise is in bioinformatics, natural language processing and machine learning.

REFERENCES

1.

Chen
 
X
,
Xie
 
D
,
Zhao
 
Q
,
You
 
ZH
.
MicroRNAs and complex diseases: from experimental results to computational models
.
Brief Bioinform
 
2019
;
20
:
515
39
.

2.

Li
 
L
,
Wang
 
Y-T
,
Ji
 
C-M
, et al.  
GCAEMDA: predicting miRNA-disease associations via graph convolutional autoencoder
.
PLoS Comput Biol
 
2021
;
17
:
e1009655
.

3.

Lu
 
M
,
Zhang
 
Q
,
Deng
 
M
, et al.  
An analysis of human microRNA and disease associations
.
PloS One
 
2008
;
3
:
e3420
.

4.

Trabucchi
 
M
,
Mategot
 
R
.
Subcellular heterogeneity of the microRNA machinery
.
Trends Genet
 
2019
;
35
:
15
28
.

5.

Ha
 
M
,
Kim
 
VN
.
Regulation of microRNA biogenesis
.
Nat Rev Mol Cell Biol
 
2014
;
15
:
509
24
.

6.

Vilimova
 
M
,
Pfeffer
 
S
.
Post-transcriptional regulation of polycistronic microRNAs
.
Wiley Interdiscip Rev RNA
 
2022
;
14
:
e1749
.

7.

Winter
 
J
,
Jung
 
S
,
Keller
 
S
, et al.  
Many roads to maturity: microRNA biogenesis pathways and their regulation
.
Nat Cell Biol
 
2009
;
11
:
228
34
.

8.

Li
 
Z
,
Li
 
J
,
Nie
 
R
, et al.  
A graph auto-encoder model for miRNA-disease associations prediction
.
Brief Bioinform
 
2021
;
22
:
bbaa240
.

9.

Jie
 
M
,
Feng
 
T
,
Huang
 
W
, et al.  
Subcellular localization of miRNAs and implications in cellular homeostasis
.
Genes (Basel)
 
2021
;
12
:
856
.

10.

Zhang
 
ZY
,
Ning
 
L
,
Ye
 
X
, et al.  
iLoc-miRNA: extracellular/intracellular miRNA prediction using deep BiLSTM with attention mechanism
.
Brief Bioinform
 
2022
;
23
:
bbac395
.

11.

Asim
 
MN
,
Ibrahim
 
MA
,
Imran Malik
 
M
, et al.  
Advances in computational methodologies for classification and sub-cellular locality prediction of non-coding RNAs
.
Int J Mol Sci
 
2021
;
22
:
8719
.

12.

Yu
 
L
,
Ju
 
B
,
Ren
 
S
.
HLGNN-MDA: heuristic learning based on graph neural networks for miRNA–disease association prediction
.
Int J Mol Sci
 
2022
;
23
:
13155
.

13.

Chu
 
Y
,
Wang
 
X
,
Dai
 
Q
, et al.  
MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph
.
Brief Bioinform
 
2021
;
22
:
bbab165
.

14.

Gong
 
Y
,
Niu
 
Y
,
Zhang
 
W
,
Li
 
X
.
A network embedding-based multiple information integration method for the MiRNA-disease association prediction
.
BMC Bioinform
 
2019
;
20
:
468
.

15.

Wang
 
J
,
Li
 
J
,
Yue
 
K
, et al.  
NMCMDA: neural multicategory MiRNA-disease association prediction
.
Brief Bioinform
 
2021
;
22
:
bbab074
.

16.

Liu
 
W
,
Lin
 
H
,
Huang
 
L
, et al.  
Identification of miRNA–disease associations via deep forest ensemble learning based on autoencoder
.
Brief Bioinform
 
2022
;
23
:
bbac104
.

17.

Meher
 
PK
,
Satpathy
 
S
,
Rao
 
AR
.
miRNALoc: predicting miRNA subcellular localizations based on principal component scores of physico-chemical properties and pseudo compositions of di-nucleotides
.
Sci Rep
 
2020
;
10
:
1
12
.

18.

Asim
 
MN
,
Malik
 
MI
,
Zehe
 
C
, et al.  
MirLocPredictor: a ConvNet-based multi-label MicroRNA subcellular localization predictor by incorporating k-Mer positional information
.
Genes
 
2020
;
11
:
1475
.

19.

Asim
 
MN
,
Ibrahim
 
MA
,
Zehe
 
C
, et al.  L2S-MirLoc: a lightweight two stage MiRNA sub-cellular localization prediction framework. In:
2021 International Joint Conference on Neural Networks (IJCNN)
. pp.
1
8
.
IEEE
,
2021
.

20.

Lee
 
Y
,
Jeon
 
K
,
Lee
 
J-T
, et al.  
MicroRNA maturation: stepwise processing and subcellular localization
.
EMBO J
 
2002
;
21
:
4663
70
.

21.

Xu
 
M
,
Chen
 
Y
,
Xu
 
Z
, et al.  
MiRLoc: predicting miRNA subcellular localization by incorporating miRNA–mRNA interactions and mRNA subcellular localization
.
Brief Bioinform
 
2022
;
23
:
bbac044
.

22.

Xiao
 
Y
,
Cai
 
J
,
Yang
 
Y
, et al.  Prediction of microrna subcellular localization by using a sequence-to-sequence model. In:
2018 IEEE International Conference on Data Mining (ICDM)
. pp.
1332
7
.
IEEE
,
2018
.

23.

Bonidia
 
RP
,
Domingues
 
DS
,
Sanches
 
DS
,
de
 
Carvalho
 
A
.
MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors
.
Brief Bioinform
 
2022
;
23
:bbab434.

24.

Chen
 
Z
,
Zhao
 
P
,
Li
 
C
, et al.  
iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization
.
Nucleic Acids Res
 
2021
;
49
:
e60
.

25.

Li
 
HL
,
Pang
 
YH
,
Liu
 
B
.
BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
.
Nucleic Acids Res
 
2021
;
49
:
e129
.

26.

Liu
 
B
,
Gao
 
X
,
Zhang
 
H
.
BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches
.
Nucleic Acids Res
 
2019
;
47
:
e127
.

27.

Yang
 
Y
,
Fu
 
X
,
Qu
 
W
, et al.  
MiRGOFS: a GO-based functional similarity measurement for miRNAs, with applications to the prediction of miRNA subcellular localization and miRNA–disease association
.
Bioinformatics
 
2018
;
34
:
3547
56
.

28.

Zheng
 
K
,
You
 
Z-H
,
Wang
 
L
, et al.  
MLMDA: a machine learning approach to predict and validate MicroRNA–disease associations by integrating of heterogenous information sources
.
J Transl Med
 
2019
;
17
:
1
14
.

29.

Lou
 
Z
,
Cheng
 
Z
,
Li
 
H
, et al.  
Predicting miRNA–disease associations via learning multimodal networks and fusing mixed neighborhood information
.
Brief Bioinform
 
2022
;
23
:
bbac159
.

30.

Shang
 
J
,
Yang
 
Y
,
Li
 
F
, et al.  
BLNIMDA: identifying miRNA-disease associations based on weighted bi-level network
.
BMC Genomics
 
2022
;
23
:
1
12
.

31.

Shen
 
Y
,
Liu
 
J-X
,
Yin
 
M-M
, et al.  
BMPMDA: prediction of MiRNA-disease associations using a space projection model based on block matrix
.
Interdiscip Sci Comput Life Sci
 
2022
;
15
:
1
12
.

32.

Liu
 
B
,
Zhu
 
X
,
Zhang
 
L
, et al.  
Combined embedding model for MiRNA-disease association prediction
.
BMC Bioinform
 
2021
;
22
:
1
22
.

33.

Chen
 
X
,
Xie
 
D
,
Wang
 
L
, et al.  
BNPMDA: bipartite network projection for MiRNA–disease association prediction
.
Bioinformatics
 
2018
;
34
:
3178
86
.

34.

Dong
 
Y
,
Sun
 
Y
,
Qin
 
C
,
Zhu
 
W
.
EPMDA: edge perturbation based method for miRNA-disease association prediction
.
IEEE/ACM Trans Comput Biol Bioinform
 
2019
;
17
:
2170
5
.

35.

Fu
 
L
,
Peng
 
Q
.
A deep ensemble model to predict miRNA-disease association
.
Sci Rep
 
2017
;
7
:
1
13
.

36.

Hou
 
J
,
Wei
 
H
,
Liu
 
B
.
iPiDA-GCN: identification of piRNA-disease associations based on graph convolutional network
.
PLoS Comput Biol
 
2022
;
18
:
e1010671
.

37.

Zhang
 
M
,
Chen
 
Y
.
Inductive matrix completion based on graph neural networks
.
arXiv preprint arXiv:1904.12058 2019
.

38.

Thafar
 
MA
,
Olayan
 
RS
,
Ashoor
 
H
, et al.  
DTiGEMS+: drug–target interaction prediction using graph embedding, graph mining, and similarity-based techniques
.
J Chem
 
2020
;
12
:
1
17
.

39.

Liu
 
L
,
Mamitsuka
 
H
,
Zhu
 
S
.
HPOFiller: identifying missing protein–phenotype associations by graph convolutional network
.
Bioinformatics
 
2021
;
37
:
3328
36
.

40.

Kipf
 
TN
,
Welling
 
M
.
Semi-supervised classification with graph convolutional networks
.
arXiv preprint arXiv:1609.02907 2016
.

41.

Cui
 
T
,
Dou
 
Y
,
Tan
 
P
, et al.  
RNALocate v2.0: an updated resource for RNA subcellular localization with increased coverage and annotation
.
Nucleic Acids Res
 
2021
;
50
:
333
9
.

42.

Huang
 
Z
,
Shi
 
J
,
Gao
 
Y
, et al.  
HMDD v3.0: a database for experimentally supported human microRNA-disease associations
.
Nucleic Acids Res
 
2019
;
47
:
D1013
7
.

43.

Li
 
J
,
Zhang
 
S
,
Liu
 
T
, et al.  
Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction
.
Bioinformatics
 
2020
;
36
:
2538
46
.

44.

Nilsson
 
A
,
Bonander
 
C
,
Stromberg
 
U
,
Bjork
 
J
.
A directed acyclic graph for interactions
.
Int J Epidemiol
 
2021
;
50
:
613
9
.

45.

Wang
 
D
,
Wang
 
J
,
Lu
 
M
, et al.  
Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases
.
Bioinformatics
 
2010
;
26
:
1644
50
.

46.

Van Laarhoven
 
T
,
Nabuurs
 
SB
,
Marchiori
 
E
.
Gaussian interaction profile kernels for predicting drug–target interaction
.
Bioinformatics
 
2011
;
27
:
3036
43
.

47.

Lei
 
X
,
Mudiyanselage
 
TB
,
Zhang
 
Y
, et al.  
A comprehensive survey on computational methods of non-coding RNA and disease association prediction
.
Brief Bioinform
 
2021
;
22
:
bbaa350
.

48.

Zhang
 
W
,
Hou
 
J
,
Liu
 
B
.
iPiDA-LTR: identifying piwi-interacting RNA-disease associations based on learning to rank
.
PLoS Comput Biol
 
2022
;
18
:
e1010404
.

49.

Smith
 
TF
,
Waterman
 
MS
.
Identification of common molecular subsequences
.
J Mol Biol
 
1981
;
147
:
195
7
.

50.

Tong
 
H
,
Faloutsos
 
C
,
Pan
 
J-Y
. Fast random walk with restart and its applications. In:
Sixth International Conference on Data Mining (ICDM'06)
. pp.
613
22
.
IEEE
,
2006
.

51.

Yan
 
K
,
Lv
 
H
,
Guo
 
Y
, et al.  
TPpred-ATMV: therapeutic peptide prediction by adaptive multi-view tensor learning model
.
Bioinformatics
 
2022
;
38
:
2712
8
.

52.

Yan
 
K
,
Lv
 
H
,
Guo
 
Y
, et al.  
sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure
.
Bioinformatics
 
2023
;
39
:
btac715
.

53.

Tang
 
YJ
,
Pang
 
YH
,
Liu
 
B
.
DeepIDP-2L: protein intrinsically disordered region prediction by combining convolutional attention network and hierarchical attention network
.
Bioinformatics
 
2021
;
38
:
1252
60
.

54.

Pang
 
Y
,
Liu
 
B
.
TransDFL: identification of disordered flexible linkers in proteins by transfer learning
.
Genom Proteom Bioinform
 
2022
. https://doi.org/10.1016/j.gpb.2022.10.004.

55.

Wang
 
N
,
Zhang
 
J
,
Liu
 
B
.
iDRBP-EL: identifying DNA- and RNA- binding proteins based on hierarchical ensemble learning
.
IEEE/ACM Trans Comput Biol Bioinform
 
20
:
432
41
.

56.

Zhang
 
J
,
Yan
 
K
,
Chen
 
Q
,
Liu
 
B
.
PreRBP-TL: prediction of species-specific RNA-binding proteins based on transfer learning
.
Bioinformatics
 
38
:
2135
43
.

57.

Yan
 
J
,
Friedrich
 
S
,
Kurgan
 
L
.
A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues
.
Brief Bioinform
 
2016
;
17
:
88
105
.

58.

Liang
 
Q
,
Zhang
 
W
,
Wu
 
H
,
Liu
 
B
.
LncRNA-disease association identification using graph auto-encoder and learning to rank
.
Brief Bioinform
 
2022
;
24
:
bbac539
.

59.

Zhang
 
W
,
Liu
 
B
.
iSnoDi-LSGT: identifying snoRNA-disease associations based on local similarity constraints and global topological constraints
.
RNA
 
2022
;
28
:
1558
67
.

60.

Wei
 
H
,
Xu
 
Y
,
Liu
 
B
.
iCircDA-LTR: identification of circRNA-disease associations based on learning to rank
.
Bioinformatics
 
2021
;
37
:
3302
10
.

61.

Bi
 
Y
,
Li
 
F
,
Guo
 
X
, et al.  
Clarion is a multi-label problem transformation method for identifying mRNA subcellular localizations
.
Brief Bioinform
 
2022
;
23
:
bbac467
.

62.

Wang
 
D
,
Zhang
 
Z
,
Jiang
 
Y
, et al.  
DM3Loc: multi-label mRNA subcellular localization prediction and analysis based on multi-head self-attention mechanism
.
Nucleic Acids Res
 
2021
;
49
:
e46
.

63.

Yuan
 
GH
,
Wang
 
Y
,
Wang
 
GZ
,
Yang
 
L
.
RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
.
Brief Bioinform
 
2022
;
24
:
bbac509
.

64.

Zhou
 
H
,
Wang
 
H
,
Tang
 
J
, et al.  
Identify ncRNA subcellular localization via graph regularized k-local hyperplane distance nearest neighbor model on multi-kernel learning
.
IEEE/ACM Trans Comput Biol Bioinform
 
2021
;
19
:
3517
29
.

65.

Wang
 
H
,
Ding
 
Y
,
Tang
 
J
, et al.  
Identify RNA-associated subcellular localizations based on multi-label learning using Chou's 5-steps rule
.
BMC Genomics
 
2021
;
22
:
1
14
.

66.

Su
 
ZD
,
Huang
 
Y
,
Zhang
 
ZY
, et al.  
iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC
.
Bioinformatics
 
2018
;
34
:
4196
204
.

67.

Gudenas
 
BL
,
Wang
 
L
.
Prediction of LncRNA subcellular localization with deep learning from sequence features
.
Sci Rep
 
2018
;
8
:
16385
.

68.

Hanley
 
JA
,
McNeil
 
BJ
.
The meaning and use of the area under a receiver operating characteristic (ROC) curve
.
Radiology
 
1982
;
143
:
29
36
.

69.

J.
 
Davis
,
M.
 
Goadrich
, The relationship between Precision-Recall and ROC curves. In:
Proceedings of the 23rd International Conference on Machine Learning, 2006
. pp.
233
40
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)