Abstract

Long noncoding RNAs (lncRNAs) participate in various biological processes and have close linkages with diseases. In vivo and in vitro experiments have validated many associations between lncRNAs and diseases. However, biological experiments are time-consuming and expensive. Here, we introduce LDA-VGHB, an lncRNA–disease association (LDA) identification framework, by incorporating feature extraction based on singular value decomposition and variational graph autoencoder and LDA classification based on heterogeneous Newton boosting machine. LDA-VGHB was compared with four classical LDA prediction methods (i.e. SDLDA, LDNFSGB, IPCARF and LDASR) and four popular boosting models (XGBoost, AdaBoost, CatBoost and LightGBM) under 5-fold cross-validations on lncRNAs, diseases, lncRNA–disease pairs and independent lncRNAs and independent diseases, respectively. It greatly outperformed the other methods with its prominent performance under four different cross-validations on the lncRNADisease and MNDR databases. We further investigated potential lncRNAs for lung cancer, breast cancer, colorectal cancer and kidney neoplasms and inferred the top 20 lncRNAs associated with them among all their unobserved lncRNAs. The results showed that most of the predicted top 20 lncRNAs have been verified by biomedical experiments provided by the Lnc2Cancer 3.0, lncRNADisease v2.0 and RNADisease databases as well as publications. We found that HAR1A, KCNQ1DN, ZFAT-AS1 and HAR1B could associate with lung cancer, breast cancer, colorectal cancer and kidney neoplasms, respectively. The results need further biological experimental validation. We foresee that LDA-VGHB was capable of identifying possible lncRNAs for complex diseases. LDA-VGHB is publicly available at https://github.com/plhhnu/LDA-VGHB.

INTRODUCTION

Long noncoding RNAs (lncRNAs) with more than 200 nucleotides are a key class of genes involved in various biological functions [1]. lncRNAs participate in multiple biological processes including gene transcription and expression, chromatin remodeling, transcriptional and post-transcriptional regulation [2]. Diseases (i.e. immune responses and cancers) may produce when lncRNAs fail to regulate the biological processes. That is, lncRNAs have close linkages with tumorigenesis, progression and drug resistance [3, 4]. Thus, they are a class of potential diagnostic and prognostic biomarkers of complex disease [5–10]. For example, lncRNA MALAT1 can sponge miR-106b-5p to induce the progression of colorectal cancers [11]. The oncogenic affect of lncRNA H19 can be inhibited through the under-regulation of renal carcinoma cells [12]. The lack of FARSA-AS1 can hinder tumor growth and metastasis [13]. The overexpressions of MNX1-AS1 and MALAT1 demonstrate high sensitivity and specificity in multiple tumor tissues [14–16]. MEG3 rs3087918 has been used to reduce the risk of breast cancer [17]. CRNDE promoted the proliferation and metastasis of hepatocellular carcinoma [18]. WWC2-AS1 was highly expressed in radiation-induced intestinal fibrosis [19]. In summary, there are complex associations between lncRNAs and diseases.

With the rapid advance of RNA sequencing technologies, many platforms provide massive RNA-relevant data resources, which significantly improved various association prediction for human cancers [20–24]. However, experimental techniques are high-cost, time-consuming and laborious [25–28]. Notably, LncRNADisease2.0 [29], Lnc2Cancer [30], NRED [31] and MNDR v2.0 [32] provide numerous lncRNA–disease association (LDA) information. Based on these databases, substantial computational methods have been developed. These methods include network-based methods and machine-learning-based methods [25, 33].

Network-based LDA prediction methods first construct a heterogeneous network, and then infer potential LDAs through random walk, label propagation or matrix decomposition. Chen et al. conducted a series of works for LDA prediction based on various biological information [34–37], for example, lncRNA expression profile-based method [34], lncRNA similarity and disease similarity-based method [35], KATZ [36] and micro RNA information-based method [37]. Xie et al. [38–41] presented several LDA prediction methods, HAUBRW [38], LDA-LNSUBRW [39], RWSF-BLP [40] and SSMF-BLNP [41]. HAUBRW [38] incorporated heat spread, probability diffusion and unbalanced bi-random walk. LDA-LNSUBRW [39] combined linear neighborhood similarity and unbalanced bi-random walk. RWSF-BLP [40] used random walk-based multi-similarity fusion with bidirectional label propagation. SSMF-BLNP [41] integrated selective similarity matrix fusion and bidirectional linear neighborhood label propagation. In addition, several network-based methods have been developed to identify potential LDAs. These methods include multi-layer network model (MHRWR) [42], Laplace normalized random walk with restart (LRWRHLDA) [43], weighted graph regularized collaborative matrix factorization (WGRCMF) [44], collaborative matrix factorization with the maximized correntropy (LDCMFC) [45], dual sparse collaborative matrix factorization (WGRCMF) [44] and graph regularized nonnegative matrix factorization (LDGRNMF) [46]. Based on existing studies, Chen et al. [25, 26] summarized LDA identification algorithms and lncRNA function prediction models. Heterogeneous network-based methods can fuse diverse multi-relational data and encode various inter- and intra-relations between lncRNAs and diseases, and thus have obtained an increasing attention in LDA prediction [47–49]. However, network-based methods rely heavily on heterogeneous LDA network and fail to find potential associations for an orphan lncRNA or disease.

Machine learning techniques especially deep learning have obtained wide applications in bioinformatics due to their better classification performance [50–55]. For LDA prediction, the type of methods first extract the features of lncRNAs and diseases, and design machine learning models to find possible LDAs [56–58]. These models include random forest regression [59], bidirectional generative adversarial network (BiGAN) [60], graph convolutional matrix completion (GCRFLDA) [2], graph autoencoder and random forest (GAERF) [61], graph attention network (GANLDA) [62], graph convolution network with conditional random field [63], combination of deep learning and positive-unlabeled learning [64] and heterogeneous graph attention network with meta-paths [65]. Machine learning-based methods efficiently improve LDA prediction; however, they are susceptible to noisy and irrelevant data. In addition, they need to extract the optimal features from biological information and topological structures of lncRNAs and diseases.

To improve the LDA prediction accuracy and identify potential associations for an orphan lncRNA or disease, in this manuscript, we developed LDA-VGHB, a novel method for identifying possible LDA by incorporating LDA feature extraction based on singular value decomposition (SVD) and variational graph auto-encoder (VGAE) and LDA classification based on heterogeneous Newton boosting machine. The LDA-VGHB performance has been validated under 5-fold cross-validations (CVs) on lncRNAs, diseases, lncRNA–disease pairs and independent lncRNAs and independent diseases. LDA-VGHB was able to accurately predict potential linkages between lncRNAs and diseases on the lncRNADisease and MNDR databases.

MATERIAL AND METHODS

Data preparation

Two human LDA datasets were collected [64]. The two datasets are from the lncRNADisease database [66] and the MNDR database [32], respectively. After excluding diseases with irregular names or without MESH information or lncRNAs without sequence information in each dataset, we obtained the two preprocessed LDA datasets. The detailed information about the datasets is shown in Table 1.

Table 1

The information of two LDA datasets

DatasetlncRNAsDiseasesLDAs
lncRNADisease82157605
MNDR891901529
DatasetlncRNAsDiseasesLDAs
lncRNADisease82157605
MNDR891901529
Table 1

The information of two LDA datasets

DatasetlncRNAsDiseasesLDAs
lncRNADisease82157605
MNDR891901529
DatasetlncRNAsDiseasesLDAs
lncRNADisease82157605
MNDR891901529

Consequently, an LDA network with |$n$| lncRNAs and |$m$| diseases is represented as |$\boldsymbol{Y} \in{\Re ^{n \times m}}$|⁠, where each element |$y_{i j}$| is defined by Eq. (1):

(1)

Based on the two datasets, we proposed a novel computational framework LDA-VGHB for predicting possible LDAs. As shown in Figure 1, first, lncRNA features and disease features are extracted by integrating lncRNA and disease similarity computation, linear feature extraction based on SVD and nonlinear feature extraction based on VGAE. Subsequently, unknown lncRNA–disease pairs are classified through a heterogeneous Newton boosting machine.

The pipeline for LDA prediction with SVD, VGAE and heterogeneous Newton boosting machine (LDA-VGHB). (i) Feature extraction. Features of lncRNAs and diseases are extracted by incorporating similarity computation, linear feature extraction based on SVD and nonlinear feature extraction based on VGAE. (ii) LDA classification. A heterogeneous Newton boosting machine is designed to classify unobserved LDAs.
Figure 1

The pipeline for LDA prediction with SVD, VGAE and heterogeneous Newton boosting machine (LDA-VGHB). (i) Feature extraction. Features of lncRNAs and diseases are extracted by incorporating similarity computation, linear feature extraction based on SVD and nonlinear feature extraction based on VGAE. (ii) LDA classification. A heterogeneous Newton boosting machine is designed to classify unobserved LDAs.

Similarity computation

To measure disease similarity, we first calculate their semantic similarity matrix |$\boldsymbol{S}_d^{sem}$| based on their MeSH descriptors using the IDSSIM model [67]. Since several diseases are lack of directed acyclic graph in the MeSH database, we are unable to compute their semantic similarity. Thus, we utilize Gaussian association profile (GAP) kernel similarity [68] as a complement to disease semantic similarity and further measure their similarity. For two diseases |$d_i$| and |$d_j$|⁠, let |${\boldsymbol{Y}}_{.i}$| and |${\boldsymbol{Y}}_{.j}$| be their GAPs, their GAP kernel similarity is defined by Eq. (2):

(2)

where |${\boldsymbol{Y}}_{.i}$| and |${\boldsymbol{Y}}_{.j}$| denote the |$i$|-th and |$j$|-th columns of |$\boldsymbol{Y}$|⁠, respectively.

Semantic similarity and GAP kernel similarity measure disease similarity from biological significance and topological structures, respectively. Subsequently, disease similarity matrix |$\boldsymbol{S}_d$| is constructed by combining the two types of similarities by (3):

(3)

where |$\alpha $| is a weight parameter.

Similarly, lncRNA functional similarity |$\boldsymbol{S}_l^{fun}$| is computed based on disease semantic similarities according to the IDSSIM model [67]. lncRNA GAP kernel similarity matrix |$\boldsymbol{G}_l$| is computed by Eq. (4):

(4)

where |${\boldsymbol{Y}}_{i.}$| and |${\boldsymbol{Y}}_{j.}$| denote the |$i$|-th and |$j$|-th rows of |$\boldsymbol{Y}$|⁠, respectively.

Consequently, lncRNA similarity matrix |$\boldsymbol{S}_l$| is constructed by Eq. (5):

(5)

Feature Extraction

Linear feature extraction

The SVD technique is a generalization of the eigen decomposition [69] and has been widely applied to feature extraction. By eigen decomposition, SVD decomposes a rectangular matrix into two orthogonal matrices and one diagonal matrix. In this study, we use SVD to extract linear features for diseases and lncRNAs. First, the LDA matrix |$\boldsymbol{Y} \in{\Re ^{n \times m}}$| is factorized into three matrices by Eq. (6):

(6)

where |$\boldsymbol{U}\in{R}^{{n\times n}}$| and |$\boldsymbol{V}\in{R}^{{m\times m}}$| are two real matrices, |$\boldsymbol{V}^{T}$| denotes the transpose of |$\boldsymbol{V}$| and |$\Sigma{\in }\boldsymbol{Y}^{{n\times m}}$| is a diagonal matrix where the |$i$|-th element |$\sigma _{i}$| denotes the |$i$|-th singular value of |$\boldsymbol{Y}$| and |$\sigma _{1}\geq \sigma _{2}\geq \cdots \geq \sigma _{n}\geq 0$|⁠.

Next, the |$k$| largest singular values are used to construct an approximation representation by Eq. (7);

(7)

Consequently, |$\boldsymbol{U}_{i}$| and |$\boldsymbol{V}_{j}^{T}$| are applied to characterize linear features of |$l_i$| and |$d_j$|⁠, respectively.

Nonlinear feature extraction

Variational graph autoencoder [70] efficiently combines graph convolutional network (GCN) and autoencoder. It fully utilizes latent variables of variable autoencoder and interpretable latent representation ability of GCN. Thus, it is widely applied to graph-structured data by incorporating graph structure and data distribution [2]. In this section, we use VGAE to extract nonlinear features for diseases and lncRNAs.

GCN [71] implements convolutional operations based on graph structures with non-Euclidean data. It can better extract node features by incorporating neighboring nodes’ characteristics and graph structures. It mainly comprises two categories based on different localized convolutional filter ways: spatial-based methods and spectral-based methods. In comparison with spatial-based methods, spectral-based methods obtain better performance via the spectrum of graph Laplacian [71, 72]. Thus, we use spectral-based methods [73] to extract features for lncRNAs and diseases from their similarity networks.

Let similarity matrix |${\boldsymbol{S}}_l$| denote the adjacency matrix of |$n$| lncRNAs. The initial scalar features of each lncRNA are represented through one corresponding row of the LDA matrix |$\boldsymbol{Y}$|⁠. Consequently, we obtain the initial scalar feature matrix |$\boldsymbol{{X}}_{l}^{(0)}$| of |$n$| lncRNAs. Taken the lncRNA similarity matrix |${\boldsymbol{S}}_l$| and initial scalar feature matrix |$\boldsymbol{{X}}_{l}^{(0)}$| as inputs, at the |${t}$|-th layer, GCN transforms the graph signal |$\boldsymbol{X}_{l}^{(t)}$| into a new signal |$\boldsymbol{X}_{l}^{(t+1)}$| for all lncRNAs by Eq. (8):

(8)

Here, |$ReLU(\cdot )=max(0,\cdot )$| is a nonlinear activation function, |$\tilde{\boldsymbol{S}}_l=\boldsymbol{S}_l+\boldsymbol{I}_N$| is an adjacency matrix corresponding to |$\boldsymbol{S}_l$| with all diagonal element value of 1, i.e. an undirected graph corresponding to |$\boldsymbol{S}_l$| with added self-loop and |$\boldsymbol{I}_N$| is an identity matrix. |$\left [\tilde{\boldsymbol{A}}_{l}\right ]_{i i}=\sum _{j}{[{\tilde{\boldsymbol{S}}_l}]_{ij}}$|⁠, and |$\beta _{l}^{(t)}$| denotes the parameters in the |$t$|-th layer of GCN. |$\boldsymbol{{X}}_{l}^{(t)}\in \Re ^{n\times d}$| denotes the matrix of activations at the |$t$|-th layer with |$\boldsymbol{{X}}_{l}^{(0)}=\boldsymbol{Y}$|⁠.

In encoder, VGAE takes |$\boldsymbol{S}_l$| and |$\boldsymbol{X}_l$| as input and outputs a latent variable by two-layer GCN. The first layer is used to generate a low-dimensional feature matrix |$\tilde{\boldsymbol{X}}_l$| by Eq. (9):

(9)

where |$\boldsymbol{Q}=\tilde{\boldsymbol{A}}_{l}^{-\frac{1}{2}} \tilde{\boldsymbol{S}}_l \tilde{\boldsymbol{A}}_{l}^{-\frac{1}{2}}$|⁠, and |$\boldsymbol{W}_{0}$| denotes the parameters in the first GCN layer.

The second layer is used to generate the data distribution by Eq. (10):

(10)

where |$\mu $| and |$\sigma $| denote the mean and variance of the node vector representation, and |$\boldsymbol{W}_{\mu }$| and |$\boldsymbol{W}_{\sigma }$| denote the corresponding parameters.

Consequently, suppose that |$\varepsilon $| follows the standard normal distribution |$N$|(0, 1), the latent variable |$\boldsymbol{Z}_l$| is obtained by Eq. (11):

(11)

In decoder, VGAE reconstructs adjacency matrix |$\boldsymbol{\hat{S}}_{l}$| by the sigmoid function based on latent variable |$\boldsymbol{Z}_l$| by Eq. (12):

(12)

During the learning, we define the following loss function by Eq. (13):

(13)

where the first term denotes the binary cross-entropy between |${\boldsymbol{S}}_{l}$| and |$\hat{\boldsymbol{S}}_{l}$|⁠, the second term denotes the Kullback–Leibler divergence between posterior probability distribution |$q(\boldsymbol{Z}_l|{\boldsymbol{X}}_l, {\boldsymbol{S}}_l)$| and standard Gaussian distribution |$p(\boldsymbol{Z}_l)$| and |$p(\boldsymbol{S}_l|\boldsymbol{Z}_l)]$| denotes the probability between two nodes computed by the embedded vectors in the graph. Finally, the obtained lncRNA latent variable matrix |$\boldsymbol{Z}_l$| is used to represent their nonlinear features.

Similarly, we characterize the initial scalar features of each disease as one corresponding column of the LDA matrix |$Y$| and obtain the initial scalar feature matrix |$\boldsymbol{X}_d^{(0)}$| of |$m$| diseases. Taken the disease similarity |${\boldsymbol{S}}_d$| and initial scalar feature matrix |$\boldsymbol{{X}}_{d}^{(0)}$| as inputs, the disease nonlinear feature matrix |$\boldsymbol{Z}_d$| is computed through VGAE.

Feature Integration

The linear and nonlinear features of each lncRNA are concatenated as a |$a$|-dimensional vector, and the linear and nonlinear features of each disease are concatenated as a |$b$|-dimensional vector. Finally, an lncRNA–disease pair is represented as a |$k(k=a+b)$|-dimensional vector.

LDA prediction

For a given LDA dataset |$D = (\hat{\boldsymbol{X}}, {\hat{\boldsymbol{Y}}})$| with |$p\,\, (p=n \times m)$| samples (i.e. lncRNA–disease pairs), let |$\hat{\boldsymbol{x}}_i\in \hat{\boldsymbol{X}}$| denote the |$i$|-th training sample with |$k$|-dimensional features, and |$\hat{\boldsymbol{y}}_i\in \hat{\boldsymbol{Y}}$| denote its label. |$\hat{\boldsymbol{y}}_i=1$| if the |$i$|-th lncRNA–disease pair is associated, otherwise |$\hat{\boldsymbol{y}}_i=0$|⁠. Inspired by heterogeneous Newton boosting machine [20, 74], we developed a heterogeneous Newton boosting machine-based LDA prediction model. Subsequently, we build an objective function by Eq. (14):

(14)

where |${\hat{\boldsymbol{y}}}_i$| and |$f(\hat{\boldsymbol{x}}_i)$| indicate the true label and the predicted label of |$\boldsymbol{x}_i$|⁠, respectively. And loss function |$l({\hat{\boldsymbol{y}}}_i,f(\hat{ \boldsymbol{x}}_i))$| is twice differentiable related to |$f(\hat{ \boldsymbol{x}}_i)$|⁠, |$l^{^{\prime}}({\hat{\boldsymbol{y}}}_i,f(\hat{ \boldsymbol{x}}_i))$| and |$l^{^{\prime\prime}}({\hat{\boldsymbol{y}}}_i,f(\hat{ \boldsymbol{x}}_i))$| represent its first and second derivatives, respectively.

At each boosting iteration, let |$\mathcal{H}^{(c)}$| represent the |$c$|-th subclass from |$C$| distinct subclasses defined by Eq. (15):

(15)

where |$\overline{\mathcal{H}}^{(c)}$| indicates a finite class with respect to |$b(\hat{ \boldsymbol{x}}_i)$|⁠: |$\mathbb{R}^{d} \to \mathbb{R}$| satisfying |${\sum \nolimits _{i = 1}^n {b({\hat{ \boldsymbol{x}}_i})} ^2} = 1$|⁠.

For the domain |$\mathcal{F}$| defined by Eq. (16):

(16)

one subclass is randomly selected to construct multiple binary decision trees. Let |$d_{min}$| and |$d_{max}$| denote the minimum and maximum depths among these decision trees, we randomly and uniformly set the maximum depth of each tree to a value between |$d_{min}$| and |$d_{max}$|⁠. Consequently, we obtain |$C=R_d+1$| (⁠|${R_d} = d_{max} - d_{min} + 1$|⁠) unique choices for the subclass. And the probability mass function |$\Phi $| is represented by Eq. (17):

(17)

At the |$k$|-th iteration, assume that |$O_{k}$| (⁠|$O_k=1,2,...,C$|⁠) denote one index with respect to the sampled subclass, the base assumption is built by Eq. (18):

(18)

where |$g_i=l^{^{\prime}}({\hat{\boldsymbol{y}}}_i,f_{k-1}(\hat{ \boldsymbol{x}}_i))$| and |$h_i=l^{^{\prime\prime}}({\hat{\boldsymbol{y}}}_i,f_{k-1}(\hat{ \boldsymbol{x}}_i)) $|⁠.

Lastly, for the |$i$|-th lncRNA–disease pair |$\hat{ \boldsymbol{x}}_i$|⁠, its interaction probability |$f_k(\hat{ \boldsymbol{x}}_i)$| is computed by iterating updating model (19) with a learning rate |$\beta> 0$|⁠:

(19)

RESULTS

Evaluation metrics and experimental setup

Precision, recall, accuracy, F1-score, area under the ROC curve (AUC) and area under the precision-recall curve (AUPR) [52] were used to evaluate the performance of LDA-VGHB with the other four classical LDA prediction models (i.e. SDLDA, LDNFSGB, IPCARF and LDASR) and the other four boosting algorithms (i.e. XGBoost, AdaBoost, CatBoost and LightGBM). Four different 5-fold CVs [75] were repeatedly conducted for 20 times:

5-fold CV on lncRNAs (⁠|$CV_l$|⁠): random rows in an LDA matrix |$\boldsymbol{Y}$| were masked for testing, i.e. 80% of lncRNAs were randomly selected as train set and the remaining was used as test set in each round.

5-fold CV on diseases (⁠|$CV_d$|⁠): random columns in an LDA matrix |$\boldsymbol{Y}$| were masked for testing, i.e. 80% of diseases were randomly selected for train set and the remaining was used as test set in each round.

5-fold CV on lncRNA–disease pairs (⁠|$CV_{ld}$|⁠): random lncRNA–disease pairs in an LDA matrix |$\boldsymbol{Y}$| were masked for testing, i.e. 80% of lncRNA–disease pairs were randomly selected as train set and the remaining was used as test set in each round.

5-fold CV on independent lncRNAs and independent diseases (⁠|$CV_{ind}$|⁠): First, 20% of lncRNAs and 20% of diseases were randomly selected to construct a ‘node test set’. Next, the remaining lncRNAs and diseases were taken as a ‘node train set’. Third, all edges linking a node in the ‘node train set’ with a node in the ‘node test set’ were removed. Finally, one learner was trained only on the ‘node train set’ to find potential LDAs within the ‘node test set’.

The above four CVs refer to association identification for (1) new lncRNAs without any associated disease, (2) new diseases without any associated lncRNA, (3) new lncRNA–disease pairs and (4) new independent lncRNAs and independent diseases, respectively. The average result on the 20 times is used as the final performance.

Baseline methods

LDA prediction models: SDLDA [76] extracts linear and nonlinear features for lncRNAs and diseases by combining SVD and deep learning and then uses a full connection layer with the sigmoid function to classify unknown lncRNA–disease pairs. LDNFSGB [77] first extracts the global and local features for lncRNAs and diseases, and uses autoencoder to reduce the feature dimensions, and finally implements LDA prediction through the gradient boosting algorithm. IPCARF [78] presents an incremental principal component analysis method to select LDA features and uses a random forest to predict potential LDAs. LDASR [79] employs autoencoder to obtain the optimal lncRNA and disease features and uses rotating forest to predict new LDAs. SDLDA [76], LDNFSGB [77], IPCARF [78] and LDASR [79] are state-of-the-art LDA prediction methods.

Boosting algorithms: XGBoost [20, 80] is an Extreme Gradient Boosting model. AdaBoost [81] manifests good generalization ability and low computational complexity. CatBoost [82] is known as categorical boosting algorithm. LightGBM [23, 83] integrates one-side sampling as well as exclusive feature bundling over gradient boosting decision trees. XGBoost [80], AdaBoost [81], CatBoost [82] and LightGBM [83] are powerful boosting models and achieve good predictions in diverse practical tasks.

Performance comparison

To evaluate the LDA-VGHB performance, we compared it with the other four classical LDA prediction methods (i.e. SDLDA [76], LDNFSGB [77], IPCARF [78] and LDASR [79]) and four popular boosting models (i.e. XGBoost [80], AdaBoost [81], CatBoost [82] and LightGBM [83]). We randomly selected negative LDAs with the same number as one of known positive LDAs from unlabeled lncRNA–disease pairs. Tables 25 show the performance of LDA-VGHB, SDLDA, LDNFSGB, LDASR and IPCAF on the lncRNADisease and MNDR databases under four different 5-fold CVs. Figure 2 depicts their receiver operating characteristic (ROC) and precision-recall (PR) curves under the four 5-fold CVs. In addition, Tables S1–S4 in Supplementary Materials give the results under the four different 10-fold CVs.

Table 2

The performance comparison of five LDA prediction methods under |$CV_l$|

DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8514|$\pm $|0.05090.7004|$\pm $|0.06390.4878|$\pm $|0.13090.6726|$\pm $|0.12000.8741|$\pm $|0.0484
MNDR0.9399|$\pm $|0.01540.8552|$\pm $|0.03930.6615|$\pm $|0.09660.8405|$\pm $|0.03000.9250|$\pm $|0.0201
RecalllncRNADisease0.6521|$\pm $|0.07320.6092|$\pm $|0.07900.5721|$\pm $|0.15800.5129|$\pm $|0.09460.7180|$\pm $|0.0713
MNDR0.8239|$\pm $|0.04370.8021|$\pm $|0.04980.6434|$\pm $|0.15450.7358|$\pm $|0.05620.8602|$\pm $|0.0395
AccuracylncRNADisease0.7799|$\pm $|0.03410.6769|$\pm $|0.04230.4906|$\pm $|0.09510.6417|$\pm $|0.05970.8123|$\pm $|0.0384
MNDR0.8857|$\pm $|0.02830.8323|$\pm $|0.02300.6526|$\pm $|0.07750.7972|$\pm $|0.02680.8947|$\pm $|0.0258
F1-scorelncRNADisease0.7365|$\pm $|0.05630.6462|$\pm $|0.04510.5125|$\pm $|0.11000.5668|$\pm $|0.05360.7852|$\pm $|0.0412
MNDR0.8775|$\pm $|0.02780.8260|$\pm $|0.02300.6401|$\pm $|0.10170.7827|$\pm $|0.02600.8908|$\pm $|0.0227
AUClncRNADisease0.8023|$\pm $|0.04770.7346|$\pm $|0.04650.5096|$\pm $|0.14320.7057|$\pm $|0.04200.8814|$\pm $|0.0425
MNDR0.9366|$\pm $|0.01950.8839|$\pm $|0.02700.7104|$\pm $|0.09970.8641|$\pm $|0.02560.9541|$\pm $|0.0200
AUPRlncRNADisease0.8461|$\pm $|0.05530.7239|$\pm $|0.06260.5336|$\pm $|0.14230.6775|$\pm $|0.09710.8949|$\pm $|0.0322
MNDR0.9533|$\pm $|0.01290.8832|$\pm $|0.03070.7128|$\pm $|0.10120.8671|$\pm $|0.02520.9617|$\pm $|0.0131
DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8514|$\pm $|0.05090.7004|$\pm $|0.06390.4878|$\pm $|0.13090.6726|$\pm $|0.12000.8741|$\pm $|0.0484
MNDR0.9399|$\pm $|0.01540.8552|$\pm $|0.03930.6615|$\pm $|0.09660.8405|$\pm $|0.03000.9250|$\pm $|0.0201
RecalllncRNADisease0.6521|$\pm $|0.07320.6092|$\pm $|0.07900.5721|$\pm $|0.15800.5129|$\pm $|0.09460.7180|$\pm $|0.0713
MNDR0.8239|$\pm $|0.04370.8021|$\pm $|0.04980.6434|$\pm $|0.15450.7358|$\pm $|0.05620.8602|$\pm $|0.0395
AccuracylncRNADisease0.7799|$\pm $|0.03410.6769|$\pm $|0.04230.4906|$\pm $|0.09510.6417|$\pm $|0.05970.8123|$\pm $|0.0384
MNDR0.8857|$\pm $|0.02830.8323|$\pm $|0.02300.6526|$\pm $|0.07750.7972|$\pm $|0.02680.8947|$\pm $|0.0258
F1-scorelncRNADisease0.7365|$\pm $|0.05630.6462|$\pm $|0.04510.5125|$\pm $|0.11000.5668|$\pm $|0.05360.7852|$\pm $|0.0412
MNDR0.8775|$\pm $|0.02780.8260|$\pm $|0.02300.6401|$\pm $|0.10170.7827|$\pm $|0.02600.8908|$\pm $|0.0227
AUClncRNADisease0.8023|$\pm $|0.04770.7346|$\pm $|0.04650.5096|$\pm $|0.14320.7057|$\pm $|0.04200.8814|$\pm $|0.0425
MNDR0.9366|$\pm $|0.01950.8839|$\pm $|0.02700.7104|$\pm $|0.09970.8641|$\pm $|0.02560.9541|$\pm $|0.0200
AUPRlncRNADisease0.8461|$\pm $|0.05530.7239|$\pm $|0.06260.5336|$\pm $|0.14230.6775|$\pm $|0.09710.8949|$\pm $|0.0322
MNDR0.9533|$\pm $|0.01290.8832|$\pm $|0.03070.7128|$\pm $|0.10120.8671|$\pm $|0.02520.9617|$\pm $|0.0131
Table 2

The performance comparison of five LDA prediction methods under |$CV_l$|

DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8514|$\pm $|0.05090.7004|$\pm $|0.06390.4878|$\pm $|0.13090.6726|$\pm $|0.12000.8741|$\pm $|0.0484
MNDR0.9399|$\pm $|0.01540.8552|$\pm $|0.03930.6615|$\pm $|0.09660.8405|$\pm $|0.03000.9250|$\pm $|0.0201
RecalllncRNADisease0.6521|$\pm $|0.07320.6092|$\pm $|0.07900.5721|$\pm $|0.15800.5129|$\pm $|0.09460.7180|$\pm $|0.0713
MNDR0.8239|$\pm $|0.04370.8021|$\pm $|0.04980.6434|$\pm $|0.15450.7358|$\pm $|0.05620.8602|$\pm $|0.0395
AccuracylncRNADisease0.7799|$\pm $|0.03410.6769|$\pm $|0.04230.4906|$\pm $|0.09510.6417|$\pm $|0.05970.8123|$\pm $|0.0384
MNDR0.8857|$\pm $|0.02830.8323|$\pm $|0.02300.6526|$\pm $|0.07750.7972|$\pm $|0.02680.8947|$\pm $|0.0258
F1-scorelncRNADisease0.7365|$\pm $|0.05630.6462|$\pm $|0.04510.5125|$\pm $|0.11000.5668|$\pm $|0.05360.7852|$\pm $|0.0412
MNDR0.8775|$\pm $|0.02780.8260|$\pm $|0.02300.6401|$\pm $|0.10170.7827|$\pm $|0.02600.8908|$\pm $|0.0227
AUClncRNADisease0.8023|$\pm $|0.04770.7346|$\pm $|0.04650.5096|$\pm $|0.14320.7057|$\pm $|0.04200.8814|$\pm $|0.0425
MNDR0.9366|$\pm $|0.01950.8839|$\pm $|0.02700.7104|$\pm $|0.09970.8641|$\pm $|0.02560.9541|$\pm $|0.0200
AUPRlncRNADisease0.8461|$\pm $|0.05530.7239|$\pm $|0.06260.5336|$\pm $|0.14230.6775|$\pm $|0.09710.8949|$\pm $|0.0322
MNDR0.9533|$\pm $|0.01290.8832|$\pm $|0.03070.7128|$\pm $|0.10120.8671|$\pm $|0.02520.9617|$\pm $|0.0131
DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8514|$\pm $|0.05090.7004|$\pm $|0.06390.4878|$\pm $|0.13090.6726|$\pm $|0.12000.8741|$\pm $|0.0484
MNDR0.9399|$\pm $|0.01540.8552|$\pm $|0.03930.6615|$\pm $|0.09660.8405|$\pm $|0.03000.9250|$\pm $|0.0201
RecalllncRNADisease0.6521|$\pm $|0.07320.6092|$\pm $|0.07900.5721|$\pm $|0.15800.5129|$\pm $|0.09460.7180|$\pm $|0.0713
MNDR0.8239|$\pm $|0.04370.8021|$\pm $|0.04980.6434|$\pm $|0.15450.7358|$\pm $|0.05620.8602|$\pm $|0.0395
AccuracylncRNADisease0.7799|$\pm $|0.03410.6769|$\pm $|0.04230.4906|$\pm $|0.09510.6417|$\pm $|0.05970.8123|$\pm $|0.0384
MNDR0.8857|$\pm $|0.02830.8323|$\pm $|0.02300.6526|$\pm $|0.07750.7972|$\pm $|0.02680.8947|$\pm $|0.0258
F1-scorelncRNADisease0.7365|$\pm $|0.05630.6462|$\pm $|0.04510.5125|$\pm $|0.11000.5668|$\pm $|0.05360.7852|$\pm $|0.0412
MNDR0.8775|$\pm $|0.02780.8260|$\pm $|0.02300.6401|$\pm $|0.10170.7827|$\pm $|0.02600.8908|$\pm $|0.0227
AUClncRNADisease0.8023|$\pm $|0.04770.7346|$\pm $|0.04650.5096|$\pm $|0.14320.7057|$\pm $|0.04200.8814|$\pm $|0.0425
MNDR0.9366|$\pm $|0.01950.8839|$\pm $|0.02700.7104|$\pm $|0.09970.8641|$\pm $|0.02560.9541|$\pm $|0.0200
AUPRlncRNADisease0.8461|$\pm $|0.05530.7239|$\pm $|0.06260.5336|$\pm $|0.14230.6775|$\pm $|0.09710.8949|$\pm $|0.0322
MNDR0.9533|$\pm $|0.01290.8832|$\pm $|0.03070.7128|$\pm $|0.10120.8671|$\pm $|0.02520.9617|$\pm $|0.0131
Table 3

The performance comparison of five LDA prediction methods under 5-fold |$CV_d$|

DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8854|$\pm $|0.03770.7548|$\pm $|0.06390.5583|$\pm $|0.09100.7462|$\pm $|0.06130.8917|$\pm $|0.0316
MNDR0.9232|$\pm $|0.03310.8005|$\pm $|0.06250.5557|$\pm $|0.14730.7625|$\pm $|0.07490.9300|$\pm $|0.0251
RecalllncRNADisease0.7182|$\pm $|0.06940.7309|$\pm $|0.06460.7538|$\pm $|0.10670.6431|$\pm $|0.07570.8415|$\pm $|0.0449
MNDR0.8579|$\pm $|0.06550.6936|$\pm $|0.07940.5279|$\pm $|0.19690.5758|$\pm $|0.08940.9190|$\pm $|0.0397
AccuracylncRNADisease0.8187|$\pm $|0.02820.7552|$\pm $|0.02910.5766|$\pm $|0.07400.7165|$\pm $|0.03390.8737|$\pm $|0.0177
MNDR0.9043|$\pm $|0.01740.7670|$\pm $|0.04320.5593|$\pm $|0.11590.7010|$\pm $|0.04630.9305|$\pm $|0.0153
F1-scorelncRNADisease0.7917|$\pm $|0.05190.7407|$\pm $|0.05260.6339|$\pm $|0.07150.6873|$\pm $|0.05120.8651|$\pm $|0.0304
MNDR0.8886|$\pm $|0.04750.7402|$\pm $|0.05770.5190|$\pm $|0.14340.6485|$\pm $|0.05550.9242|$\pm $|0.0298
AUClncRNADisease0.8788|$\pm $|0.02740.8329|$\pm $|0.02730.6402|$\pm $|0.10040.7951|$\pm $|0.03170.9406|$\pm $|0.0154
MNDR0.9559|$\pm $|0.01600.8603|$\pm $|0.03630.5992|$\pm $|0.16010.8045|$\pm $|0.03620.9741|$\pm $|0.0106
AUPRlncRNADisease0.8934|$\pm $|0.03870.8163|$\pm $|0.05370.6355|$\pm $|0.12170.7914|$\pm $|0.05420.9429|$\pm $|0.0233
MNDR0.9561|$\pm $|0.03540.8292|$\pm $|0.06800.6040|$\pm $|0.14760.7630|$\pm $|0.07170.9728|$\pm $|0.0204
DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8854|$\pm $|0.03770.7548|$\pm $|0.06390.5583|$\pm $|0.09100.7462|$\pm $|0.06130.8917|$\pm $|0.0316
MNDR0.9232|$\pm $|0.03310.8005|$\pm $|0.06250.5557|$\pm $|0.14730.7625|$\pm $|0.07490.9300|$\pm $|0.0251
RecalllncRNADisease0.7182|$\pm $|0.06940.7309|$\pm $|0.06460.7538|$\pm $|0.10670.6431|$\pm $|0.07570.8415|$\pm $|0.0449
MNDR0.8579|$\pm $|0.06550.6936|$\pm $|0.07940.5279|$\pm $|0.19690.5758|$\pm $|0.08940.9190|$\pm $|0.0397
AccuracylncRNADisease0.8187|$\pm $|0.02820.7552|$\pm $|0.02910.5766|$\pm $|0.07400.7165|$\pm $|0.03390.8737|$\pm $|0.0177
MNDR0.9043|$\pm $|0.01740.7670|$\pm $|0.04320.5593|$\pm $|0.11590.7010|$\pm $|0.04630.9305|$\pm $|0.0153
F1-scorelncRNADisease0.7917|$\pm $|0.05190.7407|$\pm $|0.05260.6339|$\pm $|0.07150.6873|$\pm $|0.05120.8651|$\pm $|0.0304
MNDR0.8886|$\pm $|0.04750.7402|$\pm $|0.05770.5190|$\pm $|0.14340.6485|$\pm $|0.05550.9242|$\pm $|0.0298
AUClncRNADisease0.8788|$\pm $|0.02740.8329|$\pm $|0.02730.6402|$\pm $|0.10040.7951|$\pm $|0.03170.9406|$\pm $|0.0154
MNDR0.9559|$\pm $|0.01600.8603|$\pm $|0.03630.5992|$\pm $|0.16010.8045|$\pm $|0.03620.9741|$\pm $|0.0106
AUPRlncRNADisease0.8934|$\pm $|0.03870.8163|$\pm $|0.05370.6355|$\pm $|0.12170.7914|$\pm $|0.05420.9429|$\pm $|0.0233
MNDR0.9561|$\pm $|0.03540.8292|$\pm $|0.06800.6040|$\pm $|0.14760.7630|$\pm $|0.07170.9728|$\pm $|0.0204
Table 3

The performance comparison of five LDA prediction methods under 5-fold |$CV_d$|

DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8854|$\pm $|0.03770.7548|$\pm $|0.06390.5583|$\pm $|0.09100.7462|$\pm $|0.06130.8917|$\pm $|0.0316
MNDR0.9232|$\pm $|0.03310.8005|$\pm $|0.06250.5557|$\pm $|0.14730.7625|$\pm $|0.07490.9300|$\pm $|0.0251
RecalllncRNADisease0.7182|$\pm $|0.06940.7309|$\pm $|0.06460.7538|$\pm $|0.10670.6431|$\pm $|0.07570.8415|$\pm $|0.0449
MNDR0.8579|$\pm $|0.06550.6936|$\pm $|0.07940.5279|$\pm $|0.19690.5758|$\pm $|0.08940.9190|$\pm $|0.0397
AccuracylncRNADisease0.8187|$\pm $|0.02820.7552|$\pm $|0.02910.5766|$\pm $|0.07400.7165|$\pm $|0.03390.8737|$\pm $|0.0177
MNDR0.9043|$\pm $|0.01740.7670|$\pm $|0.04320.5593|$\pm $|0.11590.7010|$\pm $|0.04630.9305|$\pm $|0.0153
F1-scorelncRNADisease0.7917|$\pm $|0.05190.7407|$\pm $|0.05260.6339|$\pm $|0.07150.6873|$\pm $|0.05120.8651|$\pm $|0.0304
MNDR0.8886|$\pm $|0.04750.7402|$\pm $|0.05770.5190|$\pm $|0.14340.6485|$\pm $|0.05550.9242|$\pm $|0.0298
AUClncRNADisease0.8788|$\pm $|0.02740.8329|$\pm $|0.02730.6402|$\pm $|0.10040.7951|$\pm $|0.03170.9406|$\pm $|0.0154
MNDR0.9559|$\pm $|0.01600.8603|$\pm $|0.03630.5992|$\pm $|0.16010.8045|$\pm $|0.03620.9741|$\pm $|0.0106
AUPRlncRNADisease0.8934|$\pm $|0.03870.8163|$\pm $|0.05370.6355|$\pm $|0.12170.7914|$\pm $|0.05420.9429|$\pm $|0.0233
MNDR0.9561|$\pm $|0.03540.8292|$\pm $|0.06800.6040|$\pm $|0.14760.7630|$\pm $|0.07170.9728|$\pm $|0.0204
DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8854|$\pm $|0.03770.7548|$\pm $|0.06390.5583|$\pm $|0.09100.7462|$\pm $|0.06130.8917|$\pm $|0.0316
MNDR0.9232|$\pm $|0.03310.8005|$\pm $|0.06250.5557|$\pm $|0.14730.7625|$\pm $|0.07490.9300|$\pm $|0.0251
RecalllncRNADisease0.7182|$\pm $|0.06940.7309|$\pm $|0.06460.7538|$\pm $|0.10670.6431|$\pm $|0.07570.8415|$\pm $|0.0449
MNDR0.8579|$\pm $|0.06550.6936|$\pm $|0.07940.5279|$\pm $|0.19690.5758|$\pm $|0.08940.9190|$\pm $|0.0397
AccuracylncRNADisease0.8187|$\pm $|0.02820.7552|$\pm $|0.02910.5766|$\pm $|0.07400.7165|$\pm $|0.03390.8737|$\pm $|0.0177
MNDR0.9043|$\pm $|0.01740.7670|$\pm $|0.04320.5593|$\pm $|0.11590.7010|$\pm $|0.04630.9305|$\pm $|0.0153
F1-scorelncRNADisease0.7917|$\pm $|0.05190.7407|$\pm $|0.05260.6339|$\pm $|0.07150.6873|$\pm $|0.05120.8651|$\pm $|0.0304
MNDR0.8886|$\pm $|0.04750.7402|$\pm $|0.05770.5190|$\pm $|0.14340.6485|$\pm $|0.05550.9242|$\pm $|0.0298
AUClncRNADisease0.8788|$\pm $|0.02740.8329|$\pm $|0.02730.6402|$\pm $|0.10040.7951|$\pm $|0.03170.9406|$\pm $|0.0154
MNDR0.9559|$\pm $|0.01600.8603|$\pm $|0.03630.5992|$\pm $|0.16010.8045|$\pm $|0.03620.9741|$\pm $|0.0106
AUPRlncRNADisease0.8934|$\pm $|0.03870.8163|$\pm $|0.05370.6355|$\pm $|0.12170.7914|$\pm $|0.05420.9429|$\pm $|0.0233
MNDR0.9561|$\pm $|0.03540.8292|$\pm $|0.06800.6040|$\pm $|0.14760.7630|$\pm $|0.07170.9728|$\pm $|0.0204
Table 4

The performance comparison of five LDA prediction methods under 5-fold |$CV_{ld}$|

DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8782|$\pm $|0.03060.7782|$\pm $|0.02700.7069|$\pm $|0.04780.7695|$\pm $|0.03930.8597|$\pm $|0.0269
MNDR0.9178|$\pm $|0.01540.8548|$\pm $|0.01560.7693|$\pm $|0.08500.8553|$\pm $|0.01890.9270|$\pm $|0.0143
RecalllncRNADisease0.7256|$\pm $|0.03760.8169|$\pm $|0.04080.6155|$\pm $|0.06520.6836|$\pm $|0.03420.8388|$\pm $|0.0332
MNDR0.8824|$\pm $|0.01980.8818|$\pm $|0.02040.5034|$\pm $|0.14690.8204|$\pm $|0.02380.9088|$\pm $|0.0169
AccuracylncRNADisease0.8120|$\pm $|0.02160.7916|$\pm $|0.02560.6793|$\pm $|0.04030.7385|$\pm $|0.02830.8504|$\pm $|0.0189
MNDR0.9015|$\pm $|0.01140.8658|$\pm $|0.01270.6793|$\pm $|0.07530.8405|$\pm $|0.01290.9185|$\pm $|0.0110
F1-scorelncRNADisease0.7939|$\pm $|0.02600.7965|$\pm $|0.02620.6563|$\pm $|0.04920.7233|$\pm $|0.02890.8485|$\pm $|0.0198
MNDR0.8996|$\pm $|0.01190.8679|$\pm $|0.01290.5995|$\pm $|0.13120.8371|$\pm $|0.01370.9177|$\pm $|0.0112
AUClncRNADisease0.8774|$\pm $|0.02000.8578|$\pm $|0.02340.7384|$\pm $|0.04660.8133|$\pm $|0.02180.9271|$\pm $|0.0144
MNDR0.9560|$\pm $|0.00810.9346|$\pm $|0.00740.7680|$\pm $|0.08820.9143|$\pm $|0.01120.9722|$\pm $|0.0056
AUPRlncRNADisease0.8952|$\pm $|0.01770.8489|$\pm $|0.02890.7409|$\pm $|0.05150.8131|$\pm $|0.02770.9364|$\pm $|0.0157
MNDR0.9639|$\pm $|0.00630.9273|$\pm $|0.00980.7689|$\pm $|0.09240.9100|$\pm $|0.01360.9761|$\pm $|0.0051
DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8782|$\pm $|0.03060.7782|$\pm $|0.02700.7069|$\pm $|0.04780.7695|$\pm $|0.03930.8597|$\pm $|0.0269
MNDR0.9178|$\pm $|0.01540.8548|$\pm $|0.01560.7693|$\pm $|0.08500.8553|$\pm $|0.01890.9270|$\pm $|0.0143
RecalllncRNADisease0.7256|$\pm $|0.03760.8169|$\pm $|0.04080.6155|$\pm $|0.06520.6836|$\pm $|0.03420.8388|$\pm $|0.0332
MNDR0.8824|$\pm $|0.01980.8818|$\pm $|0.02040.5034|$\pm $|0.14690.8204|$\pm $|0.02380.9088|$\pm $|0.0169
AccuracylncRNADisease0.8120|$\pm $|0.02160.7916|$\pm $|0.02560.6793|$\pm $|0.04030.7385|$\pm $|0.02830.8504|$\pm $|0.0189
MNDR0.9015|$\pm $|0.01140.8658|$\pm $|0.01270.6793|$\pm $|0.07530.8405|$\pm $|0.01290.9185|$\pm $|0.0110
F1-scorelncRNADisease0.7939|$\pm $|0.02600.7965|$\pm $|0.02620.6563|$\pm $|0.04920.7233|$\pm $|0.02890.8485|$\pm $|0.0198
MNDR0.8996|$\pm $|0.01190.8679|$\pm $|0.01290.5995|$\pm $|0.13120.8371|$\pm $|0.01370.9177|$\pm $|0.0112
AUClncRNADisease0.8774|$\pm $|0.02000.8578|$\pm $|0.02340.7384|$\pm $|0.04660.8133|$\pm $|0.02180.9271|$\pm $|0.0144
MNDR0.9560|$\pm $|0.00810.9346|$\pm $|0.00740.7680|$\pm $|0.08820.9143|$\pm $|0.01120.9722|$\pm $|0.0056
AUPRlncRNADisease0.8952|$\pm $|0.01770.8489|$\pm $|0.02890.7409|$\pm $|0.05150.8131|$\pm $|0.02770.9364|$\pm $|0.0157
MNDR0.9639|$\pm $|0.00630.9273|$\pm $|0.00980.7689|$\pm $|0.09240.9100|$\pm $|0.01360.9761|$\pm $|0.0051
Table 4

The performance comparison of five LDA prediction methods under 5-fold |$CV_{ld}$|

DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8782|$\pm $|0.03060.7782|$\pm $|0.02700.7069|$\pm $|0.04780.7695|$\pm $|0.03930.8597|$\pm $|0.0269
MNDR0.9178|$\pm $|0.01540.8548|$\pm $|0.01560.7693|$\pm $|0.08500.8553|$\pm $|0.01890.9270|$\pm $|0.0143
RecalllncRNADisease0.7256|$\pm $|0.03760.8169|$\pm $|0.04080.6155|$\pm $|0.06520.6836|$\pm $|0.03420.8388|$\pm $|0.0332
MNDR0.8824|$\pm $|0.01980.8818|$\pm $|0.02040.5034|$\pm $|0.14690.8204|$\pm $|0.02380.9088|$\pm $|0.0169
AccuracylncRNADisease0.8120|$\pm $|0.02160.7916|$\pm $|0.02560.6793|$\pm $|0.04030.7385|$\pm $|0.02830.8504|$\pm $|0.0189
MNDR0.9015|$\pm $|0.01140.8658|$\pm $|0.01270.6793|$\pm $|0.07530.8405|$\pm $|0.01290.9185|$\pm $|0.0110
F1-scorelncRNADisease0.7939|$\pm $|0.02600.7965|$\pm $|0.02620.6563|$\pm $|0.04920.7233|$\pm $|0.02890.8485|$\pm $|0.0198
MNDR0.8996|$\pm $|0.01190.8679|$\pm $|0.01290.5995|$\pm $|0.13120.8371|$\pm $|0.01370.9177|$\pm $|0.0112
AUClncRNADisease0.8774|$\pm $|0.02000.8578|$\pm $|0.02340.7384|$\pm $|0.04660.8133|$\pm $|0.02180.9271|$\pm $|0.0144
MNDR0.9560|$\pm $|0.00810.9346|$\pm $|0.00740.7680|$\pm $|0.08820.9143|$\pm $|0.01120.9722|$\pm $|0.0056
AUPRlncRNADisease0.8952|$\pm $|0.01770.8489|$\pm $|0.02890.7409|$\pm $|0.05150.8131|$\pm $|0.02770.9364|$\pm $|0.0157
MNDR0.9639|$\pm $|0.00630.9273|$\pm $|0.00980.7689|$\pm $|0.09240.9100|$\pm $|0.01360.9761|$\pm $|0.0051
DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8782|$\pm $|0.03060.7782|$\pm $|0.02700.7069|$\pm $|0.04780.7695|$\pm $|0.03930.8597|$\pm $|0.0269
MNDR0.9178|$\pm $|0.01540.8548|$\pm $|0.01560.7693|$\pm $|0.08500.8553|$\pm $|0.01890.9270|$\pm $|0.0143
RecalllncRNADisease0.7256|$\pm $|0.03760.8169|$\pm $|0.04080.6155|$\pm $|0.06520.6836|$\pm $|0.03420.8388|$\pm $|0.0332
MNDR0.8824|$\pm $|0.01980.8818|$\pm $|0.02040.5034|$\pm $|0.14690.8204|$\pm $|0.02380.9088|$\pm $|0.0169
AccuracylncRNADisease0.8120|$\pm $|0.02160.7916|$\pm $|0.02560.6793|$\pm $|0.04030.7385|$\pm $|0.02830.8504|$\pm $|0.0189
MNDR0.9015|$\pm $|0.01140.8658|$\pm $|0.01270.6793|$\pm $|0.07530.8405|$\pm $|0.01290.9185|$\pm $|0.0110
F1-scorelncRNADisease0.7939|$\pm $|0.02600.7965|$\pm $|0.02620.6563|$\pm $|0.04920.7233|$\pm $|0.02890.8485|$\pm $|0.0198
MNDR0.8996|$\pm $|0.01190.8679|$\pm $|0.01290.5995|$\pm $|0.13120.8371|$\pm $|0.01370.9177|$\pm $|0.0112
AUClncRNADisease0.8774|$\pm $|0.02000.8578|$\pm $|0.02340.7384|$\pm $|0.04660.8133|$\pm $|0.02180.9271|$\pm $|0.0144
MNDR0.9560|$\pm $|0.00810.9346|$\pm $|0.00740.7680|$\pm $|0.08820.9143|$\pm $|0.01120.9722|$\pm $|0.0056
AUPRlncRNADisease0.8952|$\pm $|0.01770.8489|$\pm $|0.02890.7409|$\pm $|0.05150.8131|$\pm $|0.02770.9364|$\pm $|0.0157
MNDR0.9639|$\pm $|0.00630.9273|$\pm $|0.00980.7689|$\pm $|0.09240.9100|$\pm $|0.01360.9761|$\pm $|0.0051
Table 5

The performance comparison of five LDA prediction methods under 5-fold |$CV_{ind}$|

DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8185|$\pm $|0.09230.6743|$\pm $|0.1170.4995|$\pm $|0.09980.6747|$\pm $|0.10910.8958|$\pm $|0.0744
MNDR0.9314|$\pm $|0.04410.7690|$\pm $|0.10190.5101|$\pm $|0.11870.7529|$\pm $|0.07530.9216|$\pm $|0.0453
RecalllncRNADisease0.6348|$\pm $|0.15930.4921|$\pm $|0.13290.6623|$\pm $|0.17430.4112|$\pm $|0.13140.7214|$\pm $|0.1518
MNDR0.8073|$\pm $|0.11060.5685|$\pm $|0.12740.5610|$\pm $|0.19410.5030|$\pm $|0.12220.8346|$\pm $|0.0834
AccuracylncRNADisease0.7422|$\pm $|0.07460.6242|$\pm $|0.08120.5029|$\pm $|0.12540.6077|$\pm $|0.07480.8150|$\pm $|0.0744
MNDR0.8731|$\pm $|0.05530.7007|$\pm $|0.07640.5197|$\pm $|0.12140.6683|$\pm $|0.06150.8799|$\pm $|0.0374
F1-scorelncRNADisease0.7001|$\pm $|0.12170.5599|$\pm $|0.11710.5664|$\pm $|0.12210.5034|$\pm $|0.11770.7869|$\pm $|0.1092
MNDR0.8600|$\pm $|0.07440.6482|$\pm $|0.11920.5286|$\pm $|0.14520.5958|$\pm $|0.09460.8723|$\pm $|0.0463
AUClncRNADisease0.7749|$\pm $|0.11150.6836|$\pm $|0.08990.5159|$\pm $|0.16790.6642|$\pm $|0.08620.8924|$\pm $|0.0666
MNDR0.9247|$\pm $|0.04190.7851|$\pm $|0.07560.5289|$\pm $|0.16160.7638|$\pm $|0.07450.9576|$\pm $|0.0218
AUPRlncRNADisease0.8285|$\pm $|0.08340.6928|$\pm $|0.09010.5490|$\pm $|0.12830.6603|$\pm $|0.09550.9056|$\pm $|0.0579
MNDR0.9431|$\pm $|0.03330.7593|$\pm $|0.08980.5365|$\pm $|0.12250.7430|$\pm $|0.07110.9621|$\pm $|0.0194
DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8185|$\pm $|0.09230.6743|$\pm $|0.1170.4995|$\pm $|0.09980.6747|$\pm $|0.10910.8958|$\pm $|0.0744
MNDR0.9314|$\pm $|0.04410.7690|$\pm $|0.10190.5101|$\pm $|0.11870.7529|$\pm $|0.07530.9216|$\pm $|0.0453
RecalllncRNADisease0.6348|$\pm $|0.15930.4921|$\pm $|0.13290.6623|$\pm $|0.17430.4112|$\pm $|0.13140.7214|$\pm $|0.1518
MNDR0.8073|$\pm $|0.11060.5685|$\pm $|0.12740.5610|$\pm $|0.19410.5030|$\pm $|0.12220.8346|$\pm $|0.0834
AccuracylncRNADisease0.7422|$\pm $|0.07460.6242|$\pm $|0.08120.5029|$\pm $|0.12540.6077|$\pm $|0.07480.8150|$\pm $|0.0744
MNDR0.8731|$\pm $|0.05530.7007|$\pm $|0.07640.5197|$\pm $|0.12140.6683|$\pm $|0.06150.8799|$\pm $|0.0374
F1-scorelncRNADisease0.7001|$\pm $|0.12170.5599|$\pm $|0.11710.5664|$\pm $|0.12210.5034|$\pm $|0.11770.7869|$\pm $|0.1092
MNDR0.8600|$\pm $|0.07440.6482|$\pm $|0.11920.5286|$\pm $|0.14520.5958|$\pm $|0.09460.8723|$\pm $|0.0463
AUClncRNADisease0.7749|$\pm $|0.11150.6836|$\pm $|0.08990.5159|$\pm $|0.16790.6642|$\pm $|0.08620.8924|$\pm $|0.0666
MNDR0.9247|$\pm $|0.04190.7851|$\pm $|0.07560.5289|$\pm $|0.16160.7638|$\pm $|0.07450.9576|$\pm $|0.0218
AUPRlncRNADisease0.8285|$\pm $|0.08340.6928|$\pm $|0.09010.5490|$\pm $|0.12830.6603|$\pm $|0.09550.9056|$\pm $|0.0579
MNDR0.9431|$\pm $|0.03330.7593|$\pm $|0.08980.5365|$\pm $|0.12250.7430|$\pm $|0.07110.9621|$\pm $|0.0194
Table 5

The performance comparison of five LDA prediction methods under 5-fold |$CV_{ind}$|

DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8185|$\pm $|0.09230.6743|$\pm $|0.1170.4995|$\pm $|0.09980.6747|$\pm $|0.10910.8958|$\pm $|0.0744
MNDR0.9314|$\pm $|0.04410.7690|$\pm $|0.10190.5101|$\pm $|0.11870.7529|$\pm $|0.07530.9216|$\pm $|0.0453
RecalllncRNADisease0.6348|$\pm $|0.15930.4921|$\pm $|0.13290.6623|$\pm $|0.17430.4112|$\pm $|0.13140.7214|$\pm $|0.1518
MNDR0.8073|$\pm $|0.11060.5685|$\pm $|0.12740.5610|$\pm $|0.19410.5030|$\pm $|0.12220.8346|$\pm $|0.0834
AccuracylncRNADisease0.7422|$\pm $|0.07460.6242|$\pm $|0.08120.5029|$\pm $|0.12540.6077|$\pm $|0.07480.8150|$\pm $|0.0744
MNDR0.8731|$\pm $|0.05530.7007|$\pm $|0.07640.5197|$\pm $|0.12140.6683|$\pm $|0.06150.8799|$\pm $|0.0374
F1-scorelncRNADisease0.7001|$\pm $|0.12170.5599|$\pm $|0.11710.5664|$\pm $|0.12210.5034|$\pm $|0.11770.7869|$\pm $|0.1092
MNDR0.8600|$\pm $|0.07440.6482|$\pm $|0.11920.5286|$\pm $|0.14520.5958|$\pm $|0.09460.8723|$\pm $|0.0463
AUClncRNADisease0.7749|$\pm $|0.11150.6836|$\pm $|0.08990.5159|$\pm $|0.16790.6642|$\pm $|0.08620.8924|$\pm $|0.0666
MNDR0.9247|$\pm $|0.04190.7851|$\pm $|0.07560.5289|$\pm $|0.16160.7638|$\pm $|0.07450.9576|$\pm $|0.0218
AUPRlncRNADisease0.8285|$\pm $|0.08340.6928|$\pm $|0.09010.5490|$\pm $|0.12830.6603|$\pm $|0.09550.9056|$\pm $|0.0579
MNDR0.9431|$\pm $|0.03330.7593|$\pm $|0.08980.5365|$\pm $|0.12250.7430|$\pm $|0.07110.9621|$\pm $|0.0194
DatasetSDLDALDNFSGBIPCARFLDASRLDA-VGHB
PrecisionlncRNADisease0.8185|$\pm $|0.09230.6743|$\pm $|0.1170.4995|$\pm $|0.09980.6747|$\pm $|0.10910.8958|$\pm $|0.0744
MNDR0.9314|$\pm $|0.04410.7690|$\pm $|0.10190.5101|$\pm $|0.11870.7529|$\pm $|0.07530.9216|$\pm $|0.0453
RecalllncRNADisease0.6348|$\pm $|0.15930.4921|$\pm $|0.13290.6623|$\pm $|0.17430.4112|$\pm $|0.13140.7214|$\pm $|0.1518
MNDR0.8073|$\pm $|0.11060.5685|$\pm $|0.12740.5610|$\pm $|0.19410.5030|$\pm $|0.12220.8346|$\pm $|0.0834
AccuracylncRNADisease0.7422|$\pm $|0.07460.6242|$\pm $|0.08120.5029|$\pm $|0.12540.6077|$\pm $|0.07480.8150|$\pm $|0.0744
MNDR0.8731|$\pm $|0.05530.7007|$\pm $|0.07640.5197|$\pm $|0.12140.6683|$\pm $|0.06150.8799|$\pm $|0.0374
F1-scorelncRNADisease0.7001|$\pm $|0.12170.5599|$\pm $|0.11710.5664|$\pm $|0.12210.5034|$\pm $|0.11770.7869|$\pm $|0.1092
MNDR0.8600|$\pm $|0.07440.6482|$\pm $|0.11920.5286|$\pm $|0.14520.5958|$\pm $|0.09460.8723|$\pm $|0.0463
AUClncRNADisease0.7749|$\pm $|0.11150.6836|$\pm $|0.08990.5159|$\pm $|0.16790.6642|$\pm $|0.08620.8924|$\pm $|0.0666
MNDR0.9247|$\pm $|0.04190.7851|$\pm $|0.07560.5289|$\pm $|0.16160.7638|$\pm $|0.07450.9576|$\pm $|0.0218
AUPRlncRNADisease0.8285|$\pm $|0.08340.6928|$\pm $|0.09010.5490|$\pm $|0.12830.6603|$\pm $|0.09550.9056|$\pm $|0.0579
MNDR0.9431|$\pm $|0.03330.7593|$\pm $|0.08980.5365|$\pm $|0.12250.7430|$\pm $|0.07110.9621|$\pm $|0.0194
The ROC and PR curves of LDA-VGHB and the other four LDA prediction methods. A-B and C-D, E-F and G-H, I-J and K-L and M-N and O-P denote the ROC and PR curves of five methods on the lncRNADisease and MNDR databases under 5-fold $CV_{l}$, $CV_{d}$, $CV_{ld}$, $CV_{ind}$, respectively.
Figure 2

The ROC and PR curves of LDA-VGHB and the other four LDA prediction methods. A-B and C-D, E-F and G-H, I-J and K-L and M-N and O-P denote the ROC and PR curves of five methods on the lncRNADisease and MNDR databases under 5-fold |$CV_{l}$|⁠, |$CV_{d}$|⁠, |$CV_{ld}$|⁠, |$CV_{ind}$|⁠, respectively.

To assess the performance of LDA-VGHB and the other four LDA prediction methods in predicting potential diseases for a new lncRNA, we considered 5-fold CV on lncRNAs. Under 5-fold CV on lncRNAs (⁠|$CV_l$|⁠), all five LDA prediction methods randomly selected 80% of lncRNAs as train set and used the remaining as test set. As shown in Table 2 and Figure 2, LDA-VGHB obtained the best performance, followed by SDLDA, LDNFSGB, LDASR and IPCAF on the lncRNADisease and MNDR databases. Particularly, LDA-VGHB computed the best AUCs of 0.8814 and 0.9541, outperforming 8.97% and 1.83% than SDLDA on the two datasets, respectively. It also obtained the best AUPRs of 0.8949 and 0.9617, 5.45% and 0.87% better than SDLDA, respectively. In general, LDA-VGHB efficiently found potential associated diseases for a new lncRNA.

To evaluate the performance of LDA-VGHB and the other four LDA prediction methods in predicting potential lncRNAs for a new disease, we considered 5-fold CV on diseases. Under 5-fold CV on diseases (⁠|$CV_d$|⁠), all five LDA prediction methods randomly selected 80% of diseases as train set and used the remaining as test set. As shown in Table 3 and Figure 2, LDA-VGHB significantly outperformed SDLDA, LDNFSGB, LDASR and IPCAF on the two LDA datasets. For example, LDA-VGHB computed the highest AUCs of 0.9406 and 0.9741, was better 6.57% and 1.87% than SDLDA on the two datasets, respectively. It also computed the best AUPRs of 0.9429 and 0.9728, outperforming 5.25% and 1.72% compared with SDLDA, respectively. We found that LDA-VGHB could accurately predict possible lncRNAs for a new disease.

To assess the performance of LDA-VGHB with the other four LDA prediction methods in predicting potential LDAs for lncRNA–disease pairs, we considered 5-fold CV on lncRNA–disease pairs. Under 5-fold CV on lncRNA–disease pairs (⁠|$CV_{ld}$|⁠), all five methods randomly selected 80% of lncRNA–disease pairs as train set and used the remaining as test set. As shown in Table 4 and Figure 2, LDA-VGHB obviously improved LDA identification compared with SDLDA, LDNFSGB, LDASR and IPCAF under majority of conditions. It calculated the AUC values of 0.9271 and 0.9722, 5.36% and 1.67% better than the second-best method on the two datasets, respectively. It also calculated the AUPR values of 0.9364 and 0.9761, 4.40% and 1.25% better than the second-best method, respectively. Consequently, LDA-VGHB more accurately predicted possible LDAs based on known LDAs.

Lastly, to evaluate the performance of LDA-VGHB with the other four LDA prediction methods in predicting potential LDAs for independent lncRNAs and independent diseases, we considered 5-fold CV on independent lncRNAs and independent diseases. Under 5-fold CV on independent lncRNAs and independent diseases |$CV_{ind}$|⁠, first, all five LDA prediction methods randomly selected 20% of lncRNAs and 20% of diseases to construct a ‘node test set’. Next, the five LDA prediction methods took the remaining lncRNAs and diseases as a ‘node train set’, and removed all edges linking a node in the ‘node train set’ with a node in the ‘node test set’. Finally, the five methods were trained only on the ‘node train set’ and were assessed the performance within the ‘node test set’. As shown in Table 5 and Figure 2, LDA-VGHB computed the best recall, accuracy, F1-score, AUC and AUPR on the two LDA datasets. It computed the highest AUCs of 0.8924 and 0.9576, outperforming 13.17% and 3.44% than SDLDA, respectively. It also computed the best AUPRs of 0.9056 and 0.9621, better 8.51% and 1.97% than SDLDA, respectively. The results manifest that LDA-VGHB computed the optimal LDA prediction performance under independent datasets.

Furthermore, boosting is one of the most popular ensemble learning tools and significantly improves classification performance [84, 85]. To evaluate the LDA classification performance of various boosting models, we compared LDA-VGHB with the other four popular boosting algorithms, i.e. XGBoost [86], AdaBoost [81], CatBoost [82] and LightGBM [87] under four different CVs. The four boosting algorithms used the same similarity computation and feature extraction procedures as LDA-VGHB. Their difference is to use different boosting models for classifying unknown lncRNA–disease pairs. The experiments were repeatedly conducted for 20 times. Tables 69 show their LDA prediction performance under 5-fold CVs on lncRNAs, diseases, lncRNA–disease pairs, independent lncRNAs and independent diseases, respectively. The results demonstrate that LDA-VGHB computed the best LDA identification accuracy on the two LDA databases under the four CVs under majority of conditions, thereby elucidating the powerful LDA classification performance of heterogeneous Newton boosting machine.

Table 6

The LDA prediction performance comparison of five boosting models under |$CV_{l}$|

DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8410|$\pm $|0.05760.7497|$\pm $|0.06040.8592|$\pm $|0.05340.8245|$\pm $|0.05530.8741|$\pm $|0.0484
MNDR0.9248 |$\pm $|0.02430.8835|$\pm $|0.03120.9161|$\pm $|0.02880.9255|$\pm $|0.02050.9250|$\pm $|0.0201
RecalllncRNADisease0.7300|$\pm $|0.07780.7898|$\pm $|0.10480.6876|$\pm $|0.08350.7013|$\pm $|0.08140.7180|$\pm $|0.7180
MNDR0.8451|$\pm $|0.04170.8244|$\pm $|0.07250.8440|$\pm $|0.05480.8479|$\pm $|0.03770.8602|$\pm $|0.0395
AccuracylncRNADisease0.8034|$\pm $|0.04130.7747|$\pm $|0.03570.7969|$\pm $|0.04200.7839|$\pm $|0.04090.8123|$\pm $|0.0384
MNDR0.8876|$\pm $|0.03050.8567|$\pm $|0.03780.8832|$\pm $|0.03200.8899|$\pm $|0.02670.8947|$\pm $|0.0258
F1-scorelncRNADisease0.7788|$\pm $|0.05400.7664|$\pm $|0.07570.7609|$\pm $|0.05820.7540|$\pm $|0.05060.7852|$\pm $|0.0412
MNDR0.8826|$\pm $|0.02820.8505|$\pm $|0.03710.8774|$\pm $|0.03240.8845|$\pm $|0.02370.8908|$\pm $|0.0227
AUClncRNADisease0.8785|$\pm $|0.03370.8373|$\pm $|0.04260.8831|$\pm $|0.02750.8466|$\pm $|0.03970.8814|$\pm $|0.0425
MNDR0.9527|$\pm $|0.02070.9095|$\pm $|0.03690.9601|$\pm $|0.01230.9542|$\pm $|0.01910.9541|$\pm $|0.0200
AUPRlncRNADisease0.8720|$\pm $|0.04410.8595|$\pm $|0.06670.8890|$\pm $|0.05160.8322|$\pm $|0.05200.8949|$\pm $|0.0322
MNDR0.9604|$\pm $|0.01440.9310|$\pm $|0.02310.9657|$\pm $|0.01140.9562|$\pm $|0.03220.9617|$\pm $|0.0131
DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8410|$\pm $|0.05760.7497|$\pm $|0.06040.8592|$\pm $|0.05340.8245|$\pm $|0.05530.8741|$\pm $|0.0484
MNDR0.9248 |$\pm $|0.02430.8835|$\pm $|0.03120.9161|$\pm $|0.02880.9255|$\pm $|0.02050.9250|$\pm $|0.0201
RecalllncRNADisease0.7300|$\pm $|0.07780.7898|$\pm $|0.10480.6876|$\pm $|0.08350.7013|$\pm $|0.08140.7180|$\pm $|0.7180
MNDR0.8451|$\pm $|0.04170.8244|$\pm $|0.07250.8440|$\pm $|0.05480.8479|$\pm $|0.03770.8602|$\pm $|0.0395
AccuracylncRNADisease0.8034|$\pm $|0.04130.7747|$\pm $|0.03570.7969|$\pm $|0.04200.7839|$\pm $|0.04090.8123|$\pm $|0.0384
MNDR0.8876|$\pm $|0.03050.8567|$\pm $|0.03780.8832|$\pm $|0.03200.8899|$\pm $|0.02670.8947|$\pm $|0.0258
F1-scorelncRNADisease0.7788|$\pm $|0.05400.7664|$\pm $|0.07570.7609|$\pm $|0.05820.7540|$\pm $|0.05060.7852|$\pm $|0.0412
MNDR0.8826|$\pm $|0.02820.8505|$\pm $|0.03710.8774|$\pm $|0.03240.8845|$\pm $|0.02370.8908|$\pm $|0.0227
AUClncRNADisease0.8785|$\pm $|0.03370.8373|$\pm $|0.04260.8831|$\pm $|0.02750.8466|$\pm $|0.03970.8814|$\pm $|0.0425
MNDR0.9527|$\pm $|0.02070.9095|$\pm $|0.03690.9601|$\pm $|0.01230.9542|$\pm $|0.01910.9541|$\pm $|0.0200
AUPRlncRNADisease0.8720|$\pm $|0.04410.8595|$\pm $|0.06670.8890|$\pm $|0.05160.8322|$\pm $|0.05200.8949|$\pm $|0.0322
MNDR0.9604|$\pm $|0.01440.9310|$\pm $|0.02310.9657|$\pm $|0.01140.9562|$\pm $|0.03220.9617|$\pm $|0.0131
Table 6

The LDA prediction performance comparison of five boosting models under |$CV_{l}$|

DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8410|$\pm $|0.05760.7497|$\pm $|0.06040.8592|$\pm $|0.05340.8245|$\pm $|0.05530.8741|$\pm $|0.0484
MNDR0.9248 |$\pm $|0.02430.8835|$\pm $|0.03120.9161|$\pm $|0.02880.9255|$\pm $|0.02050.9250|$\pm $|0.0201
RecalllncRNADisease0.7300|$\pm $|0.07780.7898|$\pm $|0.10480.6876|$\pm $|0.08350.7013|$\pm $|0.08140.7180|$\pm $|0.7180
MNDR0.8451|$\pm $|0.04170.8244|$\pm $|0.07250.8440|$\pm $|0.05480.8479|$\pm $|0.03770.8602|$\pm $|0.0395
AccuracylncRNADisease0.8034|$\pm $|0.04130.7747|$\pm $|0.03570.7969|$\pm $|0.04200.7839|$\pm $|0.04090.8123|$\pm $|0.0384
MNDR0.8876|$\pm $|0.03050.8567|$\pm $|0.03780.8832|$\pm $|0.03200.8899|$\pm $|0.02670.8947|$\pm $|0.0258
F1-scorelncRNADisease0.7788|$\pm $|0.05400.7664|$\pm $|0.07570.7609|$\pm $|0.05820.7540|$\pm $|0.05060.7852|$\pm $|0.0412
MNDR0.8826|$\pm $|0.02820.8505|$\pm $|0.03710.8774|$\pm $|0.03240.8845|$\pm $|0.02370.8908|$\pm $|0.0227
AUClncRNADisease0.8785|$\pm $|0.03370.8373|$\pm $|0.04260.8831|$\pm $|0.02750.8466|$\pm $|0.03970.8814|$\pm $|0.0425
MNDR0.9527|$\pm $|0.02070.9095|$\pm $|0.03690.9601|$\pm $|0.01230.9542|$\pm $|0.01910.9541|$\pm $|0.0200
AUPRlncRNADisease0.8720|$\pm $|0.04410.8595|$\pm $|0.06670.8890|$\pm $|0.05160.8322|$\pm $|0.05200.8949|$\pm $|0.0322
MNDR0.9604|$\pm $|0.01440.9310|$\pm $|0.02310.9657|$\pm $|0.01140.9562|$\pm $|0.03220.9617|$\pm $|0.0131
DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8410|$\pm $|0.05760.7497|$\pm $|0.06040.8592|$\pm $|0.05340.8245|$\pm $|0.05530.8741|$\pm $|0.0484
MNDR0.9248 |$\pm $|0.02430.8835|$\pm $|0.03120.9161|$\pm $|0.02880.9255|$\pm $|0.02050.9250|$\pm $|0.0201
RecalllncRNADisease0.7300|$\pm $|0.07780.7898|$\pm $|0.10480.6876|$\pm $|0.08350.7013|$\pm $|0.08140.7180|$\pm $|0.7180
MNDR0.8451|$\pm $|0.04170.8244|$\pm $|0.07250.8440|$\pm $|0.05480.8479|$\pm $|0.03770.8602|$\pm $|0.0395
AccuracylncRNADisease0.8034|$\pm $|0.04130.7747|$\pm $|0.03570.7969|$\pm $|0.04200.7839|$\pm $|0.04090.8123|$\pm $|0.0384
MNDR0.8876|$\pm $|0.03050.8567|$\pm $|0.03780.8832|$\pm $|0.03200.8899|$\pm $|0.02670.8947|$\pm $|0.0258
F1-scorelncRNADisease0.7788|$\pm $|0.05400.7664|$\pm $|0.07570.7609|$\pm $|0.05820.7540|$\pm $|0.05060.7852|$\pm $|0.0412
MNDR0.8826|$\pm $|0.02820.8505|$\pm $|0.03710.8774|$\pm $|0.03240.8845|$\pm $|0.02370.8908|$\pm $|0.0227
AUClncRNADisease0.8785|$\pm $|0.03370.8373|$\pm $|0.04260.8831|$\pm $|0.02750.8466|$\pm $|0.03970.8814|$\pm $|0.0425
MNDR0.9527|$\pm $|0.02070.9095|$\pm $|0.03690.9601|$\pm $|0.01230.9542|$\pm $|0.01910.9541|$\pm $|0.0200
AUPRlncRNADisease0.8720|$\pm $|0.04410.8595|$\pm $|0.06670.8890|$\pm $|0.05160.8322|$\pm $|0.05200.8949|$\pm $|0.0322
MNDR0.9604|$\pm $|0.01440.9310|$\pm $|0.02310.9657|$\pm $|0.01140.9562|$\pm $|0.03220.9617|$\pm $|0.0131
Table 7

The LDA prediction performance comparison of five boosting models under |$CV_{d}$|

DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8687|$\pm $|0.03830.7471|$\pm $|0.04850.8813|$\pm $|0.03660.8786|$\pm $|0.04550.8917|$\pm $|0.0316
MNDR0.9220|$\pm $|0.03180.8734|$\pm $|0.05230.9153|$\pm $|0.02800.9157|$\pm $|0.03850.9300|$\pm $|0.0251
RecalllncRNADisease0.8027|$\pm $|0.05150.8292|$\pm $|0.06250.7700|$\pm $|0.07620.8071|$\pm $|0.04780.8415|$\pm $|0.0449
MNDR0.8930|$\pm $|0.04000.8001|$\pm $|0.08550.9052|$\pm $|0.03610.8890|$\pm $|0.04760.9190|$\pm $|0.0397
AccuracylncRNADisease0.8446|$\pm $|0.02320.7791|$\pm $|0.03240.8393|$\pm $|0.02510.8518|$\pm $|0.02310.8737|$\pm $|0.0177
MNDR0.9154|$\pm $|0.01620.8552|$\pm $|0.02330.9155|$\pm $|0.01570.9124|$\pm $|0.01810.9305|$\pm $|0.0153
F1-scorelncRNADisease0.8334|$\pm $|0.03700.7848|$\pm $|0.04730.8198|$\pm $|0.05100.8403|$\pm $|0.03700.8651|$\pm $|0.0304
MNDR0.9070|$\pm $|0.03290.8332|$\pm $|0.06540.9099|$\pm $|0.02800.9019|$\pm $|0.04060.9242|$\pm $|0.0298
AUClncRNADisease0.9075|$\pm $|0.02140.8440|$\pm $|0.03790.9148|$\pm $|0.02210.9118|$\pm $|0.02340.9406|$\pm $|0.0154
MNDR0.9663|$\pm $|0.01080.8915|$\pm $|0.03290.9671|$\pm $|0.01080.9651|$\pm $|0.01320.9741|$\pm $|0.0106
AUPRlncRNADisease0.9063|$\pm $|0.03160.8704|$\pm $|0.04710.9243|$\pm $|0.03190.9125|$\pm $|0.03830.9429|$\pm $|0.0233
MNDR0.9639|$\pm $|0.02540.9051|$\pm $|0.05620.9684|$\pm $|0.01860.9615|$\pm $|0.03470.9728|$\pm $|0.0204
DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8687|$\pm $|0.03830.7471|$\pm $|0.04850.8813|$\pm $|0.03660.8786|$\pm $|0.04550.8917|$\pm $|0.0316
MNDR0.9220|$\pm $|0.03180.8734|$\pm $|0.05230.9153|$\pm $|0.02800.9157|$\pm $|0.03850.9300|$\pm $|0.0251
RecalllncRNADisease0.8027|$\pm $|0.05150.8292|$\pm $|0.06250.7700|$\pm $|0.07620.8071|$\pm $|0.04780.8415|$\pm $|0.0449
MNDR0.8930|$\pm $|0.04000.8001|$\pm $|0.08550.9052|$\pm $|0.03610.8890|$\pm $|0.04760.9190|$\pm $|0.0397
AccuracylncRNADisease0.8446|$\pm $|0.02320.7791|$\pm $|0.03240.8393|$\pm $|0.02510.8518|$\pm $|0.02310.8737|$\pm $|0.0177
MNDR0.9154|$\pm $|0.01620.8552|$\pm $|0.02330.9155|$\pm $|0.01570.9124|$\pm $|0.01810.9305|$\pm $|0.0153
F1-scorelncRNADisease0.8334|$\pm $|0.03700.7848|$\pm $|0.04730.8198|$\pm $|0.05100.8403|$\pm $|0.03700.8651|$\pm $|0.0304
MNDR0.9070|$\pm $|0.03290.8332|$\pm $|0.06540.9099|$\pm $|0.02800.9019|$\pm $|0.04060.9242|$\pm $|0.0298
AUClncRNADisease0.9075|$\pm $|0.02140.8440|$\pm $|0.03790.9148|$\pm $|0.02210.9118|$\pm $|0.02340.9406|$\pm $|0.0154
MNDR0.9663|$\pm $|0.01080.8915|$\pm $|0.03290.9671|$\pm $|0.01080.9651|$\pm $|0.01320.9741|$\pm $|0.0106
AUPRlncRNADisease0.9063|$\pm $|0.03160.8704|$\pm $|0.04710.9243|$\pm $|0.03190.9125|$\pm $|0.03830.9429|$\pm $|0.0233
MNDR0.9639|$\pm $|0.02540.9051|$\pm $|0.05620.9684|$\pm $|0.01860.9615|$\pm $|0.03470.9728|$\pm $|0.0204
Table 7

The LDA prediction performance comparison of five boosting models under |$CV_{d}$|

DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8687|$\pm $|0.03830.7471|$\pm $|0.04850.8813|$\pm $|0.03660.8786|$\pm $|0.04550.8917|$\pm $|0.0316
MNDR0.9220|$\pm $|0.03180.8734|$\pm $|0.05230.9153|$\pm $|0.02800.9157|$\pm $|0.03850.9300|$\pm $|0.0251
RecalllncRNADisease0.8027|$\pm $|0.05150.8292|$\pm $|0.06250.7700|$\pm $|0.07620.8071|$\pm $|0.04780.8415|$\pm $|0.0449
MNDR0.8930|$\pm $|0.04000.8001|$\pm $|0.08550.9052|$\pm $|0.03610.8890|$\pm $|0.04760.9190|$\pm $|0.0397
AccuracylncRNADisease0.8446|$\pm $|0.02320.7791|$\pm $|0.03240.8393|$\pm $|0.02510.8518|$\pm $|0.02310.8737|$\pm $|0.0177
MNDR0.9154|$\pm $|0.01620.8552|$\pm $|0.02330.9155|$\pm $|0.01570.9124|$\pm $|0.01810.9305|$\pm $|0.0153
F1-scorelncRNADisease0.8334|$\pm $|0.03700.7848|$\pm $|0.04730.8198|$\pm $|0.05100.8403|$\pm $|0.03700.8651|$\pm $|0.0304
MNDR0.9070|$\pm $|0.03290.8332|$\pm $|0.06540.9099|$\pm $|0.02800.9019|$\pm $|0.04060.9242|$\pm $|0.0298
AUClncRNADisease0.9075|$\pm $|0.02140.8440|$\pm $|0.03790.9148|$\pm $|0.02210.9118|$\pm $|0.02340.9406|$\pm $|0.0154
MNDR0.9663|$\pm $|0.01080.8915|$\pm $|0.03290.9671|$\pm $|0.01080.9651|$\pm $|0.01320.9741|$\pm $|0.0106
AUPRlncRNADisease0.9063|$\pm $|0.03160.8704|$\pm $|0.04710.9243|$\pm $|0.03190.9125|$\pm $|0.03830.9429|$\pm $|0.0233
MNDR0.9639|$\pm $|0.02540.9051|$\pm $|0.05620.9684|$\pm $|0.01860.9615|$\pm $|0.03470.9728|$\pm $|0.0204
DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8687|$\pm $|0.03830.7471|$\pm $|0.04850.8813|$\pm $|0.03660.8786|$\pm $|0.04550.8917|$\pm $|0.0316
MNDR0.9220|$\pm $|0.03180.8734|$\pm $|0.05230.9153|$\pm $|0.02800.9157|$\pm $|0.03850.9300|$\pm $|0.0251
RecalllncRNADisease0.8027|$\pm $|0.05150.8292|$\pm $|0.06250.7700|$\pm $|0.07620.8071|$\pm $|0.04780.8415|$\pm $|0.0449
MNDR0.8930|$\pm $|0.04000.8001|$\pm $|0.08550.9052|$\pm $|0.03610.8890|$\pm $|0.04760.9190|$\pm $|0.0397
AccuracylncRNADisease0.8446|$\pm $|0.02320.7791|$\pm $|0.03240.8393|$\pm $|0.02510.8518|$\pm $|0.02310.8737|$\pm $|0.0177
MNDR0.9154|$\pm $|0.01620.8552|$\pm $|0.02330.9155|$\pm $|0.01570.9124|$\pm $|0.01810.9305|$\pm $|0.0153
F1-scorelncRNADisease0.8334|$\pm $|0.03700.7848|$\pm $|0.04730.8198|$\pm $|0.05100.8403|$\pm $|0.03700.8651|$\pm $|0.0304
MNDR0.9070|$\pm $|0.03290.8332|$\pm $|0.06540.9099|$\pm $|0.02800.9019|$\pm $|0.04060.9242|$\pm $|0.0298
AUClncRNADisease0.9075|$\pm $|0.02140.8440|$\pm $|0.03790.9148|$\pm $|0.02210.9118|$\pm $|0.02340.9406|$\pm $|0.0154
MNDR0.9663|$\pm $|0.01080.8915|$\pm $|0.03290.9671|$\pm $|0.01080.9651|$\pm $|0.01320.9741|$\pm $|0.0106
AUPRlncRNADisease0.9063|$\pm $|0.03160.8704|$\pm $|0.04710.9243|$\pm $|0.03190.9125|$\pm $|0.03830.9429|$\pm $|0.0233
MNDR0.9639|$\pm $|0.02540.9051|$\pm $|0.05620.9684|$\pm $|0.01860.9615|$\pm $|0.03470.9728|$\pm $|0.0204
Table 8

The LDA prediction performance comparison of five boosting models under |$CV_{ld}$|

DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8852|$\pm $|0.02410.7751|$\pm $|0.02610.8758|$\pm $|0.03230.8655|$\pm $|0.02880.8597|$\pm $|0.0269
MNDR0.9192 |$\pm $|0.01270.8955|$\pm $|0.01780.9079|$\pm $|0.01450.9258|$\pm $|0.01520.9270|$\pm $|0.0143
RecalllncRNADisease0.8407|$\pm $|0.03400.8297|$\pm $|0.02860.8070|$\pm $|0.03740.8303|$\pm $|0.83030.8388|$\pm $|0.0332
MNDR0.9060|$\pm $|0.01670.8113|$\pm $|0.02420.9137|$\pm $|0.01650.8993|$\pm $|0.02000.9088|$\pm $|0.0169
AccuracylncRNADisease0.8655|$\pm $|0.01990.7940|$\pm $|0.02220.8458|$\pm $|0.02580.8501|$\pm $|0.02080.8123|$\pm $|0.0384
MNDR0.9131|$\pm $|0.01020.8581|$\pm $|0.01190.9104|$\pm $|0.01150.9134|$\pm $|0.01110.9185|$\pm $|0.0110
F1-scorelncRNADisease0.8619|$\pm $|0.02150.8011|$\pm $|0.02110.8394|$\pm $|0.02740.8470|$\pm $|0.02140.8485|$\pm $|0.0198
MNDR0.9124|$\pm $|0.01050.8510|$\pm $|0.01350.9106|$\pm $|0.01150.9121|$\pm $|0.01160.9177|$\pm $|0.0112
AUClncRNADisease0.9182|$\pm $|0.01750.8542|$\pm $|0.01780.9195|$\pm $|0.01750.9154|$\pm $|0.01470.9271|$\pm $|0.0144
MNDR0.9661|$\pm $|0.00700.9038|$\pm $|0.01300.9665|$\pm $|0.00680.9716|$\pm $|0.00580.9722|$\pm $|0.0056
AUPRlncRNADisease0.9186|$\pm $|0.01920.8824|$\pm $|0.01550.9301|$\pm $|0.01520.9146|$\pm $|0.02250.9364|$\pm $|0.0157
MNDR0.9690|$\pm $|0.00670.9255|$\pm $|0.00940.9701|$\pm $|0.00610.9742|$\pm $|0.01380.9761|$\pm $|0.0051
DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8852|$\pm $|0.02410.7751|$\pm $|0.02610.8758|$\pm $|0.03230.8655|$\pm $|0.02880.8597|$\pm $|0.0269
MNDR0.9192 |$\pm $|0.01270.8955|$\pm $|0.01780.9079|$\pm $|0.01450.9258|$\pm $|0.01520.9270|$\pm $|0.0143
RecalllncRNADisease0.8407|$\pm $|0.03400.8297|$\pm $|0.02860.8070|$\pm $|0.03740.8303|$\pm $|0.83030.8388|$\pm $|0.0332
MNDR0.9060|$\pm $|0.01670.8113|$\pm $|0.02420.9137|$\pm $|0.01650.8993|$\pm $|0.02000.9088|$\pm $|0.0169
AccuracylncRNADisease0.8655|$\pm $|0.01990.7940|$\pm $|0.02220.8458|$\pm $|0.02580.8501|$\pm $|0.02080.8123|$\pm $|0.0384
MNDR0.9131|$\pm $|0.01020.8581|$\pm $|0.01190.9104|$\pm $|0.01150.9134|$\pm $|0.01110.9185|$\pm $|0.0110
F1-scorelncRNADisease0.8619|$\pm $|0.02150.8011|$\pm $|0.02110.8394|$\pm $|0.02740.8470|$\pm $|0.02140.8485|$\pm $|0.0198
MNDR0.9124|$\pm $|0.01050.8510|$\pm $|0.01350.9106|$\pm $|0.01150.9121|$\pm $|0.01160.9177|$\pm $|0.0112
AUClncRNADisease0.9182|$\pm $|0.01750.8542|$\pm $|0.01780.9195|$\pm $|0.01750.9154|$\pm $|0.01470.9271|$\pm $|0.0144
MNDR0.9661|$\pm $|0.00700.9038|$\pm $|0.01300.9665|$\pm $|0.00680.9716|$\pm $|0.00580.9722|$\pm $|0.0056
AUPRlncRNADisease0.9186|$\pm $|0.01920.8824|$\pm $|0.01550.9301|$\pm $|0.01520.9146|$\pm $|0.02250.9364|$\pm $|0.0157
MNDR0.9690|$\pm $|0.00670.9255|$\pm $|0.00940.9701|$\pm $|0.00610.9742|$\pm $|0.01380.9761|$\pm $|0.0051
Table 8

The LDA prediction performance comparison of five boosting models under |$CV_{ld}$|

DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8852|$\pm $|0.02410.7751|$\pm $|0.02610.8758|$\pm $|0.03230.8655|$\pm $|0.02880.8597|$\pm $|0.0269
MNDR0.9192 |$\pm $|0.01270.8955|$\pm $|0.01780.9079|$\pm $|0.01450.9258|$\pm $|0.01520.9270|$\pm $|0.0143
RecalllncRNADisease0.8407|$\pm $|0.03400.8297|$\pm $|0.02860.8070|$\pm $|0.03740.8303|$\pm $|0.83030.8388|$\pm $|0.0332
MNDR0.9060|$\pm $|0.01670.8113|$\pm $|0.02420.9137|$\pm $|0.01650.8993|$\pm $|0.02000.9088|$\pm $|0.0169
AccuracylncRNADisease0.8655|$\pm $|0.01990.7940|$\pm $|0.02220.8458|$\pm $|0.02580.8501|$\pm $|0.02080.8123|$\pm $|0.0384
MNDR0.9131|$\pm $|0.01020.8581|$\pm $|0.01190.9104|$\pm $|0.01150.9134|$\pm $|0.01110.9185|$\pm $|0.0110
F1-scorelncRNADisease0.8619|$\pm $|0.02150.8011|$\pm $|0.02110.8394|$\pm $|0.02740.8470|$\pm $|0.02140.8485|$\pm $|0.0198
MNDR0.9124|$\pm $|0.01050.8510|$\pm $|0.01350.9106|$\pm $|0.01150.9121|$\pm $|0.01160.9177|$\pm $|0.0112
AUClncRNADisease0.9182|$\pm $|0.01750.8542|$\pm $|0.01780.9195|$\pm $|0.01750.9154|$\pm $|0.01470.9271|$\pm $|0.0144
MNDR0.9661|$\pm $|0.00700.9038|$\pm $|0.01300.9665|$\pm $|0.00680.9716|$\pm $|0.00580.9722|$\pm $|0.0056
AUPRlncRNADisease0.9186|$\pm $|0.01920.8824|$\pm $|0.01550.9301|$\pm $|0.01520.9146|$\pm $|0.02250.9364|$\pm $|0.0157
MNDR0.9690|$\pm $|0.00670.9255|$\pm $|0.00940.9701|$\pm $|0.00610.9742|$\pm $|0.01380.9761|$\pm $|0.0051
DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8852|$\pm $|0.02410.7751|$\pm $|0.02610.8758|$\pm $|0.03230.8655|$\pm $|0.02880.8597|$\pm $|0.0269
MNDR0.9192 |$\pm $|0.01270.8955|$\pm $|0.01780.9079|$\pm $|0.01450.9258|$\pm $|0.01520.9270|$\pm $|0.0143
RecalllncRNADisease0.8407|$\pm $|0.03400.8297|$\pm $|0.02860.8070|$\pm $|0.03740.8303|$\pm $|0.83030.8388|$\pm $|0.0332
MNDR0.9060|$\pm $|0.01670.8113|$\pm $|0.02420.9137|$\pm $|0.01650.8993|$\pm $|0.02000.9088|$\pm $|0.0169
AccuracylncRNADisease0.8655|$\pm $|0.01990.7940|$\pm $|0.02220.8458|$\pm $|0.02580.8501|$\pm $|0.02080.8123|$\pm $|0.0384
MNDR0.9131|$\pm $|0.01020.8581|$\pm $|0.01190.9104|$\pm $|0.01150.9134|$\pm $|0.01110.9185|$\pm $|0.0110
F1-scorelncRNADisease0.8619|$\pm $|0.02150.8011|$\pm $|0.02110.8394|$\pm $|0.02740.8470|$\pm $|0.02140.8485|$\pm $|0.0198
MNDR0.9124|$\pm $|0.01050.8510|$\pm $|0.01350.9106|$\pm $|0.01150.9121|$\pm $|0.01160.9177|$\pm $|0.0112
AUClncRNADisease0.9182|$\pm $|0.01750.8542|$\pm $|0.01780.9195|$\pm $|0.01750.9154|$\pm $|0.01470.9271|$\pm $|0.0144
MNDR0.9661|$\pm $|0.00700.9038|$\pm $|0.01300.9665|$\pm $|0.00680.9716|$\pm $|0.00580.9722|$\pm $|0.0056
AUPRlncRNADisease0.9186|$\pm $|0.01920.8824|$\pm $|0.01550.9301|$\pm $|0.01520.9146|$\pm $|0.02250.9364|$\pm $|0.0157
MNDR0.9690|$\pm $|0.00670.9255|$\pm $|0.00940.9701|$\pm $|0.00610.9742|$\pm $|0.01380.9761|$\pm $|0.0051
Table 9

The LDA prediction performance comparison of five boosting models under |$CV_{ind}$|

DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8531|$\pm $|0.08490.7961|$\pm $|0.09450.8797|$\pm $|0.08620.8555|$\pm $|0.08120.8958|$\pm $|0.0744
MNDR0.9187|$\pm $|0.04450.9081|$\pm $|0.05600.9108|$\pm $|0.05040.9151|$\pm $|0.04950.9216|$\pm $|0.0453
RecalllncRNADisease0.6894|$\pm $|0.15200.7625|$\pm $|0.15920.6482|$\pm $|0.18720.7040|$\pm $|0.15180.7214|$\pm $|0.1518
MNDR0.8234|$\pm $|0.09200.7812|$\pm $|0.13390.8259|$\pm $|0.08490.8141|$\pm $|0.12880.8346|$\pm $|0.0834
AccuracylncRNADisease0.7812|$\pm $|0.07730.7749|$\pm $|0.07070.7762|$\pm $|0.08800.7872|$\pm $|0.06620.8150|$\pm $|0.0744
MNDR0.8745|$\pm $|0.04980.8483|$\pm $|0.06050.8707|$\pm $|0.04390.8678|$\pm $|0.06390.8799|$\pm $|0.0374
F1-scorelncRNADisease0.7503|$\pm $|0.11180.7645|$\pm $|0.09740.7260|$\pm $|0.15660.7596|$\pm $|0.09720.7869|$\pm $|0.1092
MNDR0.8654|$\pm $|0.05940.8313|$\pm $|0.08340.8627|$\pm $|0.05460.8540|$\pm $|0.09450.8723|$\pm $|0.0463
AUClncRNADisease0.8597|$\pm $|0.06460.8257|$\pm $|0.08440.8754|$\pm $|0.06870.8736|$\pm $|0.06300.8924|$\pm $|0.0666
MNDR0.9454|$\pm $|0.03270.9025|$\pm $|0.04920.9436|$\pm $|0.03350.9448|$\pm $|0.041250.9576|$\pm $|0.0218
AUPRlncRNADisease0.8669|$\pm $|0.06900.8611|$\pm $|0.06770.8893|$\pm $|0.06650.8804|$\pm $|0.05780.9056|$\pm $|0.0579
MNDR0.9501|$\pm $|0.03070.9248|$\pm $|0.03650.9488|$\pm $|0.02920.9490|$\pm $|0.03700.9621|$\pm $|0.0194
DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8531|$\pm $|0.08490.7961|$\pm $|0.09450.8797|$\pm $|0.08620.8555|$\pm $|0.08120.8958|$\pm $|0.0744
MNDR0.9187|$\pm $|0.04450.9081|$\pm $|0.05600.9108|$\pm $|0.05040.9151|$\pm $|0.04950.9216|$\pm $|0.0453
RecalllncRNADisease0.6894|$\pm $|0.15200.7625|$\pm $|0.15920.6482|$\pm $|0.18720.7040|$\pm $|0.15180.7214|$\pm $|0.1518
MNDR0.8234|$\pm $|0.09200.7812|$\pm $|0.13390.8259|$\pm $|0.08490.8141|$\pm $|0.12880.8346|$\pm $|0.0834
AccuracylncRNADisease0.7812|$\pm $|0.07730.7749|$\pm $|0.07070.7762|$\pm $|0.08800.7872|$\pm $|0.06620.8150|$\pm $|0.0744
MNDR0.8745|$\pm $|0.04980.8483|$\pm $|0.06050.8707|$\pm $|0.04390.8678|$\pm $|0.06390.8799|$\pm $|0.0374
F1-scorelncRNADisease0.7503|$\pm $|0.11180.7645|$\pm $|0.09740.7260|$\pm $|0.15660.7596|$\pm $|0.09720.7869|$\pm $|0.1092
MNDR0.8654|$\pm $|0.05940.8313|$\pm $|0.08340.8627|$\pm $|0.05460.8540|$\pm $|0.09450.8723|$\pm $|0.0463
AUClncRNADisease0.8597|$\pm $|0.06460.8257|$\pm $|0.08440.8754|$\pm $|0.06870.8736|$\pm $|0.06300.8924|$\pm $|0.0666
MNDR0.9454|$\pm $|0.03270.9025|$\pm $|0.04920.9436|$\pm $|0.03350.9448|$\pm $|0.041250.9576|$\pm $|0.0218
AUPRlncRNADisease0.8669|$\pm $|0.06900.8611|$\pm $|0.06770.8893|$\pm $|0.06650.8804|$\pm $|0.05780.9056|$\pm $|0.0579
MNDR0.9501|$\pm $|0.03070.9248|$\pm $|0.03650.9488|$\pm $|0.02920.9490|$\pm $|0.03700.9621|$\pm $|0.0194
Table 9

The LDA prediction performance comparison of five boosting models under |$CV_{ind}$|

DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8531|$\pm $|0.08490.7961|$\pm $|0.09450.8797|$\pm $|0.08620.8555|$\pm $|0.08120.8958|$\pm $|0.0744
MNDR0.9187|$\pm $|0.04450.9081|$\pm $|0.05600.9108|$\pm $|0.05040.9151|$\pm $|0.04950.9216|$\pm $|0.0453
RecalllncRNADisease0.6894|$\pm $|0.15200.7625|$\pm $|0.15920.6482|$\pm $|0.18720.7040|$\pm $|0.15180.7214|$\pm $|0.1518
MNDR0.8234|$\pm $|0.09200.7812|$\pm $|0.13390.8259|$\pm $|0.08490.8141|$\pm $|0.12880.8346|$\pm $|0.0834
AccuracylncRNADisease0.7812|$\pm $|0.07730.7749|$\pm $|0.07070.7762|$\pm $|0.08800.7872|$\pm $|0.06620.8150|$\pm $|0.0744
MNDR0.8745|$\pm $|0.04980.8483|$\pm $|0.06050.8707|$\pm $|0.04390.8678|$\pm $|0.06390.8799|$\pm $|0.0374
F1-scorelncRNADisease0.7503|$\pm $|0.11180.7645|$\pm $|0.09740.7260|$\pm $|0.15660.7596|$\pm $|0.09720.7869|$\pm $|0.1092
MNDR0.8654|$\pm $|0.05940.8313|$\pm $|0.08340.8627|$\pm $|0.05460.8540|$\pm $|0.09450.8723|$\pm $|0.0463
AUClncRNADisease0.8597|$\pm $|0.06460.8257|$\pm $|0.08440.8754|$\pm $|0.06870.8736|$\pm $|0.06300.8924|$\pm $|0.0666
MNDR0.9454|$\pm $|0.03270.9025|$\pm $|0.04920.9436|$\pm $|0.03350.9448|$\pm $|0.041250.9576|$\pm $|0.0218
AUPRlncRNADisease0.8669|$\pm $|0.06900.8611|$\pm $|0.06770.8893|$\pm $|0.06650.8804|$\pm $|0.05780.9056|$\pm $|0.0579
MNDR0.9501|$\pm $|0.03070.9248|$\pm $|0.03650.9488|$\pm $|0.02920.9490|$\pm $|0.03700.9621|$\pm $|0.0194
DatasetXGBoostAdaBoostCatBoostLightGBMLDA-VGHB
PrecisionlncRNADisease0.8531|$\pm $|0.08490.7961|$\pm $|0.09450.8797|$\pm $|0.08620.8555|$\pm $|0.08120.8958|$\pm $|0.0744
MNDR0.9187|$\pm $|0.04450.9081|$\pm $|0.05600.9108|$\pm $|0.05040.9151|$\pm $|0.04950.9216|$\pm $|0.0453
RecalllncRNADisease0.6894|$\pm $|0.15200.7625|$\pm $|0.15920.6482|$\pm $|0.18720.7040|$\pm $|0.15180.7214|$\pm $|0.1518
MNDR0.8234|$\pm $|0.09200.7812|$\pm $|0.13390.8259|$\pm $|0.08490.8141|$\pm $|0.12880.8346|$\pm $|0.0834
AccuracylncRNADisease0.7812|$\pm $|0.07730.7749|$\pm $|0.07070.7762|$\pm $|0.08800.7872|$\pm $|0.06620.8150|$\pm $|0.0744
MNDR0.8745|$\pm $|0.04980.8483|$\pm $|0.06050.8707|$\pm $|0.04390.8678|$\pm $|0.06390.8799|$\pm $|0.0374
F1-scorelncRNADisease0.7503|$\pm $|0.11180.7645|$\pm $|0.09740.7260|$\pm $|0.15660.7596|$\pm $|0.09720.7869|$\pm $|0.1092
MNDR0.8654|$\pm $|0.05940.8313|$\pm $|0.08340.8627|$\pm $|0.05460.8540|$\pm $|0.09450.8723|$\pm $|0.0463
AUClncRNADisease0.8597|$\pm $|0.06460.8257|$\pm $|0.08440.8754|$\pm $|0.06870.8736|$\pm $|0.06300.8924|$\pm $|0.0666
MNDR0.9454|$\pm $|0.03270.9025|$\pm $|0.04920.9436|$\pm $|0.03350.9448|$\pm $|0.041250.9576|$\pm $|0.0218
AUPRlncRNADisease0.8669|$\pm $|0.06900.8611|$\pm $|0.06770.8893|$\pm $|0.06650.8804|$\pm $|0.05780.9056|$\pm $|0.0579
MNDR0.9501|$\pm $|0.03070.9248|$\pm $|0.03650.9488|$\pm $|0.02920.9490|$\pm $|0.03700.9621|$\pm $|0.0194

The other performance comparison

In the proposed LDA-VGHB model, linear and nonlinear features were extracted to represent each lncRNA–disease pair based on SVD and VGAE, respectively. We analyzed their affects on the LDA identification performance. Figure 3 demonstrates the LDA-VGHB performance on two LDA datasets under four different 5-fold CVs when using linear features, nonlinear features or their combination. In most cases, the combination of linear features and nonlinear features improved the LDA prediction performance.

Affects of linear features, nonlinear features and their combination on performance. A–D denote the performance of LDA-VGHB when using the three types of features on the lncRNADisease database under $CV_l$, $CV_d$, $CV_{ld}$ and $CV_{ind}$, respectively. E–H denote the performance of LDA-VGHB when using the three types of features on the MNDR database under $CV_l$, $CV_d$, $CV_{ld}$ and $CV_{ind}$, respectively.
Figure 3

Affects of linear features, nonlinear features and their combination on performance. A–D denote the performance of LDA-VGHB when using the three types of features on the lncRNADisease database under |$CV_l$|⁠, |$CV_d$|⁠, |$CV_{ld}$| and |$CV_{ind}$|⁠, respectively. E–H denote the performance of LDA-VGHB when using the three types of features on the MNDR database under |$CV_l$|⁠, |$CV_d$|⁠, |$CV_{ld}$| and |$CV_{ind}$|⁠, respectively.

The parameter |$\alpha $| was used to measure the importance on the LDA identification performance. Thus, we analyzed the affect of the parameter |$\alpha $| at the range of [0,1] with the stepsize of 0 on the LDA prediction performance. As shown in Figure 4, when |$\alpha $| was set to 0.5, LDA-VGHB computed the best AUC and AUPR on the lncRNADisease and MNDR datasets under the four different CVs. Consequently, we set |$\alpha $| to 0.5. Tables S5-S8 in Supplementary Materials show the LDA-VGHB performance based on different |$\alpha $| under the four different 5-fold CVs.

The affect of the parameter $\alpha $ on the LDA prediction performance. A-B, C-D, E-F and G-H denote AUC and AUPR of LDA-VGHB based on different $\alpha $ values on the lncRNADisease and MNDR databases under $CV_l$, $CV_d$, $CV_{ld}$ and $CV_{ind}$, respectively.
Figure 4

The affect of the parameter |$\alpha $| on the LDA prediction performance. A-B, C-D, E-F and G-H denote AUC and AUPR of LDA-VGHB based on different |$\alpha $| values on the lncRNADisease and MNDR databases under |$CV_l$|⁠, |$CV_d$|⁠, |$CV_{ld}$| and |$CV_{ind}$|⁠, respectively.

The obtained feature dimensions of lncRNAs and diseases are unknown. However, we failed to implement dimension reduction because they were not high-dimensional. Consequently, we analyzed affects of different dimensions (i.e. 5, 10, 16, 32, 50, and 64) on the LDA identification performance. By comprehensively considering six evaluation index values, as shown in Table 10, we selected different dimensions on different datasets.

Table 10

Feature dimensions under four different 5-fold and 10-fold CVs

CVDatasetLinearNonlinear
|$CV_l$|lncRNADisease3232
MNDR3232
|$CV_d$|lncRNADisease1010
MNDR1616
|$CV_{ld}$|lncRNADisease1616
MNDR1010
|$CV_{ind}$|lncRNADisease1010
MNDR5050
CVDatasetLinearNonlinear
|$CV_l$|lncRNADisease3232
MNDR3232
|$CV_d$|lncRNADisease1010
MNDR1616
|$CV_{ld}$|lncRNADisease1616
MNDR1010
|$CV_{ind}$|lncRNADisease1010
MNDR5050
Table 10

Feature dimensions under four different 5-fold and 10-fold CVs

CVDatasetLinearNonlinear
|$CV_l$|lncRNADisease3232
MNDR3232
|$CV_d$|lncRNADisease1010
MNDR1616
|$CV_{ld}$|lncRNADisease1616
MNDR1010
|$CV_{ind}$|lncRNADisease1010
MNDR5050
CVDatasetLinearNonlinear
|$CV_l$|lncRNADisease3232
MNDR3232
|$CV_d$|lncRNADisease1010
MNDR1616
|$CV_{ld}$|lncRNADisease1616
MNDR1010
|$CV_{ind}$|lncRNADisease1010
MNDR5050

Case Study

Lung cancer, breast cancer, colorectal cancer and kidney cancer are four of the most frequent cancers worldwide. They demonstrate high morbidity and mortality. In the above section, we have verified the LDA-VGHB performance. Subsequently, we selected LDA-VGHB to identify potential lncRNAs for the four cancers. Figure 5 illustrates the top 20 lncRNAs associated with the four cancers on the lncRNADisease and MNDR databases. Tables 1114 list the rankings of the top 20 lncRNAs according to the association scores between them and a query cancer, respectively.

The predicted top 20 lncRNAs associated with lung cancer (A and B), breast cancer (C and D), colorectal cancer (E and F) and kidney neoplasms (G and H) on the lncRNADisease and MNDR databases. The solid line and dashed line denote a predicted LDA that can be validated and can not be validated.
Figure 5

The predicted top 20 lncRNAs associated with lung cancer (A and B), breast cancer (C and D), colorectal cancer (E and F) and kidney neoplasms (G and H) on the lncRNADisease and MNDR databases. The solid line and dashed line denote a predicted LDA that can be validated and can not be validated.

Table 11

The predicted top 20 lncRNAs associated with lung cancer on lncRNADisease and MNDR

lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1PSORS1C3RNADisease1HAR1AUnknown
2HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.02BOK-AS1Unknown
3DSCAM-AS1Lnc2Cancer 3.0, RNADisease3SNHG3Lnc2Cancer 3.0, RNADisease
4WT1-ASRNADisease4KCNQ1DNUnknown
5DAOA-AS1Unknown5IGF2-ASLnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
6SNHG16Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.06KCNQ1OT1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
7NAMAUnknown7DNM3OSUnknown
8HCP5Lnc2Cancer 3.0, RNADisease8HULCLnc2Cancer 3.0, RNADisease
9GHET1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.09LINC00271Unknown
10KCNQ1DNUnknown10LINC00162Unknown
11EPB41L4A-AS1Unknown11EPB41L4A-AS1Unknown
12WRAP53Unknown12ESRGUnknown
13MIR31HGLnc2Cancer 3.0, RNADisease, lncRNADisease v2.013LINC00032Unknown
14DANCRLnc2Cancer 3.0, RNADisease, lncRNADisease v2.014IFNG-AS1Unknown
15IFNG-AS1Unknown15GHET1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
16HAR1AUnknown16HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
17BC040587Unknown17ATXN8OSUnknown
18BACE1-ASUnknown18WRAP53Unknown
19BCAR4Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.019ZFAT-AS1Unknown
20DISC2Unknown20TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1PSORS1C3RNADisease1HAR1AUnknown
2HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.02BOK-AS1Unknown
3DSCAM-AS1Lnc2Cancer 3.0, RNADisease3SNHG3Lnc2Cancer 3.0, RNADisease
4WT1-ASRNADisease4KCNQ1DNUnknown
5DAOA-AS1Unknown5IGF2-ASLnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
6SNHG16Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.06KCNQ1OT1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
7NAMAUnknown7DNM3OSUnknown
8HCP5Lnc2Cancer 3.0, RNADisease8HULCLnc2Cancer 3.0, RNADisease
9GHET1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.09LINC00271Unknown
10KCNQ1DNUnknown10LINC00162Unknown
11EPB41L4A-AS1Unknown11EPB41L4A-AS1Unknown
12WRAP53Unknown12ESRGUnknown
13MIR31HGLnc2Cancer 3.0, RNADisease, lncRNADisease v2.013LINC00032Unknown
14DANCRLnc2Cancer 3.0, RNADisease, lncRNADisease v2.014IFNG-AS1Unknown
15IFNG-AS1Unknown15GHET1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
16HAR1AUnknown16HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
17BC040587Unknown17ATXN8OSUnknown
18BACE1-ASUnknown18WRAP53Unknown
19BCAR4Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.019ZFAT-AS1Unknown
20DISC2Unknown20TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
Table 11

The predicted top 20 lncRNAs associated with lung cancer on lncRNADisease and MNDR

lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1PSORS1C3RNADisease1HAR1AUnknown
2HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.02BOK-AS1Unknown
3DSCAM-AS1Lnc2Cancer 3.0, RNADisease3SNHG3Lnc2Cancer 3.0, RNADisease
4WT1-ASRNADisease4KCNQ1DNUnknown
5DAOA-AS1Unknown5IGF2-ASLnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
6SNHG16Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.06KCNQ1OT1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
7NAMAUnknown7DNM3OSUnknown
8HCP5Lnc2Cancer 3.0, RNADisease8HULCLnc2Cancer 3.0, RNADisease
9GHET1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.09LINC00271Unknown
10KCNQ1DNUnknown10LINC00162Unknown
11EPB41L4A-AS1Unknown11EPB41L4A-AS1Unknown
12WRAP53Unknown12ESRGUnknown
13MIR31HGLnc2Cancer 3.0, RNADisease, lncRNADisease v2.013LINC00032Unknown
14DANCRLnc2Cancer 3.0, RNADisease, lncRNADisease v2.014IFNG-AS1Unknown
15IFNG-AS1Unknown15GHET1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
16HAR1AUnknown16HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
17BC040587Unknown17ATXN8OSUnknown
18BACE1-ASUnknown18WRAP53Unknown
19BCAR4Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.019ZFAT-AS1Unknown
20DISC2Unknown20TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1PSORS1C3RNADisease1HAR1AUnknown
2HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.02BOK-AS1Unknown
3DSCAM-AS1Lnc2Cancer 3.0, RNADisease3SNHG3Lnc2Cancer 3.0, RNADisease
4WT1-ASRNADisease4KCNQ1DNUnknown
5DAOA-AS1Unknown5IGF2-ASLnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
6SNHG16Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.06KCNQ1OT1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
7NAMAUnknown7DNM3OSUnknown
8HCP5Lnc2Cancer 3.0, RNADisease8HULCLnc2Cancer 3.0, RNADisease
9GHET1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.09LINC00271Unknown
10KCNQ1DNUnknown10LINC00162Unknown
11EPB41L4A-AS1Unknown11EPB41L4A-AS1Unknown
12WRAP53Unknown12ESRGUnknown
13MIR31HGLnc2Cancer 3.0, RNADisease, lncRNADisease v2.013LINC00032Unknown
14DANCRLnc2Cancer 3.0, RNADisease, lncRNADisease v2.014IFNG-AS1Unknown
15IFNG-AS1Unknown15GHET1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
16HAR1AUnknown16HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
17BC040587Unknown17ATXN8OSUnknown
18BACE1-ASUnknown18WRAP53Unknown
19BCAR4Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.019ZFAT-AS1Unknown
20DISC2Unknown20TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
Table 12

The predicted top 20 lncRNAs associated with breast cancerdraftrulesdr on lncRNADisease and MNDR

lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1WRAP53Unknown1ZFAT-AS1PMID: 21460236
2ATXN8OSLnc2Cancer 3.0, RNADisease2HAR1APMID: 26942882
3DNM3OSRNADisease3BOK-AS1Unknown
4ATP6V1G2-DDX39BUnknown4RRP1BUnknown
5CBR3-AS1Lnc2Cancer 3.0, RNADisease5SCAANT1Unknown
6DAOA-AS1Unknown6KCNQ1DNUnknown
77SKUnknown7IGF2-ASRNADisease
8DLEU1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.08TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
9DGCR5RNADisease9PTENP1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
10SNHG3Lnc2Cancer 3.0, RNADisease10DNM3OSUnknown
11SNHG4Unknown11HULCLnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
12HCP5Lnc2Cancer 3.012LINC00162Unknown
13TCL6Lnc2Cancer 3.013EPB41L4A-AS1Lnc2Cancer 3.0, RNADisease
14KCNQ1DNUnknown14ESRGUnknown
15HAR1BUnknown15LINC00032Unknown
16HNF1A-AS1RNADisease16CASC2Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
17PINK1-ASlncRNADisease v2.0, RNADisease17GHET1Lnc2Cancer 3.0, RNADisease
18IGF2-ASRNADisease18HIF1A-AS1Unknown
19HIF1A-AS1Unknown19ATXN8OSUnknown
20PSORS1C3Unknown20MIR31HGLnc2Cancer 3.0, RNADisease
lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1WRAP53Unknown1ZFAT-AS1PMID: 21460236
2ATXN8OSLnc2Cancer 3.0, RNADisease2HAR1APMID: 26942882
3DNM3OSRNADisease3BOK-AS1Unknown
4ATP6V1G2-DDX39BUnknown4RRP1BUnknown
5CBR3-AS1Lnc2Cancer 3.0, RNADisease5SCAANT1Unknown
6DAOA-AS1Unknown6KCNQ1DNUnknown
77SKUnknown7IGF2-ASRNADisease
8DLEU1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.08TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
9DGCR5RNADisease9PTENP1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
10SNHG3Lnc2Cancer 3.0, RNADisease10DNM3OSUnknown
11SNHG4Unknown11HULCLnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
12HCP5Lnc2Cancer 3.012LINC00162Unknown
13TCL6Lnc2Cancer 3.013EPB41L4A-AS1Lnc2Cancer 3.0, RNADisease
14KCNQ1DNUnknown14ESRGUnknown
15HAR1BUnknown15LINC00032Unknown
16HNF1A-AS1RNADisease16CASC2Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
17PINK1-ASlncRNADisease v2.0, RNADisease17GHET1Lnc2Cancer 3.0, RNADisease
18IGF2-ASRNADisease18HIF1A-AS1Unknown
19HIF1A-AS1Unknown19ATXN8OSUnknown
20PSORS1C3Unknown20MIR31HGLnc2Cancer 3.0, RNADisease
Table 12

The predicted top 20 lncRNAs associated with breast cancerdraftrulesdr on lncRNADisease and MNDR

lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1WRAP53Unknown1ZFAT-AS1PMID: 21460236
2ATXN8OSLnc2Cancer 3.0, RNADisease2HAR1APMID: 26942882
3DNM3OSRNADisease3BOK-AS1Unknown
4ATP6V1G2-DDX39BUnknown4RRP1BUnknown
5CBR3-AS1Lnc2Cancer 3.0, RNADisease5SCAANT1Unknown
6DAOA-AS1Unknown6KCNQ1DNUnknown
77SKUnknown7IGF2-ASRNADisease
8DLEU1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.08TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
9DGCR5RNADisease9PTENP1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
10SNHG3Lnc2Cancer 3.0, RNADisease10DNM3OSUnknown
11SNHG4Unknown11HULCLnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
12HCP5Lnc2Cancer 3.012LINC00162Unknown
13TCL6Lnc2Cancer 3.013EPB41L4A-AS1Lnc2Cancer 3.0, RNADisease
14KCNQ1DNUnknown14ESRGUnknown
15HAR1BUnknown15LINC00032Unknown
16HNF1A-AS1RNADisease16CASC2Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
17PINK1-ASlncRNADisease v2.0, RNADisease17GHET1Lnc2Cancer 3.0, RNADisease
18IGF2-ASRNADisease18HIF1A-AS1Unknown
19HIF1A-AS1Unknown19ATXN8OSUnknown
20PSORS1C3Unknown20MIR31HGLnc2Cancer 3.0, RNADisease
lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1WRAP53Unknown1ZFAT-AS1PMID: 21460236
2ATXN8OSLnc2Cancer 3.0, RNADisease2HAR1APMID: 26942882
3DNM3OSRNADisease3BOK-AS1Unknown
4ATP6V1G2-DDX39BUnknown4RRP1BUnknown
5CBR3-AS1Lnc2Cancer 3.0, RNADisease5SCAANT1Unknown
6DAOA-AS1Unknown6KCNQ1DNUnknown
77SKUnknown7IGF2-ASRNADisease
8DLEU1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.08TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
9DGCR5RNADisease9PTENP1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
10SNHG3Lnc2Cancer 3.0, RNADisease10DNM3OSUnknown
11SNHG4Unknown11HULCLnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
12HCP5Lnc2Cancer 3.012LINC00162Unknown
13TCL6Lnc2Cancer 3.013EPB41L4A-AS1Lnc2Cancer 3.0, RNADisease
14KCNQ1DNUnknown14ESRGUnknown
15HAR1BUnknown15LINC00032Unknown
16HNF1A-AS1RNADisease16CASC2Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
17PINK1-ASlncRNADisease v2.0, RNADisease17GHET1Lnc2Cancer 3.0, RNADisease
18IGF2-ASRNADisease18HIF1A-AS1Unknown
19HIF1A-AS1Unknown19ATXN8OSUnknown
20PSORS1C3Unknown20MIR31HGLnc2Cancer 3.0, RNADisease
Table 13

The predicted top 20 lncRNAs associated with colorectal cancer on lncRNADisease and MNDR

lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1ZFAT-AS1Unknown1ESRGPMID: 34896077, 31905146
2WRAP53Unknown2DGCR5Lnc2Cancer 3.0, RNADisease
3HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.03KCNQ1DNUnknown
4DSCAM-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.04BOK-AS1Unknown
5DAOA-AS1Unknown5WRAP53Unknown
6SNHG3Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.06DISC2Unknown
7SNHG4RNADisease7ATP6V1G2-DDX39BUnknown
8HCP5Lnc2Cancer 3.0, RNADisease8DNM3OSUnknown
9KCNQ1DNUnknown9HAR1Aunknown
10EPB41L4A-AS1RNADisease10IGF2-ASUnknown
11WT1-ASUnknown11HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
12TCL6Unknown12LINC00162Unknown
13IFNG-AS1Unknown13LINC00032Unknown
14HAR1Aunknown14SRA1Unknown
15SNHG11Lnc2Cancer 3.0, RNADisease15EPB41L4A-AS1RNADisease
16BC040587Unknown16PTENP1Unknown
17BACE1-ASUnknown17NRONUnknown
18DISC2Unknown18DLEU1Lnc2Cancer 3.0, RNADisease
19HNF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0197SKUnknown
20DNM3OSUnknown20ZFAT-AS1Unknown
lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1ZFAT-AS1Unknown1ESRGPMID: 34896077, 31905146
2WRAP53Unknown2DGCR5Lnc2Cancer 3.0, RNADisease
3HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.03KCNQ1DNUnknown
4DSCAM-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.04BOK-AS1Unknown
5DAOA-AS1Unknown5WRAP53Unknown
6SNHG3Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.06DISC2Unknown
7SNHG4RNADisease7ATP6V1G2-DDX39BUnknown
8HCP5Lnc2Cancer 3.0, RNADisease8DNM3OSUnknown
9KCNQ1DNUnknown9HAR1Aunknown
10EPB41L4A-AS1RNADisease10IGF2-ASUnknown
11WT1-ASUnknown11HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
12TCL6Unknown12LINC00162Unknown
13IFNG-AS1Unknown13LINC00032Unknown
14HAR1Aunknown14SRA1Unknown
15SNHG11Lnc2Cancer 3.0, RNADisease15EPB41L4A-AS1RNADisease
16BC040587Unknown16PTENP1Unknown
17BACE1-ASUnknown17NRONUnknown
18DISC2Unknown18DLEU1Lnc2Cancer 3.0, RNADisease
19HNF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0197SKUnknown
20DNM3OSUnknown20ZFAT-AS1Unknown
Table 13

The predicted top 20 lncRNAs associated with colorectal cancer on lncRNADisease and MNDR

lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1ZFAT-AS1Unknown1ESRGPMID: 34896077, 31905146
2WRAP53Unknown2DGCR5Lnc2Cancer 3.0, RNADisease
3HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.03KCNQ1DNUnknown
4DSCAM-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.04BOK-AS1Unknown
5DAOA-AS1Unknown5WRAP53Unknown
6SNHG3Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.06DISC2Unknown
7SNHG4RNADisease7ATP6V1G2-DDX39BUnknown
8HCP5Lnc2Cancer 3.0, RNADisease8DNM3OSUnknown
9KCNQ1DNUnknown9HAR1Aunknown
10EPB41L4A-AS1RNADisease10IGF2-ASUnknown
11WT1-ASUnknown11HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
12TCL6Unknown12LINC00162Unknown
13IFNG-AS1Unknown13LINC00032Unknown
14HAR1Aunknown14SRA1Unknown
15SNHG11Lnc2Cancer 3.0, RNADisease15EPB41L4A-AS1RNADisease
16BC040587Unknown16PTENP1Unknown
17BACE1-ASUnknown17NRONUnknown
18DISC2Unknown18DLEU1Lnc2Cancer 3.0, RNADisease
19HNF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0197SKUnknown
20DNM3OSUnknown20ZFAT-AS1Unknown
lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1ZFAT-AS1Unknown1ESRGPMID: 34896077, 31905146
2WRAP53Unknown2DGCR5Lnc2Cancer 3.0, RNADisease
3HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.03KCNQ1DNUnknown
4DSCAM-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.04BOK-AS1Unknown
5DAOA-AS1Unknown5WRAP53Unknown
6SNHG3Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.06DISC2Unknown
7SNHG4RNADisease7ATP6V1G2-DDX39BUnknown
8HCP5Lnc2Cancer 3.0, RNADisease8DNM3OSUnknown
9KCNQ1DNUnknown9HAR1Aunknown
10EPB41L4A-AS1RNADisease10IGF2-ASUnknown
11WT1-ASUnknown11HIF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
12TCL6Unknown12LINC00162Unknown
13IFNG-AS1Unknown13LINC00032Unknown
14HAR1Aunknown14SRA1Unknown
15SNHG11Lnc2Cancer 3.0, RNADisease15EPB41L4A-AS1RNADisease
16BC040587Unknown16PTENP1Unknown
17BACE1-ASUnknown17NRONUnknown
18DISC2Unknown18DLEU1Lnc2Cancer 3.0, RNADisease
19HNF1A-AS1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0197SKUnknown
20DNM3OSUnknown20ZFAT-AS1Unknown
Table 14

The predicted top 20 lncRNAs associated with kidney neoplasms on lncRNADisease and MNDR

lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1CBR3-AS1Unknown1SRA1Unknown
2BOK-AS1Unknown2DLEU1Lnc2Cancer 3.0, RNADisease
3IFNG-AS1Unknown3CCAT1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
4BC040587Unknown4HCP5Unknown
5GHET1Unknown5HAR1BUnknown
6HULCUnknown6DISC2Unknown
7HAR1BUnknown7SNHG3Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
8DSCAM-AS1Unknown8TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
9HCP5Unknown9UCA1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
10SNHG16Lnc2Cancer 3.0, RNADisease10LINC00271Unknown
11WRAP53Unknown11ESRGUnknown
12RMSTUnknown12IFNG-AS1unknow
13SNHG11RNADisease13SNHG11RNADisease
14BCYRN1Unknown14SNHG16Lnc2Cancer 3.0, RNADisease
15PDZRN3-AS1Unknown15WRAP53Unknown
16TERCUnknown16SCAANT1Unknown
17TRAF3IP2-AS1RNADisease, lncRNADiseasev2.017SPRY4-IT1Lnc2Cancer 3.0, RNADisease
18WT1-ASUnknown18TRAF3IP2-AS1RNADisease, lncRNADisease v2.0
19XISTUnknown19GHET1Unknown
20HNF1A-AS1Unknown20LINC00162Unknown
lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1CBR3-AS1Unknown1SRA1Unknown
2BOK-AS1Unknown2DLEU1Lnc2Cancer 3.0, RNADisease
3IFNG-AS1Unknown3CCAT1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
4BC040587Unknown4HCP5Unknown
5GHET1Unknown5HAR1BUnknown
6HULCUnknown6DISC2Unknown
7HAR1BUnknown7SNHG3Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
8DSCAM-AS1Unknown8TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
9HCP5Unknown9UCA1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
10SNHG16Lnc2Cancer 3.0, RNADisease10LINC00271Unknown
11WRAP53Unknown11ESRGUnknown
12RMSTUnknown12IFNG-AS1unknow
13SNHG11RNADisease13SNHG11RNADisease
14BCYRN1Unknown14SNHG16Lnc2Cancer 3.0, RNADisease
15PDZRN3-AS1Unknown15WRAP53Unknown
16TERCUnknown16SCAANT1Unknown
17TRAF3IP2-AS1RNADisease, lncRNADiseasev2.017SPRY4-IT1Lnc2Cancer 3.0, RNADisease
18WT1-ASUnknown18TRAF3IP2-AS1RNADisease, lncRNADisease v2.0
19XISTUnknown19GHET1Unknown
20HNF1A-AS1Unknown20LINC00162Unknown
Table 14

The predicted top 20 lncRNAs associated with kidney neoplasms on lncRNADisease and MNDR

lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1CBR3-AS1Unknown1SRA1Unknown
2BOK-AS1Unknown2DLEU1Lnc2Cancer 3.0, RNADisease
3IFNG-AS1Unknown3CCAT1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
4BC040587Unknown4HCP5Unknown
5GHET1Unknown5HAR1BUnknown
6HULCUnknown6DISC2Unknown
7HAR1BUnknown7SNHG3Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
8DSCAM-AS1Unknown8TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
9HCP5Unknown9UCA1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
10SNHG16Lnc2Cancer 3.0, RNADisease10LINC00271Unknown
11WRAP53Unknown11ESRGUnknown
12RMSTUnknown12IFNG-AS1unknow
13SNHG11RNADisease13SNHG11RNADisease
14BCYRN1Unknown14SNHG16Lnc2Cancer 3.0, RNADisease
15PDZRN3-AS1Unknown15WRAP53Unknown
16TERCUnknown16SCAANT1Unknown
17TRAF3IP2-AS1RNADisease, lncRNADiseasev2.017SPRY4-IT1Lnc2Cancer 3.0, RNADisease
18WT1-ASUnknown18TRAF3IP2-AS1RNADisease, lncRNADisease v2.0
19XISTUnknown19GHET1Unknown
20HNF1A-AS1Unknown20LINC00162Unknown
lncRNADiseaseMNDR
RanklncRNAEvidenceRanklncRNAEvidence
1CBR3-AS1Unknown1SRA1Unknown
2BOK-AS1Unknown2DLEU1Lnc2Cancer 3.0, RNADisease
3IFNG-AS1Unknown3CCAT1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
4BC040587Unknown4HCP5Unknown
5GHET1Unknown5HAR1BUnknown
6HULCUnknown6DISC2Unknown
7HAR1BUnknown7SNHG3Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
8DSCAM-AS1Unknown8TUG1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
9HCP5Unknown9UCA1Lnc2Cancer 3.0, RNADisease, lncRNADisease v2.0
10SNHG16Lnc2Cancer 3.0, RNADisease10LINC00271Unknown
11WRAP53Unknown11ESRGUnknown
12RMSTUnknown12IFNG-AS1unknow
13SNHG11RNADisease13SNHG11RNADisease
14BCYRN1Unknown14SNHG16Lnc2Cancer 3.0, RNADisease
15PDZRN3-AS1Unknown15WRAP53Unknown
16TERCUnknown16SCAANT1Unknown
17TRAF3IP2-AS1RNADisease, lncRNADiseasev2.017SPRY4-IT1Lnc2Cancer 3.0, RNADisease
18WT1-ASUnknown18TRAF3IP2-AS1RNADisease, lncRNADisease v2.0
19XISTUnknown19GHET1Unknown
20HNF1A-AS1Unknown20LINC00162Unknown

In the lncRNADisease and MNDR databases, 10 and 7 lncRNAs have been verified by existing databases (Lnc2Cancer 3.0 [88], lncRNADisease v2.0 [29], RNADisease [89]) among the predicted top 20 lncRNAs associated with lung cancer, respectively. We predicted that HAR1A may associate with lung cancer with the rankings of 16 and 1 on the two databases, respectively. lncRNA HAR1A may be a tumor suppressor in many cancer including oral cancer, hepatocellular carcinoma, brease cancer and glioma [90–92]. Its knockdown boosted ALPK1 expression and downregulated BRD7, and further induce the progression of oral cancer [93]. Its expression has been examined in glioma, and was obviously lower in hepatocellular cancer than chronic hepatitis B [90, 91]. Its decreased expression could involve in poor prognosis of hepatocellular cancer [90]. Thus, we predicted that HAR1A could associate with lung cancer and need further experimental validation.

In the lncRNADisease and MNDR databases, 11 and 8 lncRNAs have been verified by existing three databases (Lnc2Cancer 3.0, lncRNADisease v2.0, RNADisease) among the predicted top 20 lncRNAs associated with breast cancer, respectively. We inferred that KCNQ1DN may associate with breast cancer with the rankings of 14 and 6, respectively. KCNQ1DN is an lncRNA with 1109 nucleotides. It is downregulated in the renalcell carcinoma tissues and could inhibit growth and progression of renalcell carcinoma cells [94]. Xin et al. [95] detected its expression in Wihns’ tumors and found that it may link with Wilms’ tumorigenesis along with IGF2.

In the lncRNADisease and MNDR databases, eight and four lncRNAs have been verified by three publicly available databases (Lnc2Cancer 3.0, lncRNADisease v2.0, RNADisease) among the predicted top 20 lncRNAs associated with colorectal cancer, respectively. In the two databases, we found that ZFAT-AS1 could associate with colorectal cancer. lncRNA ZFAT-AS1 is an antisense transcript of gene ZFAT, which encodes a protein that functions as a transcriptional regulator with respect to apoptosis and cell survival [96]. ZFAT-AS1 is prominently downregulated in glioma. Its over-expression could inhibit proliferation, migration and invasion, and accelerate apoptosis in glioma [97–99]. Its expression is downregulated in breast cancer [100], upregulated in hepatocellular, gastric, bladder and ovarian cancers and dysregulated in multiple malignant tumors [101]. In general, ZFAT-AS1 acts as a tumor-suppressive gene in many cancers including colorectal cancer and need further in vivo or in vitro experimental validation.

In the lncRNADisease and MNDR databases, two and 9nine lncRNAs have been reported by three publicly available databases (Lnc2Cancer 3.0, lncRNADisease v2.0, RNADisease) among the predicted top 20 lncRNAs associated with kidney neoplasms, respectively. In the two databases, we inferred that HAR1B could associate with kidney neoplasms. HAR1B helps the formation of stable RNA structures in human body [102]. Its expression is obviously lower in the hepatocellular carcinoma patients [90]. It can serve as a potential biomarker in bone and soft-tissue sarcomas [103]. Deregulated HAR1B has a greatly higher expression profile in aggressive colorectal cancers [104]. The association between kidney neoplasms and HAR1B needs further experimental confirmation.

CONCLUSION

In this study, we developed a computational model LDA-VGHB to investigate underlying LDAs. LDA-VGHB first extracted features of each lncRNA–disease pair by incorporating similarity computation, linear feature extraction based on SVD and nonlinear feature extraction based on VGAE. Subsequently, it used a heterogeneous Newton boosting machine to classify unobserved lncRNA–disease pairs. LDA-VGHB was compared with the other four classical LDA prediction methods (i.e. SDLDA [76], LDNFSGB [77], IPCARF [78] and LDASR [79]) and four popular boosting models (i.e. XGBoost [80], AdaBoost [81], CatBoost [82] and LightGBM [83]) under four 5-fold CVs on lncRNAs, diseases, lncRNA–disease pairs and independent lncRNAs and independent diseases, respectively. It significantly outperformed the eight methods with its best performance on the lncRNADisease and MNDR databases under the four different CVs. We further conducted case studies for lung cancer, breast cancer, colorectal cancer and kidney neoplasms and predicted the top 20 lncRNAs associated with them among all their unobserved lncRNAs. The results showed that most of the predicted top 20 lncRNAs have been verified by biomedical experiments provided by the Lnc2Cancer 3.0, lncRNADisease v2.0 and RNADisease databases. We inferred that HAR1A, KCNQ1DN, ZFAT-AS1 and HAR1B could associate with lung cancer, breast cancer, colorectal cancer and kidney neoplasms, respectively. The results need further biological experiment validation.

Key Points
  • LDA-VGHB is developed to identify potential LDAs by incorporating feature extraction based on SVD and variational graph autoencoder and LDA classification based on heterogeneous Newton boosting machine.

  • Differing from traditional CV on lncRNA–disease pairs, the LDA-VGHB performance was assessed by comparing with four classical LDA prediction methods and four popular boosting models under 5-fold CVs on lncRNAs, diseases, lncRNA–disease pairs and independent lncRNAs and independent diseases.

  • Most of the predicted top 20 lncRNAs for lung cancer, breast cancer, colorectal cancer and kidney neoplasms have been verified by biomedical experiments provided by the Lnc2Cancer 3.0, lncRNADisease v2.0 and RNADisease databases. HAR1A, KCNQ1DN, ZFAT-AS1 and HAR1B could associate with the four cancers, respectively.

ACKNOWLEDGEMENTS

We would like to thank three anonymous reviewers and all authors of the cited references.

FUNDING

L.H.P. was supported by National Natural Science Foundation of China under Grant No. 61803151 and Natural Science Foundation of Hunan Province of China under Grant 2023JJ50201. M.C. was supported by National Natural Science Foundation of China under Grant No. 62172158. G.S.H. was supported by Natural Science Foundation of Hunan Province of China Grant 2021JJ30684 and Hunan Provincial Key Research Program (Grant No. 2022WK2009).

AUTHOR CONTRIBUTION STATEMENT

L.H.P. and L.L.H.: conceptualization; L.H.P., Q.L.S., G.T., M.C. and G.S.H.: funding acquisition; L.H.P., L.L.H., Q.L.S., G.T., M.C. and G.S.H.: project administration; L.L.H.: writing-original draft; L.H.P., L.L.H. and G.S.H.: writing-review and editing; L.H.P., L.L.H., Q.L.S., G.T., M.C. and G.S.H.: investigation; L.H.P. and L.L.H.: methodology; L.L.H.: software; L.L.H., Q.L.S., G.T., M.C., G.S.H.: validation. All authors contributed to the article and approved the submitted version.

DATA AVAILABILITY STATEMENT

Datasets and codes can be downloaded at https://github.com/plhhnu/LDA-VGHB.

Author Biographies

Lihong Peng is working in Hunan University of Technology as an associate professor. She received a PhD in College of Information Science and Engineering, Hunan University, China. Her research interests include Machine Learning, Data Mining and Bioinformatics.

Liangliang Huang is a postgraduate student in the School of Computer Science, Hunan University of Technology, China. His research interests include Machine Learning and Bioinformatics.

Qiongli Su is working in the Department of Pharmacy, the Affiliated Zhuzhou Hospital Xiangya Medical College CSU. Her research interests include Cardiovascular, tumor and thrombotic diseases.

Geng Tian is the chief executive officer in Geneis (Beijing) Co. Ltd, China. His research interests include Tumor Precise Medicine and Bioinformatics.

Min Chen is working in Hunan Institute of Technology as a professor. He received a PhD in the College of Information Science and Engineering, Hunan University, China. Her research interests include Machine Learning, Data Mining and Bioinformatics.

Guosheng Han is working in Xiangtan University as an associate professor. He received a PhD in School of Mathematics and Computational Science, Xiangtan University, China. His research interests include Machine Learning, Data Mining and Bioinformatics.

References

1.

Wang
 
KC
,
Chang
 
HY
.
Molecular mechanisms of long noncoding rnas
.
Mol Cell
 
2011
;
43
(
6
):
904
14
.

2.

Fan
 
Y
,
Chen
 
M
,
Pan
 
X
.
Gcrflda: scoring lncrna-disease associations using graph convolution matrix completion with conditional random field
.
Brief Bioinform
 
2022
;
23
(
1
):
bbab361
.

3.

Schwarzmueller
 
L
,
Bril
 
O
,
Vermeulen
 
L
,
Léveillé
 
N
.
Emerging role and therapeutic potential of lncrnas in colorectal cancer
.
Cancer
 
2020
;
12
(
12
):
3843
.

4.

Wang
 
Y
,
Guoxian
 
Y
,
Wang
 
J
, et al.  
Weighted matrix factorization on multi-relational data for lncrna-disease association prediction
.
Methods
 
2020
;
173
:
32
43
.

5.

Statello
 
L
,
Guo
 
C-J
,
Chen
 
L-L
,
Huarte
 
M
.
Gene regulation by long non-coding rnas and its biological functions
.
Nat Rev Mol Cell Biol
 
2021
;
22
(
2
):
96
118
.

6.

Olivero
 
CE
,
Martínez-Terroba
 
E
,
Zimmer
 
J
, et al.  
p53 activates the long noncoding rna pvt1b to inhibit myc and suppress tumorigenesis
.
Mol Cell
 
2020
;
77
(
4
):
761
774.e8
.

7.

Qingsong
 
H
,
Ye
 
Y
,
Chan
 
L-C
, et al.  
Oncogenic lncrna downregulates cancer cell antigen presentation and intrinsic tumor suppression
.
Nat Immunol
 
2019
;
20
(
7
):
835
51
.

8.

Yao
 
J
,
Kong
 
D
,
Ye
 
C
, et al.  
The long noncoding rna ttty15, which is located on the y chromosome, promotes prostate cancer progression by sponging let-7
.
Eur Urol
 
2019
;
76
(
3
):
315
26
.

9.

Zhuo
 
W
,
Liu
 
Y
,
Li
 
S
, et al.  
Long noncoding rna gman, up-regulated in gastric cancer tissues, is associated with metastasis in patients and promotes translation of ephrin a1 by competitively binding gman-as
.
Gastroenterology
 
2019
;
156
(
3
):
676
691.e11
.

10.

Guangyuan
 
F
,
Wang
 
J
,
Domeniconi
 
C
,
Guoxian
 
Y
.
Matrix factorization-based data fusion for the prediction of lncrna–disease associations
.
Bioinformatics
 
2018
;
34
(
9
):
1529
37
.

11.

Zhuang
 
M
,
Zhao
 
S
,
Jiang
 
Z
, et al.  
Malat1 sponges mir-106b-5p to promote the invasion and metastasis of colorectal cancer via slain2 enhanced microtubules mobility
.
EBioMedicine
 
2019
;
41
:
286
98
.

12.

Wang
 
L
,
Cai
 
Y
,
Zhao
 
X
, et al.  
Down-regulated long non-coding rna h19 inhibits carcinogenesis of renal cell carcinoma
.
Neoplasma
 
2015
;
62
(
3
):
412
8
.

13.

Zhou
 
T
,
Lili
 
W
,
Ma
 
N
, et al.  
Sox9-activated farsa-as1 predetermines cell growth, stemness, and metastasis in colorectal cancer through upregulating farsa and sox9
.
Cell Death Dis
 
2020
;
11
(
12
):
1071
.

14.

Shen
 
S
,
Zhou
 
H
.
Clinical effects and molecular mechanisms of lncrna mnx1-as1 in malignant tumors
.
Am J Transl Res
 
2020
;
12
(
11
):
7593
602
.

15.

Li
 
Q
,
Dai
 
Y
,
Wang
 
F
,
Hou
 
S
.
Differentially expressed long non-coding rnas and the prognostic potential in colorectal cancer
.
Neoplasma
 
2016
;
63
(
6
):
977
83
.

16.

Amodio
 
N
,
Raimondi
 
L
,
Juli
 
G
, et al.  
Malat1: a druggable long non-coding rna for targeted anti-cancer approaches
.
J Hematol Oncol
 
2018
;
11
:
1
19
.

17.

Zheng
 
Y
,
Wang
 
M
,
Wang
 
S
, et al.  
Lncrna meg3 rs3087918 was associated with a decreased breast cancer risk in a chinese population: a case-control study
.
BMC Cancer
 
2020
;
20
(
1
):
1
8
.

18.

Liu
 
D
,
Wang
 
Y
,
Zhao
 
Y
,
Xiao
 
G
.
Lncrna snhg5 promotes nasopharyngeal carcinoma progression by regulating mir-1179/hmgb3 axis
.
BMC Cancer
 
2020
;
20
(
1
):
1
11
.

19.

Zhou
 
J-M
,
Rong Liang
 
S-Y
,
Zhu
 
HW
, et al.  
Lncrna wwc2-as1 functions as a novel competing endogenous rna in the regulation of fgf2 expression by sponging mir-16 in radiation-induced intestinal fibrosis
.
BMC Cancer
 
2019
;
19
:
1
10
.

20.

Peng
 
L
,
Tan
 
J
,
Xiong
 
W
, et al.  
Deciphering ligand–receptor-mediated intercellular communication based on ensemble deep learning and the joint scoring strategy from single-cell transcriptomic data
.
Comput Biol Med
 
2023
;
163
:
107137
.

21.

Huan
 
H
,
Feng
 
Z
,
Lin
 
H
, et al.  
Gene function and cell surface protein association analysis based on single-cell multiomics data
.
Comput Biol Med
 
2023
;
157
:
106733
.

22.

Zhang
 
P
,
Zhang
 
H
,
Hao
 
W
.
Ipro-wael: a comprehensive and robust framework for identifying promoters in multiple species
.
Nucleic Acids Res
 
2022
;
50
(
18
):
10278
89
.

23.

Peng
 
L
,
Yuan
 
R
,
Han
 
C
, et al.  
Cellenboost: a boosting-based ligand-receptor interaction identification model for cell-to-cell communication inference
.
IEEE Trans Nanobioscience
 
2023
;
22
:
705
15
.

24.

Zhou
 
X
,
Shi
 
Z
,
Yingfu
 
W
,
Zhao
 
J
, and
Hao
 
Wu.
 
schicsc: A novel single-cell hi-c clustering framework by contact-weight-based smoothing and feature fusion
. In
2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
, pages
44
50
.
IEEE
,
2022
.

25.

Chen
 
X
,
Yan
 
CC
,
Zhang
 
X
,
You
 
Z-H
.
Long non-coding rnas and complex diseases: from experimental results to computational models
.
Brief Bioinform
 
2017
;
18
(
4
):
558
76
.

26.

Chen
 
X
,
Sun
 
Y-Z
,
Guan
 
N-N
, et al.  
Computational models for lncrna function prediction and functional similarity calculation
.
Brief Funct Genomics
 
2019
;
18
(
1
):
58
82
.

27.

Sun
 
F
,
Sun
 
J
,
Zhao
 
Q
.
A deep learning method for predicting metabolite–disease associations via graph neural network
.
Brief Bioinform
 
2022
;
23
(
4
):
bbac266
.

28.

Zhang
 
P
,
Yingfu
 
W
,
Zhou
 
H
, et al.  
Clnn-loop: a deep learning model to predict ctcf-mediated chromatin loops in the different cell lines and ctcf-binding sites (cbs) pair types
.
Bioinformatics
 
2022
;
38
(
19
):
4497
504
.

29.

Bao
 
Z
,
Yang
 
Z
,
Huang
 
Z
, et al.  
Lncrnadisease 2.0: an updated database of long non-coding rna-associated diseases
.
Nucleic Acids Res
 
2019
;
47
(
D1
):
D1034
7
.

30.

Ning
 
S
,
Zhang
 
J
,
Wang
 
P
, et al.  
Lnc2cancer: a manually curated database of experimentally supported lncrnas associated with various human cancers
.
Nucleic Acids Res
 
2016
;
44
(
D1
):
D980
5
.

31.

Dinger
 
ME
,
Pang
 
KC
,
Mercer
 
TR
, et al.  
Nred: a database of long noncoding rna expression
.
Nucleic Acids Res
 
2009
;
37
(
suppl_1
):
D122
6
.

32.

Cui
 
T
,
Zhang
 
L
,
Huang
 
Y
, et al.  
Mndr v2. 0: an updated resource of ncrna–disease associations in mammals
.
Nucleic Acids Res
 
2018
;
46
(
D1
):
D371
4
.

33.

Chen
 
X
,
Huang
 
L
.
Computational model for ncRNA research
. In:, Vol.
23
,
2022
.

34.

Chen
 
X
,
Yan
 
G-Y
.
Novel human lncrna–disease association inference based on lncrna expression profiles
.
Bioinformatics
 
2013
;
29
(
20
):
2617
24
.

35.

Chen
 
X
,
Yan
 
CC
,
Luo
 
C
, et al.  
Constructing lncrna functional similarity network based on lncrna-disease associations and disease semantic similarity
.
Sci Rep
 
2015
;
5
(
1
):
1
12
.

36.

Chen
 
X
.
Katzlda: Katz measure for the lncrna-disease association prediction
.
Sci Rep
 
2015
;
5
(
1
):
1
11
.

37.

Chen
 
X
.
Predicting lncrna-disease associations and constructing lncrna functional similarity network based on the information of mirna
.
Sci Rep
 
2015
;
5
(
1
):
1
11
.

38.

Xie
 
G
,
Changhai
 
W
,
Guosheng
 
G
,
Huang
 
B
.
Haubrw: hybrid algorithm and unbalanced bi-random walk for predicting lncrna-disease associations
.
Genomics
 
2020
;
112
(
6
):
4777
87
.

39.

Xie
 
G
,
Jiang
 
J
,
Sun
 
Y
.
Lda-lnsubrw: lncrna-disease association prediction based on linear neighborhood similarity and unbalanced bi-random walk
.
IEEE/ACM Trans Comput Biol Bioinform
 
2020
;
19
(
2
):
989
97
.

40.

Xie
 
G
,
Huang
 
B
,
Sun
 
Y
, et al.  
Rwsf-blp: a novel lncrna-disease association prediction model using random walk-based multi-similarity fusion and bidirectional label propagation
.
Mol Genet Genomics
 
2021
;
296
:
473
83
.

41.

Xie
 
G-B
,
Chen
 
R-B
,
Lin
 
Z-Y
, et al.  
Predicting lncrna–disease associations based on combining selective similarity matrix fusion and bidirectional linear neighborhood label propagation
.
Brief Bioinform
 
2023
;
24
(
1
):
bbac595
.

42.

Zhao
 
X
,
Yang
 
Y
,
Yin
 
M
.
Mhrwr: prediction of lncrna-disease associations based on multiple heterogeneous networks
.
IEEE/ACM Trans Comput Biol Bioinform
 
2020
;
18
(
6
):
2577
85
.

43.

Wang
 
L
,
Shang
 
M
,
Dai
 
Q
,
He
 
P-a
.
Prediction of lncrna-disease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks
.
BMC Bioinformatics
 
2022
;
23
(
1
):
1
20
.

44.

Liu
 
J-X
,
Cui
 
Z
,
Gao
 
Y-L
,
Kong
 
X-Z
.
Wgrcmf: a weighted graph regularized collaborative matrix factorization method for predicting novel lncrna-disease associations
.
IEEE J Biomed Health Inform
 
2020
;
25
(
1
):
257
65
.

45.

Xi
 
W-Y
,
Zhou
 
F
,
Gao
 
Y-L
, et al.  
Ldcmfc: predicting long non-coding rna and disease association using collaborative matrix factorization based on correntropy
.
IEEE/ACM Trans Comput Biol Bioinform
 
2022
.

46.

Wang
 
M-N
,
You
 
Z-H
,
Wang
 
L
, et al.  
Ldgrnmf: Lncrna-disease associations prediction based on graph regularized non-negative matrix factorization
.
Neurocomputing
 
2021
;
424
:
236
45
.

47.

Guoxian
 
Y
,
Wang
 
Y
,
Wang
 
J
, et al.  
Attributed heterogeneous network fusion via collaborative matrix tri-factorization
.
Information Fusion
 
2020
;
63
:
153
65
.

48.

Qiu
 
S
,
Wang
 
M
,
Yang
 
Y
, et al.  
Meta multi-instance multi-label learning by heterogeneous network fusion
.
Information Fusion
 
2023
;
94
:
272
83
.

49.

Wang
 
Y
,
Guoxian
 
Y
,
Domeniconi
 
C
,
Wang
 
J
,
Zhang
 
X
, and
Guo
 
M
.
Selective matrix factorization for multi-relational data fusion
. In
International conference on database systems for advanced applications
, pages
313
29
.
Springer
,
2019
.

50.

Hao
 
W
,
Yingfu
 
W
,
Jiang
 
Y
, et al.  
Schicstackl: a stacking ensemble learning-based method for single-cell hi-c classification using cell embedding
.
Brief Bioinform
 
2022
;
23
(
1
):
bbab396
.

51.

Wang
 
T
,
Sun
 
J
,
Zhao
 
Q
.
Investigating cardiotoxicity related with herg channel blockers using molecular fingerprints and graph attention mechanism
.
Comput Biol Med
 
2023
;
153
:106464.

52.

Shen
 
L
,
Liu
 
F
,
Huang
 
L
, et al.  
Vda-rwlrls: an anti-sars-cov-2 drug prioritizing framework combining an unbalanced bi-random walk and laplacian regularized least squares
.
Comput Biol Med
 
2022
;
140
:105119.

53.

Zhang
 
Z
,
Junlin
 
X
,
Yanan
 
W
, et al.  
Capsnet-lda: predicting lncrna-disease associations using attention mechanism and capsule network based on multi-view data
.
Brief Bioinform
 
2023
;
24
(
1
):
bbac531
.

54.

Zhang
 
P
,
Hao
 
W
.
Ichrom-deep: an attention-based deep learning model for identifying chromatin interactions
.
IEEE J Biomed Health Inform
 
2023
;
27
:
4559
456
.

55.

Peng
 
L
,
He
 
X
,
Peng
 
X
, et al.  
Stgnnks: identifying cell types in spatial transcriptomics data based on graph neural network, denoising auto-encoder, and k-sums clustering
.
Comput Biol Med
 
2023
;
166
:107440.

56.

Peng
 
L
,
Tan
 
J
,
Tian
 
X
,
Zhou
 
L
.
Enanndeep: an ensemble-based lncrna–protein interaction prediction framework with adaptive k-nearest neighbor classifier and deep models
.
Interdiscip Sci
 
2022
;
14
(
1
):
209
32
.

57.

Lihong
 
P
,
Wang
 
C
,
Tian
 
X
, et al.  
Finding lncrna-protein interactions based on deep learning with dual-net neural architecture
.
IEEE/ACM Trans Comput Biol Bioinform
 
2022
;
19
(
6
):
1
68
.

58.

Peng
 
L
,
Yuan
 
R
,
Shen
 
L
, et al.  
Lpi-enedt: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncrna-protein interaction data classification
.
BioData Mining
 
2021
;
14
:
1
22
.

59.

Yao
 
D
,
Zhan
 
X
,
Zhan
 
X
, et al.  
A random forest based computational model for predicting novel lncrna-disease associations
.
BMC Bioinformatics
 
2020
;
21
:
1
18
.

60.

Yang
 
Q
,
Li
 
X
.
Bigan: Lncrna-disease association prediction based on bidirectional generative adversarial network
.
BMC Bioinformatics
 
2021
;
22
:
1
17
.

61.

Qing-Wen
 
W
,
Xia
 
J-F
,
Ni
 
J-C
,
Zheng
 
C-H
.
Gaerf: predicting lncrna-disease associations by graph auto-encoder and random forest
.
Brief Bioinform
 
2021
;
22
(
5
):
bbaa391
.

62.

Lan
 
W
,
Wu
 
X
,
Chen
 
Q
, et al.  
Ganlda: graph attention network for lncrna-disease associations prediction
.
Neurocomputing
 
2022
;
469
:
384
93
.

63.

Wang
 
W
,
Zhang
 
L
,
Sun
 
J
, et al.  
Predicting the potential human lncrna-mirna interactions based on graph convolution network with conditional random field
.
Brief Bioinform
 
2022
;
23
(
6
):
bbac463
.

64.

Peng
 
L
,
Huang
 
L
,
Yuankang
 
L
,
Liu
 
G
,
Chen
 
M
, and
Han
 
G
.
Identifying possible lncrna-disease associations based on deep learning and positive-unlabeled learning
. In
2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
, pages
168
73
.
IEEE
,
2022
.

65.

Zhao
 
X
,
Zhao
 
X
,
Yin
 
M
.
Heterogeneous graph attention network based on meta-paths for lncrna–disease association prediction
.
Brief Bioinform
 
2022
;
23
(
1
):
bbab407
.

66.

Chen
 
G
,
Wang
 
Z
,
Wang
 
D
, et al.  
Lncrnadisease: a database for long-non-coding rna-associated diseases
.
Nucleic Acids Res
 
2012
;
41
(
D1
):
D983
6
.

67.

Fan
 
W
,
Shang
 
J
,
Li
 
F
, et al.  
Idssim: an lncrna functional similarity calculation model based on an improved disease semantic similarity method
.
BMC Bioinformatics
 
2020
;
21
(
1
):
1
14
.

68.

Wang
 
D
,
Wang
 
J
,
Ming
 
L
, et al.  
Inferring the human microrna functional similarity and functional network based on microrna-associated diseases
.
Bioinformatics
 
2010
;
26
(
13
):
1644
50
.

69.

Abdi
 
H
.
Singular value decomposition (svd) and generalized singular value decomposition
.
Encyclopedia of measurement and statistics
 
2007
;
907
:
912
.

70.

Kipf
 
TN
,
Welling
 
M
.
Variational graph auto-encoders
 
arXiv preprint arXiv:1611.07308
.
2016
.

71.

Kipf
 
TN
,
Welling
 
M
.
Semi-supervised classification with graph convolutional networks
 
arXiv preprint arXiv:1609.02907
.
2016
.

72.

Bruna
 
J
,
Zaremba
 
W
,
Szlam
 
A
,
LeCun
 
Y
.
Spectral networks and locally connected networks on graphs
 
arXiv preprint arXiv:1312.6203
.
2013
.

73.

Ding
 
Y
,
Lei
 
X
,
Liao
 
B
,
Fang-Xiang
 
W
.
Predicting mirna-disease associations based on multi-view variational graph auto-encoder with matrix factorization
.
IEEE J Biomed Health Inform
 
2021
;
26
(
1
):
446
57
.

74.

Parnell
 
T
,
Anghel
 
A
,
Lazuka
 
M
, et al.  
Snapboost: a heterogeneous boosting machine
.
Adv Neural Inf Process Syst
 
2020
;
33
:
11166
77
.

75.

Lihong
 
P
,
Wang
 
C
,
Tian
 
X
, et al.  
Finding lncrna-protein interactions based on deep learning with dual-net neural architecture
.
IEEE/ACM Trans Comput Biol Bioinform
 
2021
;
1
.

76.

Zeng
 
M
,
Chengqian
 
L
,
Zhang
 
F
, et al.  
Sdlda: lncrna-disease association prediction based on singular value decomposition and deep learning
.
Methods
 
2020
;
179
:
73
80
.

77.

Zhang
 
Y
,
Ye
 
F
,
Xiong
 
D
,
Gao
 
X
.
Ldnfsgb: prediction of long non-coding rna and disease association using network feature similarity and gradient boosting
.
BMC Bioinformatics
 
2020
;
21
(
1
):
1
27
.

78.

Zhu
 
R
,
Wang
 
Y
,
Liu
 
J-X
,
Dai
 
L-Y
.
Ipcarf: improving lncrna-disease association prediction using incremental principal component analysis feature selection and a random forest classifier
.
BMC Bioinformatics
 
2021
;
22
(
1
):
1
17
.

79.

Guo
 
Z-H
,
You
 
Z-H
,
Wang
 
Y-B
, et al.  
A learning-based method for lncrna-disease association identification combing similarity information and rotation forest
.
IScience
 
2019
;
19
:
786
95
.

80.

Chen
 
T
and
Guestrin
 
C
.
Xgboost: A scalable tree boosting system
. In
Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining
, pages
785
794
,
2016
.

81.

Zhou
 
L
,
Duan
 
Q
,
Tian
 
X
, et al.  
Lpi-hyadbs: a hybrid framework for lncrna-protein interaction prediction integrating feature selection and classification
.
BMC Bioinformatics
 
2021
;
22
(
1
):
1
31
.

82.

Prokhorenkova
 
L
,
Gusev
 
G
,
Vorobev
 
A
, et al.  
Catboost: unbiased boosting with categorical features
.
Adv Neural Inf Process Syst
 
2018
;
31
.

83.

Ke
 
G
,
Meng
 
Q
,
Finley
 
T
, et al.  
Lightgbm: a highly efficient gradient boosting decision tree
.
Adv Neural Inf Process Syst
 
2017
;
30
.

84.

Sagi
 
O
,
Rokach
 
L
.
Ensemble learning: a survey
.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
 
2018
;
8
(
4
):
e1249
.

85.

Peng
 
L
,
Wang
 
F
,
Wang
 
Z
, et al.  
Cell–cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: data resources and computational strategies
.
Brief Bioinform
 
2022
;
23
(
4
):
bbac234
.

86.

Wang
 
X
,
Zhang
 
Y
,
Bin
 
Y
, et al.  
Prediction of protein-protein interaction sites through extreme gradient boosting with kernel principal component analysis
.
Comput Biol Med
 
2021
;
104516
.

87.

Chen
 
C
,
Zhang
 
Q
,
Ma
 
Q
,
Bin
 
Y
.
Lightgbm-ppi: predicting protein-protein interactions through lightgbm with multi-information fusion
.
Chemom Intel Lab Syst
 
2019
;
191
:
54
64
.

88.

Gao
 
Y
,
Shang
 
S
,
Guo
 
S
, et al.  
Lnc2cancer 3.0: an updated resource for experimentally supported lncrna/circrna cancer associations and web tools based on rna-seq and scrna-seq data
.
Nucleic Acids Res
 
2021
;
49
(
D1
):
D1251
8
.

89.

Chen
 
J
,
Lin
 
J
,
Yongfei
 
H
, et al.  
Rnadisease v4. 0: an updated resource of rna-associated diseases, providing rna-disease analysis, enrichment and prediction
.
Nucleic Acids Res
 
2023
;
51
(
D1
):
D1397
404
.

90.

Shi
 
Z
,
Luo
 
Y
,
Minghui Zhu
 
Y
, et al.  
Expression analysis of long non-coding rna har1a and har1b in hbv-induced hepatocullular carcinoma in chinese patients
.
Lab Med
 
2019
;
50
(
2
):
150
7
.

91.

Zou
 
H
,
Lan-Xiang
 
W
,
Yang
 
Y
, et al.  
Lncrnas pvt1 and har1a are prognosis biomarkers and indicate therapy outcome for diffuse glioma patients
.
Oncotarget
 
2017
;
8
(
45
):
78767
80
.

92.

Liao
 
H-F
,
Lee
 
H-H
,
Chang
 
Y-S
, et al.  
Down-regulated and commonly mutated alpk1 in lung and colorectal cancers
.
Sci Rep
 
2016
;
6
(
1
):
27350
.

93.

Lee
 
C-P
,
Ko
 
AM-S
,
Nithiyanantham
 
S
, et al.  
Long noncoding rna har1a regulates oral cancer progression through the alpha-kinase 1, bromodomain 7, and myosin iia axis
.
J Mol Med
 
2021
;
99
(
9
):
1323
34
.

94.

Yang
 
F
,
Qingjian
 
W
,
Zhang
 
L
, et al.  
The long noncoding rna kcnq1dn suppresses the survival of renal cell carcinoma cells through downregulating c-myc
.
J Cancer
 
2019
;
10
(
19
):
4662
70
.

95.

Xin
 
Z
,
Soejima
 
H
,
Higashimoto
 
K
, et al.  
A novel imprinted gene, kcnq1dn, within the wt2 critical region of human chromosome 11p15. 5 and its reduced expression in wilms’ tumors
.
J. Biochem
 
2000
;
128
(
5
):
847
53
.

96.

Metsalu
 
T
,
Viltrop
 
T
,
Tiirats
 
A
, et al.  
Using rna sequencing for identifying gene imprinting and random monoallelic expression in human placenta
.
Epigenetics
 
2014
;
9
(
10
):
1397
409
.

97.

Zhang
 
F
,
Ruan
 
X
,
Ma
 
J
, et al.  
Dgcr8/zfat-as1 promotes cdx2 transcription in a prc2 complex-dependent manner to facilitate the malignant biological behavior of glioma cells
.
Mol Ther
 
2020
;
28
(
2
):
613
30
.

98.

Lv
 
Q-L
,
Chen
 
S-H
,
Zhang
 
X
, et al.  
Upregulation of long noncoding rna zinc finger antisense 1 enhances epithelial–mesenchymal transition in vitro and predicts poor prognosis in glioma
.
Tumor Biol
 
2017
;
39
(
3
):
1010428317695022
.

99.

Gao
 
K
,
Ji
 
Z
,
She
 
K
, et al.  
Long non-coding rna zfas1 is an unfavourable prognostic factor and promotes glioma cell progression by activation of the notch signaling pathway
.
Biomed Pharmacother
 
2017
;
87
:
555
60
.

100.

Askarian-Amiri
 
ME
,
Crawford
 
J
,
French
 
JD
, et al.  
Snord-host rna zfas1 is a regulator of mammary development and a potential marker for breast cancer
.
RNA
 
2011
;
17
(
5
):
878
91
.

101.

Jiang
 
X
,
Yang
 
Z
,
Li
 
Z
.
Zinc finger antisense 1: a long noncoding rna with complex roles in human cancers
.
Gene
 
2019
;
688
:
26
33
.

102.

Pollard
 
KS
,
Salama
 
SR
,
Lambert
 
N
, et al.  
An rna gene expressed during cortical development evolved rapidly in humans
.
Nature
 
2006
;
443
(
7108
):
167
72
.

103.

Yamada
 
H
,
Takahashi
 
M
,
Watanuki
 
M
, et al.  
Lncrna har1b has potential to be a predictive marker for pazopanib therapy in patients with sarcoma corrigendum in/10.3892/ol. 2021.12959
.
Oncol Lett
 
2021
;
21
(
6
):
1
14
.

104.

Khajehdehi
 
M
,
Khalaj-Kondori
 
M
,
Feizi
 
MAH
.
Expression profiling of cancer-related long non-coding rnas revealed upregulation and biomarker potential of har1b and jpx in colorectal cancer
.
Mol Biol Rep
 
2022
;
49
(
7
):
6075
84
.

Author notes

Lihong Peng and Liangliang Huang contributed equally to this work and share first authorship.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]