Abstract

Long non-coding RNAs (lncRNAs) can disrupt the biological functions of protein-coding genes (PCGs) to cause cancer. However, the relationship between lncRNAs and PCGs remains unclear and difficult to predict. Machine learning has achieved a satisfactory performance in association prediction, but to our knowledge, it is currently less used in lncRNA–PCG association prediction. Therefore, we introduce GAE-LGA, a powerful deep learning model with graph autoencoders as components, to recognize potential lncRNA–PCG associations. GAE-LGA jointly explored lncRNA–PCG learning and cross-omics correlation learning for effective lncRNA–PCG association identification. The functional similarity and multi-omics similarity of lncRNAs and PCGs were accumulated and encoded by graph autoencoders to extract feature representations of lncRNAs and PCGs, which were subsequently used for decoding to obtain candidate lncRNA–PCG pairs. Comprehensive evaluation demonstrated that GAE-LGA can successfully capture lncRNA–PCG associations with strong robustness and outperformed other machine learning-based identification methods. Furthermore, multi-omics features were shown to improve the performance of lncRNA–PCG association identification. In conclusion, GAE-LGA can act as an efficient application for lncRNA–PCG association prediction with the following advantages: It fuses multi-omics information into the similarity network, making the feature representation more accurate; it can predict lncRNA–PCG associations for new lncRNAs and identify potential lncRNA–PCG associations with high accuracy.

Introduction

Long non-coding RNAs (lncRNAs) are transcripts longer than 200 nucleotides that are barely involved in the translation process[1–3]. Currently, the roles of most lncRNAs are not clear, and only 17 754 of them are annotated by GENCODE V35[4]. These annotations suggest that lncRNAs can perturb the expression of protein-coding genes (PCGs) at multiple levels and participate in several important biological processes[1, 5–7]. Given the aforementioned biological significance of lncRNAs, combined with their large number and the diversity of their mechanisms of action, it becomes necessary to explore the relationship between lncRNAs and PCGs.

Some biological experiments have been designed to verify lncRNA–PCG associations[8–11], but their large-scale use is not easily achieved due to time and financial constraints. Thus, a reliable computational tool for identifying lncRNA–PCG associations based on existing experimental data is needed. At the present time, various computational-based approaches have been designed to recognize lncRNA–PCG associations. These approaches are grouped into three categories: sequence-based, expression-based and machine learning-based. Sequence-based approaches utilize the free energy of base sequences to predict lncRNA–PCG associations[12–14]. They excel at predicting direct physical interactions between lncRNAs and PCGs, but fail to identify potential lncRNA–PCG associations. Expression-based methods calculate the degree of association between lncRNAs and PCGs based on their expression levels[15–17]. Affected by the specificity of lncRNA and PCG expression, these methods can only predict the lncRNA–PCG association in specific samples at specific stages. Machine learning-based approaches, which are also newly proposed, identify candidate lncRNA–PCG pairs by learning from existing lncRNA–PCG pairs[18, 19]. They can identify direct and potential lncRNA–PCG associations without the expression-specific effects of lncRNA and PCG.

Machine learning methods provide an efficient solution for predicting associations between lncRNAs and other objects, including lncRNA–disease association[20–28], lncRNA–miRNA association[29–31] and lncRNA–protein association[32–35], etc. Most of these methods can be grouped into the following three categories: traditional machine learning methods[20, 24, 25, 30, 33], matrix completion methods[21–23, 29, 35] and deep learning methods[26–28, 31, 32, 34]. Inspired by the above studies, machine learning-based approaches for lncRNA–PCG association prediction were proposed. A predictor based on support vector machines (SVMs), logistic regression (LR) and random forest (RF) was first constructed to investigate the relationship between lncRNA and PCG[19]. Subsequently, a deep learning-based method was proposed to screen target PCGs for lncRNA[18]. The results of these two approaches demonstrate the effectiveness of machine learning in identifying lncRNA–PCG associations, but there is still much room for improvement in identification performance. Therefore, it is of great significance for us to design a new machine learning-based model to accurately identify lncRNA–PCG association, and to explore the regulatory mechanism of lncRNA on PCG.

lncRNA–PCG association identification can be viewed as a link prediction problem that has been proved to be excellently solved by Graph Convolutional Network (GCN)[36]. In addition, lncRNAs can regulate PCG expression at multiple levels[1] and play a multi-omics synergistic regulatory role in organisms. Therefore, based on the above two points, we combined GCN encoders and multi-omics pan-cancer data to propose a new method GAE-LGA for efficient identification of candidate lncRNA–PCG associations. We first integrated functional similarity and multi-omics similarity to build similarity networks for lncRNA and PCG, respectively. Among them, the functional similarity was inferred from lncRNA–PCG association information, and the multi-omics similarity was inferred from multi-omics information of patients. Then, feature representations of lncRNAs and PCGs were learned from the constructed similarity networks using graph autoencoders. Finally, we constructed a decoder to decode the feature representations to identify potential lncRNA–PCG associations. Compared with existing lncRNA–PCG associations prediction methods, GAE-LGA has three new features:

  • it fuses multi-omics information into the similarity network, making the feature representation more accurate;

  • it can predict lncRNA–PCG associations for a new lncRNA;

  • it identifies potential lncRNA–PCG associations with high accuracy and without expression specificity.

Materials and methods

We designed GAE-LGA for lncRNA–PCG association prediction. As shown in Figure 1, the prediction process consists of three steps. Firstly, we collected and preprocessed the multi-omics data and lncRNA–PCG association network to obtain the required association information and multi-omics information(Figure 1A). Secondly, we used GCN encoders to aggregate and learn feature representations for lncRNAs and PCGs(Figure 1B). Finally, a decoder was used to combine feature representations for predicting candidate lncRNA–PCG association(Figure 1C).

Overview of GAE-LGA. LNC represents lncRNA and PCG represents protein coding gene. (A) In the first step, we performed a preprocessing operation to get multi-omics features and association matrix. (B) To generate feature representations of lncRNAs and PCGs, the association information (functional similarity) and multi-omics information were fused for encoding. (C) Finally, we computed the association score and activated it.
Figure 1

Overview of GAE-LGA. LNC represents lncRNA and PCG represents protein coding gene. (A) In the first step, we performed a preprocessing operation to get multi-omics features and association matrix. (B) To generate feature representations of lncRNAs and PCGs, the association information (functional similarity) and multi-omics information were fused for encoding. (C) Finally, we computed the association score and activated it.

Data collection and preprocessing

Multi-omics data

We downloaded multi-omics data from the TCGA (https://portal.gdc.cancer.gov/) database. The multi-omics data are composed of single nucleotide variation (SNV), copy number variation (CNV), dna methylation (DNA Methy) and transcription profiling (TP) data, whose relationship to gene expression is shown in Table S1 (See Supplementary material). After preprocessing, each lncRNA/PCG contains 312 multi-omics features, including 66 SNV features, 132 CNV features, 48 DNA Methy featuress and 66 TP features. Their details are shown in Table 1.

Table 1

Details of lncRNA/PCG-related multi-omics features

Feature typeNCAttributeNM
SNV33Chromosome, Position66
CNV33Chromosome, Start, End, Aber132
DNA Methy12Chromosome, Start, End, Beta48
TP33Mean, Standard deviation66
Feature typeNCAttributeNM
SNV33Chromosome, Position66
CNV33Chromosome, Start, End, Aber132
DNA Methy12Chromosome, Start, End, Beta48
TP33Mean, Standard deviation66

Note: NC and NM represent the number of cancer types and multi-omics features, respectively. SNV, CNV, DNA Methy and TP are the abbreviations for single nucleotide variation, copy number variation, DNA methylation and transcription profiling, respectively. Aber and Beta refer to mutation type and methylation intensity, respectively.

Table 1

Details of lncRNA/PCG-related multi-omics features

Feature typeNCAttributeNM
SNV33Chromosome, Position66
CNV33Chromosome, Start, End, Aber132
DNA Methy12Chromosome, Start, End, Beta48
TP33Mean, Standard deviation66
Feature typeNCAttributeNM
SNV33Chromosome, Position66
CNV33Chromosome, Start, End, Aber132
DNA Methy12Chromosome, Start, End, Beta48
TP33Mean, Standard deviation66

Note: NC and NM represent the number of cancer types and multi-omics features, respectively. SNV, CNV, DNA Methy and TP are the abbreviations for single nucleotide variation, copy number variation, DNA methylation and transcription profiling, respectively. Aber and Beta refer to mutation type and methylation intensity, respectively.

Human lncRNA–PCG associations

We collected lncRNA–PCG associations from three gold standard datasets: LncRNA2Target[9], LncTarD[8] and NPInter[10]. Here, lncRNAs and PCGs with multi-omics information were used for experimental analysis. Their details are shown in Table 2. It can be found that LncRNA2Target contains 773 lncRNA–PCG associations, including 263 lncRNAs and 498 PCGs. There are 1444 lncRNA–PCG associations in LncRNA2Target, involving 238 lncRNAs and 716 PCGS. As for LncTarD, the numbers of lncRNA, PCGs and lncRNA–PCG associations are 308, 256 and 369, respectively.

Table 2

Statistics of lncRNA–PCG association datasets

DatasetLncRNAPCGlncRNA–PCG
LncRNA2Target263498773
LncTarD2387161444
NPInter308256369
DatasetLncRNAPCGlncRNA–PCG
LncRNA2Target263498773
LncTarD2387161444
NPInter308256369

Note: lncRNA–PCG represents the number of lncRNA–PCG associations, and PCG represents protein-coding gene

Table 2

Statistics of lncRNA–PCG association datasets

DatasetLncRNAPCGlncRNA–PCG
LncRNA2Target263498773
LncTarD2387161444
NPInter308256369
DatasetLncRNAPCGlncRNA–PCG
LncRNA2Target263498773
LncTarD2387161444
NPInter308256369

Note: lncRNA–PCG represents the number of lncRNA–PCG associations, and PCG represents protein-coding gene

Training and testing datasets

We used 10-fold cross-validation to obtain training and testing datasets. For them, we generated positive and negative samples by the following two methods. In method 1, we selected known lncRNA–PCG associations in the dataset as positive samples and those with unknown lncRNA–PCG associations as negative samples. In method 2, we selected positive samples in the same way as in method 1, and then randomly selected the same number of negative samples from unknown lncRNA–PCG associations.

Similarity feature

The similarity feature consists of two parts: the functional similarity feature and the multi-omics similarity feature. These details are shown in Table S2 (See Supplementary material). For lncRNA, the similarity feature was calculated as follows:
(1)
where |$l$| and |$l^{\prime}$| both represent lncRNA |$\mathrm{and\ accum}(\cdot )$| represents an accumulation operation (the concatenation of vectors). |$F_l(l,l)$| is the functional similarity between |$l$| and |$l^{\prime}$|⁠, reflecting the similarity of |$l$| and |$l^{\prime}$| in their association with PCGs and and is defined as follows:
(2)
where |$A_{lg}$| represents the existing lncRNA–PCG association. |$M_l(l,l^{\prime})$| is the multi-omics similarity between |$l$| and |$l^{\prime}$|⁠, which represents the similarity of |$l$| and |$l^{\prime}$| in cross-omics regulation and is defined as follows:
(3)
where |$O_l$| represents the multi-omics features of lncRNAs. By a method similar to computing |$S_l$|⁠, we computed similarity feature |$S_g$| for PCG. Here, the functional similarity feature and multi-omics similarity feature of PCG were calculated by Equations S1 and S2 (See Supplementary material), respectively. After that, |$S_l$| and |$S_g$| were used for feature representation learning.

Feature representation

We used GCNs as encoders to learn feature representations of lncRNAs and PCGs from their similarity features. GCN is a general concept of applying neural networks on graphs[36]. Its layer-wise propagation rule is difined as follows:
(4)
where |$A$| repesents the adjacency matrix of graph, |$\tilde{A}=A+I$| and |$I$| represents the identity matrix, |$\tilde{D}$| represents the degree matrix of matrix |$\tilde{A}$| and |$\tilde{D}_{ii} = \sum _{j}\tilde{A}_{ij}$|⁠, and |$W^{(l)}$| represents a layer-specific weight matrix. It should be note that |$H(0) = X$| represents the initial feature matrix.
We used a two-layer GCN to learn feature representations for lncRNAs. We first calculated |$\hat{A_l} = \tilde{D_l}^{-\frac{1}{2}} \tilde{A_l} \tilde{D_l}^{-\frac{1}{2}}$| in a pre-processing step. Then the GCN encoder model for lncRNA takes the following simple form:
(5)
Similarly, we used a two-layer GCN to learn feature representations for PCGs. We first calculated |$\hat{A_g} = \tilde{D_g}^{-\frac{1}{2}} \tilde{A_g} \tilde{D_g}^{-\frac{1}{2}}$| in a pre-processing step. Our GCN encoder model for PCG then takes the simple form as follows:
(6)

lncRNA–PCG association prediction

lncRNA–PCG association reconstruction

Having learned feature representations for lncRNAs and PCGs, we used the following decoder to rebuild the lncRNA–PCG association network:
(7)
where |$Z$| represents the predicted lncRNA–PCG associations and |$\mathrm{DEC}(\cdot )$| represents a reconstruction function. Existing lncRNA–PCG associations are represented by |$A$|⁠. We trained the model so that |$Z$| was as close to |$A$| as possible. The difference between |$A$| and |$Z$| is constrained as follows:
(8)
where |$m$| and |$n$| represent the number of lncRNAs and PCGs, respectively.

Association information for a new lncRNA

We predicted lncRNA–PCG associations for new lncRNAs by integrating the relationships between their neighboring nodes and PCGs. For a new lncRNA |$l$|⁠, its association with PCGs is calculated as follows:
(9)
where |$N(l)$| represents the neighbor nodes of |$l$| and it is determined by comparing the similarity features between |$l$| and other nodes as follows:
(10)
where |$l=\{1,2,\cdots ,m\}$|⁠, |$l_p=\{1,2,\cdots ,m\}$|⁠, |$p=\{1,2,\cdots ,k\}$|⁠, |$k=|N(l)|$| represents the number of neighbor nodes of |$l$|⁠, and |$\mathrm{mean}(\cdot )$| represents a mean operation. It should be note that |$k \le (m-1)$|⁠.

Complexity analysis

The computational complexity of GEA-LGA consists of the computational complexity of the graph autoencoder (Equations 5 and 6) and the decoder (Equation 7). It is defined as follows:
(11)
where |$r_1,r_2,h$| represent the number of edges in lncRNA similarity network, the number of edges in PCG similarity network and the number of hidden layer nodes of the GCN network. It can be found that the computational complexity of GAE-LGA mainly depends on the number of lncRNAs and PCGs in the association network. For existing prediction tasks, their values are all about 1000, thus the computational complexity of GAE-LGA is within an acceptable range.

Results

Experimental setup

We performed 10-fold cross-validation on three independent datasets, NPInter, LncTarD and LncRNA2Target, to verify the performance of GAE-LGA. The details of the division of positive and negative samples of the dataset are shown in Table S3 (See Supplementary material). Here, four important parameters, GCN layer, embedding size, hidden layer and learning rate, were learned by model training, which were 2, 10, 200 and 0.001, respectively. To evaluate model performance, we calculated the area under the ROC curve (AUC), area under the precision-recall curve (AUPR), accuracy, F1-score, precision, recall and Matthews correlation coefficient (MCC) for the prediction results.

Besides, we analyzed the relationships between lncRNAs, PCGs and lncRNA–PCG associations among the datasets (Figure 2). It can be found that only 77 lncRNAs (Figure 2A), 68 PCGs (Figure 2B) and 30 lncRNA–PCG associations (Figure 2C) overlap in the three datasets. Furthermore, we compared the performance of the model with and without overlapping associations (See Figure S1, Supplementary material). It can be found that these overlapping associations slightly improved model performance, thus we distributed these associations evenly across the training and test sets in our experiments.

Relationship between lncRNAs, protein coding genes (PCGs), and lncRNA–PCG associations among three datasets. Dataset1, Dataset2 and Dataset3 represent NPInter, LncTarD and LncRNA2Target, respectively. (A) Relationship between lncRNAs. (B) Relationship between PCGs. (C) Relationship between lncRNA–PCG associations.
Figure 2

Relationship between lncRNAs, protein coding genes (PCGs), and lncRNA–PCG associations among three datasets. Dataset1, Dataset2 and Dataset3 represent NPInter, LncTarD and LncRNA2Target, respectively. (A) Relationship between lncRNAs. (B) Relationship between PCGs. (C) Relationship between lncRNA–PCG associations.

Comparison experiments

Effectiveness comparison

To verify the effectiveness of the features learned by GCN, we compared it with several other representation methods. The decoding method used here is the lncRNA–PCG association reconstruction mentioned in the Materials and methods section. As shown in Table S4 (See Supplementary material), we found that GCN, Deepwalk and Node2vec sequentially decreased in their ability to learn lncRNA and PCG features. Although the combination of GCN and Deepwalk/Node2vec can improve the effectiveness of feature representation, its performance is still inferior to that of GCN. This is because GCN can capture the global information of the lncRNA/PCG-related network and thus represent the node features well.

Subsequently, we compared the experimental results of GAE-LGA with those of DeepLGP[18], Convolutional Neural Network (CNN)[37], Autoencoder[38] and CNN+Autoencoder[37, 38] on three datasets to validate its performance (Table 3). As we can see, GAE-LGA performs best on the NPInter and LncRNA2Target datasets, significantly improving over state-of-the-art baseline and deep learning methods. On the NPInter dataset, GAE-LGA improves AUC, AUPR, F1-score and MCC by 5.70%, 11.92%, 0.18% and 0.05%, respectively. On the LncRNA2Target dataset, GAE-LGA improves AUC, AUPR, F1-score and MCC by 1.03%, 2.56%, 0.63% and 0.54%, respectively. As for the LncTarD dataset, DeepLGP achieves the highest AUPR, CNN+Autoencoder achieves the highest MCC, but GAE-LGA far outperforms the best method on other evaluation metrics. It improves by 1.22% and 0.37% in AUC and F1 score, respectively. These improvements are attributed to the following two aspects. On the one hand, GAE-LGA combines the association information and multi-omics information of lncRNA and PCG, making the theoretical performance of lncRNA–PCG association higher. On the other hand, GCN can learn the feature representation of lncRNA and PCG more accurately than other feature representation methods, which leads to the performance of lncRNA–PCG association prediction closer to the theoretical value.

Table 3

Comparison results of the proposed GAE-LGA and other deep learning methods on datasets NPInter, LncTarD and LncRNA2Targe under the same experimental setup

MethodNPInterLncTarDLncRNA2Targe
AUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCC
GAE-LGA0.97430.67300.84680.72620.93080.63450.88310.61230.89560.52950.77870.7871
DeepLGP0.92180.60130.75260.72160.91960.65010.75080.61020.86570.50580.75820.7815
CNN0.91020.59610.81970.71350.91030.62540.80030.60250.88020.50620.77130.7658
Autoencoder0.90130.59360.77610.68970.90420.61880.79520.59010.87520.51080.76520.7726
Autoencoder+CNN0.92050.60090.84530.72580.91580.62910.85270.61360.88650.51630.77380.7829
MethodNPInterLncTarDLncRNA2Targe
AUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCC
GAE-LGA0.97430.67300.84680.72620.93080.63450.88310.61230.89560.52950.77870.7871
DeepLGP0.92180.60130.75260.72160.91960.65010.75080.61020.86570.50580.75820.7815
CNN0.91020.59610.81970.71350.91030.62540.80030.60250.88020.50620.77130.7658
Autoencoder0.90130.59360.77610.68970.90420.61880.79520.59010.87520.51080.76520.7726
Autoencoder+CNN0.92050.60090.84530.72580.91580.62910.85270.61360.88650.51630.77380.7829

Note: The bold value corresponds to the best performance method for each metric.

Table 3

Comparison results of the proposed GAE-LGA and other deep learning methods on datasets NPInter, LncTarD and LncRNA2Targe under the same experimental setup

MethodNPInterLncTarDLncRNA2Targe
AUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCC
GAE-LGA0.97430.67300.84680.72620.93080.63450.88310.61230.89560.52950.77870.7871
DeepLGP0.92180.60130.75260.72160.91960.65010.75080.61020.86570.50580.75820.7815
CNN0.91020.59610.81970.71350.91030.62540.80030.60250.88020.50620.77130.7658
Autoencoder0.90130.59360.77610.68970.90420.61880.79520.59010.87520.51080.76520.7726
Autoencoder+CNN0.92050.60090.84530.72580.91580.62910.85270.61360.88650.51630.77380.7829
MethodNPInterLncTarDLncRNA2Targe
AUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCC
GAE-LGA0.97430.67300.84680.72620.93080.63450.88310.61230.89560.52950.77870.7871
DeepLGP0.92180.60130.75260.72160.91960.65010.75080.61020.86570.50580.75820.7815
CNN0.91020.59610.81970.71350.91030.62540.80030.60250.88020.50620.77130.7658
Autoencoder0.90130.59360.77610.68970.90420.61880.79520.59010.87520.51080.76520.7726
Autoencoder+CNN0.92050.60090.84530.72580.91580.62910.85270.61360.88650.51630.77380.7829

Note: The bold value corresponds to the best performance method for each metric.

Following this, we analyzed the prediction results of traditional machine learning methods SVM-based model[19], LR-based model[19] and RF-based model[19] under the same experimental setup (See Table S5, Supplementary material). As we can see, on the three datasets, the average performance metrics of deep learning methods are much higher than those of traditional machine learning methods. Especially in AUC, the advantages of deep learning methods are obvious. These improvements are attributed to the following three reasons: (i) The neural network has a strong function approximation ability, which leads to its very strong learning ability; (ii) Deep neural networks have more streamlined expression and higher sample efficiency than shallow neural networks; (iii) Deep learning has strong generalization ability, that is, the model with small error learned from the training set also has small error on the test set. In conclusion, our model has advantages in predicting lncRNA–PCG associations, significantly improving the prediction performance compared with existing machine learning methods.

Robustness comparison

We randomly drop a small fraction of known lncRNA–PCG pairs in each dataset at a ratio r |$\in \{0.8,0.85,0.9,0.95,1\}$| for comparison of method performance changes (Figure 3). As we can see, the performance of all methods drops significantly with the reduction of lncRNA–PCG pairs, except GAE-LGA which yields the most robust and highest performances across different sample sizes. Furthermore, we used the standard deviation of AUC and AUPR for different sample size groups to evaluate the robustness of the models (Figure 4). It can be found that the standard deviations of AUC of GAE-LGA on NPInter, LncTarD and LncRNA2Target are 0.0091, 0.0081 and 0.0077 (Figure 4A), which are 7.2479, 11.3529 and 17.8180 times better than those of the best comparison method (Figure 4C), and 8.0703, 12.5664 and 19.8222 times better than those of the worst comparison method (Figure 4C), respectively.

Comparison of model performance changes. (A) AUC changes on NPInter. (B) AUC changes on LncTarD. (C) AUC changes on LncRNA2Target. (D) AUPR changes on NPInter. (E) AUPR changes on LncTarD. (F) AUPR changes on LncRNA2Target.
Figure 3

Comparison of model performance changes. (A) AUC changes on NPInter. (B) AUC changes on LncTarD. (C) AUC changes on LncRNA2Target. (D) AUPR changes on NPInter. (E) AUPR changes on LncTarD. (F) AUPR changes on LncRNA2Target.

Standard deviation analysis of model performance changes. Here, SD, Dataset1, Dataset2 and Dataset3 represent standard deviation, NPInter, LncTarD and LncRNA2Target, respectively. (A) AUC Changes. (B) AUPR Changes. (C) GAE-LGA improvement over other deep learning methods. AUC-I1 and AUPR-I1 represent the improvement of GAE-LGA in AUC change and AUPR change over the best deep learning method, respectively. AUC-I2 and AUPR-I2 represent the improvement of GAE-LGA in AUC change and AUPR change over the worst deep learning method by GAE-LGA, respectively.
Figure 4

Standard deviation analysis of model performance changes. Here, SD, Dataset1, Dataset2 and Dataset3 represent standard deviation, NPInter, LncTarD and LncRNA2Target, respectively. (A) AUC Changes. (B) AUPR Changes. (C) GAE-LGA improvement over other deep learning methods. AUC-I1 and AUPR-I1 represent the improvement of GAE-LGA in AUC change and AUPR change over the best deep learning method, respectively. AUC-I2 and AUPR-I2 represent the improvement of GAE-LGA in AUC change and AUPR change over the worst deep learning method by GAE-LGA, respectively.

The standard deviations of AUPR of GAE-LGA on NPInter, LncTarD and LncRNA2Target are 0.0282, 0.0129 and 0.0118 (Figure 4B), which are 5.3156, 9.1521 and 8.5238 times better than those of the best comparison method (Figure 4C), and 5.7036, 13.9338 and 12.4349 times better than those of the worst comparison method (Figure 4C), respectively. The reason for the improvement is that GCN can perform end-to-end learning on the feature information and network structure information of lncRNA and PCG at the same time, which thus can capture the global information of the related network to represent the features of the lncRNA and PCG well. Therefore, the GAE-LGA model based on the GCN encoder is robust with a lower standard deviation of model performance than other methods.

Ablation experiment

To evaluate the importance of multi-omics features in lncRNA–PCG association prediction, we conducted comparative experiments on the NPInter, LncTarD and LncRNA2Target datasets based on different features of lncRNAs and PCGs: functional similarity features, multi-omics similarity features and aggregated similarity (functional similarity and multi-omics similarity) features. As we can see in Table 4, the functional similarity-related experimental group has a higher prediction performance than the multi-omics similarity-related experimental group, while the aggregated similarity-related experimental group has the best prediction performance. This suggests that lncRNAs regulate PCG expression at the multi-omics level, and their multi-omics features contribute to improving model prediction performance.

Table 4

Comparison results of the effect of different features on model performance

MethodNPInterLncTarDLncRNA2Target
AUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCC
GAE+S10.96330.65380.86010.71980.93100.62090.87290.61090.89500.60120.75230.7691
GAE+S20.92310.61360.80220.70530.90220.63740.88300.59260.89010.53060.70210.7528
GAE+S1+S20.97430.67300.84680.72620.93080.63450.88310.61230.89560.52950.77870.7871
MethodNPInterLncTarDLncRNA2Target
AUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCC
GAE+S10.96330.65380.86010.71980.93100.62090.87290.61090.89500.60120.75230.7691
GAE+S20.92310.61360.80220.70530.90220.63740.88300.59260.89010.53060.70210.7528
GAE+S1+S20.97430.67300.84680.72620.93080.63450.88310.61230.89560.52950.77870.7871

Note: The bold value corresponds to the best performance method for each metric. GAE, S1 and S2 represents graph autoencoder, functional similarity features and multi-omics similarity features, respectively.

Table 4

Comparison results of the effect of different features on model performance

MethodNPInterLncTarDLncRNA2Target
AUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCC
GAE+S10.96330.65380.86010.71980.93100.62090.87290.61090.89500.60120.75230.7691
GAE+S20.92310.61360.80220.70530.90220.63740.88300.59260.89010.53060.70210.7528
GAE+S1+S20.97430.67300.84680.72620.93080.63450.88310.61230.89560.52950.77870.7871
MethodNPInterLncTarDLncRNA2Target
AUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCCAUCAUPRF1-scoreMCC
GAE+S10.96330.65380.86010.71980.93100.62090.87290.61090.89500.60120.75230.7691
GAE+S20.92310.61360.80220.70530.90220.63740.88300.59260.89010.53060.70210.7528
GAE+S1+S20.97430.67300.84680.72620.93080.63450.88310.61230.89560.52950.77870.7871

Note: The bold value corresponds to the best performance method for each metric. GAE, S1 and S2 represents graph autoencoder, functional similarity features and multi-omics similarity features, respectively.

Case studies

We conducted a case study of GAE-LGA prediction results using published literature that includes experimentally validated lncRNA–PCG associations (Table 5). As we can see, 12 candidate lncRNA–PCG pairs are validated by existing studies. Specifically, it has been found that CDKN2B-AS1 inhibits MMP9 expression in renal clear cell carcinoma[39], CRNDE accelerates cervical cancer progression by altering ZEB1 expression[40], H19 increases CDKN1A expression by reducing HUVEC growth[41] and MALAT1 promotes osteosarcoma development by upregulating CCND1 expression[42]. In addition, MALAT1 upregulates c-Myc expression in thymic epithelial tumors[43], upregulation of MEG3 in human osteosarcoma cells correlates with low expression of MMP9[39], MEG3 inhibits macrophage apoptosis by regulating CDKN2A expression[44] and NEAT1 induces cell differentiation in hepatoblastoma by regulating MMP9 expression[45]. Moreover, PVT1 promotes fracture healing and esophageal cancer progression by regulating the expressions of HMGA2 and ZEB1, respectively[46, 47], the correlation between TUG1 and P53 affects the cell life activities of non-small cell lung cancer[48] and UCA1 can antagonize cell cycle arrest by destabilizing EZH2[49].

Table 5

Case studies of identified lncRNA–PCG associations

LncRNAPCGDiseaseEvidenceLncRNAPCGDiseaseEvidence
CDKN2B-AS1MMP9renal clear cell carcinomaPMID: 33608495MEG3CDKN2AAtherosclerosisPMID: 30672051
CRNDEZEB1Cervical CancerPMID: 33469312NEAT1MMP9HepatoblastomaPMID: 35300348
H19CDKN1Ahypoxia-related diseasesPMID: 27063004PVT1HMGA2Fragility fracturePMID: 34592894
MALAT1CCND1osteosarcomaPMID: 30365098PVT1ZEB1esophageal cancerPMID: 33848670
MALAT1MYCthymic epithelial tumorsPMID: 34530916TUG1TP53non-small cell lung cancerPMID: 24853421
MEG3MMP9OsteosarcomaPMID: 29434890UCA1EZH2urothelial cancerPMID: 32537408
LncRNAPCGDiseaseEvidenceLncRNAPCGDiseaseEvidence
CDKN2B-AS1MMP9renal clear cell carcinomaPMID: 33608495MEG3CDKN2AAtherosclerosisPMID: 30672051
CRNDEZEB1Cervical CancerPMID: 33469312NEAT1MMP9HepatoblastomaPMID: 35300348
H19CDKN1Ahypoxia-related diseasesPMID: 27063004PVT1HMGA2Fragility fracturePMID: 34592894
MALAT1CCND1osteosarcomaPMID: 30365098PVT1ZEB1esophageal cancerPMID: 33848670
MALAT1MYCthymic epithelial tumorsPMID: 34530916TUG1TP53non-small cell lung cancerPMID: 24853421
MEG3MMP9OsteosarcomaPMID: 29434890UCA1EZH2urothelial cancerPMID: 32537408
Table 5

Case studies of identified lncRNA–PCG associations

LncRNAPCGDiseaseEvidenceLncRNAPCGDiseaseEvidence
CDKN2B-AS1MMP9renal clear cell carcinomaPMID: 33608495MEG3CDKN2AAtherosclerosisPMID: 30672051
CRNDEZEB1Cervical CancerPMID: 33469312NEAT1MMP9HepatoblastomaPMID: 35300348
H19CDKN1Ahypoxia-related diseasesPMID: 27063004PVT1HMGA2Fragility fracturePMID: 34592894
MALAT1CCND1osteosarcomaPMID: 30365098PVT1ZEB1esophageal cancerPMID: 33848670
MALAT1MYCthymic epithelial tumorsPMID: 34530916TUG1TP53non-small cell lung cancerPMID: 24853421
MEG3MMP9OsteosarcomaPMID: 29434890UCA1EZH2urothelial cancerPMID: 32537408
LncRNAPCGDiseaseEvidenceLncRNAPCGDiseaseEvidence
CDKN2B-AS1MMP9renal clear cell carcinomaPMID: 33608495MEG3CDKN2AAtherosclerosisPMID: 30672051
CRNDEZEB1Cervical CancerPMID: 33469312NEAT1MMP9HepatoblastomaPMID: 35300348
H19CDKN1Ahypoxia-related diseasesPMID: 27063004PVT1HMGA2Fragility fracturePMID: 34592894
MALAT1CCND1osteosarcomaPMID: 30365098PVT1ZEB1esophageal cancerPMID: 33848670
MALAT1MYCthymic epithelial tumorsPMID: 34530916TUG1TP53non-small cell lung cancerPMID: 24853421
MEG3MMP9OsteosarcomaPMID: 29434890UCA1EZH2urothelial cancerPMID: 32537408

Parameter analysis

GCN layer

We compared the AUCs produced by two-, three-, four- and five-layer GCN encoder models to determine the effect of GCN layers on model performance (Figure 5A). It can be found that the AUC fluctuations between models with different GCN layers are not large, but a small number of layers can speed up the convergence of the model, and a larger number of layers will make the model prone to overfitting. In this experiment, we make GCN layers equal to two.

Parameter analysis. (A) Analysis of GCN layer. (B) Analysis of embedding size. (C) Analysis of the number of hidden layer features. (D) Analysis of learning rate.
Figure 5

Parameter analysis. (A) Analysis of GCN layer. (B) Analysis of embedding size. (C) Analysis of the number of hidden layer features. (D) Analysis of learning rate.

Embedding size

Embedding size refers to the size of the feature vector of lncRNAs/PCGs extracted by the model. During model training, we set the embedding size in |$\{10,50,100,200\}$| to verify its impact on GAE-LGA prediction performance (Figure 5B). It can found that the AUC does not fluctuate much between the four groups of models. When the embedding size is set to 10, the final AUC is slightly larger than that of the other groups. Thus, we choose 10 as the embedding size in this experiment.

Number of hidden layer features

The number of features in the hidden layer in the GCN encoder is an important parameter that can affect the performance of GEA-LGA for lncRNA–PCG association prediction. In this comparison, we changed the number of hidden layer features in |$\{10,50,100,200\}$| (Figure 5C). It can be found that GAE-LGA performs best when the parameter is set to 200. Therefore, we chose 200 as the number of hidden layer features in this experiment.

Learning rate

The learning rate as a hyperparameter of the model determines if and when the objective function converges to a minimum. Here, we changed the learning rate in |$\{0.1,0.01,0.001,0.0001\}$| for analysis (Figure 5D). As we can see, the model has the best performance when the learning rate is equal to 0.001. When the learning rate is in |$\{0.1, 0.01\}$|⁠, it is prone to gradient explosion, the loss amplitude is large and the model is difficult to converge. When the learning rate is equal to 0.0001, the model falls into a local optimum and cannot achieve optimal performance. In this experiment, we choose 0.001 as the value of learning rate.

Discussion

Despite the importance of lncRNA–PCG associations for dissecting lncRNA pathogenic mechanisms, the current understanding of lncRNA–PCG association identification is still limited. Apart from some sequence-based, expression-based and two recently proposed machine learning-based prediction methods, little effort has been made to identify lncRNA–PCG associations at scale. In this study, driven by the recent progress in the multi-omics synergistic regulation of lncRNA and PCG, we explored the mechanism of action of lncRNA and PCG from a multi-omics perspective and discovered that the multi-omics features of lncRNA and PCG were associated with lncRNA–PCG associations. Based on this finding, we designed a new computational model, GAE-LGA, to predict lncRNA–PCG associations using a graph autoencoder algorithm combined with multi-omics information of genes. Through the performance and robustness comparison of GAE-LGA and other deep learning methods on three gold standard datasets, it was proved that GAE-LGA is a very effective lncRNA–PCG prediction model.

GAE-LGA can provide meaningful insights into future studies on the regulation of PCG expression by lncRNAs. Unlike traditional sequence-based lncRNA–PCG associations prediction methods which focus on the binding sites between lncRNA and its target PCGs, considering that free-energy calculations may have a high error rate and that only direct physical interactions between lncRNAs and PCGs can be identified through a base-pairing strategy. Also not like the expression-based lncRNA–PCG associations prediction methods which use the expression profiles of lncRNA and PCG to analyze their relationship. GAE-LGA predicted lncRNA–PCG associations using lncRNA–PCG learning and cross-omics correlation learning. Two learned similarities, functional similarity and multi-omics similarity, were accumulated and encoded by graph autoencoders to obtain embeddings of lncRNAs and PCGs. By combining and decoding these embeddings, GAE-LGA can identify the association for each lncRNA–PCG pair and thus, it can have broad applications: (i) GAE-LGA fuses multi-omics information into the similarity network for efficient feature representation, providing a new idea for the related analysis of lncRNA; (ii) More importantly, it identifies potential lncRNA–PCG associations with high accuracy and without expression specificity, helping to uncover the regulatory mode of lncRNAs on PCG and guide the treatment of diseases.

GAE-LGA can provide initial information for our two future studies: LncRNA-protein associations prediction and prediction of PCG-binding capacity of lncRNAs. On the one hand, lncRNAs have been reported to drive RNA–protein pair binding by promoting PCG expression[50, 51]. Since GAE-LGA focuses on identifying lncRNA–PCG associations, lncRNA–protein associations can be inferred by analyzing lncRNA–PCG associations and lncRNA-driven PCG–protein associations. On the other hand, lncRNAs may compete for binding to PCG to exert regulatory functions[52], and their binding capacities are not always equal and difficult to predict effectively. GAE-LGA can be used to measure the competitiveness of lncRNAs in regulating a specific PCG. It outputs the association score between lncRNA and PCG. Then, lncRNA–PCG associations with larger scores could be considered to be more biologically significant and thus lncRNAs belonging to these associations may preferentially regulate PCGs.

Despite the obvious advantages of GAE-LGA, some of its limitations should be known. GAE-LGA is more robust than other machine learning methods, but it may still suffer from computational biases caused by imbalanced learning samples. Well-studied lncRNAs/PCGs tend to receive ideal predictive performance because they have more connections in the known lncRNA–PCG association network. Besides, we should know that the predictive performance of GAE-LGA for new lncRNAs with unknown multi-omics characteristics is lower than that of lncRNAs with known multi-omics characteristics because of their synergistic regulatory roles.

Key Points
  • This study proposed a graph autoencoder-based deep learning model, GAE-LGA, to identify lncRNA-related PCGs.

  • GAE-LGA jointly explored lncRNA-gene learning and cross-omics correlation learning to make feature representations more accurate.

  • GAE-LGA can successfully capture lncRNA–PCG associations with strong robustness and outperformed other machine learning-based identification methods.

Availability and implementation

The source code of GAE-LGA is available at: https://github.com/meihonggao/GAE-LGA.

Funding

This work has been supported by the National Natural Science Foundation of China [NO. 61772426, U1811262].

Author Biographies

Meihong Gao is a PhD student in the School of Computer Science and Engineering at Northwestern Polytechnical University, Xi’an, China. Her research interests include computational biology and machine learning.

Shuhui Liu is a PhD student in the School of Computer Science and Engineering at Northwestern Polytechnical University, Xi’an, China. Her research interests include computational biology and machine learning.

Yang Qi is a PhD student in the School of Computer Science and Engineering at Northwestern Polytechnical University, Xi’an, China. Her research interests include bioinformatics and machine learning.

Xinpeng Guo is a PhD student in the School of Computer Science and Engineering at Northwestern Polytechnical University, Xi’an, China. His research interests include bioinformatics and machine learning.

Xuequn Shang is a professor in the School of Computer Science and Engineering at Northwestern Polytechnical University, Xi’an, China. Her research interests include data mining and bioinformatics.

References

1.

Statello
L
,
Guo
C-J
,
Chen
L-L
, et al.
Gene regulation by long non-coding rnas and its biological functions
.
Nat Rev Mol Cell Biol
2021
;
22
(
2
):
96
118
.

2.

Palazzo
AF
,
Koonin
EV
.
Functional long non-coding rnas evolve from junk transcripts
.
Cell
2020
;
183
(
5
):
1151
61
.

3.

Yao
R-W
,
Wang
Y
,
Chen
L-L
.
Cellular functions of long noncoding rnas
.
Nat Cell Biol
2019
;
21
(
5
):
542
51
.

4.

Frankish
A
,
Diekhans
M
,
Jungreis
I
, et al.
Gencode 2021
.
Nucleic Acids Res
2021
;
49
(
D1
):
D916
23
.

5.

Uszczynska-Ratajczak
B
,
Lagarde
J
,
Frankish
A
, et al.
Towards a complete map of the human long non-coding rna transcriptome
.
Nat Rev Genet
2018
;
19
(
9
):
535
48
.

6.

Gao
M
,
Liu
S
,
Yang
Q
, et al.
Imrelnc: Identifying immune-related lncrna characteristics in human cancers based on heuristic correlation optimization
.
Front Genet
2021
;
12
. https://doi.org/10.3389/fgene.2021.792541.

7.

Senft
AD
,
Macfarlan
TS
.
Transposable elements shape the evolution of mammalian development
.
Nat Rev Genet
2021
;
22
(11):
691
711
.

8.

Zhao
H
,
Shi
J
,
Zhang
Y
, et al.
Lnctard: a manually-curated database of experimentally-supported functional lncrna-target regulations in human diseases
.
Nucleic Acids Res
2020
;
48
(
D1
):
D118
26
.

9.

Cheng
L
,
Wang
P
,
Tian
R
, et al.
Lncrna2target v2. 0: a comprehensive database for target genes of lncrnas in human and mouse
.
Nucleic Acids Res
2019
;
47
(
D1
):
D140
4
.

10.

Teng
X
,
Chen
X
,
Xue
H
, et al.
Npinter v4. 0: an integrated database of ncrna interactions
.
Nucleic Acids Res
2020
;
48
(
D1
):
D160
5
.

11.

Chang
K-C
,
Diermeier
SD
,
Allen
TY
, et al.
Matar25 lncrna regulates the tensin1 gene to impact breast cancer progression
.
Nat Commun
2020
;
11
(
1
):
1
19
.

12.

Fukunaga
T
,
Hamada
M
.
Riblast: an ultrafast rna–rna interaction prediction system based on a seed-and-extension approach
.
Bioinformatics
2017
;
33
(
17
):
2666
74
.

13.

Gawronski
AR
,
Uhl
M
,
Zhang
Y
, et al.
Mechrna: prediction of lncrna mechanisms from rna–rna and rna–protein interactions
.
Bioinformatics
2018
;
34
(
18
):
3101
10
.

14.

Mann
M
,
Wright
PR
,
Backofen
R
.
Intarna 2.0: enhanced and customizable prediction of rna–rna interactions
.
Nucleic Acids Res
2017
;
45
(
W1
):
W435
9
.

15.

Liao
Q
,
Liu
C
,
Yuan
X
, et al.
Large-scale prediction of long non-coding rna functions in a coding–non-coding gene co-expression network
.
Nucleic Acids Res
2011
;
39
(
9
):
3864
78
.

16.

Zhang
J
,
Le
TD
,
Liu
L
, et al.
Inferring and analyzing module-specific lncrna–mrna causal regulatory networks in human cancer
.
Brief Bioinform
2019
;
20
(
4
):
1403
19
.

17.

Li
Z
,
Liu
L
,
Jiang
S
, et al.
Lncexpdb: an expression database of human long non-coding rnas
.
Nucleic Acids Res
2021
;
49
(
D1
):
D962
8
.

18.

Zhao
T
,
Yang
H
,
Peng
J
, et al.
Deeplgp: a novel deep learning method for prioritizing lncrna target genes
.
Bioinformatics
2020
;
36
(
16
):
4466
72
.

19.

Zhang
Y
,
Yi
T
,
Ji
H
, et al.
Designing a general method for predicting the regulatory relationships between long noncoding rnas and protein-coding genes based on multi-omics characteristics
.
Bioinformatics
2020
;
36
(
7
):
2025
32
.

20.

Chen
X
.
Katzlda: Katz measure for the lncrna-disease association prediction
.
Sci Rep
2015
;
5
(
1
):
1
11
.

21.

Zeng
M
,
Chengqian
L
,
Zhang
F
, et al.
Sdlda: lncrna-disease association prediction based on singular value decomposition and deep learning
.
Methods
2020
;
179
:
73
80
.

22.

Zhang
Z-C
,
Zhang
X-F
,
Min
W
, et al.
A graph regularized generalized matrix factorization model for predicting links in biomedical bipartite networks
.
Bioinformatics
2020
;
36
(
11
):
3474
81
.

23.

Xiao
Q
,
Luo
J
,
Liang
C
, et al.
A graph regularized non-negative matrix factorization method for identifying microrna-disease associations
.
Bioinformatics
2018
;
34
(
2
):
239
48
.

24.

Qingfeng
Chen
,
Dehuan
Lai
,
Wei
Lan
,
Ximin
Wu
,
Baoshan
Chen
,
Jin
Liu
,
Yi-Ping Phoebe
Chen
, and
Jianxin
Wang
.
Ildmsf: inferring associations between long non-coding rna and disease based on multi-similarity fusion
.
IEEE/ACM Trans Comput Biol Bioinform
,
18
(3):
1106
12
,
2019
.

25.

Jingwen
Y
,
Ping
P
,
Wang
L
, et al.
A novel probability model for lncrna–disease association prediction based on the naïve bayesian classifier
.
Genes
2018
;
9
(
7
):
345
.

26.

Xuan
P
,
Gong
Z
,
Cui
H
, et al.
Fully connected autoencoder and convolutional neural network with attention-based method for inferring disease-related lncrnas
.
Brief Bioinform
2022
;
23
(
3
):
bbac089
.

27.

Sheng
N
,
Huang
L
,
Wang
Y
, et al.
Multi-channel graph attention autoencoders for disease-related lncrnas prediction
.
Brief Bioinform
2022
;
23
(
2
):bbab604.

28.

Lan
W
,
Wu
X
,
Chen
Q
, et al.
Ganlda: graph attention network for lncrna-disease associations prediction
.
Neurocomputing
2022
;
469
:
384
93
.

29.

Liu
H
,
Ren
G
,
Chen
H
, et al.
Predicting lncrna–mirna interactions based on logistic matrix factorization with neighborhood regularized
.
Knowledge-Based Systems
2020
;
191
:105261.

30.

Yang
L
,
Li
L-P
,
Yi
H-C
.
Deepwalk based method to predict lncrna-mirna associations via lncrna-mirna-disease-protein-drug graph
.
BMC bioinformatics
2022
;
22
(
12
):
1
16
.

31.

Yang
S
,
Yan Wang
Y
,
Lin
DS
, et al.
Lncmirnet: predicting lncrna–mirna interaction based on deep learning of ribonucleic acid sequences
.
Molecules
2020
;
25
(
19
):
4372
.

32.

Wekesa
JS
,
Luan
Y
,
Chen
M
, et al.
A hybrid prediction method for plant lncrna-protein interaction
.
Cell
2019
;
8
(
6
):
521
.

33.

Zhao
Q
,
Zhang
Y
,
Huan
H
, et al.
Irwnrlpi: integrating random walk and neighborhood regularized logistic matrix factorization for lncrna-protein interaction prediction
.
Front Genet
2018
;
9
:
239
.

34.

Wekesa
JS
,
Meng
J
,
Luan
Y
.
Multi-feature fusion for deep learning to predict plant lncrna-protein interaction
.
Genomics
2020
;
112
(
5
):
2928
36
.

35.

Ma
Y
,
He
T
,
Jiang
X
.
Projection-based neighborhood non-negative matrix factorization for lncrna-protein interaction prediction
.
Front Genet
2019
;
10
:
1148
.

36.

Kipf
TN
,
Welling
M
.
Semi-supervised classification with graph convolutional networks
.
arXiv preprint arXiv:1609.02907
.
2016
.

37.

Wang
L
,
You
Z-H
,
Huang
Y-A
, et al.
An efficient approach based on multi-sources information to predict circrna–disease associations using deep convolutional neural network
.
Bioinformatics
2020
;
36
(
13
):
4038
46
.

38.

Ji
C
,
Zhen Gao
X
,
Ma
QW
, et al.
Aemda: inferring mirna–disease associations based on deep autoencoder
.
Bioinformatics
2021
;
37
(
1
):
66
72
.

39.

Shi
Y
,
Lv
C
,
Shi
L
, et al.
Meg3 inhibits proliferation and invasion and promotes apoptosis of human osteosarcoma cells
.
Oncol Lett
2018
;
15
(
2
):
1917
23
.

40.

Ren
L
,
Yang
S
,
Cao
Q
, et al.
Crnde contributes cervical cancer progression by regulating mir-4262/zeb1 axis
.
Onco Targets Ther
2021
;
14
:
355
.

41.

Voellenkle
C
,
Garcia-Manteiga
JM
,
Pedrotti
S
, et al.
Implication of long noncoding rnas in the endothelial cell response to hypoxia revealed by rna-sequencing
.
Sci Rep
2016
;
6
(
1
):
1
13
.

42.

Duan
G
,
Zhang
C
,
Changke
X
, et al.
Knockdown of malat1 inhibits osteosarcoma progression via regulating the mir-34a/cyclin d1 axis
.
Int J Oncol
2019
;
54
(
1
):
17
28
.

43.

Iaiza
A
,
Tito
C
,
Ianniello
Z
, et al.
Mettl3-dependent malat1 delocalization drives c-myc induction in thymic epithelial tumors
.
Clin Epigenetics
2021
;
13
(
1
):
1
15
.

44.

Yan
L
,
Liu
Z
,
Yin
H
, et al.
Silencing of meg3 inhibited ox-ldl-induced inflammation and apoptosis in macrophages via modulation of the meg3/mir-204/cdkn2a regulatory axis
.
Cell Biol Int
2019
;
43
(
4
):
409
20
.

45.

Hu
Y
,
Zai
H
,
Jiang
W
, et al.
Hepatoblastoma: Derived exosomal lncrna neat1 induces bmscs differentiation into tumor-supporting myofibroblasts via modulating the mir-132/mmp9 axis
.
J Oncol
2022
;
2022
. https://doi.org/10.1155/2022/7630698.

46.

Ji
X
,
Li
Z
,
Wang
W
, et al.
Downregulation of long non-coding rna pvt1 enhances fracture healing via regulating microrna-497-5p/hmga2 axis
.
Bioengineered
2021
;
12
(
1
):
8125
34
.

47.

Jing
H
,
Gao
W
.
Long noncoding rna pvt1 promotes tumour progression via the mir-128/zeb1 axis and predicts poor prognosis in esophageal cancer
.
Clin Res Hepatol Gastroenterol
2021
;
45
(
4
):101701.

48.

Zhang
EB
,
Yin
DD
,
Sun
M
, et al.
P53-regulated long non-coding rna tug1 affects cell proliferation in human non-small cell lung cancer, partly through epigenetically regulating hoxb7 expression
.
Cell Death Dis
2014
;
5
(
5
):
e1243
3
.

49.

Dong
Z
,
Gao
M
,
Li
C
, et al.
Lncrna uca1 antagonizes arsenic-induced cell cycle arrest through destabilizing ezh2 and facilitating nfatc2 expression
.
Advanced Science
2020
;
7
(
11
):
1903630
.

50.

Munschauer
M
,
Nguyen
CT
,
Sirokman
K
, et al.
The norad lncrna assembles a topoisomerase complex critical for genome stability
.
Nature
2018
;
561
(
7721
):
132
6
.

51.

Hentze
MW
,
Castello
A
,
Schwarzl
T
, et al.
A brave new world of rna-binding proteins
.
Nat Rev Mol Cell Biol
2018
;
19
(
5
):
327
41
.

52.

He
J
,
Qiaozhu
Z
,
Hu
BO
, et al.
A novel, liver-specific long noncoding rna linc01093 suppresses hcc progression by interaction with igf2bp1 to facilitate decay of gli1 mrna
.
Cancer Lett
2019
;
450
:
98
109
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data