-
PDF
- Split View
-
Views
-
Cite
Cite
Kai Zheng, Xin-Lu Zhang, Lei Wang, Zhu-Hong You, Zhao-Hui Zhan, Hao-Yuan Li, Line graph attention networks for predicting disease-associated Piwi-interacting RNAs, Briefings in Bioinformatics, Volume 23, Issue 6, November 2022, bbac393, https://doi.org/10.1093/bib/bbac393
- Share Icon Share
Abstract
PIWI proteins and Piwi-Interacting RNAs (piRNAs) are commonly detected in human cancers, especially in germline and somatic tissues, and correlate with poorer clinical outcomes, suggesting that they play a functional role in cancer. As the problem of combinatorial explosions between ncRNA and disease exposes gradually, new bioinformatics methods for large-scale identification and prioritization of potential associations are therefore of interest. However, in the real world, the network of interactions between molecules is enormously intricate and noisy, which poses a problem for efficient graph mining. Line graphs can extend many heterogeneous networks to replace dichotomous networks. In this study, we present a new graph neural network framework, line graph attention networks (LGAT). And we apply it to predict PiRNA disease association (GAPDA). In the experiment, GAPDA performs excellently in 5-fold cross-validation with an AUC of 0.9038. Not only that, it still has superior performance compared with methods based on collaborative filtering and attribute features. The experimental results show that GAPDA ensures the prospect of the graph neural network on such problems and can be an excellent supplement for future biomedical research.
Introduction
Piwi-interacting RNA (piRNA) is typically 24–32 nucleotides in length, which is a small, non-coding RNA that clusters at transposon loci in the genome [1–5]. PiRNA interacts exclusively with PIWI proteins which belong to germline-specific subclade of the Argonaute family [6–10]. Because of piRNA, the depletion of PIWI results in a sharp increase in transposon messenger RNA expression. Thus, its most famous role of is to repress transposons and maintain germline genome integrity through DNA methylation [11, 12]. Compared with microRNA (miRNA) and small interfering RNA (siRNA) that are small RNAs, piRNA has the following characteristics: (i) longer than miRNA or siRNA; (ii) only exists in animals; (iii) more diverse sequences and constitutes the largest class of noncoding RNA and (iv) testes-specific [6, 12–15].
Recently, emerging evidence suggested that piRNA and PIWI proteins are abnormally expressed in various cancers [16–21]. Therefore, the function and potential mechanism of piRNA in cancer become one of the important research directions in tumor diagnosis and treatment. For example, Fu et al. found that abnormal expression of piR-021285 promoted methylation of ARHGAP11A at the 5’-UTR/first exon CpG site, thereby promoting mRNA apoptosis and inhibiting apoptosis of breast cancer cells in 2015 [22]. Subsequently, Tan et al. [23] found that downregulation of piRNA-36,712 in breast cancer increases SLUG levels, while P21 and E-cadherin levels were reduced, thereby promoting the malignant phenotype of cancer. In addition, piR-30188 binds to OIP5-AS1 to inhibit glioma cell progression while low expression of OIP5-AS1 reduces CEBPA levels and promotes the malignant phenotype of glioma cells which was discovered by Liu et al. in 2017 [24]. Also glioblastoma, Jacobs et al. [25] found that piR-8041 can inhibit the expression of the tumor cell marker ALCAM/CD166, with the clinical role of targeted therapy. In addition, piRNA is directly or indirectly involved in the formation of liver cancer. In 2016, Rizzo et al. [26] found that hsa_piR_013306 accumulates only in hepatocellular carcinomas.
piRNA is gaining enormous attention, and tens of thousands of them have been identified in mammals and are rapidly accumulating. To accelerate research in this field and provide access to piRNA data and annotations, multiple databases such as piRNABank [27], piRBase [28] and piRNAQuest [29] have been successively established. Subsequently, the role of piRNA and PIWI proteins in the epigenetics of cancer is constantly being discovered, and some of them can serve as novel biomarkers and therapeutic targets. Taking this as an opportunity, an experimentally supported piRNA-disease correlation database called piRDisease [30] was proposed, which made it possible to predict potential associations on a large scale. Although many disease-related ncRNA prediction models have been proposed and gradually developed, predictors for disease-related piRNA are relatively unexplored [31–34].
In this paper, we propose a piRNA-disease association predictor based on line graph attention network, called GAPDA. This study has two main contributions: (i) a new graph neural network framework, line graph attention network, is proposed that can extend many heterogeneous networks to replace dichotomous networks. (ii) Different from traditional collaborative filtering and attribute-based methods, the proposed method integrates disease semantic information and piRNA sequence information, which improves prediction accuracy and has higher coverage. On the association dataset piRDisease, GAPDA achieves an the area under the receiver operating charac- teristic (ROC) curve (AUC) of 0.9038 with an accuracy of 85.69%. Overall, this method can provide new ideas for cancer mechanism research. In the meantime, it has a good application prospect for attention-based Graph Neural Networks on this kind of problem. Moreover, it is hoped that the work of this paper can promote more research on association prediction based on graph neural network. Our source code and data can be downloaded on github (https://github.com/kaizheng-academic/GAPDA/tree/main).
Methods
Dataset
The construction of new piRNA-disease association networks
The construction of the line graph
The attribute features of nodes


The network features of nodes
Graph attention layer
Line graph attention networks
In this paper, we propose a line graph attention networks, called GAPDA, to predict biologically significant but unmapped piRNA-disease associations. The main idea of GAPDA is to aggregate the properties of neighboring nodes using graph attention networks to compute the hidden state of each association. As seen in Figure 3B and C, the proposed computational model’s prediction process is separated into three major steps: preprocessing data, training the model and scoring the potential associations.

The flowchart of GAPDA for predicting piRNA-disease association.
Then, the model is trained with the calculated final descriptors. The details are as follows: (i) The node (association) feature F is linearly mapped by the shared weight matrix W to obtain the enhanced high-dimensional feature; (ii) concatenate the high-dimensional feature of its own and the neighbor node (association); (iii) use the single-layer feedforward neural network to map the spliced high-dimensional features to a real number to quantify the correlation between associations; (iv) use softmax to calculate the attention coefficient; (v) fuse the features of the association and the neighboring associations into a new association representation based on the calculated attention coefficients. In particular, the number of nodes mapping the feature
Finally, the data required for the potential association is fed into the trained model to obtain the prediction score.
The hyperparameter settings in this study are as follows: the output size of the first Graph Attention layer is 8; the number of attention heads in the first GAT layer is 8; the dropout rate is 0.02; the factor for L2 regularization is 1.25e−4 and the learning rate is 5e−2. In addition, the visualization of the neural network architecture can be seen in Figure 3B. It is worth noting that the line graphs that need to be constructed are different due to the different predicted associations. Therefore, predicting specific potential associations requires retraining the model (~5800 parameters, 100 epochs, ~20 min).
Experimental results
The performance of GAPDA on the benchmark dataset
In this part, we choose

(A) ROC curves performed by GAPDA on benchmark dataset. (B) PR curves performed by GAPDA on benchmark dataset.
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8642 | 0.8391 | 0.9012 | 0.8933 | 0.8690 |
2 | 0.8395 | 0.8022 | 0.9012 | 0.8873 | 0.8488 |
3 | 0.8636 | 0.8729 | 0.8512 | 0.8548 | 0.8619 |
4 | 0.8554 | 0.9095 | 0.7893 | 0.8139 | 0.8451 |
5 | 0.8616 | 0.8514 | 0.8760 | 0.8723 | 0.8635 |
Average | 0.8569 ± 0.0092 | 0.8550 ± 0.0356 | 0.8638 ± 0.0416 | 0.8619 ± 0.0318 | 0.8577 ± 0.0091 |
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8642 | 0.8391 | 0.9012 | 0.8933 | 0.8690 |
2 | 0.8395 | 0.8022 | 0.9012 | 0.8873 | 0.8488 |
3 | 0.8636 | 0.8729 | 0.8512 | 0.8548 | 0.8619 |
4 | 0.8554 | 0.9095 | 0.7893 | 0.8139 | 0.8451 |
5 | 0.8616 | 0.8514 | 0.8760 | 0.8723 | 0.8635 |
Average | 0.8569 ± 0.0092 | 0.8550 ± 0.0356 | 0.8638 ± 0.0416 | 0.8619 ± 0.0318 | 0.8577 ± 0.0091 |
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8642 | 0.8391 | 0.9012 | 0.8933 | 0.8690 |
2 | 0.8395 | 0.8022 | 0.9012 | 0.8873 | 0.8488 |
3 | 0.8636 | 0.8729 | 0.8512 | 0.8548 | 0.8619 |
4 | 0.8554 | 0.9095 | 0.7893 | 0.8139 | 0.8451 |
5 | 0.8616 | 0.8514 | 0.8760 | 0.8723 | 0.8635 |
Average | 0.8569 ± 0.0092 | 0.8550 ± 0.0356 | 0.8638 ± 0.0416 | 0.8619 ± 0.0318 | 0.8577 ± 0.0091 |
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8642 | 0.8391 | 0.9012 | 0.8933 | 0.8690 |
2 | 0.8395 | 0.8022 | 0.9012 | 0.8873 | 0.8488 |
3 | 0.8636 | 0.8729 | 0.8512 | 0.8548 | 0.8619 |
4 | 0.8554 | 0.9095 | 0.7893 | 0.8139 | 0.8451 |
5 | 0.8616 | 0.8514 | 0.8760 | 0.8723 | 0.8635 |
Average | 0.8569 ± 0.0092 | 0.8550 ± 0.0356 | 0.8638 ± 0.0416 | 0.8619 ± 0.0318 | 0.8577 ± 0.0091 |
In addition, we collected additional test sets independent of the cross-validation process to demonstrate the applicability and performance of their models in ‘real’ case use. We manually collected 22 experimentally validated piRNA-disease associations as an independent test set. Specifically, we used all data from the cross-validation process as the training set and tested the trained models on the independent test set. The proposed model obtained an accuracy of 86.36% on the independent test set (Table 2). The results indicate that GAPDA provided with actual application value.
No. . | piRNA . | Disease . | References . | Predictions . |
---|---|---|---|---|
1 | DQ577854 | Breast cancer | 27,177,224 | |
2 | DQ588919 | Dysplastic liver nodules and hepatocellular carcinoma | 27,429,044 | |
3 | DQ580140 | Gastric cancer | 25,779,424 | |
4 | DQ571174 | Gastric cancer | 25,779,424 | |
5 | DQ585247 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
6 | DQ590749 | Gastric cancer | 25,779,424 | |
7 | DQ571873 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
8 | DQ591522 | Gastric cancer | 25,779,424 | |
9 | DQ593431 | Alzheimer disease | 28,127,595 | |
10 | DQ595023 | Alzheimer disease | 28,127,595 | |
11 | DQ596276 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
12 | DQ596465 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
13 | DQ596465 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
14 | DQ597397 | Renal cell carcinoma | 25,998,508 | |
15 | DQ597945 | Gastric cancer | 25,779,424 | |
16 | DQ597997 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
17 | DQ600116 | Gastric cancer | 25,779,424 | |
18 | DQ600269 | Gastric cancer | 25,779,424 | |
19 | DQ600689 | Gastric cancer | 25,779,424 | |
20 | DQ574391 | Gastric cancer | 25,779,424 | |
21 | DQ576200 | Alzheimer disease | 28,127,595 | |
22 | DQ570485 | Gastric cancer | 25,779,424 |
No. . | piRNA . | Disease . | References . | Predictions . |
---|---|---|---|---|
1 | DQ577854 | Breast cancer | 27,177,224 | |
2 | DQ588919 | Dysplastic liver nodules and hepatocellular carcinoma | 27,429,044 | |
3 | DQ580140 | Gastric cancer | 25,779,424 | |
4 | DQ571174 | Gastric cancer | 25,779,424 | |
5 | DQ585247 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
6 | DQ590749 | Gastric cancer | 25,779,424 | |
7 | DQ571873 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
8 | DQ591522 | Gastric cancer | 25,779,424 | |
9 | DQ593431 | Alzheimer disease | 28,127,595 | |
10 | DQ595023 | Alzheimer disease | 28,127,595 | |
11 | DQ596276 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
12 | DQ596465 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
13 | DQ596465 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
14 | DQ597397 | Renal cell carcinoma | 25,998,508 | |
15 | DQ597945 | Gastric cancer | 25,779,424 | |
16 | DQ597997 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
17 | DQ600116 | Gastric cancer | 25,779,424 | |
18 | DQ600269 | Gastric cancer | 25,779,424 | |
19 | DQ600689 | Gastric cancer | 25,779,424 | |
20 | DQ574391 | Gastric cancer | 25,779,424 | |
21 | DQ576200 | Alzheimer disease | 28,127,595 | |
22 | DQ570485 | Gastric cancer | 25,779,424 |
No. . | piRNA . | Disease . | References . | Predictions . |
---|---|---|---|---|
1 | DQ577854 | Breast cancer | 27,177,224 | |
2 | DQ588919 | Dysplastic liver nodules and hepatocellular carcinoma | 27,429,044 | |
3 | DQ580140 | Gastric cancer | 25,779,424 | |
4 | DQ571174 | Gastric cancer | 25,779,424 | |
5 | DQ585247 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
6 | DQ590749 | Gastric cancer | 25,779,424 | |
7 | DQ571873 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
8 | DQ591522 | Gastric cancer | 25,779,424 | |
9 | DQ593431 | Alzheimer disease | 28,127,595 | |
10 | DQ595023 | Alzheimer disease | 28,127,595 | |
11 | DQ596276 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
12 | DQ596465 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
13 | DQ596465 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
14 | DQ597397 | Renal cell carcinoma | 25,998,508 | |
15 | DQ597945 | Gastric cancer | 25,779,424 | |
16 | DQ597997 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
17 | DQ600116 | Gastric cancer | 25,779,424 | |
18 | DQ600269 | Gastric cancer | 25,779,424 | |
19 | DQ600689 | Gastric cancer | 25,779,424 | |
20 | DQ574391 | Gastric cancer | 25,779,424 | |
21 | DQ576200 | Alzheimer disease | 28,127,595 | |
22 | DQ570485 | Gastric cancer | 25,779,424 |
No. . | piRNA . | Disease . | References . | Predictions . |
---|---|---|---|---|
1 | DQ577854 | Breast cancer | 27,177,224 | |
2 | DQ588919 | Dysplastic liver nodules and hepatocellular carcinoma | 27,429,044 | |
3 | DQ580140 | Gastric cancer | 25,779,424 | |
4 | DQ571174 | Gastric cancer | 25,779,424 | |
5 | DQ585247 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
6 | DQ590749 | Gastric cancer | 25,779,424 | |
7 | DQ571873 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
8 | DQ591522 | Gastric cancer | 25,779,424 | |
9 | DQ593431 | Alzheimer disease | 28,127,595 | |
10 | DQ595023 | Alzheimer disease | 28,127,595 | |
11 | DQ596276 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
12 | DQ596465 | Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration | 28,289,238 | |
13 | DQ596465 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
14 | DQ597397 | Renal cell carcinoma | 25,998,508 | |
15 | DQ597945 | Gastric cancer | 25,779,424 | |
16 | DQ597997 | Dysplastic liver nodules and hepatocellular carcinoma | 29,789,629 | |
17 | DQ600116 | Gastric cancer | 25,779,424 | |
18 | DQ600269 | Gastric cancer | 25,779,424 | |
19 | DQ600689 | Gastric cancer | 25,779,424 | |
20 | DQ574391 | Gastric cancer | 25,779,424 | |
21 | DQ576200 | Alzheimer disease | 28,127,595 | |
22 | DQ570485 | Gastric cancer | 25,779,424 |
Ablation experiment
To better evaluate the performance of the proposed method, we compare it with two methods that only use attribute information or network information. The results are shown in Table 3. The evaluation indicators of GAPDA are higher than the other two traditional methods, especially the accuracy. Therefore, the attention-based approach has better performance than traditional attribute-based and collaborative filtering-based approaches. In addition, other evaluation parameters are higher than the average performance. There are many reasons for the superior performance of GAPDA. First, the two traditional methods only consider attribute information or network information, and do not combine the two sources of heterogeneous knowledge. However, the proposed method combines four kinds of information into an attribute network, which can well quantify the characteristics of the association. Second, the introduction of attention mechanisms allows the hidden representation of nodes to be computed through neighbor behavior. This operation can effectively improve the performance of the model. Third, the new abstract network topology we built also helps improve performance. In the real world, networks are often heterogeneous. This method abstracts existing networks into adjacency matrix with uniform size, which is conducive to the fusion between heterogeneous networks.
Method . | AUC . | AUPR . | Accuracy . | Precision . | Specificity . | Recall . | F1-score . |
---|---|---|---|---|---|---|---|
Att-based | 0.8725 | 0.8465 | 0.8200 | 0.8247 | 0.8230 | 0.8143 | 0.8189 |
CF-based | 0.9032 | 0.8822 | 0.8280 | 0.8329 | 0.8312 | 0.8260 | 0.8272 |
GAPDA | 0.9038 | 0.8944 | 0.8569 | 0.8550 | 0.8555 | 0.8638 | 0.8577 |
Method . | AUC . | AUPR . | Accuracy . | Precision . | Specificity . | Recall . | F1-score . |
---|---|---|---|---|---|---|---|
Att-based | 0.8725 | 0.8465 | 0.8200 | 0.8247 | 0.8230 | 0.8143 | 0.8189 |
CF-based | 0.9032 | 0.8822 | 0.8280 | 0.8329 | 0.8312 | 0.8260 | 0.8272 |
GAPDA | 0.9038 | 0.8944 | 0.8569 | 0.8550 | 0.8555 | 0.8638 | 0.8577 |
Method . | AUC . | AUPR . | Accuracy . | Precision . | Specificity . | Recall . | F1-score . |
---|---|---|---|---|---|---|---|
Att-based | 0.8725 | 0.8465 | 0.8200 | 0.8247 | 0.8230 | 0.8143 | 0.8189 |
CF-based | 0.9032 | 0.8822 | 0.8280 | 0.8329 | 0.8312 | 0.8260 | 0.8272 |
GAPDA | 0.9038 | 0.8944 | 0.8569 | 0.8550 | 0.8555 | 0.8638 | 0.8577 |
Method . | AUC . | AUPR . | Accuracy . | Precision . | Specificity . | Recall . | F1-score . |
---|---|---|---|---|---|---|---|
Att-based | 0.8725 | 0.8465 | 0.8200 | 0.8247 | 0.8230 | 0.8143 | 0.8189 |
CF-based | 0.9032 | 0.8822 | 0.8280 | 0.8329 | 0.8312 | 0.8260 | 0.8272 |
GAPDA | 0.9038 | 0.8944 | 0.8569 | 0.8550 | 0.8555 | 0.8638 | 0.8577 |
The sensitivity analysis of line graphs
As shown in Tables 4 and 5 and Figure 5: (i) based on any abstract network topology, the performance of the proposed method is higher than the average of the traditional methods. This shows that the attribute network constructed with an abstract network topology can combine multiple knowledge sources to restore the true state of the network. This can improve model performance. (ii) Most evaluation criteria of
The 5-fold cross-validation results performed by GAPDA (
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8230 | 0.7754 | 0.9095 | 0.8906 | 0.8371 |
2 | 0.7798 | 0.7208 | 0.9136 | 0.8820 | 0.8058 |
3 | 0.8657 | 0.8865 | 0.8388 | 0.8470 | 0.8620 |
4 | 0.8512 | 0.8102 | 0.9174 | 0.9048 | 0.8605 |
5 | 0.7831 | 0.8827 | 0.6529 | 0.7246 | 0.7506 |
Average | 0.8206 ± 0.0348 | 0.8151 ± 0.0635 | 0.8464 ± 0.1010 | 0.8171 ± 0.0731 | 0.8232 ± 0.0416 |
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8230 | 0.7754 | 0.9095 | 0.8906 | 0.8371 |
2 | 0.7798 | 0.7208 | 0.9136 | 0.8820 | 0.8058 |
3 | 0.8657 | 0.8865 | 0.8388 | 0.8470 | 0.8620 |
4 | 0.8512 | 0.8102 | 0.9174 | 0.9048 | 0.8605 |
5 | 0.7831 | 0.8827 | 0.6529 | 0.7246 | 0.7506 |
Average | 0.8206 ± 0.0348 | 0.8151 ± 0.0635 | 0.8464 ± 0.1010 | 0.8171 ± 0.0731 | 0.8232 ± 0.0416 |
The 5-fold cross-validation results performed by GAPDA (
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8230 | 0.7754 | 0.9095 | 0.8906 | 0.8371 |
2 | 0.7798 | 0.7208 | 0.9136 | 0.8820 | 0.8058 |
3 | 0.8657 | 0.8865 | 0.8388 | 0.8470 | 0.8620 |
4 | 0.8512 | 0.8102 | 0.9174 | 0.9048 | 0.8605 |
5 | 0.7831 | 0.8827 | 0.6529 | 0.7246 | 0.7506 |
Average | 0.8206 ± 0.0348 | 0.8151 ± 0.0635 | 0.8464 ± 0.1010 | 0.8171 ± 0.0731 | 0.8232 ± 0.0416 |
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8230 | 0.7754 | 0.9095 | 0.8906 | 0.8371 |
2 | 0.7798 | 0.7208 | 0.9136 | 0.8820 | 0.8058 |
3 | 0.8657 | 0.8865 | 0.8388 | 0.8470 | 0.8620 |
4 | 0.8512 | 0.8102 | 0.9174 | 0.9048 | 0.8605 |
5 | 0.7831 | 0.8827 | 0.6529 | 0.7246 | 0.7506 |
Average | 0.8206 ± 0.0348 | 0.8151 ± 0.0635 | 0.8464 ± 0.1010 | 0.8171 ± 0.0731 | 0.8232 ± 0.0416 |
The 5-fold cross-validation results performed by GAPDA (
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8807 | 0.9004 | 0.8560 | 0.8628 | 0.8776 |
2 | 0.8395 | 0.8022 | 0.9012 | 0.8873 | 0.8488 |
3 | 0.8368 | 0.8147 | 0.8719 | 0.8622 | 0.8423 |
4 | 0.8533 | 0.8132 | 0.9174 | 0.9053 | 0.8621 |
5 | 0.8182 | 0.7831 | 0.8802 | 0.8632 | 0.8288 |
Average | 0.8457 ± 0.0208 | 0.8227 ± 0.0404 | 0.8853 ± 0.0217 | 0.8754 ± 0.0194 | 0.8519 ± 0.0167 |
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8807 | 0.9004 | 0.8560 | 0.8628 | 0.8776 |
2 | 0.8395 | 0.8022 | 0.9012 | 0.8873 | 0.8488 |
3 | 0.8368 | 0.8147 | 0.8719 | 0.8622 | 0.8423 |
4 | 0.8533 | 0.8132 | 0.9174 | 0.9053 | 0.8621 |
5 | 0.8182 | 0.7831 | 0.8802 | 0.8632 | 0.8288 |
Average | 0.8457 ± 0.0208 | 0.8227 ± 0.0404 | 0.8853 ± 0.0217 | 0.8754 ± 0.0194 | 0.8519 ± 0.0167 |
The 5-fold cross-validation results performed by GAPDA (
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8807 | 0.9004 | 0.8560 | 0.8628 | 0.8776 |
2 | 0.8395 | 0.8022 | 0.9012 | 0.8873 | 0.8488 |
3 | 0.8368 | 0.8147 | 0.8719 | 0.8622 | 0.8423 |
4 | 0.8533 | 0.8132 | 0.9174 | 0.9053 | 0.8621 |
5 | 0.8182 | 0.7831 | 0.8802 | 0.8632 | 0.8288 |
Average | 0.8457 ± 0.0208 | 0.8227 ± 0.0404 | 0.8853 ± 0.0217 | 0.8754 ± 0.0194 | 0.8519 ± 0.0167 |
Testing set . | Accuracy . | Sensitivity . | Precision . | Specificity . | F1-score . |
---|---|---|---|---|---|
1 | 0.8807 | 0.9004 | 0.8560 | 0.8628 | 0.8776 |
2 | 0.8395 | 0.8022 | 0.9012 | 0.8873 | 0.8488 |
3 | 0.8368 | 0.8147 | 0.8719 | 0.8622 | 0.8423 |
4 | 0.8533 | 0.8132 | 0.9174 | 0.9053 | 0.8621 |
5 | 0.8182 | 0.7831 | 0.8802 | 0.8632 | 0.8288 |
Average | 0.8457 ± 0.0208 | 0.8227 ± 0.0404 | 0.8853 ± 0.0217 | 0.8754 ± 0.0194 | 0.8519 ± 0.0167 |
Five-fold cross-validation . | Methods . | GAPDA . | iPiDi-PUL . | iPiDA-sHN . | piRDA . |
---|---|---|---|---|---|
AUC . | 0.9038 . | 0.8541 . | 0.8859 . | - . | |
Independent test set | Methods | GAPDA | iPiDi-PUL | iPiDA-sHN | piRDA |
Acc. | 0.8636 | 0.6818 | – | 0.7727 | |
FDR | 0.1739 | 0.3181 | – | 0.1904 |
Five-fold cross-validation . | Methods . | GAPDA . | iPiDi-PUL . | iPiDA-sHN . | piRDA . |
---|---|---|---|---|---|
AUC . | 0.9038 . | 0.8541 . | 0.8859 . | - . | |
Independent test set | Methods | GAPDA | iPiDi-PUL | iPiDA-sHN | piRDA |
Acc. | 0.8636 | 0.6818 | – | 0.7727 | |
FDR | 0.1739 | 0.3181 | – | 0.1904 |
Five-fold cross-validation . | Methods . | GAPDA . | iPiDi-PUL . | iPiDA-sHN . | piRDA . |
---|---|---|---|---|---|
AUC . | 0.9038 . | 0.8541 . | 0.8859 . | - . | |
Independent test set | Methods | GAPDA | iPiDi-PUL | iPiDA-sHN | piRDA |
Acc. | 0.8636 | 0.6818 | – | 0.7727 | |
FDR | 0.1739 | 0.3181 | – | 0.1904 |
Five-fold cross-validation . | Methods . | GAPDA . | iPiDi-PUL . | iPiDA-sHN . | piRDA . |
---|---|---|---|---|---|
AUC . | 0.9038 . | 0.8541 . | 0.8859 . | - . | |
Independent test set | Methods | GAPDA | iPiDi-PUL | iPiDA-sHN | piRDA |
Acc. | 0.8636 | 0.6818 | – | 0.7727 | |
FDR | 0.1739 | 0.3181 | – | 0.1904 |

(A) ROC curves performed by GAPDA (
Comparison with other existing methods
Relevant computational models have been proposed, and we choose three of them using attribute information and network information as features to compare with GAPDA [52, 53]. From the results, the proposed method outperforms the existing method in the five-fold crossover experiment (Table 6). To test the robustness of the model, we add an equal number of negative samples randomly selected from unlabeled association pairs to the independent test set and calculate the false discovery rate (FDR). The proposed method outperforms the existing methods in terms of pattern recognition due to the similarity of the original information used by the three models. We believe that line graphs have a positive impact on the predictive performance of the models, which is consistent with the conclusions of previous work [35]. In addition, we compare the performance of existing methods on independent test sets. Because piRDA [54] does not provide code and uses the ten-fold cross-validation evaluation model. Therefore, we only do performance comparisons in independent test sets. We chose an optimal set of parameters to obtain the prediction results of the piRDA online model on the independent test set. As shown in Table 6, the proposed method significantly outperforms iPiDi-PUL and piRDA in the independent test set, which indicates that GAPDA performs better than the existing methods in ‘real’ case. Since no code is available for iPiDA-sHN, only data from the five-fold cross-validation experiments are available. Therefore, the experimental results of iPiDA-sHN are not calculated in this experiment.
Conclusion
Since the network of interactions between molecules in the real world is enormously intricate and noisy, how to mine graphs efficiently has become a research hotspot. In this study, we propose a piRNA-disease association prediction framework based on the line graph attention network to capture graph features and calculate the hidden representations of associations in the network based on neighbor nodes. Supported by the line graph, GAPDA shows encouraging results in predicting piRNA-disease associations. In detail, in the 5-fold cross-validation, GAPDA gets an AUC of 0.9038, AUPR of 0.8774 and accuracy of 0.8569. In addition, we compare two traditional methods and different strategies to generate abstract network topologies. Experiments show that GAPDA can be an excellent complement to future biomedical research and has determined the prospect of the graph neural grid on such problems. We hope that the proposed method can provide a powerful candidate for piRNA biomarkers and can be extended to other graph-based tasks. However, GAPDA still has limitations. First, the size of the transformed line graph is increased compared with the original network, which makes training time-consuming. In addition, the model is not applicable to all diseases and piRNAs. It can only predict associations between piRNAs with RNA sequences and diseases with MeSH IDs and requires data collection for new piRNAs and diseases. In future work, we will focus more on improving the applicability of the model as well as its computational efficiency.
A new graph neural network framework, line graph attention networks (LGAT), with association as the node, is proposed that can extend many heterogeneous networks to replace dichotomous networks.
Applying LGAT to piRNA-disease association prediction, a new prediction model GAPDA is proposed. This GAT-based approach brings together the advantages of representational learning and network-based approaches.
Different from traditional collaborative filtering and attribute-based methods, the proposed method integrates disease semantic information and piRNA sequence information, which improves prediction accuracy and has higher coverage.
Data availability
The GAPDA prediction code, together with example datasets, input data files, is available at [https://github.com/kaizheng-academic/GAPDA/tree/main]. Further codes written for and used in this study are available from the corresponding author upon reasonable request.
Acknowledgements
The authors would like to thank all anonymous reviewers for their constructive advice.
Funding
Science and Technology Innovation 2030-‘Brain Science and Brain-like Research’ Major Project (Grant 2021ZD0200403); National Natural Science Foundation of China (Grants 62172355, 61702444); Qingtan scholar talent project of Zaozhuang University; Fundamental Research Funds for the Central Universities of Central South University (2021zzts0206).
Author Biographies
Kai Zheng is a PhD student in the Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Hunan, China. Her current research interests include pattern recognition, intelligent information processing and its applications in bioinformatics.
Xin-Lu Zhang an engineer in the 36th Research Institute of China Electronics Technology Group Corporation, received his PhD in computer application technology from Chinese Academy of Sciences in 2021. His research interests include pattern recognition, machine learning and neural machine translation.
Lei Wang is a professor at the Guangxi Academy of Science. His research interests include data mining, machine learning, deep learning, computational biology and bioinformatics.
Zhu-Hong You is a professor at the Guangxi Academy of Science. His research interests include neural networks, intelligent information processing, sparse representation and its applications in bioinformatics.
Zhao-Hui Zhan is a PhD candidate at the City University of Hong Kong. Her research interests include machine learning and pattern recognition.
Yao-Yuan Li is a PhD student in the Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Hunan, China. His current research interests include bioinformatics, machine learning and heterogeneous network.
References
Velickovic P, Cucurull G, Casanova A, et al.