Line graph attention networks for predicting disease-associated Piwi-interacting RNAs

Zheng, Kai; Zhang, Xin-Lu; Wang, Lei; You, Zhu-Hong; Zhan, Zhao-Hui; Li, Hao-Yuan

doi:10.1093/bib/bbac393

Abstract

PIWI proteins and Piwi-Interacting RNAs (piRNAs) are commonly detected in human cancers, especially in germline and somatic tissues, and correlate with poorer clinical outcomes, suggesting that they play a functional role in cancer. As the problem of combinatorial explosions between ncRNA and disease exposes gradually, new bioinformatics methods for large-scale identification and prioritization of potential associations are therefore of interest. However, in the real world, the network of interactions between molecules is enormously intricate and noisy, which poses a problem for efficient graph mining. Line graphs can extend many heterogeneous networks to replace dichotomous networks. In this study, we present a new graph neural network framework, line graph attention networks (LGAT). And we apply it to predict PiRNA disease association (GAPDA). In the experiment, GAPDA performs excellently in 5-fold cross-validation with an AUC of 0.9038. Not only that, it still has superior performance compared with methods based on collaborative filtering and attribute features. The experimental results show that GAPDA ensures the prospect of the graph neural network on such problems and can be an excellent supplement for future biomedical research.

PIWI-interacting RNA, disease, piRNA-disease association, line graph attention network, self-attention mechanism

Issue Section:

Problem solving protocol

Introduction

Piwi-interacting RNA (piRNA) is typically 24–32 nucleotides in length, which is a small, non-coding RNA that clusters at transposon loci in the genome [1–5]. PiRNA interacts exclusively with PIWI proteins which belong to germline-specific subclade of the Argonaute family [6–10]. Because of piRNA, the depletion of PIWI results in a sharp increase in transposon messenger RNA expression. Thus, its most famous role of is to repress transposons and maintain germline genome integrity through DNA methylation [11, 12]. Compared with microRNA (miRNA) and small interfering RNA (siRNA) that are small RNAs, piRNA has the following characteristics: (i) longer than miRNA or siRNA; (ii) only exists in animals; (iii) more diverse sequences and constitutes the largest class of noncoding RNA and (iv) testes-specific [6, 12–15].

Recently, emerging evidence suggested that piRNA and PIWI proteins are abnormally expressed in various cancers [16–21]. Therefore, the function and potential mechanism of piRNA in cancer become one of the important research directions in tumor diagnosis and treatment. For example, Fu et al. found that abnormal expression of piR-021285 promoted methylation of ARHGAP11A at the 5’-UTR/first exon CpG site, thereby promoting mRNA apoptosis and inhibiting apoptosis of breast cancer cells in 2015 [22]. Subsequently, Tan et al. [23] found that downregulation of piRNA-36,712 in breast cancer increases SLUG levels, while P21 and E-cadherin levels were reduced, thereby promoting the malignant phenotype of cancer. In addition, piR-30188 binds to OIP5-AS1 to inhibit glioma cell progression while low expression of OIP5-AS1 reduces CEBPA levels and promotes the malignant phenotype of glioma cells which was discovered by Liu et al. in 2017 [24]. Also glioblastoma, Jacobs et al. [25] found that piR-8041 can inhibit the expression of the tumor cell marker ALCAM/CD166, with the clinical role of targeted therapy. In addition, piRNA is directly or indirectly involved in the formation of liver cancer. In 2016, Rizzo et al. [26] found that hsa_piR_013306 accumulates only in hepatocellular carcinomas.

piRNA is gaining enormous attention, and tens of thousands of them have been identified in mammals and are rapidly accumulating. To accelerate research in this field and provide access to piRNA data and annotations, multiple databases such as piRNABank [27], piRBase [28] and piRNAQuest [29] have been successively established. Subsequently, the role of piRNA and PIWI proteins in the epigenetics of cancer is constantly being discovered, and some of them can serve as novel biomarkers and therapeutic targets. Taking this as an opportunity, an experimentally supported piRNA-disease correlation database called piRDisease [30] was proposed, which made it possible to predict potential associations on a large scale. Although many disease-related ncRNA prediction models have been proposed and gradually developed, predictors for disease-related piRNA are relatively unexplored [31–34].

In this paper, we propose a piRNA-disease association predictor based on line graph attention network, called GAPDA. This study has two main contributions: (i) a new graph neural network framework, line graph attention network, is proposed that can extend many heterogeneous networks to replace dichotomous networks. (ii) Different from traditional collaborative filtering and attribute-based methods, the proposed method integrates disease semantic information and piRNA sequence information, which improves prediction accuracy and has higher coverage. On the association dataset piRDisease, GAPDA achieves an the area under the receiver operating charac- teristic (ROC) curve (AUC) of 0.9038 with an accuracy of 85.69%. Overall, this method can provide new ideas for cancer mechanism research. In the meantime, it has a good application prospect for attention-based Graph Neural Networks on this kind of problem. Moreover, it is hoped that the work of this paper can promote more research on association prediction based on graph neural network. Our source code and data can be downloaded on github (https://github.com/kaizheng-academic/GAPDA/tree/main).

Methods

Dataset

With the gradual emergence of the role of piRNA in disease diagnosis and prognosis, piRNAs have become a research hotspot. How to effectively use the data obtained from piRNA-related experiments is an urgent problem to be solved. The piRDisease database [30], proposed by Azhar et al. in 2019, collects experimentally supported piRNA disease associations from over 2500 articles. After removing some piRNAs that could not be verified on piRbase, the positive sample set

T^{p}

consisting of 1212 associations was obtained (contains 501 piRNAs and 22 diseases). We refer to the processed dataset as GPRD. Therefore, the training dataset T can be defined as follows:

T = T^{p} \cup T^{n},

(1)

where

T^{n}

is a negative subset containing 1212 associations which are randomly extracted from all 9810 unconfirmed associations between piRNA and disease. Known associations represent 10.9% of all associations.

The construction of new piRNA-disease association networks

The construction of the line graph

So far, experimentally validated piRNA-disease associations are limited and the large number of piRNAs results in a sparse piRNA-disease association network. Due to the complexity of biological data, the network representation computed by the sparse network does not cover all behavior information in the real world. Therefore, we try to construct the network by considering the association as nodes, thus achieving the goal of enriching the hidden representations contained in the sparse network of network information [35]. The transformation process of the line graph is shown in Figure 3A. Specifically, we transfer the edges in the original figure into points in the line graph. In this study, the nodes of the original graph are piRNAs and diseases. The edge is a piRNA-disease association, and there is an edge between two nodes if a piRNA is associated with a disease. Nodes in the line graph are piRNA-disease associations. The edge is the relationship between associations. If two associations have the same piRNA or the same disease, there is an edge between the two nodes. This transforms the problem under study from link prediction to node prediction. In detail, on the ground of n associations, the new association adjacency matrix is calculated as below

A = [\begin{array}{ccc} a_{1, 1} & \dots & a_{1, n} \\ ⋮ & ⋱ & ⋮ \\ a_{n, 1} & \dots & a_{n, n} \end{array}],

(2)

where

n

stands for the number of associations. The element

a_{i, j}

is set to 1 if node

i

(the

i

th association) and node

j

(the

j

th association) in the network are related, otherwise it is set to 0. In particular, the relationships between associations are various. For example, associations of the same piRNA can form a bipartite network; associations of the same disease can form a bipartite network; associations of similar piRNAs can form a weighted network; associations of similar diseases can form a weighted network. In this paper, we utilize piRNAs and diseases as link vectors, respectively, and define them as follows:

α_{i, j}^{R} = {\begin{cases} 1 & i f a s s o c i a t i o n (i) . p i R N A = a s s o c i a t i o n (j) . p i R N A \\ 0 & o t h e r w i s e \end{cases}

(3)

α_{i, j}^{D} = {\begin{cases} 1 & i f a s s o c i a t i o n (i) . d i s e a s e = a s s o c i a t i o n (j) . d i s e a s e \\ 0 & o t h e r w i s e \end{cases} .

(4)

It is worth noting that both the original graph and the line graph are undirected graphs with edge weights of 0 or 1.

A^{R} .

denotes the line graph with links between piRNA-identical associations, where the element is

α_{i, j}^{R}

⁠. Similarly, the line graph with links between disease-identical associations and the line graph with links between piRNA or disease-identical associations are denoted as

A^{D}

and

A^{R D}

⁠.

The attribute features of nodes

The attribute features are obtained by fusing piRNA sequence features and disease semantic features. In this study, sequence decisions carrying genetic information were selected as the data source for describing piRNAs. And the final descriptor of piRNA is calculated by k-mer algorithm [36]. Therefore, the piRNA attribute feature descriptor

F e a t u r e (p_{a})

can be defined as follows:

F_{p s} (p_{a}) [i] = \frac{i t h k m e r c o u n t i n p_{a}}{l e n g t h (p_{a}) - k + 1},

(5)

where

F_{p s} (p_{a}) [i]

is the probability that the

i

th k-mer appears in the sequence.

p_{a}

is the

a

th piRNA.

i

is the

i

th k-mer.

k m e r c o u n t

is the number of times such k-mer appears in the sequence.

l e n g t h (p_{a})

is the sequence length of piRNA

p_{a}

⁠. The process is shown in Figure 1.

Figure 1

The flowchart for calculating piRNA sequence features.

Open in new tab Download slide

Describing disease attribute features is a vital and difficult task. Until now, approaches for building directed acyclic graphs (DAG) guided by Medical Subject Headings (MeSH) have been widely employed to quantify the link between diseases [37, 38]. MeSH is the authoritative standard vocabulary produced by the National Medical Library. Because of its strict classification of diseases, it can deconstruct the semantic relationship of diseases. Taking Lip Neoplasms (LN) as an example, its MeSH ID is ‘C04.588.443.591.550; C07.465.409.640; C07.465.565.550’, and the corresponding parent nodes are Mouth Neoplasms and Lip Disease whose MeSH IDs are ‘C04.588.443.591; C07.465.565.550’ and ‘C07.465.409.640’, as shown in Figure 2. Similarly, Mouth Neoplasms and Lip Disease also have their parent nodes, Mouth Disease and Head and Neck Neoplasms. According to the aforementioned analysis, Lip Neoplasms and other related diseases can be expressed as

{D A G}_{L N} = (L N, T_{L N}, E_{L N})

⁠, where

T_{L N}

is a collection of nodes in

{D A G}_{L N}

that contain LN, such as ‘Head and Neck Neoplams’ and ‘Mouth Disease’. Furthermore,

E_{L N}

is a collection of edges between different nodes, such as the edge between ‘Stomatognathic Disease’ and ‘Mouth Disease’. Based on former research production [34], the semantic contribution

C

of disease

w

to disease

d

is calculated as follows:

{\begin{cases} C_{d} (w) = 1 i f w = d \\ C_{d} (w) = max (\nabla * C_{d} (w^{'}) | w^{'} \in c h i l d o f w) i f w \neq d \end{cases},

(6)

where the semantic decay factor

\nabla

is set at 0.5 in this case.

w

is the parent node of

w^{'}

⁠. The greater contribution of disease

w

to disease

d

indicates the tighter distance in the DAG between disease

w

to disease

d

⁠. According to the above formula, the semantic value

V

of the disease

d

can be described as follows:

V (d) = \sum_{w \in T_{d}} C_{d} (w) .

(7)

Figure 2

The directed acyclic graphs (DAG) of Lip Neoplasms.

Open in new tab Download slide

Based on the hypothesis that two diseases are more related if they share more common parent nodes, the semantic similarity score

S S

of disease

a

and disease

b

can be defined as follows:

S S (a, b) = \frac{\sum_{w \in N_{a} \cap N_{b}} (C_{a} (w) + C_{b} (w))}{V (a) + V (b)} .

(8)

Although the above calculation method quantifies the correlation to some extent, its performance is limited. If there exists a disease that shares parent nodes with all diseases, intuitively this disease has no specificity for other diseases and the weight should below. Therefore, the semantic similarity

{S S}^{'}

⁠, which introduces specificity, is defined as follows:

C_{d}^{'} (w) = - \log (\frac{n u m (D A G s i n c l u d i n g w)}{n u m (d i s e a s e)})

(9)

S S^{'} (a, b) = \frac{\sum_{w \in N_{a} \cap N_{b}} (C_{a}^{'} (w) + C_{b}^{'} (w))}{V (a) + V (b)} .

(10)

Although

{S S}^{'}

takes specificity into account, it comes at the expense of certain data. Therefore, the comprehensive similarity

S

is calculated to take the advantage of the two methods

S (a, b) = max (S S (a, b), S S^{'} (a, b)) .

(11)

The degree to which disease

d_{b} .

is related to other diseases is used here as the disease description

F e a t u r e (d_{b})

⁠.

The network features of nodes

The Gaussian interaction profile kernel similarity (GIP) is a collaborative filtering algorithm that is widely used for association prediction [39–45]. The similarity between

p_{a}

and

p_{b}

based on the interaction information can be calculated as follows:

F_{p g} (p_{a}, p_{b}) = \exp (- ψ_{p} {| | V (p_{a}) - V (p_{b}) | |}^{2}),

(12)

where

V (p_{a})

is the

a

th row of the association matrix

A

⁠, which is the interaction of piRNA

p_{a}

with all diseases and

ψ_{p}

is the kernel width coefficient, which can be defined as follows:

ψ_{p} = \frac{1}{\frac{1}{n u m_{p}} \sum_{k = 1}^{n u m_{p}} {| | V (p_{k}) | |}^{2}},

(13)

where

{n u m}_{p}

denotes the total quantity of piRNAs. The similarity between diseases can also be calculated by the GIP algorithm in the same way

F_{d g} (d_{a}, d_{b}) = \exp (- ψ_{d} {| | V (d_{a}) - V (d_{b}) | |}^{2})

(14)

ψ_{d} = \frac{1}{\frac{1}{n u m_{d}} \sum_{k = 1}^{n u m_{d}} {| | V (p_{k}) | |}^{2}},

(15)

where

{n u m}_{d}

denotes the total quantity of disease. The Gaussian interaction profile kernel similarity of disease

d_{b}

to other diseases is used as the disease GIP description

{F e a t u r e}^{'} (d_{b})

[46–50]. The Gaussian interaction profile kernel similarity of piRNA

p_{a}

to other piRNA is used as the piRNA GIP description

{F e a t u r e}^{'} (p_{a})

⁠.

Graph attention layer

The attention mechanism is widely used in many sequence-based tasks. In contrast to Graph Convolutional Network (GCN), which treats all the node’s neighbors equally, Graph Attention Network (GAT) incorporates the self-attention mechanism into the propagation process, in which each node’s hidden state is calculated by itself and its neighbors [51]. The GAT network is composed of basic graph attention layers, and the following formula is used to determine the attention coefficients:

a_{i, j} = \frac{\exp (l e a k y R e L U (a^{T} [W h_{i} ‖ W h_{j}]))}{\sum_{t \in N_{i}} \exp (l e a k y R e L U (a^{T} [W h_{i} ‖ W h_{t}]))},

(16)

where

a_{i, j}

is the attention factor of node

i

to

j

⁠, and

N_{i}

denotes the neighboring nodes of node

i

⁠. The input feature of node is

h = [h_{1}, h_{2}, \dots, h_{N N}]

and

h_{i} \in R^{1 \times N F}

⁠, where

N N

denotes the number of nodes and

N F

denotes the feature dimension, respectively. The output feature of the node is

h^{'} = [h'_{1}, h'_{2}, \dots, h'_{N N}]

and

h'_{i} \in R^{1 \times {N F}^{'}}

⁠.

W \in R^{N F^{'} \times N F}

is the linear transformation weight matrix applied on each node, and

a \in R^{2 \times {N F}^{'}}

is the weight vector. Finally,

s o f t m a x

is used for normalization and

L e a k y R e L U

is introduced for nonlinear transformation.

Line graph attention networks

In this paper, we propose a line graph attention networks, called GAPDA, to predict biologically significant but unmapped piRNA-disease associations. The main idea of GAPDA is to aggregate the properties of neighboring nodes using graph attention networks to compute the hidden state of each association. As seen in Figure 3B and C, the proposed computational model’s prediction process is separated into three major steps: preprocessing data, training the model and scoring the potential associations.

Figure 3

The flowchart of GAPDA for predicting piRNA-disease association.

Open in new tab Download slide

First, the preprocessing data include the new association network, the piRNA sequence characteristics

F_{p s}

⁠, the piRNA’s Gaussian interaction profile kernel similarity

F_{p g}

⁠, the disease semantic features

F_{d s}

and the drug Gaussian interaction profile kernel similarity

F_{d g}

⁠. As a result, we develop the following final descriptors defining the piRNA-disease association:

F (p_{a}, d_{b}) = [F_{p s} (p_{a}) ‖ F_{p g} (p_{a}) ‖ F_{d s} (d_{b}) ‖ F_{d g} (D_{b})],

(17)

where

‖

represents vector concatenation.

Then, the model is trained with the calculated final descriptors. The details are as follows: (i) The node (association) feature F is linearly mapped by the shared weight matrix W to obtain the enhanced high-dimensional feature; (ii) concatenate the high-dimensional feature of its own and the neighbor node (association); (iii) use the single-layer feedforward neural network to map the spliced high-dimensional features to a real number to quantify the correlation between associations; (iv) use softmax to calculate the attention coefficient; (v) fuse the features of the association and the neighboring associations into a new association representation based on the calculated attention coefficients. In particular, the number of nodes mapping the feature $F$ to an enhanced feature is 2424; (vi) the new association representation is input into the fully connected layer.

Finally, the data required for the potential association is fed into the trained model to obtain the prediction score.

The hyperparameter settings in this study are as follows: the output size of the first Graph Attention layer is 8; the number of attention heads in the first GAT layer is 8; the dropout rate is 0.02; the factor for L2 regularization is 1.25e−4 and the learning rate is 5e−2. In addition, the visualization of the neural network architecture can be seen in Figure 3B. It is worth noting that the line graphs that need to be constructed are different due to the different predicted associations. Therefore, predicting specific potential associations requires retraining the model (~5800 parameters, 100 epochs, ~20 min).

Experimental results

The performance of GAPDA on the benchmark dataset

In this part, we choose $α_{i, j}^{R}$ as an element for abstract network topology. To evaluate the performance of the proposed method, it is applied to the benchmark database GPRD. The GAPDA achieves an average AUC of 0.9038 (Figure 4). In detail, the AUCs of GAPDA are 0.9115, 0.8943, 0.9109, 0.9167 and 0.8859. In addition, Table 1 lists the results of the detailed evaluation criteria: with the average accuracy (Acc.) of 0.8569; the precision (Pre.) is 0.8550; the Recall (Rec.) is 0.8638; and the F1-score is 0.8577. Their standard deviations are 0.92, 3.56, 4.16 and 0.92%, respectively. From the results, the lowest accuracy in the five experiments reach 0.8395, and the highest accuracy reaches 0.8642. In addition, we calculated the specificity, which was 0.8933, 0.8873, 0.8548, 0.8139 and 0.8723, with a mean value of 0.8619. Meanwhile, this experiment relies on the network structure to make predictions, and the prediction results obtained by different attribute networks have errors. Overall, our approach yields convincing results, suggesting that GAPDA can provide powerful candidates for piRNA as a biomarker and has the potential to drive disease diagnosis and to identify disease mechanisms.

Figure 4

(A) ROC curves performed by GAPDA on benchmark dataset. (B) PR curves performed by GAPDA on benchmark dataset.

Open in new tab Download slide

Table 1

Open in new tab

The 5-fold cross-validation results performed by GAPDA on benchmark dataset

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8642	0.8391	0.9012	0.8933	0.8690
2	0.8395	0.8022	0.9012	0.8873	0.8488
3	0.8636	0.8729	0.8512	0.8548	0.8619
4	0.8554	0.9095	0.7893	0.8139	0.8451
5	0.8616	0.8514	0.8760	0.8723	0.8635
Average	0.8569 ± 0.0092	0.8550 ± 0.0356	0.8638 ± 0.0416	0.8619 ± 0.0318	0.8577 ± 0.0091

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8642	0.8391	0.9012	0.8933	0.8690
2	0.8395	0.8022	0.9012	0.8873	0.8488
3	0.8636	0.8729	0.8512	0.8548	0.8619
4	0.8554	0.9095	0.7893	0.8139	0.8451
5	0.8616	0.8514	0.8760	0.8723	0.8635
Average	0.8569 ± 0.0092	0.8550 ± 0.0356	0.8638 ± 0.0416	0.8619 ± 0.0318	0.8577 ± 0.0091

Table 1

Open in new tab

The 5-fold cross-validation results performed by GAPDA on benchmark dataset

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8642	0.8391	0.9012	0.8933	0.8690
2	0.8395	0.8022	0.9012	0.8873	0.8488
3	0.8636	0.8729	0.8512	0.8548	0.8619
4	0.8554	0.9095	0.7893	0.8139	0.8451
5	0.8616	0.8514	0.8760	0.8723	0.8635
Average	0.8569 ± 0.0092	0.8550 ± 0.0356	0.8638 ± 0.0416	0.8619 ± 0.0318	0.8577 ± 0.0091

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8642	0.8391	0.9012	0.8933	0.8690
2	0.8395	0.8022	0.9012	0.8873	0.8488
3	0.8636	0.8729	0.8512	0.8548	0.8619
4	0.8554	0.9095	0.7893	0.8139	0.8451
5	0.8616	0.8514	0.8760	0.8723	0.8635
Average	0.8569 ± 0.0092	0.8550 ± 0.0356	0.8638 ± 0.0416	0.8619 ± 0.0318	0.8577 ± 0.0091

In addition, we collected additional test sets independent of the cross-validation process to demonstrate the applicability and performance of their models in ‘real’ case use. We manually collected 22 experimentally validated piRNA-disease associations as an independent test set. Specifically, we used all data from the cross-validation process as the training set and tested the trained models on the independent test set. The proposed model obtained an accuracy of 86.36% on the independent test set (Table 2). The results indicate that GAPDA provided with actual application value.

Table 2

Open in new tab

The performance of GAPDA on the independent test set

No.	piRNA	Disease	References	Predictions
1	DQ577854	Breast cancer	27,177,224	$\sqrt$
2	DQ588919	Dysplastic liver nodules and hepatocellular carcinoma	27,429,044	$\sqrt$
3	DQ580140	Gastric cancer	25,779,424	$\sqrt$
4	DQ571174	Gastric cancer	25,779,424	$\sqrt$
5	DQ585247	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
6	DQ590749	Gastric cancer	25,779,424	$\sqrt$
7	DQ571873	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
8	DQ591522	Gastric cancer	25,779,424	$\times$
9	DQ593431	Alzheimer disease	28,127,595	$\times$
10	DQ595023	Alzheimer disease	28,127,595	$\sqrt$
11	DQ596276	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
12	DQ596465	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
13	DQ596465	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
14	DQ597397	Renal cell carcinoma	25,998,508	$\sqrt$
15	DQ597945	Gastric cancer	25,779,424	$\sqrt$
16	DQ597997	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
17	DQ600116	Gastric cancer	25,779,424	$\sqrt$
18	DQ600269	Gastric cancer	25,779,424	$\sqrt$
19	DQ600689	Gastric cancer	25,779,424	$\sqrt$
20	DQ574391	Gastric cancer	25,779,424	$\sqrt$
21	DQ576200	Alzheimer disease	28,127,595	$\times$
22	DQ570485	Gastric cancer	25,779,424	$\sqrt$

No.	piRNA	Disease	References	Predictions
1	DQ577854	Breast cancer	27,177,224	$\sqrt$
2	DQ588919	Dysplastic liver nodules and hepatocellular carcinoma	27,429,044	$\sqrt$
3	DQ580140	Gastric cancer	25,779,424	$\sqrt$
4	DQ571174	Gastric cancer	25,779,424	$\sqrt$
5	DQ585247	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
6	DQ590749	Gastric cancer	25,779,424	$\sqrt$
7	DQ571873	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
8	DQ591522	Gastric cancer	25,779,424	$\times$
9	DQ593431	Alzheimer disease	28,127,595	$\times$
10	DQ595023	Alzheimer disease	28,127,595	$\sqrt$
11	DQ596276	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
12	DQ596465	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
13	DQ596465	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
14	DQ597397	Renal cell carcinoma	25,998,508	$\sqrt$
15	DQ597945	Gastric cancer	25,779,424	$\sqrt$
16	DQ597997	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
17	DQ600116	Gastric cancer	25,779,424	$\sqrt$
18	DQ600269	Gastric cancer	25,779,424	$\sqrt$
19	DQ600689	Gastric cancer	25,779,424	$\sqrt$
20	DQ574391	Gastric cancer	25,779,424	$\sqrt$
21	DQ576200	Alzheimer disease	28,127,595	$\times$
22	DQ570485	Gastric cancer	25,779,424	$\sqrt$

Table 2

Open in new tab

The performance of GAPDA on the independent test set

No.	piRNA	Disease	References	Predictions
1	DQ577854	Breast cancer	27,177,224	$\sqrt$
2	DQ588919	Dysplastic liver nodules and hepatocellular carcinoma	27,429,044	$\sqrt$
3	DQ580140	Gastric cancer	25,779,424	$\sqrt$
4	DQ571174	Gastric cancer	25,779,424	$\sqrt$
5	DQ585247	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
6	DQ590749	Gastric cancer	25,779,424	$\sqrt$
7	DQ571873	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
8	DQ591522	Gastric cancer	25,779,424	$\times$
9	DQ593431	Alzheimer disease	28,127,595	$\times$
10	DQ595023	Alzheimer disease	28,127,595	$\sqrt$
11	DQ596276	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
12	DQ596465	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
13	DQ596465	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
14	DQ597397	Renal cell carcinoma	25,998,508	$\sqrt$
15	DQ597945	Gastric cancer	25,779,424	$\sqrt$
16	DQ597997	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
17	DQ600116	Gastric cancer	25,779,424	$\sqrt$
18	DQ600269	Gastric cancer	25,779,424	$\sqrt$
19	DQ600689	Gastric cancer	25,779,424	$\sqrt$
20	DQ574391	Gastric cancer	25,779,424	$\sqrt$
21	DQ576200	Alzheimer disease	28,127,595	$\times$
22	DQ570485	Gastric cancer	25,779,424	$\sqrt$

No.	piRNA	Disease	References	Predictions
1	DQ577854	Breast cancer	27,177,224	$\sqrt$
2	DQ588919	Dysplastic liver nodules and hepatocellular carcinoma	27,429,044	$\sqrt$
3	DQ580140	Gastric cancer	25,779,424	$\sqrt$
4	DQ571174	Gastric cancer	25,779,424	$\sqrt$
5	DQ585247	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
6	DQ590749	Gastric cancer	25,779,424	$\sqrt$
7	DQ571873	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
8	DQ591522	Gastric cancer	25,779,424	$\times$
9	DQ593431	Alzheimer disease	28,127,595	$\times$
10	DQ595023	Alzheimer disease	28,127,595	$\sqrt$
11	DQ596276	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
12	DQ596465	Cardiovascular diseases (CDC,CF,CCS) cardiac regeneration	28,289,238	$\sqrt$
13	DQ596465	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
14	DQ597397	Renal cell carcinoma	25,998,508	$\sqrt$
15	DQ597945	Gastric cancer	25,779,424	$\sqrt$
16	DQ597997	Dysplastic liver nodules and hepatocellular carcinoma	29,789,629	$\sqrt$
17	DQ600116	Gastric cancer	25,779,424	$\sqrt$
18	DQ600269	Gastric cancer	25,779,424	$\sqrt$
19	DQ600689	Gastric cancer	25,779,424	$\sqrt$
20	DQ574391	Gastric cancer	25,779,424	$\sqrt$
21	DQ576200	Alzheimer disease	28,127,595	$\times$
22	DQ570485	Gastric cancer	25,779,424	$\sqrt$

Ablation experiment

To better evaluate the performance of the proposed method, we compare it with two methods that only use attribute information or network information. The results are shown in Table 3. The evaluation indicators of GAPDA are higher than the other two traditional methods, especially the accuracy. Therefore, the attention-based approach has better performance than traditional attribute-based and collaborative filtering-based approaches. In addition, other evaluation parameters are higher than the average performance. There are many reasons for the superior performance of GAPDA. First, the two traditional methods only consider attribute information or network information, and do not combine the two sources of heterogeneous knowledge. However, the proposed method combines four kinds of information into an attribute network, which can well quantify the characteristics of the association. Second, the introduction of attention mechanisms allows the hidden representation of nodes to be computed through neighbor behavior. This operation can effectively improve the performance of the model. Third, the new abstract network topology we built also helps improve performance. In the real world, networks are often heterogeneous. This method abstracts existing networks into adjacency matrix with uniform size, which is conducive to the fusion between heterogeneous networks.

Table 3

Open in new tab

Comparison of different types of prediction method on benchmark dataset

Method	AUC	AUPR	Accuracy	Precision	Specificity	Recall	F1-score
Att-based	0.8725	0.8465	0.8200	0.8247	0.8230	0.8143	0.8189
CF-based	0.9032	0.8822	0.8280	0.8329	0.8312	0.8260	0.8272
GAPDA	0.9038	0.8944	0.8569	0.8550	0.8555	0.8638	0.8577

Method	AUC	AUPR	Accuracy	Precision	Specificity	Recall	F1-score
Att-based	0.8725	0.8465	0.8200	0.8247	0.8230	0.8143	0.8189
CF-based	0.9032	0.8822	0.8280	0.8329	0.8312	0.8260	0.8272
GAPDA	0.9038	0.8944	0.8569	0.8550	0.8555	0.8638	0.8577

Table 3

Open in new tab

Comparison of different types of prediction method on benchmark dataset

Method	AUC	AUPR	Accuracy	Precision	Specificity	Recall	F1-score
Att-based	0.8725	0.8465	0.8200	0.8247	0.8230	0.8143	0.8189
CF-based	0.9032	0.8822	0.8280	0.8329	0.8312	0.8260	0.8272
GAPDA	0.9038	0.8944	0.8569	0.8550	0.8555	0.8638	0.8577

Method	AUC	AUPR	Accuracy	Precision	Specificity	Recall	F1-score
Att-based	0.8725	0.8465	0.8200	0.8247	0.8230	0.8143	0.8189
CF-based	0.9032	0.8822	0.8280	0.8329	0.8312	0.8260	0.8272
GAPDA	0.9038	0.8944	0.8569	0.8550	0.8555	0.8638	0.8577

The sensitivity analysis of line graphs

In Section The construction of the line graph, an abstract network topology method of reconstructing the associated network is proposed and we design three strategies to generate an abstract network topology. In Section The performance of GAPDA on the benchmark dataset, the results of

α_{i, j}^{R}

have been described. Therefore, in this section, we evaluate the other two strategies to evaluate the performance of the abstract network topology approach. For comparison, we integrate the heterogeneous network by fusing the adjacency matrix

A^{R}

and the adjacency matrix

A^{D}

into

A^{R D}

⁠. Thus, the adjacency matrix

α_{i, j}^{R D}

is represented as follows:

α_{i, j}^{R D} = {\begin{cases} 1 & i f α_{i, j}^{R} = 1 o r α_{i, j}^{D} = 1 \\ 0 & otherwise \end{cases} .

(17)

As shown in Tables 4 and 5 and Figure 5: (i) based on any abstract network topology, the performance of the proposed method is higher than the average of the traditional methods. This shows that the attribute network constructed with an abstract network topology can combine multiple knowledge sources to restore the true state of the network. This can improve model performance. (ii) Most evaluation criteria of $A^{D}$ and $A^{R D}$ strategies are inferior to $A^{R}$ ⁠, of which $A^{D}$ strategy is the most obvious. The reason is that the elements with value = 1 in the adjacency matrix $A^{D}$ are too dense, which makes its abstract network topology specificity insufficient, and $A^{R D}$ is similar. The above two information shows that different abstract network topologies affect the performance of the model to varying degrees, so giving them different weights can improve the effectiveness.

Table 4

Open in new tab

The 5-fold cross-validation results performed by GAPDA (⁠ $A^{D}$ ⁠) on benchmark dataset

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8230	0.7754	0.9095	0.8906	0.8371
2	0.7798	0.7208	0.9136	0.8820	0.8058
3	0.8657	0.8865	0.8388	0.8470	0.8620
4	0.8512	0.8102	0.9174	0.9048	0.8605
5	0.7831	0.8827	0.6529	0.7246	0.7506
Average	0.8206 ± 0.0348	0.8151 ± 0.0635	0.8464 ± 0.1010	0.8171 ± 0.0731	0.8232 ± 0.0416

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8230	0.7754	0.9095	0.8906	0.8371
2	0.7798	0.7208	0.9136	0.8820	0.8058
3	0.8657	0.8865	0.8388	0.8470	0.8620
4	0.8512	0.8102	0.9174	0.9048	0.8605
5	0.7831	0.8827	0.6529	0.7246	0.7506
Average	0.8206 ± 0.0348	0.8151 ± 0.0635	0.8464 ± 0.1010	0.8171 ± 0.0731	0.8232 ± 0.0416

Table 4

Open in new tab

The 5-fold cross-validation results performed by GAPDA (⁠ $A^{D}$ ⁠) on benchmark dataset

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8230	0.7754	0.9095	0.8906	0.8371
2	0.7798	0.7208	0.9136	0.8820	0.8058
3	0.8657	0.8865	0.8388	0.8470	0.8620
4	0.8512	0.8102	0.9174	0.9048	0.8605
5	0.7831	0.8827	0.6529	0.7246	0.7506
Average	0.8206 ± 0.0348	0.8151 ± 0.0635	0.8464 ± 0.1010	0.8171 ± 0.0731	0.8232 ± 0.0416

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8230	0.7754	0.9095	0.8906	0.8371
2	0.7798	0.7208	0.9136	0.8820	0.8058
3	0.8657	0.8865	0.8388	0.8470	0.8620
4	0.8512	0.8102	0.9174	0.9048	0.8605
5	0.7831	0.8827	0.6529	0.7246	0.7506
Average	0.8206 ± 0.0348	0.8151 ± 0.0635	0.8464 ± 0.1010	0.8171 ± 0.0731	0.8232 ± 0.0416

Table 5

Open in new tab

The 5-fold cross-validation results performed by GAPDA (⁠ $A^{R D}$ ⁠) on benchmark dataset

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8807	0.9004	0.8560	0.8628	0.8776
2	0.8395	0.8022	0.9012	0.8873	0.8488
3	0.8368	0.8147	0.8719	0.8622	0.8423
4	0.8533	0.8132	0.9174	0.9053	0.8621
5	0.8182	0.7831	0.8802	0.8632	0.8288
Average	0.8457 ± 0.0208	0.8227 ± 0.0404	0.8853 ± 0.0217	0.8754 ± 0.0194	0.8519 ± 0.0167

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8807	0.9004	0.8560	0.8628	0.8776
2	0.8395	0.8022	0.9012	0.8873	0.8488
3	0.8368	0.8147	0.8719	0.8622	0.8423
4	0.8533	0.8132	0.9174	0.9053	0.8621
5	0.8182	0.7831	0.8802	0.8632	0.8288
Average	0.8457 ± 0.0208	0.8227 ± 0.0404	0.8853 ± 0.0217	0.8754 ± 0.0194	0.8519 ± 0.0167

Table 5

Open in new tab

The 5-fold cross-validation results performed by GAPDA (⁠ $A^{R D}$ ⁠) on benchmark dataset

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8807	0.9004	0.8560	0.8628	0.8776
2	0.8395	0.8022	0.9012	0.8873	0.8488
3	0.8368	0.8147	0.8719	0.8622	0.8423
4	0.8533	0.8132	0.9174	0.9053	0.8621
5	0.8182	0.7831	0.8802	0.8632	0.8288
Average	0.8457 ± 0.0208	0.8227 ± 0.0404	0.8853 ± 0.0217	0.8754 ± 0.0194	0.8519 ± 0.0167

Testing set	Accuracy	Sensitivity	Precision	Specificity	F1-score
1	0.8807	0.9004	0.8560	0.8628	0.8776
2	0.8395	0.8022	0.9012	0.8873	0.8488
3	0.8368	0.8147	0.8719	0.8622	0.8423
4	0.8533	0.8132	0.9174	0.9053	0.8621
5	0.8182	0.7831	0.8802	0.8632	0.8288
Average	0.8457 ± 0.0208	0.8227 ± 0.0404	0.8853 ± 0.0217	0.8754 ± 0.0194	0.8519 ± 0.0167

Table 6

Open in new tab

Comparison of existing piRNA-disease prediction methods

Five-fold cross-validation	Methods	GAPDA	iPiDi-PUL	iPiDA-sHN	piRDA
Five-fold cross-validation	AUC	0.9038	0.8541	0.8859	-
Independent test set	Methods	GAPDA	iPiDi-PUL	iPiDA-sHN	piRDA
	Acc.	0.8636	0.6818	–	0.7727
	FDR	0.1739	0.3181	–	0.1904

Five-fold cross-validation	Methods	GAPDA	iPiDi-PUL	iPiDA-sHN	piRDA
Five-fold cross-validation	AUC	0.9038	0.8541	0.8859	-
Independent test set	Methods	GAPDA	iPiDi-PUL	iPiDA-sHN	piRDA
	Acc.	0.8636	0.6818	–	0.7727
	FDR	0.1739	0.3181	–	0.1904

Table 6

Open in new tab

Comparison of existing piRNA-disease prediction methods

Five-fold cross-validation	Methods	GAPDA	iPiDi-PUL	iPiDA-sHN	piRDA
Five-fold cross-validation	AUC	0.9038	0.8541	0.8859	-
Independent test set	Methods	GAPDA	iPiDi-PUL	iPiDA-sHN	piRDA
	Acc.	0.8636	0.6818	–	0.7727
	FDR	0.1739	0.3181	–	0.1904

Five-fold cross-validation	Methods	GAPDA	iPiDi-PUL	iPiDA-sHN	piRDA
Five-fold cross-validation	AUC	0.9038	0.8541	0.8859	-
Independent test set	Methods	GAPDA	iPiDi-PUL	iPiDA-sHN	piRDA
	Acc.	0.8636	0.6818	–	0.7727
	FDR	0.1739	0.3181	–	0.1904

$(A) ROC curves performed by GAPDA (${A}^D$) on benchmark dataset. (B) PR curves performed by GAPDA (${A}^D$) on benchmark dataset. (C) ROC curves performed by GAPDA (${A}^{RD}$) on benchmark dataset. (D) PR curves performed by GAPDA (${A}^{RD}$) on benchmark dataset.$

Figure 5

(A) ROC curves performed by GAPDA (⁠ $A^{D}$ ⁠) on benchmark dataset. (B) PR curves performed by GAPDA (⁠ $A^{D}$ ⁠) on benchmark dataset. (C) ROC curves performed by GAPDA (⁠ $A^{R D}$ ⁠) on benchmark dataset. (D) PR curves performed by GAPDA (⁠ $A^{R D}$ ⁠) on benchmark dataset.

Open in new tab Download slide

Comparison with other existing methods

Relevant computational models have been proposed, and we choose three of them using attribute information and network information as features to compare with GAPDA [52, 53]. From the results, the proposed method outperforms the existing method in the five-fold crossover experiment (Table 6). To test the robustness of the model, we add an equal number of negative samples randomly selected from unlabeled association pairs to the independent test set and calculate the false discovery rate (FDR). The proposed method outperforms the existing methods in terms of pattern recognition due to the similarity of the original information used by the three models. We believe that line graphs have a positive impact on the predictive performance of the models, which is consistent with the conclusions of previous work [35]. In addition, we compare the performance of existing methods on independent test sets. Because piRDA [54] does not provide code and uses the ten-fold cross-validation evaluation model. Therefore, we only do performance comparisons in independent test sets. We chose an optimal set of parameters to obtain the prediction results of the piRDA online model on the independent test set. As shown in Table 6, the proposed method significantly outperforms iPiDi-PUL and piRDA in the independent test set, which indicates that GAPDA performs better than the existing methods in ‘real’ case. Since no code is available for iPiDA-sHN, only data from the five-fold cross-validation experiments are available. Therefore, the experimental results of iPiDA-sHN are not calculated in this experiment.

Conclusion

Since the network of interactions between molecules in the real world is enormously intricate and noisy, how to mine graphs efficiently has become a research hotspot. In this study, we propose a piRNA-disease association prediction framework based on the line graph attention network to capture graph features and calculate the hidden representations of associations in the network based on neighbor nodes. Supported by the line graph, GAPDA shows encouraging results in predicting piRNA-disease associations. In detail, in the 5-fold cross-validation, GAPDA gets an AUC of 0.9038, AUPR of 0.8774 and accuracy of 0.8569. In addition, we compare two traditional methods and different strategies to generate abstract network topologies. Experiments show that GAPDA can be an excellent complement to future biomedical research and has determined the prospect of the graph neural grid on such problems. We hope that the proposed method can provide a powerful candidate for piRNA biomarkers and can be extended to other graph-based tasks. However, GAPDA still has limitations. First, the size of the transformed line graph is increased compared with the original network, which makes training time-consuming. In addition, the model is not applicable to all diseases and piRNAs. It can only predict associations between piRNAs with RNA sequences and diseases with MeSH IDs and requires data collection for new piRNAs and diseases. In future work, we will focus more on improving the applicability of the model as well as its computational efficiency.

Key Points

A new graph neural network framework, line graph attention networks (LGAT), with association as the node, is proposed that can extend many heterogeneous networks to replace dichotomous networks.
Applying LGAT to piRNA-disease association prediction, a new prediction model GAPDA is proposed. This GAT-based approach brings together the advantages of representational learning and network-based approaches.
Different from traditional collaborative filtering and attribute-based methods, the proposed method integrates disease semantic information and piRNA sequence information, which improves prediction accuracy and has higher coverage.

Data availability

The GAPDA prediction code, together with example datasets, input data files, is available at [https://github.com/kaizheng-academic/GAPDA/tree/main]. Further codes written for and used in this study are available from the corresponding author upon reasonable request.

Acknowledgements

The authors would like to thank all anonymous reviewers for their constructive advice.

Funding

Science and Technology Innovation 2030-‘Brain Science and Brain-like Research’ Major Project (Grant 2021ZD0200403); National Natural Science Foundation of China (Grants 62172355, 61702444); Qingtan scholar talent project of Zaozhuang University; Fundamental Research Funds for the Central Universities of Central South University (2021zzts0206).

Author Biographies

Kai Zheng is a PhD student in the Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Hunan, China. Her current research interests include pattern recognition, intelligent information processing and its applications in bioinformatics.

Xin-Lu Zhang an engineer in the 36th Research Institute of China Electronics Technology Group Corporation, received his PhD in computer application technology from Chinese Academy of Sciences in 2021. His research interests include pattern recognition, machine learning and neural machine translation.

Lei Wang is a professor at the Guangxi Academy of Science. His research interests include data mining, machine learning, deep learning, computational biology and bioinformatics.

Zhu-Hong You is a professor at the Guangxi Academy of Science. His research interests include neural networks, intelligent information processing, sparse representation and its applications in bioinformatics.

Zhao-Hui Zhan is a PhD candidate at the City University of Hong Kong. Her research interests include machine learning and pattern recognition.

Yao-Yuan Li is a PhD student in the Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Hunan, China. His current research interests include bioinformatics, machine learning and heterogeneous network.

References

1.

Yin

H

,

Lin

H

.

An epigenetic activation role of Piwi and a Piwi-associated piRNA in Drosophila melanogaster

.

Nature

2007

;

450

:

304

–

8

.

2.

Iwasaki

YW

,

Siomi

MC

,

Siomi

H

.

PIWI-interacting RNA: its bio-genesis and functions

.

Annu Rev Biochem

2015

;

84

:

405

–

33

.

3.

Grimson

A

,

Srivastava

M

,

Fahey

B

, et al.

Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals

.

Nature

2008

;

455

:

1193

–

7

.

4.

Aravin

AA

,

Hannon

GJ

,

Brennecke

J

.

The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race

.

Science

2007

;

318

:

761

–

4

.

5.

Malone

CD

,

Brennecke

J

,

Dus

M

, et al.

Specialized piRNA pathways act in germline and somatic tissues of the Drosophila ovary

.

Cell

2009

;

137

:

522

–

35

.

6.

Leslie

M

.

The immune system's compact genomic counterpart

.

Am Assoc Adv Sci

2013

;

339

:

25

–

7

.

7.

Pall

GS

,

Codony-Servat

C

,

Byrne

J

, et al.

Carbodiimide-mediated cross-linking of RNA to nylon membranes improves the detection of siR-NA, miRNA and piRNA by northern blot

.

Nucleic Acids Res

2007

;

35

:

e60

.

8.

Marcon

E

,

Babak

T

,

Chua

G

, et al.

miRNA and piRNA localization in the male mammalian meiotic nucleus

.

Chromosome Res

2008

;

16

:

243

–

60

.

9.

Armisen

J

,

Gilchrist

MJ

,

Wilczynska

A

, et al.

Abundant and dynamically expressed miRNAs, piRNAs, and other small RNAs in the vertebrate Xenopus tropicalis

.

Genome Res

2009

;

19

:

1766

–

75

.

10.

Moyano

M

,

Stefani

G

.

piRNA involvement in genome stability and human cancer

.

J Hematol Oncol

2015

;

8

:

38

.

11.

Brennecke

J

,

Aravin

AA

,

Stark

A

, et al.

Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila

.

Cell

2007

;

128

:

1089

–

103

.

12.

Siomi

MC

,

Sato

K

,

Pezic

D

, et al.

PIWI-interacting small RNAs: the vanguard of genome defence

.

Nat Rev Mol Cell Biol

2011

;

12

:

246

–

58

.

13.

Rajasethupathy

P

,

Antonov

I

,

Sheridan

R

, et al.

A role for neuronal piRNAs in the epigenetic control of memory-related synaptic plasticity

.

Cell

2012

;

149

:

693

–

707

.

14.

Houwing

S

,

Kamminga

LM

,

Berezikov

E

, et al.

A role for Piwi and piRNAs in germ cell maintenance and transposon silencing in Zebrafish

.

Cell

2007

;

129

:

69

–

82

.

15.

Moazed

D

.

Small RNAs in transcriptional gene silencing and genome defence

.

Nature

2009

;

457

:

413

–

20

.

16.

Zou

AE

,

Zheng

H

,

Saad

MA

, et al.

The non-coding landscape of head and neck squamous cell carcinoma

.

Oncotarget

2016

;

7

:

51211

–

22

.

17.

Chu

H

,

Hui

G

,

Yuan

L

, et al.

Identification of novel piRNAs in bladder cancer

.

Cancer Lett

2015

;

356

:

561

–

7

.

18.

Cheng

J

,

Guo

J-M

,

Xiao

B-X

, et al.

piRNA, the new non-coding RNA, is aberrantly expressed in human cancer cells

.

Clin Chim Acta

2011

;

412

:

1621

–

5

.

19.

Assumpcao

CB

,

Calcagno

DQ

,

Araújo

TMT

, et al.

The role of piRNA and its potential clinical implications in cancer

.

Epigenomics

2015

;

7

:

975

–

84

.

20.

Ng

KW

,

Anderson

C

,

Marshall

EA

, et al.

Piwi-interacting RNAs in cancer: emerging functions and clinical utility

.

Mol Cancer

2016

;

15

:

5

.

21.

Romano

G

,

Veneziano

D

,

Acunzo

M

, et al.

Small non-coding RNA and cancer

.

Carcinogenesis

2017

;

38

:

485

–

91

.

22.

Fu

A

,

Jacobs

DI

,

Hoffman

AE

, et al.

PIWI-interacting RNA 021285 is involved in breast tumorigenesis possibly by remodeling the cancer epigenome

.

Carcinogenesis

2015

;

36

:

1094

–

102

.

23.

Tan

L

,

Mai

D

,

Zhang

B

, et al.

PIWI-interacting RNA-36712 restrains breast cancer progression and chemoresistance by interaction with SEPW1 pseudogene SEPW1P RNA

.

Mol Cancer

2019

;

18

:

9

.

24.

Liu

X

,

Zheng

J

,

Xue

Y

, et al.

PIWIL3/OIP5-AS1/miR-367-3p/CEBPA feedback loop regulates the biological behavior of glioma cells

.

Theranostics

2018

;

8

:

1084

–

105

.

25.

Jacobs

DI

,

Qin

Q

,

Fu

A

, et al.

piRNA-8041 is downregulated in human glioblastoma and suppresses tumor growth in vitro and in vivo

.

Oncotarget

2018

;

9

:

37616

–

26

.

26.

Rizzo

F

,

Rinaldi

A

,

Marchese

G

, et al.

Specific patterns of PIWI-interacting small noncoding RNA expression in dysplastic liver nodules and hepatocellular carcinoma

.

Oncotarget

2016

;

7

:

54650

–

61

.

27.

Sai Lakshmi

S

,

Agrawal

S

.

piRNABank: a web resource on classified and clustered Piwi-interacting RNAs

.

Nucleic Acids Res

2007

;

36

:

D173

–

7

.

28.

Wang

J

,

Zhang

P

,

Lu

Y

, et al.

piRBase: a com-prehensive database of piRNA sequences

.

Nucleic Acids Res

2018

;

47

:

D175

–

80

.

Google Scholar

Crossref

WorldCat

29.

Sarkar

A

,

Maji

RK

,

Saha

S

, et al.

piRNAQuest: searching the piRNAome for silencers

.

BMC Genomics

2014

;

15

:

555

.

30.

Muhammad

A

,

Waheed

R

,

Khan

NA

, et al.

piRDisease v1. 0: a manually curated database for piRNA associated diseases

.

Database (Oxford)

2019

;

2019

:baz052. https://doi.org/0.1093/database/baz052.

Google Scholar

OpenURL Placeholder Text

WorldCat

31.

Wang

L

,

Wang

H-F

,

Liu

S-R

, et al.

Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation forest

.

Sci Rep

2019

;

9

:

9848

.

32.

Zheng

K

,

You

ZH

,

Wang

L

, et al. MISSIM: improved miRNA-disease association prediction model based on chaos game representation and broad learning system. In: Huang DS, Huang ZK, Hussain A (eds).

Intelligent Computing Methodologies

. ICIC, Lecture Notes in Computer Science.

2019

;

11645

. Springer, Cham. https://doi.org/10.1007/978-3-030-26766-7_36.

33.

Zheng

K

,

You

Z-H

,

Wang

L

, et al.

MLMDA: a machine learning approach to predict and validate MicroRNA–disease associations by integrating of heterogenous information sources

.

J Transl Med

2019

;

17

:

1

–

14

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

34.

Wang

L

,

You

Z-H

,

Chen

X

, et al.

LMTRDA: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities

.

PLoS Comput Biol

2019

;

15

:

e1006865

.

35.

Cai

L

,

Li

J

,

Wang

J

, et al.

Line graph neural networks for link prediction

.

IEEE Trans Pattern Anal Mach Intell

2021

;

44

:

1

–

5113

.

Google Scholar

OpenURL Placeholder Text

WorldCat

36.

Kirk

JM

,

Kim

SO

,

Inoue

K

, et al.

Functional classification of long non-coding RNAs by k-mer content

.

Nat Genet

2018

;

50

:

1474

–

82

.

37.

Xiang

Z

,

Qin

T

,

Qin

ZS

, et al.

A genome-wide MeSH-based literature mining sys-tem predicts implicit gene-to-gene relationships and networks

.

BMC Syst Biol

2013

;

7

:

S9

.

38.

Xuan

P

,

Han

K

,

Guo

M

, et al.

Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors

.

PloS one

2013

;

8

:

e70204

.

39.

van

Laarhoven

T

,

Nabuurs

SB

,

Marchiori

E

.

Gaussian interaction pro-file kernels for predicting drug–target interaction

.

Bioinformatics

2011

;

27

:

3036

–

43

.

40.

Zheng

K

,

You

Z-H

,

Li

J-Q

, et al.

iCDA-CGR: Identification of circRNA-disease associations based on Chaos Game Representation

.

PLoS Comput Biol

2020

;

16

:

e1007872

.

41.

Wang

Y-B

,

You

Z-H

,

Yang

S

, et al.

A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network

.

BMC Med Inform Decis Mak

2020

;

20

:

1

–

9

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

42.

Wang

M-N

,

You

Z-H

,

Wang

L

, et al.

LDGRNMF: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization

.

Neurocomputing

2020

;

424

:

236

–

45

.

Google Scholar

Crossref

WorldCat

43.

Zheng

K

,

You

Z-H

,

Wang

L

, et al.

DBMDA: a unified embedding for sequence-based miRNA similarity measure with applications to predict and validate miRNA-disease associations

.

Mol Ther Nucleic Acids

2020

;

19

:

602

–

11

.

44.

Fan

C

,

Lei

X

,

Wu

F-X

.

Prediction of CircRNA-disease associations using KATZ model based on heterogeneous networks

.

Int J Biol Sci

2018

;

14

:

1950

–

9

.

45.

Zheng

K

,

You

Z-H

,

Wang

L

, et al.

SPRDA: a matrix completion approach based on the structural perturbation to infer disease-associated Piwi-Interacting RNAs

.

bioRxiv 2020.07.02.185611

.

2020

. https://doi.org/10.1101/2020.07.02.185611

46.

Wang

L

,

You

Z-H

,

Li

L-P

, et al. Predicting circRNA-disease associations using deep generative adversarial network based on multi-source fusion in-formation. In:

2019 IEEE International Conference on Bioinformatics and Biomedi-cine (BIBM)

.

San Diego, CA, USA, IEEE

,

2019

,

145

–

52

.

47.

Zheng

K

,

You

Z-H

,

Wang

L

, et al.

MISSIM: an incremental learning-based model with applications to the prediction of miRNA-disease association

.

IEEE/ACM Trans Comput Biol Bioinform

2020

;

18

(5):

1733

–

42

.

Google Scholar

Crossref

WorldCat

48.

You

Z-H

,

Li

X

,

Chan

KC

.

An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers

.

Neurocomputing

2017

;

228

:

277

–

82

.

Google Scholar

Crossref

WorldCat

49.

Wang

Y-B

,

You

Z-H

,

Li

X

, et al.

Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network

.

Mol Biosyst

2017

;

13

:

1336

–

44

.

50.

Zheng

K

,

You

Z-H

,

Wang

L

, et al.

iMDA-BN: identification of miRNA-disease associations based on the biological network and graph embedding algorithm

.

Comput Struct Biotechnol J

2020

;

18

:

2391

–

400

.

51.

Velickovic P, Cucurull G, Casanova A, et al.

Graph attention networks

.

Stat

2017

;

1050

:20.

OpenURL Placeholder Text

WorldCat

52.

Wei

H

,

Ding

Y

,

Liu

B

.

iPiDA-sHN: Identification of Piwi-interacting RNA-disease associations by selecting high quality negative samples

.

Comput Biol Chem

2020

;

88

:

107361

.

53.

Wei

H

,

Xu

Y

,

Liu

B

.

iPiDi-PUL: identifying Piwi-interacting RNA-disease associations based on positive unlabeled learning

.

Brief Bioinform

2021

;

22.3

:

bbaa058

.

Google Scholar

OpenURL Placeholder Text

WorldCat

54.

Ali

SD

,

Tayara

H

,

Kil To Chong

.

Identification of piRNA disease associations using deep learning

.

Comput Struct Biotechnol J

2022

;

20

:

1208

–

17

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
October 2022	135
November 2022	78
December 2022	68
January 2023	19
February 2023	40
March 2023	45
April 2023	31
May 2023	18
June 2023	18
July 2023	32
August 2023	11
September 2023	19
October 2023	36
November 2023	20
December 2023	13
January 2024	50
February 2024	42
March 2024	84
April 2024	50
May 2024	35
June 2024	30
July 2024	23
August 2024	14
September 2024	29
October 2024	70
November 2024	79
December 2024	45
January 2025	15
February 2025	39
March 2025	56
April 2025	39
May 2025	2

Article Contents

Line graph attention networks for predicting disease-associated Piwi-interacting RNAs

Abstract

Introduction

Methods

Dataset

The construction of new piRNA-disease association networks

The construction of the line graph

The attribute features of nodes

The network features of nodes

Graph attention layer

Line graph attention networks

Experimental results

The performance of GAPDA on the benchmark dataset

Ablation experiment

The sensitivity analysis of line graphs

Comparison with other existing methods

Conclusion

Data availability

Acknowledgements

Funding

Author Biographies

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Line graph attention networks for predicting disease-associated Piwi-interacting RNAs

Abstract

Introduction

Methods

Dataset

The construction of new piRNA-disease association networks

The construction of the line graph

The attribute features of nodes

The network features of nodes

Graph attention layer

Line graph attention networks

Experimental results

The performance of GAPDA on the benchmark dataset

Ablation experiment

The sensitivity analysis of line graphs

Comparison with other existing methods

Conclusion

Data availability

Acknowledgements

Funding

Author Biographies

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only