GVDTI: graph convolutional and variational autoencoders with attribute-level attention for drug–protein interaction prediction

Xuan, Ping; Fan, Mengsi; Cui, Hui; Zhang, Tiangang; Nakaguchi, Toshiya

doi:10.1093/bib/bbab453

Abstract

Motivation

Identifying proteins that interact with drugs plays an important role in the initial period of developing drugs, which helps to reduce the development cost and time. Recent methods for predicting drug–protein interactions mainly focus on exploiting various data about drugs and proteins. These methods failed to completely learn and integrate the attribute information of a pair of drug and protein nodes and their attribute distribution.

Results

We present a new prediction method, GVDTI, to encode multiple pairwise representations, including attention-enhanced topological representation, attribute representation and attribute distribution. First, a framework based on graph convolutional autoencoder is constructed to learn attention-enhanced topological embedding that integrates the topology structure of a drug–protein network for each drug and protein nodes. The topological embeddings of each drug and each protein are then combined and fused by multi-layer convolution neural networks to obtain the pairwise topological representation, which reveals the hidden topological relationships between drug and protein nodes. The proposed attribute-wise attention mechanism learns and adjusts the importance of individual attribute in each topological embedding of drug and protein nodes. Secondly, a tri-layer heterogeneous network composed of drug, protein and disease nodes is created to associate the similarities, interactions and associations across the heterogeneous nodes. The attribute distribution of the drug–protein node pair is encoded by a variational autoencoder. The pairwise attribute representation is learned via a multi-layer convolutional neural network to deeply integrate the attributes of drug and protein nodes. Finally, the three pairwise representations are fused by convolutional and fully connected neural networks for drug–protein interaction prediction. The experimental results show that GVDTI outperformed other seven state-of-the-art methods in comparison. The improved recall rates indicate that GVDTI retrieved more actual drug–protein interactions in the top ranked candidates than conventional methods. Case studies on five drugs further confirm GVDTI’s ability in discovering the potential candidate drug-related proteins.

Contact

[email protected]Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

Drug–protein interaction prediction, Attention-enhanced topological representation, Pairwise attribute representation, Pairwise attribute distribution, Graph convolutional and variational autoencoders

1 Introduction

Drugs usually perform their functions by forming interactions with all sorts of molecular targets, among which proteins are a major class of targets [1, 2]. In the incipient stage of drug development, the identification of drug–target interactions (DTIs) is particularly important [3–6]. However, the identification of DTIs is a time-consuming and costly process [7]. Therefore, various calculation methods have been exploited to infer possible DTIs, providing biologists with information regarding drug-related protein candidates and reducing the workload of wet experiments [8–11].

Early calculation methods for determining drug–protein interactions are mainly divided into two categories. The first category comprises molecular-docking-based methods [12–14], which use the three-dimensional structure of the protein to predict DTIs. However, the three-dimensional structure of many proteins, such as the membrane protein GPCR [15, 16], is not available; this limits the performance of such approaches. The other category comprises ligand-based methods [17], which compare proteins with unknown ligands and those with known ligands. However, when the number of known binding ligands is small, these methods do not work well.

Over the years, computational methods based on machine learning have been proposed to predict drug–protein interactions. Ding et al. established a drug–protein interaction prediction model based on support vector machines, which mainly used the substructure fingerprint of drugs, the physical and chemical properties of the target organism, and the relationships between drugs and target proteins [18]. A support vector machine (SVM) framework based on bipartite local models (BLMs) was proposed by Bleakley and Yamanishi to predict drug–protein interactions [19]. DDR is established based on the random forest algorithm, which mainly utilizes the similarity information and interaction information of drugs and proteins for DTI prediction [20]. Xuan et al. proposed an approach, DTIGBDT, establishing a drug–protein interaction prediction model based on a gradient boosting decision tree (GBDT) [21].

However, most of these methods only use the similarity information and interaction information of drugs and proteins, and do not use other data sources. An information flow-based method is proposed to predict drug-related proteins [22]. DTINet, which mainly utilizes multifarious relationships among drugs, proteins and diseases to study the low-dimensional vector representation of nodes for predicting DTIs, has also been constructed [23]. There are complex correlations between various data regarding drugs and proteins. However, because most of the aforementioned approaches are shallow predictive models, it is difficult to learn such correlations using these methods.

Several recent prediction methods focus on building models based on deep learning to enhance the accuracy of the prediction of drug-related proteins. Sun et al. established a drug–protein interaction prediction model based on generative adversarial networks [24]. An ensemble learning method based on non-negative matrix factorization and GBDT was proposed to infer candidate proteins interacting with drugs [25]. Based on the Weisfeiler-Lehman Neural network, a drug–protein interaction prediction model is established; this model deeply fuses the similarity and interaction information of drugs and proteins [26]. The relationships of the associations among drugs, proteins and diseases also represent essential ancillary information for predicting drug–protein interactions. However, these methods fail to take advantage of information regarding drug-related and protein-related diseases. Zhang et al. established a model for drug–protein interaction prediction based on bidirectional gated recurrent unit, which deeply integrated multiple data related to drugs, proteins and diseases. However, this method failed to take into account the attribute distribution of node pairs [27].

To tackle the limitations in existing conventional methods for drug–protein interaction prediction, we propose a new model, GVDTI, to learn and integrate three pairwise representations from multi-source data, including the attention-enhanced topological representations, attribute distributions and attribute representations. The contributions of our model include:

To extract attention-enhanced topological structure and node attributes, we propose a graph convolutional autoencoder (GCA) based framework and an attribute-level attention mechanism. GCA extracts and embeds the hidden topological structure from drug similarity, drug–disease and drug–protein sub-networks. Since individual attribute in a node’s attribute vector have different contributions to topological embedding, we propose the new attention mechanism at node attribute level to adaptively learn and reflect the discriminative contributions of each sub-network’s node attribute.
To facilitate the extraction of attribute distribution and attribute representation of drug–protein node pairs, we first construct a drug–protein–disease heterogeneous network and an embedding strategy to associate the similarities, interactions and associations of pairs of nodes. The embedding strategy reflects the biological premise that a pair of drug and protein nodes is more likely to interact with each other if they share more common drugs, proteins or diseases.
We propose a novel convolutional variational autoencoder (CVAE) based approach to learn pairwise attribute distributions. The attribute distribution reveals the underlying drug–protein relationship in the established drug–protein–disease heterogeneous network by a convolutional variational encoding and decoding process to foster the prediction of drug-related proteins.
To extract drug–protein pairwise attribute representation, we design a new encoding strategy based on the multi-layer convolutional neural network (MCNN). The pairwise attribute representation integrates the similarities, interactions and correlations of a pair of drug and protein nodes. The ability of the proposed model, the learnt attention-enhanced topological representations, attribute representations and attribute distributions for drug–protein interaction prediction are demonstrated by comprehensive comparison with recently published models and case studies of five drugs.

2 Materials and Methods

Figure 1 demonstrates the framework of our model for predicting drug–protein interactions. First, we constructed a module based on graph convolutional networks with attention to capture the topology structures of multiple subnets, and the module learned the topology representation of each drug node and that of each protein node. Second, a tri-layer heterogeneous network composed of drug nodes, protein nodes and disease nodes was constructed. The attribute distribution representation and the attribute representation of a drug–protein node pair were encoded separately. Finally, these three representations were deeply fused to get the interaction score of the node pair.

2.1 Dataset

In this paper, the protein–disease associations, the drug–disease associations, the drug–protein interactions, the drug similarities and the protein similarities were obtained from a previously published paper [23]; the information obtained included 708 drugs, 1512 proteins, 5603 diseases, 199 214 known drug–disease associations, 1596 745 known protein–disease associations and 1923 known drug–protein interactions. The associations among proteins, drugs and diseases were originally obtained from the comparative toxigenics database (CTD), which mainly provides information about the relationships among chemistry, genes and diseases [28].

2.2 Calculation and representation of multi-source data

Five types of matrices are defined to represent data regarding drugs, proteins and diseases, including similarity matrices of drugs and proteins, drug–disease association matrix, protein–disease association matrix and drug–protein interaction matrix.

2.2.1 Association and interaction matrices

As shown in Figure 2, we use matrix |$A^{drug} \in{\mathbb{R}^{n_{r} \times{n_{d}}}}$| to represent the associations between |$n_{r}$| drugs and |$n_{d}$| diseases, where if drug |${r_{i}}$| is observed to be associated with disease |${d_{j}}$|⁠, then |$A_{ij}^{drug}$| is 1 (otherwise, it is 0). The protein–disease association matrix is denoted by |$A^{protein} \in{\mathbb{R}^{n_{p} \times{n_{d}}}}$|⁠, and the matrix element |$A_{ij}^{protein}$| is 0 or 1. 1 indicates that the protein is related to disease, while 0 indicates the opposite. The matrix |$Y \in{\mathbb{R}^{n_{r} \times{n_{p}}}}$| represents the interactions between drugs and proteins. When |$Y_{ij}$| is 1, it means that the interaction between drug |${r_{i}}$| and protein |${p_{j}}$| is observed, otherwise, it is 0.

$The framework of the proposed GVDTI model, take ${r_{1}}$ and ${p_{2}}$ as examples. The topological embedding vector of each drug or protein node is learned by a graph convolutional autoencoder (a) and (b), and the multi-layer convolutional neural network is used to fuse the topology embedding of ${r_{1}}$ - ${p_{2}}$(c). (d) A tri-layer heterogeneous network is constructed, and the proposed embedding strategy is used to form an attribute-embedding matrix of ${r_{1}}$ - ${p_{2}}$. (e) Pairwise attribute distribution representation is extracted using a convolutional variational autoencoder. (f) Pairwise attribute representation is obtained by multi-layer convolutional coding. (g) Fusion of the three pairwise representations, attention-enhanced topological representation, attribute distribution and attribute representation.$

Figure 1

The framework of the proposed GVDTI model, take |${r_{1}}$| and |${p_{2}}$| as examples. The topological embedding vector of each drug or protein node is learned by a graph convolutional autoencoder (a) and (b), and the multi-layer convolutional neural network is used to fuse the topology embedding of |${r_{1}}$| - |${p_{2}}$|(c). (d) A tri-layer heterogeneous network is constructed, and the proposed embedding strategy is used to form an attribute-embedding matrix of |${r_{1}}$| - |${p_{2}}$|⁠. (e) Pairwise attribute distribution representation is extracted using a convolutional variational autoencoder. (f) Pairwise attribute representation is obtained by multi-layer convolutional coding. (g) Fusion of the three pairwise representations, attention-enhanced topological representation, attribute distribution and attribute representation.

Open in new tab Download slide

2.2.2 Similarity matrices

Based on the chemical substructure information of drugs, the authors of a previous study calculated the intra drug similarity using Tanimoto coefficient [29]. In Figure 2, matrix |$S^{drug} \in{\mathbb{R}^{n_{r} \times{n_{r}}}}$| is used to represent the similarity matrix of drugs, and |$S_{ij}^{drug} \in [0,1]$| is the similarity value between drug |${r_{i}}$| and drug |${r_{j}}$|⁠. The larger the |$S_{ij}^{drug}$|⁠, the higher the similarity between drug |${r_{i}}$| and drug |${r_{j}}$|⁠. As shown in Figure 2, the protein similarity matrix |$S^{protein} \in{\mathbb{R}^{n_{p} \times{n_{p}}}}$|⁠, which was described in [30], was constructed on the basis of the Smith–Waterman score based on the primary sequences of the targets. |$S_{ij}^{protein}$| indicates the similarity value between protein |${p_{i}}$| and protein |${p_{j}}$|⁠.

$Similarity matrices, interaction matrices and association matrices derived from corresponding networks of drugs, proteins and diseases. The detailed process of the proposed embedding strategy. Considering drug ${r_{1}}$ and protein ${p_{2}}$ as examples, the attribute-embedding matrix of ${r_{1}}$ - ${p_{2}}$ is constructed.$

Figure 2

Similarity matrices, interaction matrices and association matrices derived from corresponding networks of drugs, proteins and diseases. The detailed process of the proposed embedding strategy. Considering drug |${r_{1}}$| and protein |${p_{2}}$| as examples, the attribute-embedding matrix of |${r_{1}}$| - |${p_{2}}$| is constructed.

Open in new tab Download slide

2.3 Pairwise attention-enhanced topological representation learning

2.3.1 Attention mechanism at attribute level

The drug–disease association matrix |$A^{drug}$| and the drug–protein interaction matrix |$Y$| are spliced back and forth to form a matrix |$X^{drug}$|⁠. Its |$i$|-th line records the association of drug |${r_{i}}$| with all diseases and the interaction with all proteins. Such a row can be used as the attribute vector of |${r_{i}}$|⁠, and each of its attribute nodes has a different contribution to the low-dimensional topological embedding vector of the drug. Therefore, we established an attribute attention mechanism to learn which attributes of the drug and protein nodes are the most informative for their low-dimensional topological embedding vector, as shown in Figure 3. Each attribute |$X_{ij}^{drug}$| of the drug node |${r_{i}}$| is assigned a different weight |${\alpha _{ij}^{r}}$|⁠,

$$\begin{align}& { s_{i}^{r} = H^{r} tanh\left({W^{r}}{(X_{i}^{drug})}^{T} + b^{r}\right) } \end{align}$$

(1)

$$\begin{align}& \alpha_{ij}^{r} = \frac{{exp\left( {s_{ij}^{r}} \right)}}{{\sum\nolimits_{k} {exp\left( {{s_{ik}^{r}}} \right)}}} \end{align}$$

(2)

where |$W^{r}$| is a weight matrix, and |$b^{r}$| is a bias vector. |$X_{i}^{drug}\in{\mathbb{R}^{{1} \times{(n_{d}+n_{p})}}}$| is the attribute vector of the drug node |$r_{i}$|⁠, |$H^{r}\in{\mathbb{R}^{{(n_{d}+n_{p})} \times{n_{f}}}}$| is used to capture contextual relationships among different drugs, and |$n_{f}$| is the number of low-dimensional features for a drug node. |$s_{i}^{r} = \big \{s_{i1}^{r},s_{i2}^{r},...,s_{ik}^{r},...,s_{i(n_{d}+n_{p})}^{r}\big \}$| is the vector that records the attention score and |$s_{ij}^{r}$| is the score of the |$j$|-th attribute |$X_{ij}^{drug}$| of the drug |$r_{i}$|⁠. |$\alpha _{i}^{r} = \big \{\alpha _{i1}^{r},\alpha _{i2}^{r},...,\alpha _{ik}^{r},...,\alpha _{i(n_{d}+n_{p})}^{r}\big \}$| is the result of normalizing the attention scores of all attributes of |${r_{i}}$| and |$\alpha _{ij}^{r}$| is the attention weight of the attribute node |$X_{ij}^{drug}$|⁠. Therefore, the enhancement vector of the drug node |${r_{i}}$| can be expressed as |$y_{i}^{r}$|⁠,

$$\begin{align}& { y_{i}^{r} = {\alpha_{i}^{r}}\circ{(X_{i}^{drug})}^{T}} \end{align}$$

(3)

|$\circ $| is an element-wise product operator. |$y_{i}^{r}$| is the enhanced attribute vector of the drug node |$r_{i}$|⁠, and its transposition is the |$i$|-th row of the enhanced attribute matrix |$\widetilde{X}^{drug}$|⁠. Similarly, |$X^{protein}$| was obtained by the front-and-back splicing of the protein–disease association matrix |$A^{protein}$| and drug–protein interaction matrix |$Y$|⁠, and |$\widetilde{X}^{protein}$| was obtained by the attention enhancement of all attributes of protein |${p_{i}}$|⁠. Finally, we obtained the enhanced drug property matrix |$\widetilde{X}^{drug}$| and protein property matrix |$\widetilde{X}^{protein}$|⁠.

$Extraction and fusion of the topological embedding of ${r_{1}}$ - ${p_{2}}$.$

Figure 3

Extraction and fusion of the topological embedding of |${r_{1}}$| - |${p_{2}}$|⁠.

Open in new tab Download slide

2.3.2 Pairwise attention-enhanced topology extraction by graph convolutional autoencoder

|$\widetilde{X}^{drug}$| is the enhanced drug attribute matrix, and |$S^{drug}$| is the drug similarity matrix showing the similarity between the drugs. |$\hat{S}^{drug}=D^{\frac{-1}{2}}{S}^{drug} D^{\frac{-1}{2}}$|⁠, where |$D$| is a diagonal matrix and |$D_{ii}=\sum \nolimits _{j} {S}_{ij}^{drug}$|⁠. |$\hat{S}^{drug}$| is multiplied by |$\widetilde{X}^{drug}$| to fuse the properties of the drug node with the topological structures within the drugs. The result of matrix multiplication is the input of the module based on the graph convolution autoencoder [31–33]. Then, following multiplication with the weight matrix |$W_{e}^{1}$|⁠, the drug node is mapped to a potential low-dimensional space, and the low-dimensional topology representation matrix of the drug, |$Z_{e}^{r}$| is obtained. Similarly, we performed graph convolution on |$Z_{e}^{r}$| again to obtain the low-dimensional topology representation matrix |$Z^{r}$| of the drug,

$$\begin{align}& Z_{e}^{r} = Softmax\left(\hat{S}^{drug} \widetilde{X}^{drug} W_{e}^{1} \right) \end{align}$$

(4)

$$\begin{align}& Z^{r} = Softmax\left(\hat{S}^{drug} Z_{e}^{r} W_{e}^{2} \right) \end{align}$$

(5)

We decoded the matrix |$Z^{r}$| back to the original feature space and obtained |$\hat{X}^{drug}$|⁠,

$$\begin{align}& \hat{X}^{drug} = Sigmoid\left(\hat{S}^{drug} \left( Sigmoid \left( \hat{S}^{drug} Z^{r} W_{d}^{1}\right) \right)W_{d}^{2} \right) \end{align}$$

(6)

where |$W_{d}^{1}$| and |$W_{d}^{2}$| are weight matrices. The gap between the original matrix |$X^{drug}$| and the reconstructed matrix |$\hat{X}^{drug}$| should be minimized, so that the mean square error can be considered the loss function for this module [34].

$$\begin{align}& loss_{t}^{r} = min\sum\nolimits\parallel{{X^{drug}} - {\hat{X}^{drug}}}\parallel^{2} \end{align}$$

(7)

Homoplastically, we also need to fuse the protein attribute matrix |$\widetilde{X}^{protein}$| and similarity matrix |$S^{protein}$| and extract the low-dimensional topological representation of the protein to obtain matrix |$Z^{p}$|⁠.

Let |$z_{1}^{r}$| be the first row of |$Z^{r}$| and |$z_{2}^{p}$| be the second row of |$Z^{p}$|⁠. Taking drugs |$r_{1}$| and protein |$p_{2}$| as examples, we stacked the topological embedding vector |$z_{1}^{r}$| of the drug node and the topological embedding vector |$z_{2}^{p}$| of the protein node up and down to obtain

$x=\begin{bmatrix} z_{1}^{r} \\ z_{2}^{p} \end{bmatrix}$

⁠. |$x$| went through two convolution-pool layers to fuse the topological embedding vectors of |${r_{1}}$|-|${p_{2}}$|⁠, and the topology representation |$u_{topology}$| is learned.

2.4 Construction of attribute-embedding matrix

The biological premise of our embedding strategy is that if a pair of drug and protein nodes have interactions, associations or similarities with more of the same drugs, proteins or diseases, the said pairs of nodes are more likely to interact. Based on this biological premise, we established pairwise attribute-embedding matrices, for example, the attribute-embedding matrix of |${r_{1}}$|-|${p_{2}}$|⁠.

Figure 2 shows the interaction matrix |$Y$|⁠, the association matrices |$A^{protein}$| and |$A^{drug}$|⁠, and the similarity matrices |$S^{protein}$| and |$S^{drug}$|⁠. Firstly, when |${r_{1}}$| and |${p_{2}}$| have interactions or similarities with more identical proteins, |${r_{1}}$| is more likely to interact with |${p_{2}}$|⁠. The first row of |$Y$|⁠, |$Y_{1,*}$|⁠, and the second row of matrix |$S^{protein}$|⁠, |$S_{2,*}^{protein}$|⁠, record the interactions of |${r_{1}}$| and |${p_{2}}$|⁠, respectively, with all proteins; thus, we spliced them up and down to obtain matrix |${f_{1}}$|⁠,

$$\begin{align}& f_{1}=\begin{bmatrix} Y_{1,*} \\ S_{2,*}^{protein} \end{bmatrix} \end{align}$$

(8)

Secondly, when |${r_{1}}$| and |${p_{2}}$| have similarities and interactions with more of the same drugs, |${r_{1}}$| and |${p_{2}}$| are more likely to interact; thus, the first row of matrix |$S^{drug}$|⁠, |$S_{1,*}^{drug}$|⁠, and the transpose of the second column of matrix |$Y$|⁠, |$Y_{*,2}^{T}$|⁠, were spliced to form a matrix |${f_{2}}$|⁠,

$$\begin{align}& f_{2}=\begin{bmatrix} S_{1,*}^{drug} \\ Y_{*,2}^{T} \end{bmatrix} \end{align}$$

(9)

where |$Y_{*,2}^{T}$| and |$S_{1,*}^{drug}$|⁠, respectively, show the connections of |${p_{2}}$| and |${r_{1}}$| to all drugs. Similarly, when |${r_{1}}$| and |${p_{2}}$| are associated with more common diseases, |${r_{1}}$| is more likely to interact with |${p_{2}}$|⁠, to form an association matrix |${f_{3}}$| between them and the disease,

$$\begin{align}& f_{3}=\begin{bmatrix} A_{1,*}^{drug} \\ A_{2,*}^{protein} \end{bmatrix} \end{align}$$

(10)

Finally, we obtained the attribute-embedding matrix |$F \in{\mathbb R^{ 2 \times (n_{p}+n_{r}+n_{d})}}$| of |${r_{1}}$| and |${p_{2}}$| by concatenating |${f_{1}}$|⁠, |${f_{2}}$| and |${f_{3}}$| from end to end,

$$\begin{align}& F=\begin{bmatrix} f_{1} f_{2} f_{3}\end{bmatrix} = \begin{bmatrix} Y_{1,*}\quad S_{1,*}^{drug}\ A_{1,*}^{drug} \\ S_{2,*}^{protein} Y_{*,2}^{T}\quad A_{2,*}^{protein} \end{bmatrix} \end{align}$$

(11)

|$n_p$|⁠, |$n_r$| and |$n_d$| represent the numbers of proteins, drugs and diseases, respectively.

2.5 Pairwise attribute distribution learning by CVAE

A CVAE is a deep generative model [35]. Unlike traditional autoencoders [36], which describe the low-dimensional feature representation of |${r_{1}}$|-|${p_{2}}$| numerically, a CVAE describes this in a probabilistic way, and finally yields the pairwise attribute distribution representation |$m$|⁠.

Variational encoder. The coding part of the CVAE takes the embedding matrix |$F$| of |${r_{1}}$| and |${p_{2}}$| as the input, learning the pairwise attribute distribution representation |$m$|⁠. The inference network consists of two hidden layers, each of which contains a convolution layer and a pooling layer. The specific parameter settings are shown in Figure 4. In Figure 4, k refers to the size of the filter, s is the strides of convolution and pooling operations, p is the zero-padding operation. The output of each hidden layer in the encoding process is,

$$\begin{align}& X_{en}^{l}=max \big( \eta \big( W_{en}^{l}*X_{en}^{l-1} + b_{en}^{l} \big) \big) l=1,...,L_{en} \end{align}$$

(12)

where |${L_{en}}$| is the number of layers of the inference network and |$X_{en}^{0} = F$|⁠. |$W_{en}^{l}$| and |$b_{en}^{l}$| are the weight matrix and bias vector of the l-th layer, respectively. * represents the convolution operation, |$\eta $| is the nonlinear activation function |$LeakyRelu$|⁠, and |$max$| represents the maximum pooling. We flattened the output |$X_{en}^{L_{en}}$| of the last layer of convolution into a vector |$x_{en}$|⁠, and then performed two fully connected mappings on |$x_{en}$|⁠, as described previously [37], to obtain two parameters: |$\mu $| and |$\sigma $|⁠,

$$\begin{align}& \mu= \eta \big(W_{\mu} x_{en} + b_{\mu}\big),\sigma= \eta \big(W_{\sigma} x_{en} + b_{\sigma}\big) \end{align}$$

(13)

where |$W_{\mu }$| and |$W_{\sigma }$| are the weights matrices of the linear layer derived from |$\mu $| and |$\sigma $|⁠, respectively, |$b_{\mu }$| and |$b_{\sigma }$| are the corresponding bias vectors and |$\eta $| is the nonlinear activation function |$LeakyRelu$|⁠. Following the reparameterization scheme in [38, 39], we considered |$\mu $| and |$\sigma $| the mean and variance, respectively, and constructed the attribute distribution representation of |${r_{1}}$|-|${p_{2}}$|⁠, |$m$|⁠, as follows,

$$\begin{align}& m = exp \big(\sigma \big) \circ \xi + \mu \end{align}$$

(14)

$$\begin{align}& p \big( F|m\big) = f_{de}\big( m\big) \end{align}$$

(15)

|$f_{de}\big ( m\big )$| indicates that the attribute distribution representation |$m$| is first linearly mapped, and then reshaped into a feature map to perform a deconvolution operation. The specific details regarding this process are shown in Figure 4. The result of the linear mapping is as follows,

$$\begin{align}& x_{de} = \eta \big( W_{lin}m + b_{lin}\big) \end{align}$$

(16)

where |$W_{lin}$| and |$b_{lin}$| are the weight matrix and bias vector to be learnt by the linear layer, respectively. We reshaped |$x_{de}$| into a feature map form, and then used it as the input of deconvolution to obtain |$X_{de}^{l}$|⁠,

$$\begin{align}& X_{de}^{l} = \eta \big( W_{de}^{l} \star X_{de}^{l-1} + b_{de}^{l}\big) l=1,...,L_{de} \end{align}$$

(17)

where |$X_{de}^{0} = x_{de}$|⁠, |$L_{de}$| is the number of layers of the generated network. |$W_{de}^{l}$| is the weight matrix of layer l, |$b_{de}^{l}$| is the corresponding bias vector, |$\star $| represents the deconvolution operation and |$X_{de}^{l}$| is feature map acquired through layer l.

Loss calculation based on the CVAE. We optimized the representation of the attribute distribution based on the following loss,

$$\begin{align}& loss_{v} = E_{q \big( m|F\big)} [log(p \big( F|m\big))] - KL[q \big( m|F\big) || p \big(m\big)] \end{align}$$

(18)

where |$p \big (m\big )$| is the prior distribution, which makes the target distribution of |$m$| a Gaussian distribution. |$KL$| is the KL divergence, which is used to measure the distance between the posterior distribution |$q \big ( m|F\big )$| and the prior distribution |$p \big (m\big )$|⁠. We used the Adam algorithm to optimize |$loss_{v}$| [40]. After the training is completed, the pairwise attribute distribution representation |$m$| can be obtained, which is defined as |$u_{distribution}$|⁠.

$shows the process for the extraction of the pairwise attribute distribution and attribute representation of ${r_{1}}$-${p_{2}}$.$

Figure 4

shows the process for the extraction of the pairwise attribute distribution and attribute representation of |${r_{1}}$|-|${p_{2}}$|⁠.

Open in new tab Download slide

2.6 Pairwise attribute representation learning by multi-layer convolutional neural network

The embedded matrix |$F$| of |${r_{1}}$| and |${p_{2}}$| is inputted into MCNN to learn the pairwise attribute representations of |${r_{1}}$| and |${p_{2}}$| to assist the entire model in predicting drug–protein interactions. The convolution module contains two convolution layers and two max-pooling layers, as shown in Figure 4. In order to learn the edge information of |$F$| during the convolution process, we have performed zero-padding operations on the input of each convolution layer. The output feature map of each hidden layer is

$$\begin{align}& c^{l} = max \big(\eta \big(W_{cn}^{l} * c^{l-1} + b_{cn}^{l}\big)\big) l= 1,2 \end{align}$$

(19)

where |$c^{0} = F$|⁠, |$b_{cn}^{l}$| is the bias vector of the l-th layer and |$W_{cn}^{l}$| is the corresponding weight matrix. |$c^{l}$| is feature map output by the lth hidden layer. We flattened the feature map |$c^{2}$| output by the last hidden layer into a vector |$c^{^{\prime}}$|⁠, which is the pairwise attribute representation, and defined it as |$u_{attribute}$|⁠.

2.7 Integration of the multiple pairwise representations

The attention-enhanced topological representation, attribute distribution and attribute representation of drug |$r_{1}$| and protein |$p_{2}$| are |$u_{topology}$|⁠, |$u_{distribution}$| and |$u_{attribute}$|⁠, respectively. We concatenated these three representations before and after to obtain the vector |$p$|⁠,

$$\begin{align}& p = [u_{topology}, u_{distribution}, u_{attribute}] \end{align}$$

(20)

In order to obtain the associated probability of |$r_{1}$|-|$p_{2}$|⁠, |$p$| goes through two convolution-pooling layers to obtain its feature map which is flattened as a vector |$p^{^{\prime}}$|⁠. |$p^{^{\prime}}$| passes through a fully connected layer and |$softmax$| layer to obtained two types of associated probability distributions |$o$| [41],

$$\begin{align}& { o = softmax \big(W_{f}p^{^{\prime}} + b_{f}\big) } \end{align}$$

(21)

|$o = [o_{1}, o_{2}]$| and |$o_{1}$| and |$o_{2}$| represent the probability that |$r_{1}$| and |$p_{2}$| are determined to have an interaction relationship and the probability that there is no interaction relationship, respectively. We used the cross-entropy loss function to optimize the above process,

$$\begin{align}& loss = -[y log(o_{1}) + (1-y)log(o_{2}))] \end{align}$$

(22)

where |$y$| is the real label. The loss function (22) is optimized using the Adam algorithm [40]. The module is trained by the backpropagation (BP) algorithm [42].

Figure 5

ROC curves and PR curves of all the methods in comparison of all the 708 drugs.

Open in new tab Download slide

3 Experimental evaluations and discussions

3.1 Evaluation metrics

In this article, we treated the known drug–protein interaction samples as positive samples, and the unknown drug–protein interactions as the negative samples. In our dataset, there are 1923 positive samples and |$708 * 1512 - 1923 = 1068\,573$| negative samples. Obviously, there is a serious class imbalance between the positive and negative samples. Therefore, we randomly extracted negative samples at the same amount as the positive samples and formed the set A together with the positive samples. Set B contains |$1068\,573-1923=1066\,650$| negative samples.

We utilized 5-fold cross-validation to evaluate the performance of GVDTI and several other more advanced forecasting methods. The same training data and test data were used to verify these methods. In every cross-validation, we randomly divided the samples in set A into five equal subsets, four of which are for training; the fifth subset is combined with set B as the test set.

Given a threshold |$\omega $|⁠, when the drug–protein node pair has a known interaction relationship in the sample, and its interaction score is greater than |$\omega $|⁠, we consider the sample as a positive sample that is successfully identified. Otherwise, it is judged as a negative sample. We calculated the true positive rates (TPRs) and false positive rates (FPRs) by changing |$\omega $|⁠, and plotted the receiver operating characteristic (ROC) curve [43]. The TPR and FPR are defined as follows,

$$\begin{align}& TPR = \frac{TP}{TP+FN}, FPR = \frac{FP}{TN+FP} \end{align}$$

(23)

where TP and TN are the number of positive samples and negative samples that were successfully identified, respectively. FN(FP) represents the number of negative(positive) examples that are incorrectly identified.

AUC is the area under the ROC curve [44]; it is utilized to evaluate the predictive performance of the model. Nevertheless, the number of negative samples is much larger than that of the positive samples. In this case, the area under the precision-recall (PR) curve (AUPR) is more informative with regard to evaluating the overall performance of the prediction method [45]. Precision and Recall are defined as

$$\begin{align}& Precision= \frac{TP}{TP+FP}, Recall = \frac{{{TP}}}{TP+FN} \end{align}$$

(24)

Precision is the percentage of correctly identified positive samples relative to those that are judged to be positive, and Recall is the same as TPR. Since biologists often choose the top candidate proteins and then further verify their interaction with the drugs, the recall rate of the top k is calculated.

Table 1

Open in new tab

The statistical results of the paired Wilcoxon test on the AUCs and AUPRs over all the 708 drugs by comparing GVDTI and all other seven methods.

	DTIP	GANDTI	NGDTP	DTINet	GRMF	DDR	Lee’s method
p-value of AUC	1.2215e-153	0.2123e-112	2.2055e-133	5.0918e-62	3.5449e-75	2.5239e-92	5.1732e-89
p-value of AUPR	5.1432e-294	7.6154e-134	6.6362e-261	8.5746e-224	2.9768e-249	1.5273e-104	1.0503e-114

	DTIP	GANDTI	NGDTP	DTINet	GRMF	DDR	Lee’s method
p-value of AUC	1.2215e-153	0.2123e-112	2.2055e-133	5.0918e-62	3.5449e-75	2.5239e-92	5.1732e-89
p-value of AUPR	5.1432e-294	7.6154e-134	6.6362e-261	8.5746e-224	2.9768e-249	1.5273e-104	1.0503e-114

Table 1

Open in new tab

The statistical results of the paired Wilcoxon test on the AUCs and AUPRs over all the 708 drugs by comparing GVDTI and all other seven methods.

	DTIP	GANDTI	NGDTP	DTINet	GRMF	DDR	Lee’s method
p-value of AUC	1.2215e-153	0.2123e-112	2.2055e-133	5.0918e-62	3.5449e-75	2.5239e-92	5.1732e-89
p-value of AUPR	5.1432e-294	7.6154e-134	6.6362e-261	8.5746e-224	2.9768e-249	1.5273e-104	1.0503e-114

	DTIP	GANDTI	NGDTP	DTINet	GRMF	DDR	Lee’s method
p-value of AUC	1.2215e-153	0.2123e-112	2.2055e-133	5.0918e-62	3.5449e-75	2.5239e-92	5.1732e-89
p-value of AUPR	5.1432e-294	7.6154e-134	6.6362e-261	8.5746e-224	2.9768e-249	1.5273e-104	1.0503e-114

Table 2

Open in new tab

The top 10 candidate proteins of five drugs

Drug name	Rank	Targets	Evidence	Rank	Targets	Evidence
	1	HTR6	DrugBank/STITCH	6	DRD5	DrugBank
	2	CHRM2	DrugBank	7	ADRA2C	DrugBank
Quetiapine	3	CHRM4	DrugBank	8	ADRA1B	DrugBank
	4	HTR2C	DrugBank/STITCH	9	CHRM5	DrugBank
	5	ADRA1D	DrugBank	10	DRD2	DrugBank/STITCH
	1	HTR2C	DrugBank/STITCH	6	NR1I2	Unconfirmed
	2	HTR7	DrugBank/STITCH	7	CHRM1	DrugBank/STITCH
Clozapine	3	HTR1D	DrugBank	8	ADRA1A	DrugBank
	4	HTR6	DrugBank/STITCH	9	CHRM5	DrugBank
	5	HTR1B	DrugBank	10	HRH4	DrugBank
	1	KCNJ11	DrugBank	6	CACNA1A	DrugBank
	2	ADRA2B	Unconfirmed	7	CACNA1B	DrugBank
Verapamil	3	KCNH2	DrugBank/STITCH	8	CACNB4	Literature [49]
	4	CACNB2	Literature [49]	9	CACNA1C	DrugBank/STITCH
	5	CACNA1S	Literature [49]	10	CACNA1F	Literature [49]
	1	CHRM2	DrugBank	6	SLC6A2	DrugBank
	2	NTRK1	DrugBank	7	HTR2A	DrugBank/STITCH
Amitriptyline	3	KCNQ2	DrugBank	8	KCND2	Literature [50]
	4	KCNA1	DrugBank	9	OPRD1	DrugBank
	5	ADRA1A	DrugBank/STITCH	10	KCND3	Literature [50]
	1	HTR2C	DrugBank/STITCH	6	HTR3A	DrugBank
	2	DRD2	DrugBank	7	CHRM3	DrugBank
Ziprasidone	3	DRD5	DrugBank	8	ADRA2C	Literature [51]
	4	ADRA2A	DrugBank	9	HRH2	Unconfirmed
	5	HTR1D	DrugBank/STITCH	10	HTR6	DrugBank

Drug name	Rank	Targets	Evidence	Rank	Targets	Evidence
	1	HTR6	DrugBank/STITCH	6	DRD5	DrugBank
	2	CHRM2	DrugBank	7	ADRA2C	DrugBank
Quetiapine	3	CHRM4	DrugBank	8	ADRA1B	DrugBank
	4	HTR2C	DrugBank/STITCH	9	CHRM5	DrugBank
	5	ADRA1D	DrugBank	10	DRD2	DrugBank/STITCH
	1	HTR2C	DrugBank/STITCH	6	NR1I2	Unconfirmed
	2	HTR7	DrugBank/STITCH	7	CHRM1	DrugBank/STITCH
Clozapine	3	HTR1D	DrugBank	8	ADRA1A	DrugBank
	4	HTR6	DrugBank/STITCH	9	CHRM5	DrugBank
	5	HTR1B	DrugBank	10	HRH4	DrugBank
	1	KCNJ11	DrugBank	6	CACNA1A	DrugBank
	2	ADRA2B	Unconfirmed	7	CACNA1B	DrugBank
Verapamil	3	KCNH2	DrugBank/STITCH	8	CACNB4	Literature [49]
	4	CACNB2	Literature [49]	9	CACNA1C	DrugBank/STITCH
	5	CACNA1S	Literature [49]	10	CACNA1F	Literature [49]
	1	CHRM2	DrugBank	6	SLC6A2	DrugBank
	2	NTRK1	DrugBank	7	HTR2A	DrugBank/STITCH
Amitriptyline	3	KCNQ2	DrugBank	8	KCND2	Literature [50]
	4	KCNA1	DrugBank	9	OPRD1	DrugBank
	5	ADRA1A	DrugBank/STITCH	10	KCND3	Literature [50]
	1	HTR2C	DrugBank/STITCH	6	HTR3A	DrugBank
	2	DRD2	DrugBank	7	CHRM3	DrugBank
Ziprasidone	3	DRD5	DrugBank	8	ADRA2C	Literature [51]
	4	ADRA2A	DrugBank	9	HRH2	Unconfirmed
	5	HTR1D	DrugBank/STITCH	10	HTR6	DrugBank

Table 2

Open in new tab

The top 10 candidate proteins of five drugs

Drug name	Rank	Targets	Evidence	Rank	Targets	Evidence
	1	HTR6	DrugBank/STITCH	6	DRD5	DrugBank
	2	CHRM2	DrugBank	7	ADRA2C	DrugBank
Quetiapine	3	CHRM4	DrugBank	8	ADRA1B	DrugBank
	4	HTR2C	DrugBank/STITCH	9	CHRM5	DrugBank
	5	ADRA1D	DrugBank	10	DRD2	DrugBank/STITCH
	1	HTR2C	DrugBank/STITCH	6	NR1I2	Unconfirmed
	2	HTR7	DrugBank/STITCH	7	CHRM1	DrugBank/STITCH
Clozapine	3	HTR1D	DrugBank	8	ADRA1A	DrugBank
	4	HTR6	DrugBank/STITCH	9	CHRM5	DrugBank
	5	HTR1B	DrugBank	10	HRH4	DrugBank
	1	KCNJ11	DrugBank	6	CACNA1A	DrugBank
	2	ADRA2B	Unconfirmed	7	CACNA1B	DrugBank
Verapamil	3	KCNH2	DrugBank/STITCH	8	CACNB4	Literature [49]
	4	CACNB2	Literature [49]	9	CACNA1C	DrugBank/STITCH
	5	CACNA1S	Literature [49]	10	CACNA1F	Literature [49]
	1	CHRM2	DrugBank	6	SLC6A2	DrugBank
	2	NTRK1	DrugBank	7	HTR2A	DrugBank/STITCH
Amitriptyline	3	KCNQ2	DrugBank	8	KCND2	Literature [50]
	4	KCNA1	DrugBank	9	OPRD1	DrugBank
	5	ADRA1A	DrugBank/STITCH	10	KCND3	Literature [50]
	1	HTR2C	DrugBank/STITCH	6	HTR3A	DrugBank
	2	DRD2	DrugBank	7	CHRM3	DrugBank
Ziprasidone	3	DRD5	DrugBank	8	ADRA2C	Literature [51]
	4	ADRA2A	DrugBank	9	HRH2	Unconfirmed
	5	HTR1D	DrugBank/STITCH	10	HTR6	DrugBank

Drug name	Rank	Targets	Evidence	Rank	Targets	Evidence
	1	HTR6	DrugBank/STITCH	6	DRD5	DrugBank
	2	CHRM2	DrugBank	7	ADRA2C	DrugBank
Quetiapine	3	CHRM4	DrugBank	8	ADRA1B	DrugBank
	4	HTR2C	DrugBank/STITCH	9	CHRM5	DrugBank
	5	ADRA1D	DrugBank	10	DRD2	DrugBank/STITCH
	1	HTR2C	DrugBank/STITCH	6	NR1I2	Unconfirmed
	2	HTR7	DrugBank/STITCH	7	CHRM1	DrugBank/STITCH
Clozapine	3	HTR1D	DrugBank	8	ADRA1A	DrugBank
	4	HTR6	DrugBank/STITCH	9	CHRM5	DrugBank
	5	HTR1B	DrugBank	10	HRH4	DrugBank
	1	KCNJ11	DrugBank	6	CACNA1A	DrugBank
	2	ADRA2B	Unconfirmed	7	CACNA1B	DrugBank
Verapamil	3	KCNH2	DrugBank/STITCH	8	CACNB4	Literature [49]
	4	CACNB2	Literature [49]	9	CACNA1C	DrugBank/STITCH
	5	CACNA1S	Literature [49]	10	CACNA1F	Literature [49]
	1	CHRM2	DrugBank	6	SLC6A2	DrugBank
	2	NTRK1	DrugBank	7	HTR2A	DrugBank/STITCH
Amitriptyline	3	KCNQ2	DrugBank	8	KCND2	Literature [50]
	4	KCNA1	DrugBank	9	OPRD1	DrugBank
	5	ADRA1A	DrugBank/STITCH	10	KCND3	Literature [50]
	1	HTR2C	DrugBank/STITCH	6	HTR3A	DrugBank
	2	DRD2	DrugBank	7	CHRM3	DrugBank
Ziprasidone	3	DRD5	DrugBank	8	ADRA2C	Literature [51]
	4	ADRA2A	DrugBank	9	HRH2	Unconfirmed
	5	HTR1D	DrugBank/STITCH	10	HTR6	DrugBank

3.2 Comparison with other methods

The performance of the proposed GVDTI method for drug–protein interaction prediction is compared with that of several advanced methods, including DTIP [27], GANDTI [24], NGDTP [25], DTINet [23], GRMF [46], DDR [20] and Lee’s method [47]. As shown in Figure 5(A), GVDTI achieved the highest average AUC (AUC = 0.983) of all the 708 tested drugs, which is 0.2% higher than that of the model showing second-best performance, DTIP, 2.8% higher than that of NGDTP, 4.8% higher than that of GANDTI, 6.2% higher than that of DTINET, 8.9% higher than that of GRMF, 10.4% higher than that of DDR and 17.1% higher than that of the worst-performing method, i.e. Lee’s method. The GVDTI method showed the best performance, achieving an AUPR of 0.435, which is superior to that of DTIP, NGDTP, GRMF, DTINet, GANDTI, Lee’s method and DDR by 3.6%, 7.4%, 11.6%, 26%, 36.1%, 37.6% and 38.5%, respectively.

DTIP showed the second-best performance. Based on bidirectional GRU, this method learns the multi-scale neighbor topology of drugs and proteins, and deeply explores the potential relationship between drugs and proteins. NGDTP also showed a good performance. Based on non-negative matrix factorization, this method fuses multiple connection data between drugs and proteins, learning the topological representation of drugs and proteins. These findings indicate that it is necessary to integrate various information about drugs and proteins to obtain the topological representation of nodes.

Figure 6

The average recalls over all the drugs at different top |$k$| values.

Open in new tab Download slide

DTINet performed well with regard to the AUC (AUC = 0.921), but it did not achieve a good AUPR (AUPR = 0.175). On the contrary, GRMF achieved a good AUPR (AUPR = 0.319), but its AUC was a little worse (AUC = 0.894). NGDTP, DTINet and GRMF are all shallow prediction models based on matrix factorization; these cannot deeply learn the complex associations between the information regarding various drugs and proteins. Based on the generative adversarial network, GANDTI established a drug–protein interaction prediction model. However, this approach does not utilize drug–disease associations and protein–disease associations. DDR and Lee’s method performance was even worse, because the former does not use network topology information and the latter ignores the attribute information of the node. Our method not only performed the in-depth fusion of a variety of information related to the drugs and proteins, but also made the most of the attribute messages of the nodes.

In addition, for each prediction method, we obtain 708 AUCs and 708 AUPRs for all the 708 drugs. We performed Paired Wilcoxon test on the 708 paired AUCs or AUPRs of every two methods. Wilcoxon tests were used to assess whether the AUCs and AUPRs of GVDTI were significantly greater than each of the other seven approaches for the 708 drugs. Table 1 shows that GVDTI with regard to the AUCs and the AUPRs was significantly better than the other methods (⁠|$p$|-value < 0.05).

Among the top k protein candidates of the prediction results, the higher the recall rate, the more correctly will real proteins be identified. Under different k values, GVDTI’s performance was always better than that of the other methods (Figure 6), accounting for 89.7% of the positive samples in the top 30, 91.8% in the top 60 and 94.9% in the top 120. The recall rates of DTIP ranked second, with 85.3%, 89.4% and 93.5% positive samples in the top 30, 60 and 120, respectively. GANDTI identified 38.2%, 62.9% and 86.1% of the positive samples in the top 30, 60 and 120, respectively. NGDTP identified 85.2%, 87.1% and 89.8% of the positive samples in the top 30, 60 and 120, respectively. DTINet identified 74.8%, 81.5% and 85.4% of the positive samples in the top 30, 60 and 120, respectively, which were slightly higher than those of the GRMF method (77.5%, 79.5% and 82.6%, respectively). In contrast, Lee’s method was inferior to other methods, identifying 23%, 33.4% and 51.9%, respectively.

3.3 Case studies on five drugs

To fully prove the ability of GVDTI to discover potential drug–protein interactions, we conducted case studies on five drugs (quetiapine, verapamil, amitriptyline, clozapine and ziprasidone), and each of drug has at least 14 known drug–protein interactions. We collected and analyzed the top 10 candidate proteins for each drug (Table 2). In addition, we also conducted case studies on five drugs (imipramine, triazolam, desipramine, clonazepam, diazepam) each of which has less than 14 known drug–protein interactions. We collected their top five candidates and listed them in supplementary Table ST1.

The DrugBank database not only contains detailed drug data, such as chemical data and pharmacological data, but also includes comprehensive drug–protein data, such as information regarding their sequence, structure and pathway of action [46]. STITCH (Search Tool for Interacting Chemicals), which is a database based on Compartments: cellular localizations, eggNOG: gene orthology and STRING: protein–protein networks, contains detailed protein-related data, drug-related data and information regarding drug–protein interactions. As shown in Table 2, 40 candidate proteins were recorded by DrugBank and 13 candidate proteins were identified by STITCH. This result shows that these candidate proteins do interact with the corresponding drugs.

Four candidate proteins of verapamil, two candidate proteins of amitriptyline and one candidate protein of ziprasidone are labeled as ‘literature’. They have been confirmed by several published articles; this validates their mutual interaction. Of the 50 candidate proteins, three were marked as ‘unconfirmed’.

The top five candidates for five drugs each of which has less than 14 known drug–protein interactions were listed in supplementary Table ST1. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a database which contains the detailed drug data, the protein data and the drug–protein interactions. The Comparative Toxicogenomics Database (CTD) is also a public database that includes the drug–protein interactions. Five candidate proteins were recorded by KEGG, and four candidate proteins were included by CTD. The databases, DrugBank and STITCH, covered 16 and 10 candidates, respectively. It indicates that these candidates indeed interact with their corresponding drugs. In these 25 candidates, only two candidates were marked as ‘unconfirmed’, which means there are no evidences to confirm their interactions. The above analysis indicates that GVDTI has the powerful ability in discovering potential drug–protein interactions.

3.4 Prediction of novel proteins related to drugs

After training the prediction model using all drug–protein interrelationships, we used it to predict the top 10 ranked protein candidates for each drug and provide it in the supplemental Table ST2 (https://github.com/pingxuan-hlju/GVDTI). This may help biologists identify actual drug-related proteins through wet laboratory experiments.

4 Conclusions

We propose a novel prediction method, GVDTI, which extracts and integrates the topological structure of multiple sub-networks of drugs and proteins, as well as the attribute distribution and attribute representation of drug–protein node pairs to predict drug-related candidate proteins. GVDTI captures the various intra-relationships between drugs and proteins, i.e. drug similarities and protein similarities. Simultaneously, it captures the inter-relationships among drugs, proteins and diseases, i.e. drug–protein interactions, drug–disease associations and protein–disease associations. The developed graph convolutional autoencoder based framework learns pairwise topological representation, attribute distribution and attribute representation. The node attribute attention mechanism distinguishes the contributions of different attributes of a drug or protein node from its topological embedding vector. The tri-layer heterogeneous network is conducive to the formulation of pairwise attribute-embedding and further promotes the learning of pairwise attribute distribution and attribute representation. The experimental results demonstrated that GVDTI improved the drug–protein candidates prediction and top candidate proteins identification results. Our model can be used as a tool to screen potential candidate proteins and then discover the real drug–protein interaction relationships through wet laboratory experiments.

Key Points

A newly proposed attention-enhanced pairwise topological representation to embed the topology structure of drug and protein nodes and reveal the underlying topological relationship of drug–protein sub-networks. The attribute-level attention mechanism distinguishes the different contributions of various attributes of each drug or protein node from its topological embedding vector.
A heterogeneous network to facilitate the association of similarities, interactions and associations across drug, protein and disease, which assists the modeling of further pairwise attribute distribution and attribute representation.
The novel drug–protein pairwise attribute distribution modeled by convolutional variational autoencoder reveals the deep underlying relationship among drug, protein and disease data sources.
The biological premise driven pairwise attribute representation infers the drug–protein interactions through their common drugs, proteins and diseases. The improved performance for drug–protein interaction prediction was demonstrated by comparing with seven state-of-the-art prediction methods. The improved recall rate and five drug case studies further prove the ability of the proposed model.

Funding

The work was supported by the Natural Science Foundation of China (61972135, 62172143); Natural Science Foundation of Heilongjiang Province (LH2019F049 and LH2019A029); China Postdoctoral Science Foundation (2019M650069, 2020M670939); Hei-longjiang Postdoctoral Scientific Research Staring Foundation (BHLQ18104); Fundamental Research Foundation of Universi-ties in Heilongjiang Province for Technology Innovation (KJCX201805); Innovation Talents Project of Harbin Science and Technology Bureau (2017RAQXJ094); Fundamental Research Foundation of Universities in Heilongjiang Province for Youth Innovation Team (RCYJTD201805).

Ping Xuan, PhD (Harbin Institute of Technology), is a professor at the School of Computer Science and Technology, Heilongjiang University, Harbin, China. Her current research interests include computational biology, complex network analysis and medical image analysis.

Mengsi Fan is studying for her master’s degree in the School of Computer Science and Technology at Heilongjiang University, Harbin, China. Her research interests include complex network analysis and deep learning.

Hui Cui, PhD (The University of Sydney), is a lecturer at Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia. Her research interests lie in data-driven and computerized models for biomedical and health informatics.

Tiangang Zhang, PhD (The University of Tokyo), is an associate professor of the School of Mathematical Science, Heilongjiang University, Harbin, China. His current research interests include complex network analysis and computational fluid dynamics.

Toshiya Nakaguchi, PhD (Sophia University), is a professor at the Center for Frontier Medical Engineering, Chiba University, Chiba, Japan. His current research interests include complex network analysis, medical image processing and biometrics measurement.

References

1.

Huang

K

,

Xiao

C

,

Glass

LM

, et al. .

MolTrans: Molecular Interaction Transformer for drug-target interaction prediction

.

Bioinformatics

2021

;

37

(

6

):

830

–

6

.

2.

Sun

C

,

Cao

Y

,

Wei

J-M

, et al. .

Autoencoder-based drug-target interaction prediction by preserving the consistency of chemical properties and functions of drugs

.

Bioinformatics

2021

;btab384.

Google Scholar

OpenURL Placeholder Text

WorldCat

3.

Chu

Y

,

Kaushik

AC

,

Wang

X

, et al. .

DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features

.

Brief Bioinform

2021

;

22

(

1

):

451

–

62

.

4.

Verma

N

,

Qu

X

,

Trozzi

F

, et al. .

SSnet: A Deep Learning Approach for Protein-Ligand Interaction Prediction

.

Int J Mol Sci

2021

;

22

(

3

):

1392

.

5.

Chen

Z-H

,

You

Z-H

,

Guo

Z-H

, et al. .

Prediction of Drug-Target Interactions From Multi-Molecular Network Based on Deep Walk Embedding Model

.

Front Bioeng Biotechnol

2020

;

8

:

338

.

6.

Ding

Y

,

Tang

J

,

Guo

F

.

Identification of drug-target interactions via multiple information integration

.

Inform Sci

2017

;

418-419

:

546

–

60

.

Google Scholar

Crossref

WorldCat

7.

Bagherian

M

,

Sabeti

E

,

Wang

K

, et al. .

Machine learning approaches and databases for prediction of drug-target interaction: a survey paper

.

Brief Bioinform

2021

;

22

(

1

):

247

–

69

.

8.

Lee

I

,

Keum

J

,

Nam

H

.

DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

.

PLoS Comput Biol

2019

;

15

(

6

):e1007129.

Google Scholar

OpenURL Placeholder Text

WorldCat

9.

Chen

X

,

Yan

CC

,

Zhang

X

, et al. .

Drug-target interaction prediction: databases, web servers and computational models

.

Brief Bioinform

2016

;

17

(

4

):

696

–

712

.

10.

Ding

Y

,

Tang

J

,

Fei

G

.

Identification of Protein-Ligand Binding Sites by Sequence Information and Ensemble Classifier

.

J Chem Inf Model

2017

;

57

(

12

):

3149

–

61

.

11.

Whitebread

S

,

Hamon

J

,

Bojanic

D

, et al. .

Keynote review: In vitro safety pharmacology profiling: an essential tool for successful drug development- ScienceDirect

.

Drug Discov Today

2005

;

10

(

21

):

1421

–

33

.

12.

Morris

G

,

Huey

R

.

AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility

.

J Comput Chem

2009

;

48

:

443

–

53

.

Google Scholar

OpenURL Placeholder Text

WorldCat

13.

Shoichet

BK

,

McGovern

SL

,

Wei

B

, et al. .

Lead discovery using molecular docking

.

Curr Opin Chem Biol

2002

;

6

(

4

):

439

–

46

.

14.

Donald

BR

.

Algorithms in Structural Molecular Biology

.

The MIT Press

2011

;

1

:

1

–

429

.

Google Scholar

OpenURL Placeholder Text

WorldCat

15.

Ballesteros

JA

,

Palczewski

K

.

G protein-coupled receptor drug discovery: Implications from the crystal structure of rhodopsin

.

Current Opinion in Drug Discovery and Development

2001

;

4

(

5

):

561

–

74

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

16.

Zheng

X

,

Wu

LY

,

Zhou

X

, et al. .

Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces

.

BMC Syst Biol

2010

;

4

(

Suppl 2

):

S6

.

Google Scholar

OpenURL Placeholder Text

WorldCat

17.

Keiser

MJ

,

Roth

BL

,

Armbruster

BN

, et al. .

Relating Protein Pharmacology by Ligand Chemistry

.

Nat Biotechnol

2007

;

25

(

2

):

197

–

206

.

18

Ding

Y

,

Tang

J

,

Guo

F

.

Identification of drug-target interactions via multiple information integration

.

Information ences

2017

;

546

–

60

.

Google Scholar

OpenURL Placeholder Text

WorldCat

19.

Bleakley

K

,

Yamanishi

Y

.

Supervised prediction of drug-target interactions using bipartite local models

.

Bioinformatics

2009

;

25

(

18

):

2397

–

403

.

20.

Olayan

RS

,

Ashoor

H

,

Bajic

VB

.

DDR: Efficient computational method to predict drug-target interactions using graph mining and machine learning approaches

.

Bioinformatics

2018

;

34

(

7

):

1164

–

73

.

21.

Xuan

P

,

Sun

C

,

Zhang

T

, et al. .

Gradient Boosting Decision Tree-Based Method for Predicting Interactions Between Target Genes and Drugs

.

Front Genet

2019

;

10

:

459

.

22.

Wang

W

,

Yang

S

,

Zhang

X

, et al. .

Drug repositioning by integrating target information through a heterogeneous network model

.

Bioinformatics

2014

;

30

(

20

):

2923

–

30

.

23.

Luo

Y

,

Zhao

X

,

Zhou

J

, et al. .

A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information

.

Nat Commun

2017

;

8

(

1

):

573

.

24.

Sun

C

,

Xuan

P

,

Zhang

T

, et al. .

Graph convolutional autoencoder and generative adversarial network-based method for predicting drug-target interactions

.

IEEE/ACM Trans Comput Biol Bioinform

2020

;

1

:

1

–

11

.

Google Scholar

OpenURL Placeholder Text

WorldCat

25.

Xuan

P

,

Chen

B

,

Zhang

T

, et al. .

Prediction of drug-target interactions based on network representation learning and ensemble learning

.

IEEE/ACM Trans Comput Biol Bioinform

2020

;

4

:

1

–

12

.

Google Scholar

OpenURL Placeholder Text

WorldCat

26.

Manoochehri

HE

,

Kadiyala

SS

,

Nourani

M

.

Predicting Drug-Target Interactions Using Weisfeiler-Lehman Neural Network

.

IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) IEEE

2019

;

88

(

11

):

1

–

4

.

Google Scholar

OpenURL Placeholder Text

WorldCat

27.

Ping

Xuan

,

Yu

Zhang

,

Hui

Cui

,

Tiangang

Zhang

,

Maozu

Guo

,

Toshiya

Nakaguchi

.

Integrating multi-scale neighbouring topologies and cross-modal similarities for drug-protein interaction prediction

.

Brief Bioinform

2021

; bbab119:

1

–

10

.

Google Scholar

OpenURL Placeholder Text

WorldCat

28.

Allan

,

Peter

, Davis, et al. The Comparative Toxicogenomics Database:

update

2013

.

Nucleic Acids Research 2013

;

41

(

D1

):

D1104

–

14

.

Google Scholar

OpenURL Placeholder Text

WorldCat

29.

Iorio

F

,

Bosotti

R

,

Scacheri

E

, et al. .

Discovery of drug mode of action and drug repositioning from transcriptional responses

.

Proc Natl Acad Sci U S A

2010

;

107

(

8

):

14621

–

6

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

30

Wang

W

,

Yang

S

,

Zhang

X

, et al.

Drug repositioning by integrating target information through a heterogeneous network model

.

Bioinformatics

2014

;

20

:

2923

–

30

.

Google Scholar

OpenURL Placeholder Text

WorldCat

31.

Kipf

TN

,

Welling

M

.

Variational graph auto-encoders

.

Conference and Workshop on Neural Information Processing Systems NIPS

2016

;

1050

:

1

–

3

.

Google Scholar

OpenURL Placeholder Text

WorldCat

32.

Schlichtkrull

M

,

Kipf

TN

,

Bloem

P

, et al. .

Modeling Relational Data with Graph Convolutional Networks

.

European semantic web conference

2018

;

1

:

593

–

607

.

Google Scholar

OpenURL Placeholder Text

WorldCat

33.

Kipf

TN

,

Welling

M

.

Semisupervised classifification with graph convolutional networks

.

International Conference on Learning Representations

2016

;

1609

:

1

–

14

.

Google Scholar

OpenURL Placeholder Text

WorldCat

34.

Ma

T

,

Cao

X

,

Zhou

J

, et al. .

Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders

.

International Joint Conference on Artificial Intelligence IJCAI

2018

;

1804

:

1

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

35.

Chen

Y

,

Rijke

MD

.

A Collective Variational Autoencoder for Top-N Recommendation with Side Information

.

Association for Computing Machinery

2018

;

1807

:

3

–

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

36.

Vincent

P

,

Larochelle

H

,

Lajoie

I

, et al. .

Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

.

Journal of Machine Learning Research

2010

;

11

(

12

):

3371

–

408

.

Google Scholar

OpenURL Placeholder Text

WorldCat

37.

Gligorijevic

V

, et al. .

deepNF: deep network fusion for protein function prediction

.

Bioinformatics

2018

;

34

:

3873

–

81

.

38.

Zeng

X

,

Zhu

S

,

Liu

X

, et al. .

deepDR: a network-based deep learning approach to in silico drug repositioning

.

Bioinformatics

2019

;

35

(

24

):

5191

–

8

.

39.

Kingma

DP

,

Welling

M

.

Auto-encoding variational Bayes

arXiv

.

2013

;

1312

:

6114

.

40.

Kingma

D

,

Ba

J

.

Adam: A Method for Stochastic Optimization

.

International Conference for Learning Representations

2015

;

1412

:

1

–

15

.

Google Scholar

OpenURL Placeholder Text

WorldCat

41.

Bahdanau

D

,

Cho

K

,

Bengio

Y

.

Neural Machine Translation by Jointly Learning to Align and Translate

.

International Conference on Learning Representations ICLR

2015

;

1409

:

1

–

15

.

Google Scholar

OpenURL Placeholder Text

WorldCat

42.

Leonard

J

,

Kramer

MA

.

Improvement of the backpropagation algorithm for training neural networks

.

Computers and Chemical Engineering

1990

;

14

(

3

):

337

–

41

.

Google Scholar

Crossref

WorldCat

43.

Karimollah

HT

.

Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation

.

Caspian J Intern Med

2013

;

4

:

627

–

35

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

44.

Ling

CX

,

Huang

J

,

Zhang

H

.

AUC: a better measure than accuracy in comparing learning algorithms

.

Conference of the Canadian Society for Computational Studies of Inteligence

2003

;

2671

:

329

–

41

.

Google Scholar

OpenURL Placeholder Text

WorldCat

45.

Takaya

S

,

Marc

R

,

Guy

B

.

The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets

.

PLoS ONE

2015

;

10

(

3

):e0118432.

Google Scholar

OpenURL Placeholder Text

WorldCat

46.

Ezzat

A

,

Zhao

P

,

Wu

M

, et al. .

Drug-target interaction prediction with graph regularized matrix factorization

.

IEEE/ACM Trans Comput Biol Bioinform

2016

;

14

(

3

):

646

–

56

.

47.

Li

Z-C

,

Huang

M-H

,

Zhong

W-Q

, et al. .

Identification of drugtarget interaction from interactome network with ‘guilt-byassociation’ principle and topology features

.

Bioinformatics

2015

;

32

(

7

):

1057

–

64

.

48.

Wishart

DS

,

Feunang

YD

,

An

CG

, et al. .

DrugBank 5.0: A major update to the DrugBank database for 2018

.

Nucleic Acids Res

2017

;

46

(

D1

):

D1074

–

82

.

Google Scholar

Crossref

WorldCat

49.

Tfelt-Hansen

P

,

Tfelt-Hansen

J

.

Verapamil for Cluster Headache. Clinical Pharmacology and Possible Mode of Action

.

The Journal of Head and Face Pain

2009

;

49

(

1

):

117

–

25

.

Google Scholar

Crossref

WorldCat

50.

Casis

O

,

Sánchez-Chapula

JA

.

Disopyramide, imipramine, and amitriptyline bind to a common site on the transient outward K+ channel

.

J Cardiovasc Pharmacol

1998

;

32

(

4

):

521

–

6

.

51.

Nasrallah

HA

.

Atypical antipsychotic-induced metabolic side effects: insights from receptor-binding profiles

.

Mol Psychiatry

2008

;

13

(

1

):

27

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
October 2021	57
November 2021	240
December 2021	101
January 2022	56
February 2022	68
March 2022	78
April 2022	67
May 2022	73
June 2022	42
July 2022	31
August 2022	35
September 2022	52
October 2022	41
November 2022	25
December 2022	27
January 2023	14
February 2023	20
March 2023	30
April 2023	18
May 2023	16
June 2023	7
July 2023	30
August 2023	24
September 2023	27
October 2023	14
November 2023	29
December 2023	9
January 2024	79
February 2024	46
March 2024	57
April 2024	56
May 2024	56
June 2024	20
July 2024	43
August 2024	34
September 2024	38
October 2024	26
November 2024	25
December 2024	50
January 2025	31
February 2025	30
March 2025	43
April 2025	23
May 2025	8

Article Contents

GVDTI: graph convolutional and variational autoencoders with attribute-level attention for drug–protein interaction prediction

Abstract

1 Introduction

2 Materials and Methods

2.1 Dataset

2.2 Calculation and representation of multi-source data

2.2.1 Association and interaction matrices

2.2.2 Similarity matrices

2.3 Pairwise attention-enhanced topological representation learning

2.3.1 Attention mechanism at attribute level

2.3.2 Pairwise attention-enhanced topology extraction by graph convolutional autoencoder

2.4 Construction of attribute-embedding matrix

2.5 Pairwise attribute distribution learning by CVAE

2.6 Pairwise attribute representation learning by multi-layer convolutional neural network

2.7 Integration of the multiple pairwise representations

3 Experimental evaluations and discussions

3.1 Evaluation metrics

3.2 Comparison with other methods

3.3 Case studies on five drugs

3.4 Prediction of novel proteins related to drugs

4 Conclusions

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

GVDTI: graph convolutional and variational autoencoders with attribute-level attention for drug–protein interaction prediction Free

Abstract

1 Introduction

2 Materials and Methods

2.1 Dataset

2.2 Calculation and representation of multi-source data

2.2.1 Association and interaction matrices

2.2.2 Similarity matrices

2.3 Pairwise attention-enhanced topological representation learning

2.3.1 Attention mechanism at attribute level

2.3.2 Pairwise attention-enhanced topology extraction by graph convolutional autoencoder

2.4 Construction of attribute-embedding matrix

2.5 Pairwise attribute distribution learning by CVAE

2.6 Pairwise attribute representation learning by multi-layer convolutional neural network

2.7 Integration of the multiple pairwise representations

3 Experimental evaluations and discussions

3.1 Evaluation metrics

3.2 Comparison with other methods

3.3 Case studies on five drugs

3.4 Prediction of novel proteins related to drugs

4 Conclusions

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

GVDTI: graph convolutional and variational autoencoders with attribute-level attention for drug–protein interaction prediction