GSScore: a novel Graphormer-based shell-like scoring method for protein–ligand docking

The Jensen-Shannon divergence between training and several test sets

	RMSD<2Å	RMSD<10Å
primary test	0.031	0.101
CASF2016	0.228	0.136
DUD-E	0.102	0.132

Table 1

The Jensen-Shannon divergence between training and several test sets

	RMSD<2Å	RMSD<10Å
primary test	0.031	0.101
CASF2016	0.228	0.136
DUD-E	0.102	0.132

Node and edge features

The node features used in this paper are presented in Table 2. The molecular features we used are referenced from LigPose [48], but we made slight adjustments by removing the covalent feature. All the features in Table 2 are processed using RDKit. In the graph construction process, the atoms of the protein or ligand are considered nodes. For protein atoms, there are seven categories, while for ligand atoms, there are eight categories. Each category is encoded using a one-hot encoding scheme. For example, there are seven categories for protein atoms based on their Atom types, with an additional ’other’ category. Hence, a one-hot vector of length eight is used to represent the category of a protein atom.

Table 2

The Node and edge features

protein node features	ligand node features	edge features
atom type(7)	atom type(7)	bond type(7)
atom degree(10)	atom degree(10)	distance(1)
implicit valence(6)	implicit valence(6)
neighboring hydrogen(6)	neighboring hydrogen(6)
hybridization(9)	hybridization(9)
amino acid type(22)	formal charge(22)
atom name(37)	ring size(12)
	aromatic(2)

protein node features	ligand node features	edge features
atom type(7)	atom type(7)	bond type(7)
atom degree(10)	atom degree(10)	distance(1)
implicit valence(6)	implicit valence(6)
neighboring hydrogen(6)	neighboring hydrogen(6)
hybridization(9)	hybridization(9)
amino acid type(22)	formal charge(22)
atom name(37)	ring size(12)
	aromatic(2)

Table 2

The Node and edge features

protein node features	ligand node features	edge features
atom type(7)	atom type(7)	bond type(7)
atom degree(10)	atom degree(10)	distance(1)
implicit valence(6)	implicit valence(6)
neighboring hydrogen(6)	neighboring hydrogen(6)
hybridization(9)	hybridization(9)
amino acid type(22)	formal charge(22)
atom name(37)	ring size(12)
	aromatic(2)

protein node features	ligand node features	edge features
atom type(7)	atom type(7)	bond type(7)
atom degree(10)	atom degree(10)	distance(1)
implicit valence(6)	implicit valence(6)
neighboring hydrogen(6)	neighboring hydrogen(6)
hybridization(9)	hybridization(9)
amino acid type(22)	formal charge(22)
atom name(37)	ring size(12)
	aromatic(2)

The categorization of protein atoms and ligand atoms is different. Protein atoms are classified into H, C, N, O, P, S, metal (Ca, Fe, K, Mg, Mn, Na, Zn) and other categories. Ligand atoms are classified into H, C, N, O, P, S, halogen (F, Cl, Br, I) and other categories. In the protein features, the Atom type represents the element category of the atom, while the atom name represents the string representation of the atom in columns 13–16 of the PDB file.

The features Atom degree, Implicit valence, Neighboring hydrogen and Hybridization for ligand atoms are similar to those of protein atoms. The feature ’formal charge(22)’ for ligand atoms represents the formal charge of the atom. The feature ’ring size(12)’ indicates the size of the ring that the atom belongs to. If the atom is part of a fused benzene ring, the largest ring is selected. Currently, support is provided for rings with a maximum of 10 atoms, and anything beyond that is categorized as ’other’. The feature ’aromatic(2)’ represents whether the atom is aromatic or not.

In addition, we have defined seven categories for the interactions between atoms (edges between nodes): Single, Double, Triple, Aromatic, Non-covalent, Other and Unknown. These categories represent different types of interactions or bonds between atoms in the molecular structure. The Single, Double and Triple categories indicate single, double and triple covalent bonds, respectively. The Aromatic category represents aromatic interactions, often found in conjugated systems. The Non-covalent category encompasses non-covalent interactions such as hydrogen bonds, van der Waals interactions and electrostatic interactions. The Other category is used for any other types of interactions not covered by the previous categories. The Unknown category is assigned when the nature of the interaction is not known or cannot be determined. These interaction categories provide important information about the connectivity and bonding patterns between atoms in the molecular system.

Shell-like graph

The shell-like partitioning of the molecular environment in GSScore is inspired by the work of Wen Torng et al. [40] and DeepRMSD [34]. However, unlike DeepRMSD, we do not consider each atom as the center of a spherical shell. Instead, we use the entire ligand molecule as the center. Therefore, a shell in GSScore is not necessarily a spherical shell but can have irregular shapes. As shown in Figure 1, the entire ligand molecule in the protein pocket serves as the center of the shell, and multiple layers of shells are defined. The distance between the first shell and all ligand atoms is denoted as $d_{0}$ ⁠, and subsequent shells are separated by a distance $d$ from the previous shell. There are a total of $n$ shells, and the distance between the $k$ th shell and all ligand atoms is given by $d_{0} + (k - 1) \times d$ ⁠.

Figure 1

The network architecture of GSScore. For a given protein–ligand conformation, GSScore divides it into multiple layers of shells with different distances, using the ligand molecule as the central reference point. Each shell includes protein atoms that are in proximity to the ligand, and a separate graph is constructed for each shell. These graphs are then individually inputted into a Graphormer to extract their respective feature vectors. The resulting feature vectors from all the graphs are concatenated into a single long vector, which is then passed through an MLP layer to predict the RMSD. Graphormer is an enhancement of the traditional Transformer model, incorporating spatial topological features and edge features based on the shortest path. This integration of graph topology and edge features helps to improve the predictive power of the model.

When processing the $k$ th shell, a subgraph, denoted as graph $_{k}$ ⁠, is constructed by considering the ligand atoms and all protein atoms between the $k$ th and $(k - 1)$ th shells. In graph $_{k}$ ⁠, the distance between ligand atoms is fixed to 1, and there are associated edges between protein atoms and ligand atoms within the distance range of DistMin to DistMax. When $k = 1$ ⁠, DistMin is set to 0, and DistMax is set to $d_{0}$ ⁠. When $k > 1$ ⁠, DistMin is set to $d_{0} + (k - 2) \times d$ ⁠, and DistMax is set to $d_{0} + (k - 1) \times d$ ⁠. No edges are created between protein atoms to better capture the dependencies between protein and ligand atoms. The node and edge feature vectors for the subgraph graph $_{k}$ are described in section 2.3. Each node in the subgraph is represented by a vector of length 171. To differentiate between protein and ligand atoms, only the first 97 dimensions are used for protein atoms, with the remaining 74 dimensions padded with zeros. For ligand atoms, only the last 74 dimensions are used, with the first 97 dimensions padded with zeros. The edge feature vector is different from the node feature vector and has a total of eight dimensions. The first seven dimensions are a one-hot vector representing the edge type, and the last dimension represents the distance of the edge. Instead of using the actual Euclidean distance, the distance value is defined as DistMax.

Network architecture of GSScore

GSScore inputs graph $_{k}$ into the corresponding Graphormer $_{k}$ ⁠. The model architecture of GSScore is based on Graphormer [39]. The fundamental idea behind Graphormer is to use the topological information of the graph instead of the positional encoding used in the original Transformer. The basic equation of the Transformer is shown in Equation (1)

\begin{aligned} \begin{aligned} Q & = H W_{Q}, K = H W_{K}, V = H W_{V}, \\ A & = \frac{Q K^{⊤}}{\sqrt{d_{K}}}, Attn (H) = softmax (A) V, \end{aligned} \end{aligned}

(1)

where $H = {[h_{1}^{⊤}, \dots, h_{n}^{⊤}]}^{⊤} \in R^{n \times d}$ ⁠, $W_{Q} \in R^{d \times d}$ ⁠, $W_{K} \in R^{d \times d}$ ⁠, $W_{V} \in R^{d \times d}$ ⁠, $h_{i}^{⊤} \in R^{1 \times d}$ is the hidden representation at position $i$ and $d$ is the hidden dimension.

Here, we only used two additional graph topological features from the original Graphormer paper, namely Spatial encoding and Edge encoding, but we omitted the Centrality encoding feature [39]. In our specific experiments, we found that the Centrality encoding feature did not improve the performance of the model. A more detailed analysis of this will be presented in the ablation experiments.

For Spatial encoding, we use the shortest path between nodes $v_{i}$ and $v_{j}$ as the original feature, as shown in Equation (2). The function $ϕ (v_{i}, v_{j}) : V \times V \to R$ is defined to represent the shortest path between nodes $v_{i}$ and $v_{j}$ ⁠. We use the Floyd–Warshall algorithm [49] to compute the shortest paths between any two nodes. If there is no shortest path between the two nodes, we set the distance value to $- 1$ ⁠. Additionally, we set a maximum length for the shortest path, hop_max. If the shortest path between $v_{i}$ and $v_{j}$ is greater than hop_max, the path is truncated to the length of hop_max. Considering that the shortest path values have a large number of repeated values, directly incorporating them into the attention matrix is not a wise choice. Therefore, we introduce a learnable scalar $b$ as the bias for self-attention, where $ϕ (v_{i}, v_{j})$ is mapped to a learnable parameter. With the inclusion of $b_{ϕ (v i, v j)} \in R$ ⁠, each individual Transformer layer can adaptively adjust its receptive field based on the graph structure. This is different from traditional GNNs, which can only aggregate information from neighboring nodes. When $b_{ϕ (v i, v j)}$ decreases during the training process, the model will pay more attention to nearby neighbor information, or otherwise, it will focus more on distant neighbor information.

For Edge encoding, we encode each edge along the shortest path between nodes $v_{i}$ and $v_{j}$ ⁠, as shown in the right half of Equation (2). Traditional approaches for incorporating edge features either directly add the edge information to the node feature vectors or apply the edge information during the aggregation of node information. Both of these methods propagate the edge information to neighboring nodes but may overlook the contribution of edge information to the entire graph. Given two nodes $v_{i}$ and $v_{j}$ on the graph, we use the Floyd–Warshall algorithm to find the edge sequence $(e_{1}, e_{2}, . . ., e_{N})$ along the shortest path between them. Each edge $e_{i}$ corresponds to a learnable parameter $w_{n}^{E}$ ⁠, and the feature vector of each edge is computed by taking the dot-product with the parameter vector. After computing the dot-products for all edges along the path, the resulting value $c_{i j} \in R$ is added to the attention matrix as the attention weight between nodes $v_{i}$ and $v_{j}$ ⁠, along with $b_{ϕ (v i, v j)}$ ⁠. Similar to Spatial encoding, paths longer than hop_max are truncated to a length of hop_max

\begin{aligned} \begin{aligned} A_{i j} & = \frac{(h_{i} W_{Q}) {(h_{j} W_{K})}^{T}}{\sqrt{d}} + b_{ϕ (v_{i}, v_{j})} + c_{i j}, \\ c_{i j} & = \frac{1}{N} \sum_{i = 1}^{N} x_{e_{i}} {(w_{i}^{E})}^{T}, \end{aligned} \end{aligned}

(2)

where $x_{e_{i}} \in R^{d_{E}}$ is the feature of the $i$ th edge, $w_{i}^{E} \in R^{d_{E}}$ is the $i$ th learnable parameters, $d_{E}$ is the dimension of edge feature and $N \leq h o p_m a x$ is the length of the shortest path between $v_{i}$ and $v_{j}$ ⁠.

In addition, considering that in some conformations, due to the docking algorithm, the ligand molecule may be far away from the protein molecule, for example, the shortest distance between the ligand molecule and the protein molecule is more than 20Å. Therefore, we define that when there are none of the protein atom nodes in the multi-shell graph constructed by the protein–ligand complex conformation, it is considered that the predicted RMSD value of this conformation is infinity.

Training GSScore

During training, AdamW optimizer with an initial learning rate of 0.001 was used to minimize the mean square error (MSE) loss between predicted RMSD and truth RMSD of the pose as equation 3 below. During the training process, we use a variable learning rate, where the learning rate is halved every $S T E P$ epochs. After conducting extensive experiments, we found that setting $S T E P$ to 25 works well. Additionally, we employ early stopping, which stops the training process if the loss does not decrease for a consecutive number of $S T E P$ epochs. This helps to avoid unnecessary training. The maximum number of training iterations is set to 150, as we have observed through extensive experiments that the checkpoint models converge before reaching 150 iterations. There will be 150 checkpoint models at most. The training data are evenly divided into four parts, with one part used for training and another randomly selected part used for validation. The validation dataset is used to select the best checkpoint model from multiple epochs and evaluate it on the test dataset to calculate various performance metrics for the part of training data we selected above. There will be four models for we have four parts of training data. The mean value and standard deviation of the four models will be reported as final results

\begin{aligned} M S E = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i})^{2} . \end{aligned}

(3)

Evaluation metrics

Six basic evaluation metrics, RMSE, $R$ ⁠, Spearman, Docking power, Hit rate and Enrichment factor are used to compare the performance of GSScore with other methods. $R$ means the Pearson’s correlation coefficient between the computed RMSDs and the experimental binding RMSDs, while Spearman means the Spearman correlation coefficient between the computed RMSDs and the experimental binding RMSDs.

Docking power refers to the ability of a scoring function to identify the native ligand binding pose among computer-generated decoys [11]. Ideally, the native binding pose should be identified as the top-ranked one. Docking power is usually described as the percentage of complexes that have near-native poses within top-ranked poses, can be described using the following equation:

\begin{aligned} Docking power = \frac{m (k)}{M}, \end{aligned}

(4)

where $m (k)$ is the number of complexes that have at least one near-native poses among top $k$ ranked poses, and $M$ is the total number of complexes in the test set.

Hit rate measures the fraction of near-native poses among top-ranked poses relative to all near-native decoys among the entire poses set for a given protein–ligand complex. Since there is not only one protein–ligand complex in the test sets, we use the mean Hit rate to measure the performance. Therefore, the mean Hit rate can be described below

\begin{aligned} Hit rate = \frac{1}{M} \sum_{i = 1}^{M} \frac{h (k)}{P}, \end{aligned}

(5)

where $h (k)$ is the number of near-native poses among top $k$ ranked poses, and $P$ is the total number of near-native poses among the entire set of poses for a given protein–ligand complexes.

Enrichment factor (EF) is regarded as the second quantitative indicator of screening power for docking poses [11]. We use ’EF’ as the shorthand for EF. Screening power refers to the ability of a scoring function to identify the true binders to a given target protein among a pool of random molecules. The EF can be described below.

\begin{aligned} Enrichment factor = \frac{1}{M} \sum_{i = 1}^{M} \frac{h (k)}{P \times α (k)}, \end{aligned}

(6)

where $α (k)$ is the fraction of $k$ to the total number of docking poses in a complex.

In addition, we also consider Screening power as an evaluation indicator. This evaluation index is divided into two parts, namely, Success rate and EF. The specific definition of Screening power can be found in the literature. Different from the previous six basic evaluation indicators, Screening power is aimed at cross-docking. In the docking process described above, a protein target corresponds to only one ligand, and the docking software searches for multiple poses for that ligand. During screening power docking, a protein target corresponds to multiple different ligands, which are derived from ligand molecules in different protein complexes. Therefore, screening power is used for cross-docking evaluation. It is worth to note that there are differences between re-docking and cross-docking for the EF, and we use the ’EF(re-docking)’ and ’EF(cross-docking)’ to distinguish them. The results of EF(cross-docking) are in the Part3 of the Supplementary Materials.

EXPERIMENTS AND RESULTS

Scoring functions to be compared

We have extensively evaluated our GSScore on the primary test set, CASF2016 test set and DUD-E test set, and compared it with three RMSD prediction functions of protein–ligand docking poses, DeepBSP [44], DeepRMSD [34] and ViTScore [50]. The reason why we chose these functions was that all of them were trained on PDBBind2019 general set [34, 44, 50]. Therefore, we can make a fair competition. DeepBSP was the first RMSD prediction function based on 3D-CNN. Its network architecture was similar to KDEEP [28] which was widely used in binding affinity prediction of protein–ligand docking poses. Meanwhile, we compared our previous RMSD prediction function ViTScore [50], which was based on the 3D Vision Transformer. In addition, we also compared our method with DeepRMSD [34]. DeepRMSD was the first method that applied the shell-like modeling approach to RMSD prediction. However, we thought that DeepRMSD+vina should not be used here for two main reasons. During the computation process, DeepRMSD+vina fine-tuned the spatial structure of poses, which was different from the conventional definition of a scoring function. The spatial structure changes of poses should be part of the searching process, and an independent scoring function should not alter the spatial structure of poses. DeepRMSD+vina progressively optimized the spatial structure of each pose during the iteration process, leading to some non-native poses being transformed into near-native poses. This resulted in an increase in evaluation metrics such as Docking power, Hit rate and EF, as the numerator increased. This evaluation approach might be not suitable for traditional scoring functions. Therefore, for fairness, we compared our method with DeepRMSD alone.

Evaluation of GSScore on the primary test set

As shown in Table 3, GSScore performs the best among the four methods in terms of RMSE metrics. Specifically, GSScore achieves an RMSE as low as 1.531, which is 4.25% lower than our previous work, ViTScore and 4.97% lower than DeepBSP. However, DeepRMSD gives an RMSE of 6.971, which greatly exceeds our expectations. In the experimental process, although we have excluded some poses that encountered errors during the execution of DeepRMSD, there are still some poses that yield exceptionally large RMSD values when being inputted into DeepRMSD, despite no errors occurring during their execution. Therefore, we continue to exclude poses with predicted RMSD values greater than 15Åto avoid potentially misleading RMSE calculations. It is worth noting that there is no RMSE evaluation in the Vina method here, because the affinity value of each conformation calculated by Vina is not the RMSD value of each pose, which is different from the prediction results of other methods.

Table 3

Comparison of GSScore with DeepBSP, ViTScore, Vina and DeepRMSD for the primary test set^1,2

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	1.611	6.971	1.599	∖	1.531 $\pm$ 0.012
R	0.826	−0.011	0.828	−0.068	0.835 $\pm$ 0.006
Spearman	0.821	0.273	0.843	0.042	0.881 $\pm$ 0.007
Docking power	0.902	0.697	0.996	0.674	0.895 $\pm$ 0.015
Hit rate	0.128	0.106	0.161	0.086	0.139 $\pm$ 0.005
EF(re-docking)	12.412	10.282	15.675	8.160	13.550 $\pm$ 0.223

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	1.611	6.971	1.599	∖	1.531 $\pm$ 0.012
R	0.826	−0.011	0.828	−0.068	0.835 $\pm$ 0.006
Spearman	0.821	0.273	0.843	0.042	0.881 $\pm$ 0.007
Docking power	0.902	0.697	0.996	0.674	0.895 $\pm$ 0.015
Hit rate	0.128	0.106	0.161	0.086	0.139 $\pm$ 0.005
EF(re-docking)	12.412	10.282	15.675	8.160	13.550 $\pm$ 0.223

$^{1}$ ’∖’ means that there is no corresponding evaluation index for this method $^{2}$ Bold font indicates the maximum value in a row.

Table 3

Comparison of GSScore with DeepBSP, ViTScore, Vina and DeepRMSD for the primary test set^1,2

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	1.611	6.971	1.599	∖	1.531 $\pm$ 0.012
R	0.826	−0.011	0.828	−0.068	0.835 $\pm$ 0.006
Spearman	0.821	0.273	0.843	0.042	0.881 $\pm$ 0.007
Docking power	0.902	0.697	0.996	0.674	0.895 $\pm$ 0.015
Hit rate	0.128	0.106	0.161	0.086	0.139 $\pm$ 0.005
EF(re-docking)	12.412	10.282	15.675	8.160	13.550 $\pm$ 0.223

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	1.611	6.971	1.599	∖	1.531 $\pm$ 0.012
R	0.826	−0.011	0.828	−0.068	0.835 $\pm$ 0.006
Spearman	0.821	0.273	0.843	0.042	0.881 $\pm$ 0.007
Docking power	0.902	0.697	0.996	0.674	0.895 $\pm$ 0.015
Hit rate	0.128	0.106	0.161	0.086	0.139 $\pm$ 0.005
EF(re-docking)	12.412	10.282	15.675	8.160	13.550 $\pm$ 0.223

$^{1}$ ’∖’ means that there is no corresponding evaluation index for this method $^{2}$ Bold font indicates the maximum value in a row.

In Table 3, $R$ represents the Pearson coefficient between the predicted RMSD values and the true RMSD values for all poses. However, for Vina, $R$ represents the Pearson correlation coefficient between the predicted conformational affinity and the conformational RMSD value. It can be observed that GSScore achieves an $R$ -value of 0.835, which is 0.007 higher than our previous work,ViTScore and 0.009 higher than DeepBSP. Although the improvement is not significantly pronounced, it is still a noteworthy enhancement considering a large number of test poses, exceeding 22 000. Additionally, it is widely recognized that for any evaluation metric, the closer it is to its optimal value, the more limited the room for improvement becomes.

In Table 3, Spearman represents the Spearman coefficient between the predicted RMSD values and the true RMSD values for all poses. It is also worth to note, for Vina, Spearman represents the Spearman correlation coefficient between the predicted conformational affinity and the conformational RMSD value. It can be observed that GSScore achieves an Spearman-value of 0.881, which is 0.038 higher than our previous work, ViTScore and 0.06 higher than DeepBSP.

For the Docking power, Hit rate and EF(re-docking) shown in Table 3, GSScore does not perform optimally on the primary test. This could be attributed to the model sacrificing some performance in order to improve generalization on test data with inductive bias or similar distribution. This is in contrast to our previous work, ViTScore, which exhibits significant performance on test data with similar distribution, resulting in excellent results on the primary test. However, it is important to note that GSScore does not perform poorly either. It achieves a Docking power of 0.895, which is close to DeepBSP; a Hit rate of 0.139, surpassing DeepBSP’s 0.128; and an EF(re-docking) of 13.550, which also exceeds DeepBSP’s 12.412.

Evaluation of GSScore on CASF2016 test set

From Table 4, GSScore still has the lowest RMSE, highest $R$ and Spearman among the five methods. GSScore achieves an RMSE as low as 1.586, as well as $R$ and Spearman as high as 0.831 and 0.846, respectively, which are similar to the results in Table 3 and reflect the generalization ability of GSScore to some extent. Although GSScore has a Docking power of only 0.862, lower than ViTScore’s 0.905, DeepBSP’s 0.898 and Vina’s 0.902, its Hit rate of 0.047 and EF(re-docking) of 3.789 are both higher than the other three methods. It is worth noting that Docking power is more applicable for evaluating the performance of a ligand molecule in inverse virtual screening across multiple protein molecules. However, for the docking process of a ligand molecule on a single protein molecule, we pay more attention to the evaluation of the Hit rate and the EF in this study.

Table 4

Comparison of GSScore with DeepBSP, ViTScore, Vina and DeepRMSD for the CASF2016 test set

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	1.601	2.239	1.764	∖	1.586 $\pm$ 0.016
R	0.821	0.637	0.789	0.604	0.831 $\pm$ 0.008
Spearman	0.808	0.657	0.843	0.528	0.846 $\pm$ 0.013
Docking power	0.898	0.580	0.905	0.902	0.862 $\pm$ 0.024
Hit rate	0.046	0.029	0.041	0.043	0.047 $\pm$ 0.003
EF(re-docking)	3.665	2.284	3.277	3.635	3.789 $\pm$ 0.271

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	1.601	2.239	1.764	∖	1.586 $\pm$ 0.016
R	0.821	0.637	0.789	0.604	0.831 $\pm$ 0.008
Spearman	0.808	0.657	0.843	0.528	0.846 $\pm$ 0.013
Docking power	0.898	0.580	0.905	0.902	0.862 $\pm$ 0.024
Hit rate	0.046	0.029	0.041	0.043	0.047 $\pm$ 0.003
EF(re-docking)	3.665	2.284	3.277	3.635	3.789 $\pm$ 0.271

$^{1}$ ’∖’ means that there is no corresponding evaluation index for this method $^{2}$ Bold font indicates the maximum value in a row.

Table 4

Comparison of GSScore with DeepBSP, ViTScore, Vina and DeepRMSD for the CASF2016 test set

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	1.601	2.239	1.764	∖	1.586 $\pm$ 0.016
R	0.821	0.637	0.789	0.604	0.831 $\pm$ 0.008
Spearman	0.808	0.657	0.843	0.528	0.846 $\pm$ 0.013
Docking power	0.898	0.580	0.905	0.902	0.862 $\pm$ 0.024
Hit rate	0.046	0.029	0.041	0.043	0.047 $\pm$ 0.003
EF(re-docking)	3.665	2.284	3.277	3.635	3.789 $\pm$ 0.271

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	1.601	2.239	1.764	∖	1.586 $\pm$ 0.016
R	0.821	0.637	0.789	0.604	0.831 $\pm$ 0.008
Spearman	0.808	0.657	0.843	0.528	0.846 $\pm$ 0.013
Docking power	0.898	0.580	0.905	0.902	0.862 $\pm$ 0.024
Hit rate	0.046	0.029	0.041	0.043	0.047 $\pm$ 0.003
EF(re-docking)	3.665	2.284	3.277	3.635	3.789 $\pm$ 0.271

$^{1}$ ’∖’ means that there is no corresponding evaluation index for this method $^{2}$ Bold font indicates the maximum value in a row.

Furthermore, in our previous work, ViTScore does not perform as well as DeepBSP in terms of RMSE and $R$ on CASF2016. Our previous analysis revealed that ViTScore is sensitive to the distribution of RMSD values, which leads to suboptimal performance on CASF2016, where the RMSD distribution differs from the training set. However, after extensive analysis, we have designed GSScore, which exhibits relatively good generalization. Thanks to the shell-like design, GSScore is able to capture the interaction patterns between atoms at different distance ranges. Despite the difference in RMSD distribution between CASF2016 and the training and primary test sets, GSScore is still capable of capturing the interaction patterns between ligand atoms and protein atoms at different distances.

Evaluation of GSScore on DUD-E test set

The results on DUD-E test dataset are shown in Table 5. GSScore outperforms all other methods in all six evaluation metrics. In terms of RMSE, GSScore is the only method with an RMSE below 2Å, reaching as low as 1.668. It also has the highest $R$ and Spearman values among all methods, reaching 0.817 and 0.825. In terms of Docking power, GSScore surpasses the other three methods, reaching 0.85. This is an 18.1% improvement over our previous work, ViTScore and a 37.1% improvement over DeepBSP. Moreover, GSScore achieves a Hit rate of 0.161, which is 42.5% higher than DeepRMSD’s 0.113. GSScore also achieves an EF(re-docking) of 16.316, which is 42.4% higher than DeepRMSD’s 11.455.

Table 5

Comparison of GSScore with DeepBSP, ViTScore and DeepRMSD for the DUD-E test set

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	2.258	14.151	2.010	∖	1.668 $\pm$ 0.011
$R$	0.596	−0.105	0.703	0.166	0.817 $\pm$ 0.008
Spearman	0.559	0.345	0.718	0.267	0.835 $\pm$ 0.015
Docking power	0.620	0.681	0.720	0.636	0.850 $\pm$ 0.020
Hit rate	0.101	0.113	0.108	0.091	0.161 $\pm$ 0.003
EF(re-docking)	10.147	11.455	10.926	9.216	16.316 $\pm$ 0.252

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	2.258	14.151	2.010	∖	1.668 $\pm$ 0.011
$R$	0.596	−0.105	0.703	0.166	0.817 $\pm$ 0.008
Spearman	0.559	0.345	0.718	0.267	0.835 $\pm$ 0.015
Docking power	0.620	0.681	0.720	0.636	0.850 $\pm$ 0.020
Hit rate	0.101	0.113	0.108	0.091	0.161 $\pm$ 0.003
EF(re-docking)	10.147	11.455	10.926	9.216	16.316 $\pm$ 0.252

$^{1}$ ’∖’ means that there is no corresponding evaluation index for this method $^{2}$ Bold font indicates the maximum value in a row.

Table 5

Comparison of GSScore with DeepBSP, ViTScore and DeepRMSD for the DUD-E test set

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	2.258	14.151	2.010	∖	1.668 $\pm$ 0.011
$R$	0.596	−0.105	0.703	0.166	0.817 $\pm$ 0.008
Spearman	0.559	0.345	0.718	0.267	0.835 $\pm$ 0.015
Docking power	0.620	0.681	0.720	0.636	0.850 $\pm$ 0.020
Hit rate	0.101	0.113	0.108	0.091	0.161 $\pm$ 0.003
EF(re-docking)	10.147	11.455	10.926	9.216	16.316 $\pm$ 0.252

$Indicators \ Methods$	DeepBSP	DeepRMSD	ViTScore	Vina	GSScore
RMSE	2.258	14.151	2.010	∖	1.668 $\pm$ 0.011
$R$	0.596	−0.105	0.703	0.166	0.817 $\pm$ 0.008
Spearman	0.559	0.345	0.718	0.267	0.835 $\pm$ 0.015
Docking power	0.620	0.681	0.720	0.636	0.850 $\pm$ 0.020
Hit rate	0.101	0.113	0.108	0.091	0.161 $\pm$ 0.003
EF(re-docking)	10.147	11.455	10.926	9.216	16.316 $\pm$ 0.252

$^{1}$ ’∖’ means that there is no corresponding evaluation index for this method $^{2}$ Bold font indicates the maximum value in a row.

From the previous analysis of RMSD distributions in supplymentary material, it is known that DUD-E has the lowest proportion of native poses among all test datasets. Therefore, correctly predicting the RMSD of native poses is a challenging task for all methods and should not be overlooked. The comprehensive superiority of GSScore over other methods on this dataset reflects its generalization ability. This also demonstrates, on a broader range of data, that the shell-like design approach helps improve the generalization performance of the model.

It is worth noting that DeepRMSD has an abnormally high RMSE and a negative $R$ -value. In fact, during the experiments, we discovered that DUD-E yielded many abnormal RMSD predictions during runtime, similar to the primary test set. Similarly, we excluded poses with predicted RMSD greater than 15Å, as these can also be considered processing errors. The results show that DeepRMSD tends to overestimate the RMSD predictions for DUD-E, so although it does not exhibit abnormalities in terms of Docking power, the RMSE, $R$ and Spearman values cannot be considered normal.

DISCUSSION

Interpretability analysis

Here, we use the DeepLIFT [51] program to perform an in-depth analysis of the latent space of the model. Figure 2 shows the results of DeepLIFT analysis, where the $x$ -axis represents the index of the embedding vector, and the $y$ -axis represents different test datasets. We first input all the poses of a test dataset into GSScore. According to the model described in Figure 1, each pose passes through $n (n = 10)$ Graphormer layers and obtains an embedding vector of length 128 for one Graphormer layer. This vector is then concatenated into one vector with a length of $128 \times n = 1280$ ⁠. DeepLIFT uses this 1280-length vector and the true RMSD value of the pose as inputs to analyze the weights of each dimension in the vector. Since multiple poses result in multiple weight outputs, we take the average of the weights for each dimension as the final analysis result.

Figure 2

Visualization of DeepLIFT weights. The greater the weight, the more it tends to be red, and the reverse tends to be blue. The horizontal coordinate indicates the indexes of weights, and the vertical coordinate indicates different test data.

We have intentionally divided the $x$ -axis into 128 evenly spaced intervals to correspond to the 128-length embedding vector after $n (n = 10)$ Graphormer layers. The larger the weight, the more it tends toward red, or otherwise, it tends toward blue. From the figure, it can be observed that the majority of the weights are very small, with only a few dimensions having larger weights. However, regardless of the test dataset, the distribution pattern of the weights is highly similar, with the larger weights appearing in the same dimension indices. It is worth noting that within each 128-length interval, there is at least one dimension with a high weight. This to some extent indicates that each Graphormer produces effective dimensions in the embedding vector, meaning that each shell’s subgraph can extract useful embedding features, thereby improving the predictive ability of the model. The positive weight representation of DeepLIFT has an increasing effect on the prediction results. In contrast, a negative weight indicates a reduced effect on the predictive results.

To visually demonstrate the embedding effect of GSScore’s latent space, we used t-SNE [52] to visualize over 10 000 data points. These data points are derived from the primary test, CASF2016 and DUD-E datasets. We combined the three datasets at the compound level and randomized the order. Then, we extracted multiple compounds until the total number of poses exceeded 10 000.

In Figure 3A, we distinguished between near-native poses and non-native poses. Near-native poses were defined as those with an RMSD less than 2Å. In Figure 3B, we divided the poses into nine intervals. Each interval had a 1Å interval, with native poses labeled as RMSD equal to 0, and poses with RMSD greater than or equal to nine grouped into a single category.

Figure 3

Visualization of latent space with t-SNE. (A) represents the case where only near-native poses and non-native poses are distinguished. Near-native poses are defined as poses with an RMSD less than 2A. (B) represents the case where poses are divided into 10 types. Each interval has a range of 1A, where poses with an RMSD of 0 represent native poses, and poses with an RMSD of 9 or above are grouped together in one category.

From the visualization in Figure 3A, it can be observed that the majority of near-native poses are concentrated in the upper right corner, while non-native poses are distributed in other regions apart from it. To further visualize the model’s performance, we created the visualization in Figure 3B. It can be seen that poses from different intervals transition smoothly from the upper right corner to the lower left corner, showing a gradual change in color depth. This to some extent demonstrates that GSScore is not only capable of distinguishing near-native from non-native poses but also able to identify RMSD values of poses in multiple fine-grained intervals, which aligns with the design of GSScore using MSE loss.

Figure 4 is a visual example of the GSScore prediction for RMSD. The PDB ID of the protein–ligand complex is 2zcr. In the figure,A–D, respectively, correspond to four different poses. The true RMSD value and the predicted RMSD value for each pose are below it. The cyan pose in each image is the native pose, the gold ribbon in the background is the target protein, and the four different poses are represented by blue, orange, yellow and fuchsia, respectively. As can be seen from the figure, GSScore can have better prediction results for near-native pose of subfigure (A) or non-native poses of other subfigures.

Figure 4

Visualization of RMSD predictions. The PDB ID: 2zcr. There are four different poses with different RMSD values. The native pose is represented by cyan in each subfigure, while other poses are represented by four different colors. The true RMSD and predicted RMSD values are under each subfigure.

Ablation study

We conducted an ablation analysis on GSScore, comparing the effects of different features and network structures used in the model. Due to the relatively long training time, we limited the ablation experiments to the primary test set and tested only the model trained with a specific set of training data. The purpose of the ablation analysis was to gain a deeper understanding of the contributions of individual components and design choices in the GSScore model. By performing these experiments, we aimed to assess the significance of different features and network structures on the model’s performance and identify key factors that influence its effectiveness.

Table 6 presents the results of the ablation experiments on the Centrality, Edge and Spatial encoders in Graphormer. The first row indicates the absence of any Graphormer encoder, while the subsequent rows represent different combinations of encoders. From the table, it is evident that the Centrality encoding is generally not a beneficial feature and can even have a detrimental effect, which is why we did not include it in the model. When the Edge Encoding and Spatial Encoding are used individually, they contribute to some extent of performance improvement. However, their combined usage results in a more pronounced enhancement. Surprisingly, when the Centrality Encoding, Edge Encoding and Spatial Encoding are used together, the performance of the model deteriorates. It is worth noting that while RMSE and $R$ are typically positively correlated, this relationship may not hold true for Docking power, Hit rate and EF. For instance, comparing the first and last rows in Table 6, lower RMSE does not necessarily indicate higher Docking power. Similarly, contrasting the fourth and fifth rows, higher Docking power does not necessarily translate to higher Hit rate and EF.

Table 6

Ablation study for Graphormer Encoders

Centrality Encoding	Edge Encoding	Spatial Encoding	RMSE	$R$	Docking power	Hit rate	EF(re-docking)
			1.757	0.802	0.758	0.116	11.207
*			1.915	0.754	0.719	0.111	10.684
	*		1.654	0.809	0.810	0.130	12.703
		*	1.663	0.809	0.764	0.120	11.698
*	*		1.828	0.742	0.726	0.130	12.013
*		*	1.831	0.733	0.722	0.111	10.802
	*	*	1.529	0.835	0.889	0.139	13.580
*	*	*	1.762	0.775	0.778	0.112	10.855

Centrality Encoding	Edge Encoding	Spatial Encoding	RMSE	$R$	Docking power	Hit rate	EF(re-docking)
			1.757	0.802	0.758	0.116	11.207
*			1.915	0.754	0.719	0.111	10.684
	*		1.654	0.809	0.810	0.130	12.703
		*	1.663	0.809	0.764	0.120	11.698
*	*		1.828	0.742	0.726	0.130	12.013
*		*	1.831	0.733	0.722	0.111	10.802
	*	*	1.529	0.835	0.889	0.139	13.580
*	*	*	1.762	0.775	0.778	0.112	10.855

$^{1}$ * means the corresponding Encoding features have been used. $^{2}$ Bold font indicates the maximum value in a column.

Table 6

Ablation study for Graphormer Encoders

Centrality Encoding	Edge Encoding	Spatial Encoding	RMSE	$R$	Docking power	Hit rate	EF(re-docking)
			1.757	0.802	0.758	0.116	11.207
*			1.915	0.754	0.719	0.111	10.684
	*		1.654	0.809	0.810	0.130	12.703
		*	1.663	0.809	0.764	0.120	11.698
*	*		1.828	0.742	0.726	0.130	12.013
*		*	1.831	0.733	0.722	0.111	10.802
	*	*	1.529	0.835	0.889	0.139	13.580
*	*	*	1.762	0.775	0.778	0.112	10.855

Centrality Encoding	Edge Encoding	Spatial Encoding	RMSE	$R$	Docking power	Hit rate	EF(re-docking)
			1.757	0.802	0.758	0.116	11.207
*			1.915	0.754	0.719	0.111	10.684
	*		1.654	0.809	0.810	0.130	12.703
		*	1.663	0.809	0.764	0.120	11.698
*	*		1.828	0.742	0.726	0.130	12.013
*		*	1.831	0.733	0.722	0.111	10.802
	*	*	1.529	0.835	0.889	0.139	13.580
*	*	*	1.762	0.775	0.778	0.112	10.855

$^{1}$ * means the corresponding Encoding features have been used. $^{2}$ Bold font indicates the maximum value in a column.

Additionally, we conduct an analysis of the impact of the number of shells on the model’s performance, as shown in Figure 5. The $x$ -axis represents the number of shells in GSScore, which also corresponds to the number of Graphormers used. The $y$ -axis represents the RMSE values on the primary test set. Due to the complexity of the experiments, we focused only on the influence of the number of shells on RMSE when using the Edge Encoding and Spatial Encoding in Graphormer.

Figure 5

RMSE results of primary test within different number of shells in GSScore. The horizontal coordinate indicates the number of shells, and the vertical coordinate indicates the RMSE result of the primary test.

From the figure, it can be observed that even with just one shell, which corresponds to the distance range of $d_{0}$ in Figure 1, GSScore already improves the model’s performance, surpassing the effectiveness of DeepBSP. However, as the number of shells increases, the performance of GSScore deteriorates until reaching a shell number of 8, where it starts to improve again. Beyond 10 shells, the model’s performance plateaus, and no further improvement is achieved. In the actual experimental process, when the number of shells exceeds 10, the batch size during training needs to be set below 32 to avoid GPU memory overflow issues. Considering both model performance and program efficiency, we selected a shell count of 10 as the optimal choice. The series of ablation experiments described above provide us with a deeper understanding of the impact of each module on the model’s performance. Based on these findings, we make informed decisions regarding the model structure, as depicted in Figure 1.

CONCLUSION

We have developed GSScore, a deep learning model that combined Graphormer features and network architecture with a shell-like design approach for predicting the RMSD values of protein–ligand conformations. GSScore focuses on the environment of the ligand and divides it into multiple shells based on different distance ranges. Within each shell, protein atoms are used to construct a graph, resulting in multiple graphs that are input into different Graphormer modules. These modules generate multiple embedding vectors, which are then concatenated into a long vector and fed into an MLP layer for predicting the conformational RMSD value.

The features and network architecture of Graphormer enable GSScore to effectively capture the topological characteristics of atomic graphs in protein–ligand conformations. Furthermore, the shell-like design approach further enhances the generalization performance of GSScore. The results tested on the primary test, CASF2016 and DUD-E datasets, reveal that GSScore has better performance in predicting conformational RMSD values in terms of RMSE and $R$ -value and GSScore does not achieve the best results in Docking power, Hit rate, EF and Screening power in all testing data sets.

Currently, GSScore utilizes only the raw chemical feature information as node and edge features and does not incorporate external information such as sequence conservation [53] and co-evolution [54]. Therefore, future work will consider integrating such information and differentiating between coarse-grained and fine-grained graph construction approaches to further enhance the prediction performance of protein–ligand conformational RMSD values.

Key Points

We proposed a novel protein–ligand scoring function that ranked the docking poses based on the predicted RMSD value of each pose relative to the native ligand structure.
The Graphormer model can effectively recognize the interaction patterns between protein atoms and ligand atoms in different distance ranges around ligand molecules.
The shell-like architecture enabled multi-subgraph construction between ligand atoms and surrounding protein atoms in different distance ranges, allowing our model to handle interactions over long distances.

ACKNOWLEDGEMENT

We are grateful to the High Performance Computing Center of Central South University for partial support of this work.

FUNDING

National Key Research and Development Program of China (2021YFF1201200); National Natural Science Foundation of China (62332020, 62072473, U22A2041) and the Science Foundation for Distinguished Young Scholars of Hunan Province (NO. 2023JJ10080).

Author Biographies

Linyuan Guo is a doctoral student at Central South University.

Jianxin Wang is a Professor at Central South University and the director in Hunan Provincial Key Lab on Bioinformatics.

References

Pushpakom

Iorio

Eyers

, et al. .

Drug repurposing: progress, challenges and recommendations

Nat Rev Drug Discov

2019

;

(

–

Gorgulla

Boeszoermenyi

Wang

, et al. .

An open-source drug discovery platform enables ultra-large virtual screens

Nature

2020

;

580

(

7805

663

–

Lyu

Wang

Balius

, et al. .

Ultra-large library docking for discovering new chemotypes

Nature

2019

;

566

(

7743

224

–

Zheng

Zhao

Cui

, et al. .

Computational chemical biology and drug design: facilitating protein structure, function, and modulation studies

Med Res Rev

2018

;

(

914

–

Sze

Ballester

Machine-learning scoring functions for structure-based virtual screening

Wiley Interdiscip Rev Comput Mol Sci

2021

;

(

):e1478.

Shen

Zhang

Deng

, et al. .

Boosting protein–ligand binding pose prediction and virtual screening based on residue–atom distance likelihood potential and graph transformer

J Med Chem

2022

;

(

10691

–

706

Huang

Comprehensive assessment of flexible-ligand docking algorithms: current effectiveness and challenges

Brief Bioinform

2018

;

(

982

–

Wang

Sun

Wang

, et al. .

End-point binding free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug design

Chem Rev

2019

;

119

(

9478

–

508

Liu

Han

, et al. .

Forging the basis for developing protein–ligand interaction scoring functions

Acc Chem Res

2017

;

(

302

–

10.

Liu

Han

, et al. .

PDB-wide collection of binding data: current status of the PDBbind database

Bioinformatics

2015

;

(

405

–

11.

Yang

, et al. .

Comparative assessment of scoring functions: the CASF-2016 update

J Chem Inf Model

2018

;

(

895

–

913

12.

Huang

Grinter

Zou

Scoring functions and their evaluation methods for protein–ligand docking: recent advances and future directions

Phys Chem Chem Phys

2010

;

(

12899

–

908

13.

Yuriev

Holien

Ramsland

Improvements, trends, and new ideas in molecular docking: 2012–2013 in review

J Mol Recognit

2015

;

(

581

–

604

14.

Trott

Olson

AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading

J Comput Chem

2010

;

(

455

–

15.

Venkatachalam

Jiang

Oldfield

Waldman

LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites

J Mol Graph Model

2003

;

(

289

–

307

16.

Jones

Willett

Glen

, et al. .

Development and validation of a genetic algorithm for flexible docking

J Mol Biol

1997

;

267

(

727

–

17.

Corbeil

Williams

Labute

Variability in docking success rates due to dataset preparation

J Comput Aided Mol Des

2012

;

(

775

–

18.

Friesner

Banks

Murphy

, et al. .

Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy

J Med Chem

2004

;

(

1739

–

19.

Allen

Balius

Mukherjee

, et al. .

DOCK 6: impact of new features and current docking performance

J Comput Chem

2015

;

(

1132

–

20.

Jain

Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine

J Med Chem

2003

;

(

499

–

511

21.

Huang

Zou

An iterative knowledge-based scoring function to predict protein–ligand interactions: II. Validation of the scoring function

J Comput Chem

2006

;

(

1876

–

22.

Wang

Zhu

Yan

Computationally predicting binding affinity in protein–ligand complexes: free energy-based simulations and machine learning-based scoring functions

Brief Bioinform

2021

;

(

bbaa107

23.

Shen

Ding

Wang

, et al. .

From machine learning to deep learning: advances in scoring functions for protein–ligand docking

Wiley Interdiscip Rev Comput Mol Sci

2020

;

(

):e1429.

24.

Zheng

Merz

, Jr.

Development of the knowledge-based and empirical combined scoring algorithm (KECSA) to score protein–ligand interactions

J Chem Inf Model

2013

;

(

1073

–

25.

Debroise

Shakhnovich

Chéron

A hybrid knowledge-based and empirical scoring function for protein–ligand interaction: SMoG2016

J Chem Inf Model

2017

;

(

584

–

26.

Cang

Wei

TopologyNet: topology based deep convolutional and multi-task neural networks for biomolecular property predictions

PLoS Comput Biol

2017

;

(

):e1005690.

27.

Ragoza

Hochuli

Idrobo

, et al. .

Protein–ligand scoring with convolutional neural networks

J Chem Inf Model

2017

;

(

942

–

28.

Jiménez

Skalic

Martinez-Rosell

, et al. .

K deep: protein–ligand absolute binding affinity prediction via 3d-convolutional neural networks

J Chem Inf Model

2018

;

(

287

–

29.

Cang

Wei

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

PLoS Comput Biol

2018

;

(

):e1005929.

30.

Nguyen

Wei

AGL-score: algebraic graph learning score for protein–ligand binding scoring, ranking, docking, and screening

J Chem Inf Model

2019

;

(

3291

–

304

31.

Zheng

Fan

Onionnet: a multiple-layer intermolecular-contact-based convolutional neural network for protein–ligand binding affinity prediction

ACS omega

2019

;

(

15956

–

32.

Kwon

Shin

Lee

AK-score: accurate protein-ligand binding affinity prediction using an ensemble of 3D-convolutional neural networks

Int J Mol Sci

2020

;

(

8424

33.

Zhai

, et al. .

Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction

Brief Bioinform

2021

;

(5):bbab054.

34.

Wang

Zheng

Wang

, et al. .

A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function

Brief Bioinform

2023

;

(1):bbac520.

35.

Bessadok

Mahjoub

Rekik

Graph neural networks in network neuroscience

IEEE Trans Pattern Anal Mach Intell

2022

;

(

5833

–

36.

Sun

Zhang

, et al. .

Graph neural networks in recommender systems: a survey

ACM Comput Surv

2022

;

(

–

37.

Crampon

Giorkallos

Deldossi

, et al. .

Machine-learning methods for ligand–protein molecular docking

Drug Discov Today

2022

;

(

151

–

38.

Dwivedi

V P

Bresson

. A generalization of transformer networks to graphs.

arXiv Preprint 2020;arXiv:2012.09699

39.

Ying

Cai

Luo

, et al. .

Do transformers really perform badly for graph representation?

Adv Neural Inf Process Syst

2021

;

28877

–

40.

Torng

Altman

High precision protein functional site detection using 3D convolutional neural networks

Bioinformatics

2019

;

(

1503

–

41.

Liu

Han

, et al. .

PDB-wide collection of binding data: current status of the PDBbind database

Bioinformatics

2015

;

(

405

–

42.

Mysinger

Carchia

Irwin

Shoichet

Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking

J Med Chem

2012

;

(

6582

–

43.

Meli

Biggin

Spyrmsd: symmetry-corrected RMSD calculations in python

J Chem

2020

;

(

–

44.

Bao

Zhang

JZH

DeepBSP—a machine learning method for accurate prediction of protein–ligand docking structures

J Chem Inf Model

2021

;

(

2231

–

45.

Burley

Berman

Bhikadiya

, et al. .

RCSB protein data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy

Nucleic Acids Res

2019

;

(

D464

–

46.

Morris

Huey

Lindstrom

, et al. .

AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility

J Comput Chem

2009

;

(

2785

–

47.

Menéndez

Pardo

The jensen-shannon divergence

J Franklin Inst

1997

;

334

(

307

–

48.

Zhang

Dong

Accurate protein-ligand complex structure prediction using geometric deep learning

Research Square

2022

;

rs-1454132

49.

Floyd

Algorithm 97: shortest path

Commun ACM

1962

;

(

345

50.

Guo

Qiu

Wang

ViTScore: a novel three-dimensional vision transformer method for accurate prediction of protein-ligand docking poses

IEEE Trans Nanobioscience

2023

;

(

734

–

51.

Shrikumar

Greenside

Kundaje

Learning important features through propagating activation differences. International conference on machine learning

PMLR

2017

;

3145

–

52.

Van der Maaten

Hinton

Visualizing data using t-SNE

J Mach Learn Res

2008

;

(

2579

–

605