Abstract

Accurately predicting protein–ligand binding affinities can substantially facilitate the drug discovery process, but it remains as a difficult problem. To tackle the challenge, many computational methods have been proposed. Among these methods, free energy-based simulations and machine learning-based scoring functions can potentially provide accurate predictions. In this paper, we review these two classes of methods, following a number of thermodynamic cycles for the free energy-based simulations and a feature-representation taxonomy for the machine learning-based scoring functions. More recent deep learning-based predictions, where a hierarchy of feature representations are generally extracted, are also reviewed. Strengths and weaknesses of the two classes of methods, coupled with future directions for improvements, are comparatively discussed.

Introduction

Molecular recognition is of central importance to many biological functions and systems, such as enzyme catalysis and intracellular signal transduction. Small-molecule ligands can modulate key intracellular signaling pathways by inhibiting protein functions or triggering protein conformational changes, which makes the identification and design of bioactive molecules a topic of interest. The specificity and strength of a ligand binding to a target protein determines its therapeutic effect, and therefore the accurate prediction of protein–ligand binding affinity plays a crucial role in drug discovery. Owing to the rapid advances in computer power and methodology, computer-aided drug design (CADD) has thrived in the past decade, aiming to vigorously assist the traditional expensive drug trials. An important step in CADD is structure-based virtual screening (VS), whose goal is to computationally test a large pool of ligands and recognize bioactive ligands for further experimental screening. The underlying technique is molecular docking that comprises two major tasks of pose identification and scoring. Current algorithms are capable of identifying ligand poses that are close to the co-crystallized structures (holo structures). However, accurately scoring or predicting the binding affinity remains a challenging problem.

A conjunction of all-atom molecular dynamics (MD) simulation and efficient free-energy sampling algorithms can provide accurate prediction of binding free energies of ligands to proteins, which results in a broad category of prediction methods. By analyzing the simulations, these methods attempt to determine the free-energy difference upon binding or between two similar ligands via a thermodynamic path. Recognized as the most rigorous option for providing free-energy estimates, free energy perturbation (FEP) or thermodynamic integration (TI) considers the relaxation process (by different windows) in the protein/solvent system and applies statistical mechanisms in the calculations [8, 48, 71, 78]. Unfortunately, these methods often struggle in achieving a speed–accuracy balance [91]. Biased MD simulations (umbrella sampling) provide free energies along a reaction coordinate between two thermodynamic states and efficiently sample the whole coordinate by introducing a bias potential to the system [7, 54, 74]. Biased simulations might be preferred over FEP/TI methods due to the integration errors in those methods. However, new free parameters in biased simulations further increase the model complexity [54]. The microscopic all-atom linear response approximation (LRA) and its variants are an alternative method, which approximates the electrostatic free energy by LRA and the nonelectrostatic part by different strategies [90]. Variants, such as the linear interaction energy (LIE) method, attempt to approximate FEP/TI calculations while efficiently reducing the high computational cost [35]. Another broadly used method is the molecular mechanics Poisson–Boltzmann/surface area (MM-PB/SA). In this method, the binding free energy is contributed by the molecular mechanics (MM) energy, the solvation free energy with Poisson–Boltzmann (PB) or generalized Born (GB) calculations, and the entropy terms [58, 87]. Although much faster than FEP/TI methods, MM-PB/SA is challenged by some researchers with its lack of theoretical foundation [84]. These expensive free-energy calculations are considered accurate, but they are often limited to family-specific applications.

Apart from the rigorous but expensive free energy-based simulations, a wide variety of scoring functions for fast predicting protein–ligand binding affinity have been proposed. Although different from free energy-based simulations in accounting for necessary physical processes in molecular recognition, these scoring functions can be applied efficiently to large-scale calculations. Classical scoring functions can be categorized into force field, knowledge-base or empirical [5, 32, 42, 109]. Force field scoring functions measure the binding in a complex by summing up different energy terms that stem from bonded and non-bonded interactions, with the parameters estimated from the experiment data or QM [41]. Entropy contributions are generally neglected in these force fields. Knowledge-based scoring functions are pseudo-energy functions, which arise from a knowledge base comprising the 3D coordinates of a large set of complexes and evaluate the binding in a new complex according to the similarity of its features to those in the knowledge base [31]. A typical knowledge-based potential concerns the ratio of the probability distributions of atom-type pairs spanning a specific distance to a reference distribution and is processed with a reverse Boltzmann method. Empirical scoring functions each stand on an energy factorization model and estimate the binding affinity by a weighted sum of different terms in the model [27, 32, 115]. The weights are calibrated from experiment data, mostly through multivariate regression analysis. It is noteworthy that these classical scoring functions mostly predefine an additive model for different contributors (characterization variables) and thereafter assume a linear relation between such contributors and the predicted binding affinity [5]. However, the assumptions or predefined functional forms may limit their prediction or generalization capability. Without assuming a specific functional form, machine learning-based scoring functions have become an emerging class and burgeoned during the past decade. These functions can implicitly capture different binding contributors arising from any possible kind of interaction, by a linear or nonlinear fit to the experiment data. Still, these scoring functions rely heavily on feature engineering, which uses expert knowledge or statistical inference to define rules for feature representations. In the studies of protein–ligand affinity prediction, such feature representations can vary from simple geometrical and physico-chemical descriptors (e.g. counts of atom pairs, counts of hydrogen bonds, etc.) to more complicated forms (e.g. molecular fingerprints, proteochemometric characterization, etc.). Allowing for simplified feature engineering, deep-learning models learn a hierarchy of feature representations that are relevant for the given task from a simple input representation, leading to a vast number of applications in many research fields. Deep-learning techniques have also been applied to the structure-based studies of protein–ligand binding. These data-driven methods are mostly efficient, while may be questioned about their interpretability or generalization capability.

In this paper, we aim to provide a comprehensive review on the more rigorous free energy-based simulations and the data-driven machine learning-based scoring functions. The wide variety of classical scoring functions, with reviews that can be found elsewhere [32, 42], are not the focus of this work. In what follows, we begin with a description of binding affinity and its experimental measurements. Next, we provide an overview of the free energy-based simulations according to various thermodynamic cycles and introduce a series of featurizers tailored for machine learning-based scoring functions. A discussion section is provided at the end of this review.

Binding affinity and experiment measurements

Ligand–protein binding affinity is affected by non-covalent intermolecular interactions, such as hydrogen bonding, electrostatic interactions, hydrophobic and van der Waals (vdW) forces, between the two molecules and the high concentrations of other molecules. Such ligand–protein affinity can be quantified according to a dissociation equilibrium constant |$K_d$| (in moles/liter) that generally relies on the concentrations of the protein and ligand in the solution,
(1)
where |$P$| is the protein, |$L$| the ligand, |$LP$| the complex and |$[x]$| the equilibrium concentration of |$x$| in the solution. |$K_d$| is a thermodynamic constant and can be easily manipulated by solution conditions (e.g. temperature, pH and salt concentration). A simple interpretation of |$K_d$| is the ligand concentration at which half of the proteins are complexed with ligands at equilibrium. Alternatively, it can be interpreted kinetically as the ratio of the dissociation rate constant and the association rate constant (⁠|$\frac{k_{-1}}{k_1}$|⁠). A small magnitude of |$K_d$| indicates tighter bound complexes and therefore a higher binding affinity. The standard Gibbs free energy of formation of |$RL$| (⁠|$\Delta G^\ominus $|⁠), namely the change in Gibbs free energy that accompanies the formation of |$RL$| (1 mole) in its standard state from |$R$| and |$L$| in their standard states, can be associated with the equilibrium constant |$K_d$| through
(2)
where |$R$| is the ideal gas constant, |$T$| the absolute temperature, |$\Gamma $| the quotient of activity coefficients |$\frac{\gamma (LP)}{\gamma (P)\gamma (L)}$| that is generally set to 1 with the dimension of moles/liter (⁠|$\frac{K_d}{\Gamma }$| is accordingly dimensionless). Methods for measuring protein–ligand binding affinity and dissociation constants include equilibrium dialysis [88], analytical ultracentrifugation [96], spectroscopic assays [47, 105] and isothermal titration calorimetry (ITC) [19].

Protein–ligand binding affinity can also be measured according to quantities like inhibition constant |$K_i$| (dissociation constant of the enzyme-inhibitor complex) or |$IC_{50}$| (concentration of inhibitor required to reduce ligand binding by half). In the absence of competition with the substrate of an enzyme, |$K_i$| is regarded as equal to |$K_d$|⁠. Although |$IC_{50}$| is more easily to be determined, comparisons among different experiments are intractable as the |$IC_{50}$| values depend on the initial amount of ligand available to the protein. In this regard, the binding constants (⁠|$K_d$| and |$K_i$|⁠) are more comprehensive for theoretical studies.

Prediction by free energy-based simulations

To evaluate the free energy changes in any biological process, defining a relevant thermodynamic cycle is of central importance. The free energy difference depends on the end states of the cycle. For the free energy of binding, such end states can be the bound (complex |$LP$| in solvent environment) and unbound (free molecules |$L$| and |$P$| in solvent environment) configurations,
(3)
The superscript |$^\ominus $| from all thermodynamic quantities are removed for easier interpretation. However, such a direct path from the unbound state to the bound state in the solution is practically intractable or complicated; therefore, indirect paths are broadly used in the acquisition of |$\Delta G_{bind}$|⁠.
On the other hand, any free energy difference (⁠|$\Delta G$|⁠) is contributed by the changes in enthalpy (⁠|$\Delta H$|⁠) and entropy (⁠|$\Delta S$|⁠) as
(4)
where |$T$| is the temperature of the system. The calculation of binding free energy generally involves a deeper energy factorization by assuming an additive model, such as the model summarized in [80] that covers the solvation effects, conformational changes, interaction energy and motion changes (⁠|$\Delta G_{bind}=\Delta G_{sol}+\Delta G_{conf}+\Delta G_{int}+\Delta G_{mot}$|⁠). Based on specific thermodynamic cycles, a wide variety of free-energy models, mostly upon MD simulations, have been proposed in earlier studies. As follows, we list a number of well-acknowledged models.

FEP and TI

FEP applies statistical mechanisms to the computation of free energy differences from MD or Monte Carlo (MC) simulations [52]. According to the Zwanzig equation [125], the free energy difference between an initial state A and a final state B (A|$\rightarrow $|B) can be expressed as
(5)
Here, Gibbs free energies (⁠|$G$|⁠) are used to indicate standard experimental conditions and isothermal-isobaric ensemble simulations, |$T$| is the temperature, |$k_B$| is the Boltzmann’s constant, |$V_i$| indicates the potential energy of a state |$i$| (A or B) and |$\langle \rangle _{\boldsymbol{A}}$| indicates an average evaluated by propagating trajectories over state A. The potential energy for a state can be measured according to MM or quantum mechanics (QM) [8], aided by the development of accurate force fields (e.g. Amber force fields [73, 110], OPLS force fields [37, 53, 113], polarized force fields (for charged systems) [49, 117] and bespoke force fields [17]). FEP calculations converge properly only if |$\Delta G$| is small enough; hence, it is essential to divide the calculation into a series of small windows (individual simulations) as |$\Delta G_{\boldsymbol{A}\rightarrow \boldsymbol{B}}=\sum \limits _{i=0}^{n-1} \Delta G_{\lambda _i\rightarrow \lambda _{i+1}}$|⁠. Suppose |$\boldsymbol{V}_{\boldsymbol{A}}$| and |$\boldsymbol{V}_{\boldsymbol{B}}$| represent the potential surfaces of the two states, a path between A and B can be constructed as a set of intermediate energy functions |$ \{\boldsymbol{V}_i|i=0,\ldots ,n\}$| by linearly combining |$\boldsymbol{V}_{\boldsymbol{A}}$| and |$\boldsymbol{V}_{\boldsymbol{B}}$| as follows:
(6)
This leads the calculation of |$\Delta G_{\boldsymbol{A}\rightarrow \boldsymbol{B}}$| to
(7)
With sufficient small |$\lambda _i$| values (⁠|$\lambda \rightarrow 0$|⁠), Equation 7 can be further written as an integral |$\Delta G_{\boldsymbol{A}\rightarrow \boldsymbol{B}}=\int _{0}^{1} \langle \frac{\partial V(\lambda )}{\partial \lambda }\rangle _{\lambda } d\lambda $|⁠, which is the TI formula of free energy [77].

In FEP, |$\{\lambda _i| 0\leq \lambda _i\leq 1, i = 0, \ldots , n\}$| can be a series with equal increments (⁠|$\lambda _{i+1}-\lambda _i=\Delta \lambda $|⁠) or with any pattern of spacing, while non-uniform spacing of |$\lambda $|-points were proposed to be a better choice in some studies [13]. Generally, a compromise between the number of windows (precision) and computational cost is required, and |$50\sim 100$| windows were recommended in [13]. The sampling techniques for selecting |$\lambda _i$| in a particular FEP series include classic modeling methods (e.g. double-wide sampling [51], double-ended sampling [9] and overlap sampling [71]) and modern enhanced-sampling methods (e.g. replica exchange [100], orthogonal space tempering [122], replica exchange with solute tempering (REST) [68] and REST2 [112]). Taking double-ended (DE) sampling as an example, one should perform both the forward perturbation (⁠|$\boldsymbol{A}\rightarrow \boldsymbol{B}$|⁠) and its reverse (⁠|$\boldsymbol{B}\rightarrow \boldsymbol{A}$|⁠), and take |$\Delta G^{DE}_{\boldsymbol{A}\rightarrow \boldsymbol{B}}=\frac{1}{2}(\Delta G_{\boldsymbol{A}\rightarrow \boldsymbol{B}}-\Delta G_{\boldsymbol{B}\rightarrow \boldsymbol{A}})$|⁠. Since TI is confined to small |$\lambda $|-steps, it is advisable to use enough small windows (⁠|$\lambda $|-steps) in a discretized TI [13]. Differently, |$\lambda $| is treated as a dynamic variable in |$\lambda $|-dynamics simulations, which enables a simultaneous evaluation of multiple thermodynamic states in a single simulation [57].

For applications to ligand binding, FEP is mostly used to derive the relative binding free energy |$\Delta \Delta G_{bind}$| of two ligands binding to a protein (thermodynamic cycle in Figure 1A), as follows:
(8)
Thermodynamic cycles of calculating protein–ligand binding free energy in FEP methods. (A) Calculating the relative binding free energy of two ligands ($L$ and $L^{\prime}$) binding to a protein ($P$) in water. (B) Calculating the absolute binding free energy between a ligand $L$ and a protein $P$ in water.
Figure 1

Thermodynamic cycles of calculating protein–ligand binding free energy in FEP methods. (A) Calculating the relative binding free energy of two ligands (⁠|$L$| and |$L^{\prime}$|⁠) binding to a protein (⁠|$P$|⁠) in water. (B) Calculating the absolute binding free energy between a ligand |$L$| and a protein |$P$| in water.

where |$\Delta G_{bind}^L$| and |$\Delta G_{bind}^{L^{\prime}}$| are the binding free energies between the two ligands and the protein, |$\Delta G_{L\rightarrow L^{\prime}}^{water}$| indicates the free energy difference of alchemically modifying a ligand (⁠|$L$|⁠) to another (⁠|$L^{\prime}$|⁠) in the solution and |$\Delta G_{L\rightarrow L^{\prime}}^{protein}$| is the energy difference of changing the complex |$LP$| to |$L^{\prime}P$| in the solution. FEP can estimate such alchemical transformations by modeling the paths of |$L\rightarrow L^{\prime}$| and |$LP\rightarrow L^{\prime}P$| in the solution. In the modeling of the paths, the problems caused by vanishing/appearing atoms in the end-point states can be mitigated by strategies such as softening potentials and using more |$\lambda $|-points near end points [10, 13, 120]. On the other hand, FEP can also be used to calculate the absolute binding free energy of a ligand for a protein, although depending on more complicated thermodynamic cycles. An example of such cycles [2] is displayed in Figure 1B, where a white ligand means no interactions with the environment and a clip indicates the presence of restraints. In [2], FEP was applied to the calculations of free energy differences of (i) decoupling the ligand from water (⁠|$\Delta G_{elec+vdW}^{water}$|⁠, scaling vdW and electrostatic interactions to zero), (ii) turning back on the electrostatic and vdW interactions for the complex |$LP_{decoup+restr}$| in water (⁠|$-\Delta G_{elec+vdW}^{protein}$|⁠) and (iii) removing the ligand restraints from the complex |$LP_{restr}$| in water (⁠|$-\Delta G_{restr}^{protein}$|⁠). Restraining the position/orientation of the ligand was accomplished by one distance, two angles and three dihedral harmonic potentials [2], and the restraints-induced free energy difference |$\Delta G_{restr}^{water}$| was calculated analytically [12]. Overall, the binding free energy can be estimated by collecting all the contributions as
(9)
With the great advances in force fields accuracy, enhanced sampling techniques and fast GPU simulations, FEP techniques have been broadly applied to the accurate and reliable calculations of protein–ligand binding free energies [48, 78, 111, 113]. Canonical workflows, such as FEP+ developed in the past few years, attempt to implement large-scale FEP calculations and begin to play an increasing role in computer-aided lead optimization [64, 111, 113].

LRA and LIE methods

FEP has been acknowledged as the most rigorous approach for free energy estimates. However, the FEP calculations may frequently encounter the slow convergence problem and a high computational complexity when modeling large macromolecular assemblies [91]. Instead of implementing an FEP series to derive the free energy difference (Equation 7), researchers have also attempted to expand the energy difference between two states in a power series around one state, and thus to reduce the computational time [1, 125]. Suppose |$\Delta V=V_{\boldsymbol{B}}-V_{\boldsymbol{A}}$| is a small perturbing potential, the power series can be expressed as |$\Delta G_{\boldsymbol{A}\rightarrow \boldsymbol{B}}=\langle \Delta V\rangle _{\boldsymbol{A}}-\frac{1}{2k_BT}\langle (\Delta V-\langle \Delta V\rangle _{\boldsymbol{A}})^2\rangle _{\boldsymbol{A}}+\frac{1}{6k_B^2T^2}\langle (\Delta V-\langle \Delta V\rangle _{\boldsymbol{A}})^3\rangle _{\boldsymbol{A}}-\ldots $| [1]. Truncation in such a series has been explored in some studies, while the approximations based on a low-order truncation were not expected to work in general (1st-order approximation in [29]) and those retaining the higher order (⁠|$\ge 2$|⁠) terms led to extremely slow convergence [92]. Nevertheless, when the system obeys linear response, it turns out to be more efficient to calculate the free energy difference from a truncated series (1st-order approximation with two simulations of the two end states), encouraging LRA and its variants to serve as an efficient approximation to FEP calculations [62]. The general LRA expression is shown as Equation 10.
(10)
Relying on a thermodynamic cycle such as the one in Figure 2 [90], LRA can be applied to the calculation of protein–ligand absolute binding free energy.
Thermodynamic cycle of calculating protein–ligand binding free energy in LRA method.
Figure 2

Thermodynamic cycle of calculating protein–ligand binding free energy in LRA method.

The binding free energy is accordingly expressed as
(11)
Here, |$\Delta G_{elec}^{water}-\Delta G_{elec}^{protein}+\Delta G_{bind}^{^{\prime}}$| explains the upper cycle in Figure 2, where we first annihilate the ligand’s electrostatic interactions in water by scaling all the residual charges to zero (⁠|$\Delta G_{elec}^{water}$|⁠), then bind the non-polar ligand to the protein (⁠|$\Delta G_{bind}^{^{\prime}}$|⁠) and finally recharge the ligand in the protein binding site (⁠|$-\Delta G_{elec}^{protein}$|⁠). |$\Delta G_{bind}^{^{\prime}}$| is further decomposed as |$\Delta G_{L^{\prime}\rightarrow L^{\prime\prime}}^{water}-\Delta G_{L^{\prime}\rightarrow L^{\prime\prime}}^{protein}+\Delta G_{bind}^{^{\prime\prime}}$| according to the lower cycle in Figure 2, where |$L^{\prime\prime}$| indicates a zero-sized ligand that is shrunk from the non-polar ligand |$L^{\prime}$| and |$\Delta G_{bind}^{^{\prime\prime}}$| is only contributed by an entropic term (⁠|$\approx 0$| in some studies [91]). Taking the Equation 10 into consideration, the electrostatic contribution to the binding free energy |$\Delta G^{elec}_{bind}$| can be derived as
(12)
where |$V_{elec}$| denotes the electrostatic interactions between the ligand and its environment (protein binding site or water) [90]. |$\Delta G_{bind}^{^{\prime}}$| is contributed by the vdW effect, hydrophobic effect, water penetration and entropic term associated with moving |$L^{\prime}$| from water to protein in [91] as follows:
(13)
where the vdW (⁠|$\Delta G_{vdW}^{^{\prime}}$|⁠), hydrophobic (⁠|$\Delta G_{hyd}^{^{\prime}}$|⁠) and water penetration (⁠|$\Delta G_{penetr}^{^{\prime}}$|⁠) contributions are treated by the protein dipoles Langevin dipoles (PDLD) or scaled PDLD (PDLD/S) method [62] and the entropic contribution |$T\Delta \Delta S^{\prime}$| can be estimated using consecutive FEP steps [39, 90] or neglected [62]. The final estimation of binding free energy is accordingly expressed as
(14)

One efficient variant of LRA is the PDLD/S-LRA method [62, 79], which replaces the microscopic potentials |$V_{elec}$| by the effective PDLD/S-LRA potentials |$\bar{V}_{elec}$| [90] in the calculation of electrostatic free energy as |$(\Delta G^{elec}_{bind})^{PDLD/S-LRA}=\frac{1}{2}(\langle \bar{V}_{elec}^{protein}\rangle _{L}+\langle \bar{V}_{elec}^{protein}\rangle _{L^{\prime}}-\langle \bar{V}_{elec}^{water}\rangle _{L}-\langle \bar{V}_{elec}^{water}\rangle _{L^{\prime}})$|⁠. A number of LRA variants involve the estimation of |$\Delta G_{bind}^{^{\prime}}$| by scaled vdW terms (vdW interaction energy between the ligand and its environment), including the LRA/|$\beta $| model with |$(\Delta G_{bind})^{LRA/\beta }=(\Delta G^{elec}_{bind})^{LRA}+\beta (\langle V_{vdW}^{protein}\rangle _{L}-\langle V_{vdW}^{water}\rangle _{L})$| [45] and the PDLD/S-LRA/|$\beta $| model with |$(\Delta G_{bind})^{PDLD/S-LRA/\beta }=(\Delta G^{elec}_{bind})^{PDLD/S-LRA}+\beta (\langle V_{vdW}^{protein}\rangle _{L}-\langle V_{vdW}^{water}\rangle _{L})$| [91]. The LIE method further simplifies the model by neglecting the potentials associated with the non-polar ligand state (⁠|$\langle V_{elec}\rangle _{L^{\prime}}$|⁠) in the calculation of electrostatic free energy [3]. This leads to |$(\Delta G_{bind})^{LIE}=\alpha (\langle V_{elec}^{protein}\rangle _{L}-\langle V_{elec}^{water}\rangle _{L})+\beta (\langle V_{vdW}^{protein}\rangle _{L}-\langle V_{vdW}^{water}\rangle _{L})$|⁠, where |$\alpha $| is a constant around |$\frac{1}{2}$| and |$\beta $| is an empirical parameter [91]. The hydrophobic contribution (e.g. solvent accessible surface area) was additionally considered in [94], yielding |$\Delta G_{bind}=\alpha \Delta G_{bind}^{elec}+\beta \Delta G_{bind}^{vdW}+\gamma \Delta G_{bind}^{hyd}$|⁠. LRA and it variants have been proved to be effective in many protein–ligand affinity predictions [3, 35, 76, 91, 123].

Umbrella sampling methods

Umbrella sampling (biased MD) is an alternative technique to provide free energy differences between two thermodynamic states [103]. Consider the Gibbs free energy |$G$|⁠, one can relate it to a canonical partition function |$Q$| of a system according to
(15)
where |$r$| indicates the whole phase space (e.g. configuration space and momentum space), |$V(r)$| is the associated potential energy and |$N$| is the degrees of freedom of the system. In order to distinguish between two thermodynamic states, a reaction coordinate parameter (⁠|$\theta $|⁠) can be introduced, frequently standing on the geometric grounds (e.g. distance). Then, the partition function |$Q$| evolves into a probability distribution function |$Q(\theta )=\int \delta [\theta (r)-\theta ]\ exp[-\frac{V(r)}{k_BT}]\ d^Nr\ /\ Q$| with the indicator function |$\delta (x)$| and the free energy [or potential of mean force (PMF)] is accordingly calculated as |$G(\theta )=-k_BT\ lnQ(\theta )$|⁠. During computer simulations, such an ensemble average |$Q(\theta )$| is commonly replaced by a time average |$P(\theta )$| for infinite sampling in an ergodic system, due to the infeasibility of directly calculating |$Q(\theta )$| [54]. For finite-time simulations, those high-energy regions or barrier-separated regions in the configuration space of a system are poorly sampled, while required to profile the free energy |$G(\theta )$|⁠. FEP and TI, as described earlier, both represent efficient sampling techniques for covering such regions, and umbrella sampling also achieves this goal by introducing a bias potential |$\omega (\theta )$| to bridge the energy gaps. Such bias potentials include window-dependent harmonic biases [54, 56], adaptive biases [7, 74] and specialized biases (e.g. local elevation bias) [36]. Along the reaction coordinate, multiple simulations (windows) centering at different coordinates are normally required to profile each small energy barrier. For a window |$i$|⁠, the bias potential |$\omega _i(\theta )$| connects the unbiased (⁠|$V_i(r)$|⁠) and biased (⁠|$V^b_i(r)$|⁠) potential energies, as follows:
(16)
Considering an ergodic system, the biased distribution along the reaction coordinate |$P_i^b(\theta )$| can be modeled by biased MD simulations, and the unbiased free energy is accordingly recovered as
(17)
where |$F_i$| is an additive constant (⁠|$-k_BT\ ln\langle exp[-\frac{\omega _i(\theta )}{k_BT}]\rangle $|⁠) but cannot directly be sampled during simulations. Instead, |$F_i$| can be calculated via methods such as the weighted histogram analysis method (WHAM) [59] and umbrella integration [55], which leads to the combination of results from different umbrella-sampling windows into an overall PMF |$G(\theta )$| along the reaction coordinate. The energy barrier for this PMF curve (⁠|$\Delta G=max\{G(\theta )\}-min\{G(\theta )\}$|⁠) can be treated as the free energy difference between the two end states. When applying a slowly varied bias potential along the reaction coordinate, one can pull a system from one state to another, and such pulling simulations are well acknowledged as steered MD [46]. In other simulation techniques such as metadynamics, the bias potential can be defined as a function of a few collective variables (⁠|$\omega (\boldsymbol{s})$|⁠) and continuously updated by adding a new bias function (e.g. Gaussian) at a specific rate, which promotes the accumulated bias potential to ultimately converge to the opposite of the free energy (⁠|$G(\boldsymbol{s})$|⁠) [6].

During the past few years, umbrella sampling has become a promising approach for determining protein–ligand binding free energies [63, 75, 81]. Normally, the ligand is pulled out of protein active site by an external force (Figure 3), and a series of configurations are generated along the pulling pathway (reaction coordinate) and subjected to umbrella sampling simulations. During the pulling simulation, the protein serves as a reference and the ligand is placed at increasing center-of-mass distance (windows) from the reference with a harmonic bias potential applied to its position. Window-wise simulations based on different starting configurations are conducted with the PMF curves parsed, and a continuous PMF curve over the entire reaction coordinate can be generated by combining the individual PMF curves using WHAM. The absolute binding free energy is then determined as the difference between the peak and valley of this curve [63].

Pulling of a ligand from its protein active site along the reaction coordinate.
Figure 3

Pulling of a ligand from its protein active site along the reaction coordinate.

MM-PB/SA methods

MM-PB/SA method estimates the binding free energy of a ligand |$L$| to a protein |$P$| on the basis of a thermodynamic cycle (Figure 4), which decomposes the binding energy in solution into a gas-phase free energy and a solvation free energy difference (Equation 18).
(18)
Thermodynamic cycle of MM-PB/SA calculations.
Figure 4

Thermodynamic cycle of MM-PB/SA calculations.

where |$\Delta G_{sol}^{x}$| is the solvation free energy of transferring |$x$| (⁠|$LP$|⁠, |$P$| or |$L$|⁠) from gas to solution, |$\Delta H_{gas}$| the gas-phase enthalpy change, |$\Delta S_{gas}$| the gas-phase entropy change and |$T_{gas}$| the temperature of the system in the gas phase. Generally, |$\Delta H_{gas}$| is approximated according to a MM force field and |$\Delta G_{sol}$| is contributed by the polar and non-polar components, as shown below.
(19)
Here, |$E_{bond}$|⁠, |$E_{angle}$| and |$E_{dihedral}$| terms indicate the bonded interactions (internal energy of molecule), |$E_{elec}$| (electrostatic interactions) and |$ E_{vdW}$| (vdW forces) terms are the non-bonded interactions, |$\Delta G_{sol}^{polar}$| is calculated by solving the PB equation or by the GB approach, and |$\Delta G_{sol}^{nonpolar}$| is dependent on the solvent-accessible surface area (SASA) via |$\Delta G_{sol}^{nonpolar}=\gamma SASA+b$| [87]. MM-PB/SA normally adopts ensemble averages of the gas-phase enthalpies and solvation free energy differences in the calculation, where the configurations are sampled from MD simulations with explicit solvent. |$\Delta G_{sol}$| is yielded from PB/SA or GB/SA (implicit solvent models) with high computational efficiency. Optionally, the entropy contribution |$\Delta S_{gas}$| can be approximated through normal mode analysis (NMA). The ligand-binding affinities yielded by the MM-PB/SA method have shown to agree well with experimental data in many studies [43, 58, 72, 97, 106, 108]. Besides ensemble averages, a single, relaxed complex structure can be employed in such MM-PB/SA calculations [58]. Alternatively, Huang et al. [41] replaced the MD simulations with energy minimizations (OPLS all-atom force field and a GB implicit solvent model) and used the optimized structures for calculating the binding energies. Later, combining MM-PB/SA with single energy-minimized protein–ligand structures (S-MMPB(GB)/SA) has been successfully applied to VS to discriminate between true binders and decoys for protein targets [87]. Depending on energy-minimized structures, implicit-solvent simulations and various distance/orientation parameterization, Suenaga et al. [99] generated multiple random conformations for protein–ligand complexes and scored the affinities by a conjunction of MM-GB/SA calculations and Jarzynski identity, for post-docking evaluation. MM-GB/SA has also been united with an interaction entropy method to yield the residue-specific protein–ligand binding free energy under computational alanine scanning and the integrated residue-specific energies agreed well with the experimental binding free energy [70].

Comparisons in a case study

Modern computer simulations (e.g. MD or MC simulations) reproduce molecular geometry and properties according to MM or QM force fields, which provide the functional form and parameters for calculating the potential energy |$V$| of a molecular system. Such calculations are the foundation of free energy-based simulations. Methods like FEP/TI and umbrella sampling also require intensive sampling (windows) of free energy, while the LRA-related and MM-PB/SA methods simplify such intensive sampling to end-state simulations by introducing linear-response theory or designing specialized thermodynamic cycles (Figure 4). FEP/TI is frequently used for predicting relative binding free energies of two or multiple ligands binding to the same protein (Figure 1A), but the calculations of absolute binding free energies normally rely on more complicated thermodynamic cycles and converge quite slowly. MM-PB/SA calculations without entropy estimation can be efficiently used to predict relative binding free energies, as the entropy terms are assumed to be similar in the systems only differing in small ligands. However, involving the entropy estimation heavily slows down the computations, especially for large molecular systems. LRA-related methods such as LIE have been broadly applied to the free-energy prediction (both absolute and relative values), but they are semi-empirical and thus require parameter tuning (⁠|$\alpha $| and |$\beta $| in LIE). Umbrella sampling can be used for the prediction of absolute or relative binding free energies but requires even more computational time than the FEP/TI method in many cases.

In order to compare the procedures and results of several representative simulation methods (TI, LIE, umbrella sampling, MM-PB/SA and MM-GB/SA), we considered a case study where the relative binding free energies of small molecules binding to the bacteriophage T4 lysozyme were predicted. Lysozyme is an enzyme that breaks down bacterial cell walls by hydrolyzing the polysaccharide component and has been found in high concentration in a number of secretions and egg white. T4 lysozyme with a point mutation L99A has shown to provide a suitable cavity for binding small molecules like benzene [28]. In this case study, benzene and two analogs (toluene and phenol) binding to this T4 lysozyme mutant were investigated. Benzene (C6H6) binds to this protein with the free energy of |$-5.16$| kcal/mol and toluene (C7H8) binds slightly tighter than benzene (⁠|$-5.48$| kcal/mol), while phenol (C6H6O) does not bind at all (currently no experimental data available, should be |$>-5.16$| kcal/mol) [24]. Specifically, we considered two pairs of ligands (benzene|$\rightarrow $|toluene and benzene|$\rightarrow $|phenol) for the prediction of relative binding free energy |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}$|⁠. To provide a fair comparison among the simulation methods, we utilized uniform protein crystal structure [entry of 4W52 in RCSB protein data bank (PDB)] and simulation platform (Amber 16 [15]). Treating 4W52 (benzene–lysozyme complex) as a template structure, toluene (PDB:4W53) and phenol (PDB:1LI2) were separately aligned to the ligand position to form the initial complex structures. These initial structures were fed to Amber for simulations and calculations of relative binding free energies. In the simulations, we adopted the explicit-solvent model uniformly. As a canonical procedure, prior to the production simulation (for energy estimation) of a solvated complex, equilibration of the system needs to be implemented. The equilibration stage normally includes a short energy-minimization step to remove any bad contacts, a phase for heating the system to a given temperature with weak positional restraints on the solute (as an NVT ensemble), and an equilibration phase at constant temperature/pressure to relax solvent density and gradually remove the restraints on solute (as an NPT ensemble). Specific settings for different free energy-based simulation methods are listed as follows.

  • (i) TI. To calculate |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}$| using TI, the free energy differences associated with two alchemical transformations of |$L\rightarrow L^{\prime}$| (⁠|$\Delta G_{L\rightarrow L^{\prime}}^{water}$|⁠) and |$LP\rightarrow L^{\prime}P$| (⁠|$\Delta G_{L\rightarrow L^{\prime}}^{protein}$|⁠) in water need to be calculated in advance. We applied the one-step protocol to each alchemical transformation. Before conducting window-wise TI simulations for each transformation, we simply equilibrated the systems (⁠|$\lambda =0.5$|⁠), including 500-step energy minimization, 50-ps heating to 300 K (restraints on solute with the weight of 5 kcal/mol|$\cdot \mathring{A}^2$|⁠, Langevin dynamics for temperature scaling), and 100-ps equilibration at 300 K and 1 bar (solute restraints gradually reduced to 0). The equilibrated systems were fed to different windows (⁠|$\lambda =0.0,0.1,\ldots ,1.0$|⁠) for TI simulations, and the window-wise simulations each include a heating phase (50 ps) and a production simulation (200 ps). SHAKE was turned on in these equilibration/TI simulations, and the transforming ligands (⁠|$L$| and |$L^{\prime}$|⁠) were considered as the TI regions, where appearing/disappearing atoms were treated using softcore potentials [15] and the remaining atoms were linearly transformed (Equation 6). By combining the estimated |$\frac{\partial V^i(\lambda )}{\partial \lambda }$| values from the TI windows (⁠|$\Delta G^i$| in FEP calculations), one can estimate the total free energy difference of the alchemical transformation (⁠|$\Delta G_{L\rightarrow L^{\prime}}^{protein}$| or |$\Delta G_{L\rightarrow L^{\prime}}^{water}$|⁠). Lastly, the predicted value of relative binding free energy can be derived as |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}=\Delta G_{L\rightarrow L^{\prime}}^{protein}-\Delta G_{L\rightarrow L^{\prime}}^{water}$|⁠. The diagram of above calculations is shown in Figure 5A. All the simulations were run with 20 CPU parallel processes (CPU: Inte® Xeon® E5-2690 v2 @ 3.00GHz).

  • (ii) LIE. LIE predicts the absolute binding free energy (⁠|$\Delta G_{bind}$|⁠) by considering the differences in electrostatic (⁠|$\Delta G_{bind}^{elec}$|⁠) and vdW (⁠|$\Delta G_{bind}^{vdW}$|⁠) interaction energies caused by alterations of ligand environment (water to protein binding site). In this regard, simulations for the two end states (⁠|$LP$| in water and |$L$| in water) should be conducted for calculating the energy differences. For each state, we implemented a short energy minimization (500 steps), a heating step (50 ps, NVT ensemble with SHAKE on, restraints on solute), an equilibration step (500 ps, NPT ensemble with SHAKE on, gradually reduced solute restraints) and a production simulation (3 ns, NPT ensemble with SHAKE on, no restraints on solute). Upon the production simulation results (100 trajectory frames, abandoning data for the first 1 ns) of the two end states, one can calculate the interaction energy differences |$\Delta G_{bind}^{elec}$| and |$\Delta G_{bind}^{vdW}$|⁠, based on which the binding free energy can be derived (⁠|$\Delta G_{bind}=\alpha \Delta G_{bind}^{elec}+\beta \Delta G_{bind}^{vdW}$|⁠, |$\alpha =0.5$| and |$\beta =0.16$| selected from [116]). Henceforth, the relative binding free energy can be calculated as |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}=\Delta G_{bind}^{L^{\prime}}-\Delta G_{bind}^{L}$|⁠. The flow diagram is displayed in Figure 5B. All the simulations were GPU-accelerated (single GPU: Tesla 65C).

  • (iii) Umbrella sampling. Umbrella sampling was first used to estimate the |$\Delta G_{bind}$| of a ligand binding to T4 lysozyme. In order to calculate |$\Delta G_{bind}$|⁠, we conducted the simulations including a ligand-pulling process (from the protein binding site to water) and a series of window-wise simulations (starting at various positions in the pulling process). Prior to the pulling process, a large solvent environment was constructed (with the minimum distance of |$20\mathring{A}$| between the complex and the edge of water box) and the ligand was placed at the center-of-mass (COM) of the protein binding site. The system was equilibrated with a short energy minimization (500 steps), a heating step (50 ps) and an equilibration step (100 ps, restraining the ligand at the COM of binding site with force constant of 2.5 kcal/mol|$\cdot \mathring{A}^2$|⁠), after which the ligand was pulled from the COM to |$20\mathring{A}$| far from it (in z-direction) with the rate of |$1 \mathring{A}/ns$| and force constant of 1.1 kcal/mol|$\cdot \mathring{A}^2$|⁠. Positions with 2|$\mathring{A}$| spacing along the z-axis were extracted to start a series of windows, within each we conducted a 5-ns production simulation (NPT ensemble, restraining the ligand with 2.5 kcal/mol|$\cdot \mathring{A}^2$|⁠). The results from different windows were combined by WHAM (http://membrane.urmc.rochester.edu/content/wham) for PMF profiling, which yielded the estimation of |$\Delta G_{bind}$|⁠. The relative binding free energy was then calculated as |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}=\Delta G_{bind}^{L^{\prime}}-\Delta G_{bind}^{L}$|⁠. The whole prediction procedure is presented in Figure 5C. All the simulations were GPU-accelerated (single GPU: Tesla 65C).

  • 4) MM-PB(GB)/SA. In MM-PB(GB)/SA calculations, |$\Delta G_{bind}$| is contributed by the gas-phase enthalpy change (⁠|$\Delta H_{gas}$|⁠) and the solvation free energy difference (⁠|$\Delta G_{sol}^{LP}-\Delta G_{sol}^P-\Delta G_{sol}^L$|⁠), with the polar part of the solvation free energy yielded by the PB (MM-PB/SA) or GB (MM-GBSA) approach. For similar systems, the entropy contributions can be ignored, leaving the effective part of the binding free energy |$\Delta \tilde{G}_{bind}$|⁠. In these methods, only one state (⁠|$LP$| in water) needs to be considered for simulations. Similarly, a short energy minimization (500 steps), a heating step (50 ps), an equilibration step (500 ps) and a production simulation (3 ns) were conducted. The last 2-ns simulation data (100 frames) were used to calculate the effective binding free energy (⁠|$\Delta \tilde{G}_{bind}$|⁠). Then, the predicted value of relative binding free energy can be obtained as |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}=\Delta \tilde{G}_{bind}^{L^{\prime}}-\Delta \tilde{G}_{bind}^{L}$|⁠. The flow diagram is shown in Figure 5D. All the simulations were GPU-accelerated (single GPU: Tesla 65C).

The experimental and predicted |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}$| are listed in Table 1. For the small |$\Delta \Delta G_{bind}^{benzene\rightarrow toluene}$|⁠, TI produced the results (⁠|$0.02\pm 0.14$| kcal/mol) closest to the experimental data (⁠|$-0.32$| kcal/mol), followed by LIE (⁠|$-0.65\pm 0.68$| kcal/mol), MM-PB/SA (⁠|$-1.33\pm 2.46$| kcal/mol), MM-GB/SA (⁠|$-3.09\pm 1.97$| kcal/mol) and umbrella sampling (6.05 kcal/mol). However, TI costed significantly longer running time (73.67 hours) than the others (⁠|$2.8\sim 3.57$| hours) except umbrella sampling (129.7 hours). The experimental |$\Delta \Delta G_{bind}^{benzene\rightarrow phenol}$| is not available, while it should be at least |$>0$| and a larger value is preferred. Accordingly, TI (⁠|$1.35\pm 0.44$| kcal/mol) and umbrella sampling (7.15 kcal/mol) achieved the best performance, followed by MM-PB/SA (⁠|$1.24\pm 2.93$| kcal/mol), MM-GB/SA (⁠|$-0.62\pm 2.20$| kcal/mol) and LIE (⁠|$-1.55\pm 1.13$| kcal/mol). Compared with |$\Delta \Delta G_{bind}^{benzene\rightarrow toluene}$|⁠, |$\Delta \Delta G_{bind}^{benzene\rightarrow phenol}$| costed much shorter TI-calculation time (10.05 hours) due to the handling of less appearing atoms in the TI regions, while the costs were still much more expensive than most of the other methods (⁠|$2.81\sim 3.58$| hours). If only considering the sign, all methods except TI and umbrella sampling captured the correct sign of |$\Delta \Delta G_{bind}^{benzene\rightarrow toluene}$|⁠, whereas TI, umbrella sampling and MM-PB/SA produced the correct sign of |$\Delta \Delta G_{bind}^{benzene\rightarrow phenol}$|⁠. Overall, TI performed the best but were computationally very expensive. In this simple case study, we conducted short TI simulations (200 ps for each window), but much longer simulations (nanosecond-level) will be needed by more complicated systems. Additionally, the one-step protocol for each alchemical transformation can be replaced by multi-step protocols (such as treating electrostatic and vdW interactions separately). it may increase the prediction accuracy but require more computing time. In this study, MM-PB/SA served as an accurate (predictions with correct signs) and quite efficient (⁠|$\sim 3$| hours per prediction) tool, largely ascribed to the ignorance of entropy contribution in the calculations of |$\Delta G_{bind}$|⁠. MM-GB/SA was less accurate but generally faster than MM-PB/SA. For LIE, we simply adopted fixed parameters [116] for the calculations of |$\Delta G_{bind}$|⁠, which might result in the underperformance of LIE to TI and MM-PB/SA. In specific studies, one may need to adjust such parameters carefully. Umbrella sampling costed extremely long time to predict |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}$|⁠, while did not result in predictions accurate enough to compensate such computational costs. The ligand-pulling direction may be a potential factor that influences the prediction performance, but determining such directions will require more sophisticated mechanisms.

Different diagrams for calculating relative binding free energy of two ligands ($L$ and $L^{\prime}$) binding to a protein ($P$). (A) TI method. (B) LIE method. (C) Umbrella sampling method. (D) Molecular mechanics Poisson-Boltzmann (generalized Born)/surface area (MM-PB(GB)/SA) method.
Figure 5

Different diagrams for calculating relative binding free energy of two ligands (⁠|$L$| and |$L^{\prime}$|⁠) binding to a protein (⁠|$P$|⁠). (A) TI method. (B) LIE method. (C) Umbrella sampling method. (D) Molecular mechanics Poisson-Boltzmann (generalized Born)/surface area (MM-PB(GB)/SA) method.

Machine learning-based scoring functions

Benchmark databases

Developing a trustworthy scoring function requires a set of protein–ligand complexes, with experimentally determined structures and affinities of high quality, for calibrating the parameters in the function. The size and quality of this set will affect the calibration or even bias the results. Efforts have been made in the past two decades to develop comprehensive databases of protein–ligand complexes, by consolidating information, such as the structural and binding data, from different sources. Below, we list a number of highly accessed databases, upon which both generic scoring functions and family-specific functions can be developed.

BindingDB: Launched in 2000 (http://www.bindingdb.org) and with a few updates thereafter [16, 30], BindingDB has collected over 1.8 million data entries of experimental protein–ligand interaction data mostly from scientific publications and patents. Drug targets or the potential candidates, coupled with small drug-like molecules, are the main focus of BindingDB. With the rapid expansion of BindingDB, currently the collected entries have covered over 7000 protein targets and 0.8 million ligands. Aside from the affinity measurements (⁠|$K_d$|⁠, |$K_i$| or |$IC_{50}$|⁠), key experimental conditions (e.g. temperature, pH and buffer composition) have also curated for each entry. These entries have additionally been navigated to data sources such as the PDB, pathway databases and compound databases, according to data availability. Users can download the queries upon a free registration in the format of SDfile or TSV file, or access the database programmatically (e.g. RESTful API). BindingDB also provides a series of data sets, such as those validation sets designed for evaluation and parameterization of models in computer-aided drug design, for specialized uses.

PDBbind: PDBbind or PDBbind-CN (http://www.pdbbind.org.cn) was first released to the public in 2004 and updated annually until recently (newest release of version 2019) [114]. Differing from BindingDB, PDBbind aims to provide a solid link between molecular structural information and binding data. It was developed based on PDB, by screening for complexes of interests (e.g. protein–ligand complexes) and scraping the binding data from relevant references (⁠|$K_d$|⁠, |$K_i$| or |$IC_{50}$| value, preferably in a standard or near-standard experimental setting). Currently, PDBbind has covered over 21 000 biomolecular complexes, where around 18 000 are protein–ligand complexes. Additionally, the developers have filtered the collected data into a refined set by multi-step quality control and further condensed it into a core set by clustering the complexes using protein similarity, both of which have served as convenient entries into molecular docking and structure-based scoring of molecular binding affinities. Users can download the structural files together with the binding data as a package upon a free registration.

Binding MOAD: Originally published in 2005 [40] and with regular updates, Binding MOAD is a database storing high-quality, experimentally determined, structural and binding data of protein–ligand complexes (http://bindingmoad.org). Quite similar to PDBbind, Binding MOAD was developed using a top-down procedure that started from the PDB. The entry criteria for Binding MOAD include a higher resolution (⁠|$<2.5\mathring{A}$|⁠) of the crystal structure of the complex, at least one protein chain, a ligand that non-covalently binds to the protein, and valid binding data (⁠|$K_a$|⁠,|$K_d$|⁠, |$K_i$| or |$IC_{50}$| value, order of preference: |$K_d>K_i>IC_{50}$|⁠) curated from literature. Redundancy was further addressed by sequence alignment coupled with unified binding sites, and similarity data for the ligands and binding sites were also supplied as a part of the database. Currently, Binding MOAD (release of 2018) contains |$\sim $| 36 000 protein–ligand structures, where over 17 000 unique ligands and more than 13 000 binding data are uncovered. Binding MOAD provides different subsets from PDBbind for specialized applications, and such data can be downloaded as a package.

CSAR benchmarks: The community structure activity resource (CSAR) aims to collect high-quality data from industry and academia and make them publicly available for development and benchmarking of computational methods for ligand docking and scoring (http://www.csardock.org). The dataset released in the 2010 CSAR benchmark exercise [26] was developed for effective evaluation of different docking and scoring methods. Starting from Binding MOAD and PDBbind, two initial datasets with PDB structures deposited before 2006 (set 2) and between 2007 and 2008 (set 1) were identified. Under a series of refining processes according to high structure quality, ligand validity, preferable binding data and manual inspection, the pruned datasets were released as the CSAR high-quality (CSAR-HiQ) sets. Later with a remediation based on improved quality-assessment criteria and a further refinement work by the National Research Council of Canada (NRC), the CSAR-NRC HiQ sets (set 1: 176 complexes, set 2: 167 complexes) were released. The benchmark exercise continued annually from 2012 to 2014, with more data collected from the industry and a focus on target-specific docking/scoring [14, 25, 93]. Another major data set of CSAR, named CSAR 2012 set, was released in 2012 and covered 6 protein targets with 647 compounds and 82 crystal structures [25]. Although not all protein–ligand crystal structures were available in this set, it still serves as a valuable data source for the prediction of binding affinity (generic or target-specific). Details of all the datasets adopted for the benchmark exercises (CSAR-HiQ and CSAR 2012|$\sim $|2014) can be downloaded from the CSAR website.

Platinum: Platinum, publicly accessible at http://structure.bioc.cam.ac.uk/platinum, is a specialized database that aims at studying the effects of structural mutations on protein–ligand affinities [86]. It started with a search of valid triplets, each containing a wild-type (WT) protein, a mutant and a ligand, from PubMed articles and the PDB. The entry criteria include the release of crystal structures of the complexes (WT-ligand, mutant-ligand or both) and valid binding data for both the complexes (⁠|$K_d$|⁠, |$K_i$| or |$K_m$|⁠). For each pair of WT-ligand and mutant-ligand complexes, the binding affinities were measured under the same experimental conditions by the same technique. Aside from the structural and binding data, the experimental conditions coupled with a number of ligand properties have also been supplied by Platinum. Although lacking regular updates, this database currently covers over 1000 triples and can potentially be developed into a benchmark for predicting relative biding affinities.

Scoring functions

Upon characterizing a protein–ligand complex as a set of features relevant to the binding affinity, machine-learning techniques can reveal the nonlinear relationship between such features and the binding affinity (generally log-transformed as |$pK_{d/i}=-log\ K_{d/i}$|⁠). In the evaluation of prediction performance, Pearson’s correlation coefficient (⁠|$R=\frac{cov(\boldsymbol{y}^{true},\boldsymbol{y}^{prediction})}{\sigma (\boldsymbol{y}^{true})\sigma (\boldsymbol{y}^{prediction})}$|⁠), mean absolute error (⁠|$MAE=\frac{1}{N}\sum \limits _{n=1}^{N}|y^{true}_n-y^{prediction}_n|$|⁠) and the root mean square error (⁠|$RMSE=\sqrt{\frac{1}{N}\sum \limits _{n=1}^{N}(y^{true}_n-y^{prediction}_n)^2}$|⁠) are commonly used metrics. A higher correlation and a lower error indicate a better performance. In what follows, we briefly describe a number of scoring models, grouped according to different feature representations, for the prediction of protein–ligand binding affinity.

Geometrical descriptors

RF-SCORE: RF-SCORE is among the earlier studies of machine learning-based scoring functions that showed a better performance than classical scoring functions [5]. Based on a selected group of heavy-atom types |$\boldsymbol{A}=\{C,N,O,F.P,S,Cl,Br,I\}$|⁠, the counts for interacting atom pairs (⁠|$i$| and |$j$|⁠) with a specific atom-type pair (⁠|$A_p$| and |$A_l$|⁠), expressed as |$c_{(A_p,A_l)}=\sum _{i=A_p}\sum _{j=A_l}I_{d_{ij}<d_{cut}}$|⁠, were obtained. The interacting atoms, between the protein and ligand, were defined as pairs of atoms within a specific distance threshold (⁠|$d_{cut}=12\mathring{A}$| in [5]). The feature set is accordingly the counts for all atom-type pairs, as |$\boldsymbol{x}=\{c_{(A_p,A_l)}|A_p\in \boldsymbol{A}, A_l\in \boldsymbol{A}\}$|⁠. Random forests (RFs) were employed as a regression tool to map such features of each complex (⁠|$\boldsymbol{x}_n$|⁠, |$n=1,\ldots ,N$|⁠) to its binding affinity (⁠|$y_n$|⁠, |$n=1,\ldots ,N$|⁠). The predicted RF-SCORE can be derived by averaging the predictions from multiple trees (⁠|$T_p$|⁠, |$p=1,\ldots ,P$|⁠;|$P=500$| in [5]), as |$f_{\boldsymbol{RF-SCORE}}(\boldsymbol{x})=\frac{1}{P}\sum _{p=1}^{P}T_p(\boldsymbol{x})$|⁠. RF-SCORE was trained on the refined set of PDBbind (version 2007, with the core set excluded) and validated on the core set (⁠|$R=0.776$| and |$RMSE=1.58$|⁠).

Occurrences of atom-type pairs: The occurrences of atom-type pairs were also considered as features in an earlier study [23], while with a larger atom-type group (⁠|$\boldsymbol{A}=\{C.3,C.2,C.ar,C.cat,N.3$||$,N.ar,N.am,N.pl3,$||$O.3,O.2,O.co2,F.P.3,S.3,Cl,Br,Ca,Zn,Ni,Fe\}$|⁠) and several distances bins (⁠|$1\mathring{A}-6\mathring{A}$| at an interval of |$1\mathring{A}$|⁠) instead of a single threshold. After a feature-selection step, the kernel-partial least square (K-PLS) regression was applied to map these features to the binding affinities, with the radial basis function kernel used for distance calculation. As report in this work, a pruned descriptor set rather than the full set allowed |$R^2$| to reach 0.6. However, it has not been validated on a well-recognized benchmark set.

Quasi-fragmental descriptors: In another earlier work of Artemenko’s [4], such occurrences of atom-type pairs in multiple distance bins were named as quasi-fragmental descriptors. Specifically, these descriptors were extracted based on the atom types from Amber force field and a number of intervals (⁠|$0.5\mathring{A}$|⁠, |$1\mathring{A}$|⁠, |$1.5\mathring{A}$|⁠) for cutting distance bins (⁠|$<6\mathring{A}$|⁠), with the overlapping bins included or excluded. Besides, another group of physico-chemical descriptors (close contacts, metal-ligand interactions, flexible bonds, vdW interaction energy and electrostatic interaction energy) were included to complete the initial feature set. With a pruning of this feature set, an ensemble model of shallow neural networks was developed for a regression analysis, resulting in an average performance of |$R=0.7625$| and |$RMSE=1.79$| for a test set. Interestingly, the descriptors that frequently survived from the pruning process were primarily quasi-fragmental but not physico-chemical, demonstrating the validity of these simple geometrical descriptors in this prediction work. This method has not been validated on a well-recognized benchmark set.

CScore: In the work of Ouyang et al.[82], the atom-type pairs were further classified as repulsive or attractive by two fuzzy membership functions. These membership functions softened the boundary between repulsion and attraction for each atom-type pair, with the initial boundary defined as the sum of vdW radii of the two component atoms. Then, the occurrences of repulsive and attractive atom-type pairs were counted (⁠|$\{c_{(A_p,A_l)}^{repulsive},c_{(A_p,A_l)}^{attractive}|A_p\in \boldsymbol{A}^P, A_l\in \boldsymbol{A}^L\}$|⁠) and constituted the primary features in this work. Different atom types were considered for the ligand (⁠|$\boldsymbol{A}^L=\{H,C,N,O,F,P,S,Cl,Br,I\}$|⁠) and the protein (⁠|$\boldsymbol{A}^P=\{H,C,N,O,S\}$|⁠). Neural network models were employed to map the features to the binding affinities. CScore was trained on the PDBbind core set (version 2009) and tested on the refined set (with core set excluded), which had a performance of |$R=0.7668$| and |$RMSE=1.454$|⁠.

Knowledge-based descriptors

SVR-KB score: The prototypes of the above-mentioned quasi-fragmental descriptors (occurrences of atom-type pairs with a specific distance cutoff or within distance bins) are the knowledge-based pairwise potentials [31]. Such a pairwise potential can be defined based on a pair of atom types (⁠|$A_p$| and |$A_l$|⁠) and a distance cutoff (⁠|$d_{cut}$|⁠), as
(20)
where |$N_{obs}(A_p, A_l, d)$| and |$N_{exp}(A_p, A_l, d)$| are the observed (in training set) and expected (in a reference state) occurrences of |$(A_p, A_l)$| pair within the distance bin |$[d-\frac{\Delta d}{2},d+\frac{\Delta d}{2}]$|[66]. Upon Equation 20, a potential can be defined for each atom-type pair as |$V_{(A_p, A_l)}=\sum _{d}v(A_p, A_l, d)$|⁠. In [66], parameters as follows were adopted for calculating the pairwise potentials.
  • (i) |$d_{cut}=12\mathring{A}$|

  • (ii)

    $$\Delta d=\begin{cases}2\mathring{A},\ d\le 2\mathring{A}\\0.5\mathring{A},\ 2\mathring{A}<d\le 8\mathring{A}\\1\mathring{A},\ d>8\mathring{A}\end{cases}$$

  • (iii) |$\boldsymbol{A}^L=\{C.3,C.2,C.ar,C.cat,N.4,N.am,N.pl3,N.2,O.3,O.2,O.co2,$||$ P.3,S.3,F,Cl,Br,I\}$|

  • (iv) |$\boldsymbol{A}^P=\{C.3,C.2,C.ar,C.cat,N.4,N.am,N.pl3,N.2,O.3,O.2,O.co2,$||$P.3,S.3,metals\}$|⁠.

Such pairwise potentials were used as descriptors in different scenarios for protein–ligand binding, and then absorbed by support vector regression (SVR) models to predict the binding affinity. This SVR-KB score was trained on the PDBbind refined set (version 2010) and tested on the CSAR-NRC HiQ sets, leading to the performances of |$R=0.67$| (set 1) and |$R=0.59$| (set 2).

Mutated AutoDock Vina: A number of attempts have been made to re-score protein–ligand complexes based on contributors of existing scoring functions, using advanced machine-learning methods [11, 22]. As an example, Bitencourt-Ferreira et al. employed the energy terms from the AutoDock Vina scoring functions and fitted such terms to the binding free energies through a polynomial scoring function. AutoDock Vina absorbs the advantages of both knowledge-based potentials (pairwise atomic potentials) and empirical scoring functions (parameterization) [104]. Specifically, it follows a general functional form based on an atom typing scheme, as
(21)
where |$r_{ij}$| is the distance between a pair of atoms |$i$| and |$j$| that are assigned the types of |$T_i$| and |$T_j$| and |$f_{T_iT_j}$| denotes a set of interaction functions. The predicted binding free energy is calculated from the intermolecular part of the conformation with the minimum score |$G^p_{bind}=h(\hat{V}_{inter})$|⁠, generally without defining an explicit functional form of |$f_{T_iT_j}$| or |$h$|⁠. In AutoDock Vina, |$f_{T_iT_j}$| is defined on the surface distance |$d_{ij}$| (with the sum of vdW radii of the two atom types subtracted from |$r_{ij}$|⁠) and contributed by the steric interactions, hydrophobic interactions and hydrogen bonds. Besides |$\hat{V}_{inter}$|⁠, |$h$| also concerns the active rotatable bonds between heavy atoms in the ligand. In the work of Bitencourt-Ferreira et al., the interaction terms from AutoDock Vina were used to train a polynomial scoring function, which achieved a performance of |$R=0.886$| on the test set. Aside from the good performance on the selected dataset (training: 34 complexes; test: 14 complexes), this method lacks a validation on a well-recognized benchmark set.

Physico-chemical descriptors

SVR-EP score: In the work of Li et al.[66], a group of physico-chemical descriptors were considered in the binding affinity prediction. These descriptors included ligand molecular weight, vdW interaction energy, hydrogen bonds, ligand deformation effects (rotatable bonds), hydrophobic effects and several terms concerning the SASA (buried/unburied, polar/nonpolar and ratios). Upon a feature selection step, the remaining descriptors were adopted by SVR models for binding affinity prediction. This SVR-EP score was trained on the PDBbind refined set (version 2010) and tested on the CSAR-NRC HiQ sets, with the performances of |$R=0.55$| (set 1) and |$R=0.50$| (set 2).

B2BScore: Liu et al.[69] developed the B2BScore by exploring two physical–chemical properties (B factors and |$\beta $| contacts) and learning them through RF regression models, in order to predict protein–ligand binding affinity. B factor quantifies the mobility of an atom in a dynamic protein structure, and a larger value generally suggests a higher flexibility of the atom. A normalized form of B factors for each atom was used in (⁠|$\hat{B}_i\in [0,2]$|⁠) [69]. A |$\beta $| contact between atoms |$i$| and |$j$|⁠, defined upon a distance threshold (⁠|$d_{cut}=2.8\mathring{A}$| in [69]) and an angle threshold (⁠|$\angle \beta \in [75,90]$|⁠, depending on |$d_{ij}$| in [69]), guarantees a forbidden region for enough direct contact between |$i$| and |$j$| (no other atoms in this region). A |$\beta $|-contacts graph can be constructed for a given protein–ligand complex, with each point representing a heavy atom and each edge a |$\beta $| contact. Depending on the |$\beta $|-contacts graph, a series of descriptors were generated based on the B factors of involved atoms. These descriptors characterized the cross-interface |$\beta $| contacts (for each atom-type pair), the within-binding-site |$\beta $| contacts (for each atom-type pair), the |$\pi $|-center involved |$\beta $| contacts, the backbone-atom involved |$\beta $| contacts, the metal-ion involved |$\beta $| contacts, the interacting atoms in ligand and the interfacial backbone atoms. Later, the descriptors were adopted by RFs for the prediction of binding affinity. B2BScore was trained on the PDBbind refined set (version 2009, with core set excluded) and tested on the core set, outperforming a number of scoring functions with |$R=0.746$| and |$RMSE=1.593$|⁠. It was also tested under the leave-one-cluster-out cross-validation scheme by grouping the proteins into 26 clusters in the PDBbind refined set, and it outperformed the RF-SCORE with |$\bar{R}=0.518$|⁠.

ID-Score: Li et al.[65] developed ID-Score by involving 50 descriptors relevant to protein-ligand interactions and employing SVR models for affinity prediction. Those 50 descriptors, mostly physico-chemical, were classified into nine categories. Specifically, the descriptors characterized the vdW interactions ascribed to different atom-type pairs, the hydrogen-bonding interactions for a number of acceptor–donor pairs, the electrostatic interactions, the |$\pi $|-system interactions (⁠|$\pi $|-|$\pi $| interactions, halogen-|$\pi $| interactions and electronegative atoms-|$\pi $| interactions), the metal-ligand interactions (O/N atoms in ligand with metal cations), the desolvation effects concerning the ligand (hydrophilicity, polar surface area, volume and SASA) and protein (ratio of the polar surface area to the non-polar surface area of the protein binding pocket, and the binding-site SASA), the entropy effects concerning ligand rotatable bonds, the shape complementarity between a ligand and its target and the property complementarity between the ligand surface and the binding-site surface (ascribed to electrostatic-property pairs). SVR models based on these descriptors were trained on the PDBbind refined set (version 2011, with core set excluded) and tested on the core set, with the performance of |$R=0.7525$| and |$RMSE=1.6282$|⁠. Besides, ID-Score also performed well in target-specific predictions, with |$R$| ranging from 0.627 to 0.851 (⁠|$\bar{R}=0.764$|⁠) for a carefully collected test set covering 20 diverse targets.

SFCscore|$\boldsymbol{^{RF}}$|: In the work of Zilian et al.[124], the conjunction of a rich set of descriptors developed earlier [95] and RF regression models evolved into the efficient |$\boldsymbol{SFCscore^{RF}}$| for the prediction of protein–ligand binding affinity. These descriptors are mostly physico-chemical and were also used in the construction of SFCscore[95]. In details, they described the simple ligand characteristics (molecular weight and number of atoms), the hydrogen bonds and scores (total number, total score, scores of charged and neutral hydrogen bonds), the metal-ligand interactions, the aromatic ring interactions (ring-ring, ring-metal and ring-lysine side chain), the hydrophobic effects (complementarity between the protein and ligand, and percentage of buried carbon atoms of the ligand), the entropy effects concerning rotatable bonds and scores (total number, total score and score of |$sp^3-sp^3/sp^2$| bonds) and the properties (hydrophobic, polar and aromatic) of several types of surfaces (total, buried, exposed and contact surfaces of the ligand and protein). By training RF regression models on the PDBbind refined set (version 2007, with the core set and CSAR-NRC HiQ sets excluded) using such descriptors, the derived |$\boldsymbol{SFCscore^{RF}}$| achieved the performances of |$R=0.779/RMSE=1.56$| and |$R=0.730/RMSE=1.53$| on the core set and CSAR-NRC HiQ set (combining sets 1 and 2), respectively. Applying |$\boldsymbol{SFCscore^{RF}}$| to several target-specific prediction tasks also achieved a good performance.

Proteochemometric characterization

Proteochemometric approaches, first proposed by Wikberg et al.[61], indicate using structural features of both the protein (the binding site and neighborhood) and the ligand in the construction of predictive models.

PESD signatures: On the basis of the proteochemometric concept, Das et al.[20] developed the property-encoded shape distribution (PESD) signatures for protein–ligand interaction surfaces, which were used as features by SVR models for binding affinity prediction. The protein interaction surface (PIS) and ligand interaction surface (LIS) for each complex were defined based on distance cutoffs for protein–ligand atom pairs (⁠|$4.5\mathring{A}$| for PISs and |$3\mathring{A}$| for LISs). These PISs and LISs were further encoded with physico-chemical properties by EP (electrostatic potential) and ALP (hydrogen bonds, polarization and hydrophobicity) surface maps [60]. In total, 100 000 point pairs with various distances on a property-encoded interaction surface (PIS or LIS), which was triangulated for simplicity, were sampled. Specifically, the distance was binned (⁠|$\boldsymbol{D}=\{[0,1\mathring{A}],[1\mathring{A},2\mathring{A}],\ldots ,[23\mathring{A},24\mathring{A}]\}$|⁠) and the property magnitudes were coarse-grained into several bins (9 for EP properties: |$\boldsymbol{P}^{EP}=\{P^{EP}_1,\ldots ,P^{EP}_9\}$| and 14 for ALP properties: |$\boldsymbol{P}^{ALP}=\{P^{ALP}_1,\ldots ,P^{ALP}_{14}\}$|⁠). Each point pair was assigned to a two-dimensional bin, with the bin populations (⁠|$\{c_{d,(p_1,p_2)}^{EP}|d\in \boldsymbol{D},p_1\in \boldsymbol{P}^{EP},p_2\in \boldsymbol{P}^{EP}\}$| or |$\{c_{d,(p_1,p_2)}^{ALP}|d\in \boldsymbol{D},p_1\in \boldsymbol{P}^{ALP},p_2\in \boldsymbol{P}^{ALP}\}$|⁠) counted. Such bin populations regarding the EP and ALP properties for both the PIS and LIS, |$\{c_{d,(p_1,p_2)}^{EP,PIS}, c_{d,(p_1,p_2)}^{EP,LIS}, c_{d,(p_1,p_2)}^{ALP,PIS}, c_{d,(p_1,p_2)}^{ALP,LIS}\}$|⁠, constituted the PESD signatures for a protein–ligand complex and were fed into an SVR model for affinity prediction. This method was trained on the PDBbind refined set (version 2005, with the core set excluded) and tested on the core set, yielding a better prediction performance of |$R=0.638/MAE=1.45$| compared with classical empirical scoring functions.

QSAR descriptors: Machine-learning approaches have been extensively applied to decoding quantitative structure-activity relationships (QSAR) [34]. The QSAR bioactivity predictions primarily focus on ligand-only properties/signatures (e.g. extended-connectivity fingerprints (ECFPs) [89]), protein-only features (mostly of amino acid sequences, e.g. amino acid composition descriptors and autocorrelation descriptors) and any combinations thereof. This differs from typical machine learning-based scoring functions in exploiting the protein structures or protein–ligand relative structures. (i) In the work of Li et al.[67], QSAR descriptors for the protein sequence, the ligand and the binding pocket (e.g. charged partial surface area) were adopted in the prediction of protein–ligand binding affinity. This method can also be considered as proteochemometric, although the proteins were characterized using the sequence properties instead of structural features. The initial 3286 descriptors were pruned by checking the redundancy and weighting the relevance to the target, upon which the least squares support vector machines were employed for a regression analysis. Two separate models were built for predicting |$K_d$| and |$K_i$| affinities in the PDBbind refined set (version 2007). For |$K_d$| or |$K_i$| samples in this set, the training and testing samples were partitioned as |$4:1$|⁠. The testing performances of |$R=0.833/RMSE=1.118$| and |$R=0.742/RMSE=1.471$| were derived in the two scenarios. However, the cutoffs for redundancy checking and relevance weighting in this work seem quite arbitrary, and the resulting descriptors may in turn bias the prediction performance. (ii) In another work of Kundu et al.[60] that adopted a similar proteochemometric concept, features describing the whole protein and the ligand were extracted for a set of protein–ligand complexes (a subset of PDBbind of version 2015), upon which machine-learning methods such as RFs were trained. In detail, descriptors of the proteins included the amino acid compositions of the sequence, SASA, hydrogen bonds for describing different secondary structures, number of chains, number of |$S-S$| bridges and number of residues. For ligands, the atom counts for different atom types, bound types and ring types and a number of physico-chemical properties (e.g. molecular weight and hydrogen-bond accepor/donor) were considered. The 10-fold cross-validation performance on the PDBbind subset reached |$R^2=0.765/RMSE=1.31$|⁠, and the performance of a blind-set validation (⁠|$4:1$| for training and testing samples in the PDBbind subset) achieved |$R^2=0.75/RMSE=1.38$|⁠.

PLEC fingerprint: W|$\acute{o}$|jcikowski et al.[118] proposed the protein–ligand extended connectivity (PLEC) fingerprint to encode protein-ligand interactions, and employed such fingerprints in the construction of machine-learning models tailored for binding-affinity prediction. PLEC fingerprints belong to the QSAR community, and the related predictive models are proteochemometric as they adopt the fingerprints of fragments from both the protein and ligand. In the generation of PLEC fingerprints, interacting atom pairs were first identified (⁠|$d_{ij}<4.5\mathring{A}$|⁠) with |$n$|-depth neighborhoods of each atom defined, and then pairings between the ligand and protein atomic neighborhoods based on depth thresholds (⁠|$n^P$| for protein and |$n^L$| for ligand) were conducted with each pairing hashed to a bit position in the PLEC fingerprint. The following depth pairs of the ligand and protein atomic neighborhoods were used in the pairings.

  • |$\{(n^p,n^l)|n^p=n^l\le \min (n^P,n^L)\}$|

  • $$\{(n^p,n^l)|\begin{cases}n^l=n^L,n^L<n^p\le n^P;\ \ \ n^L<n^P\\ n^p=n^P,n^P<n^l\le n^L;\ \ \ n^P<n^L\end{cases}\}$$

These atomic neighborhoods were encoded by the ECFPs, which concern the features of atomic number, isotope, number of neighboring heavy atoms, number of hydrogen atoms, formal charge, ring membership and aromaticity [89]. Later, a raw PLEC fingerprint was folded to a smaller length. Comparisons among the PLEC fingerprints can be relied on common similarity measures such as the Tanimoto coefficients. Using these PLEC fingerprints, machine-learning models were trained on the PDBbind general set (version 2016, with the core set and Astex diverse set [38] excluded), with the neighborhood depths (⁠|$n^P$| and |$n^L$|⁠) and fingerprint folding size tuned. The best models yielded promising prediction results on the core set (⁠|$R=0.817$|⁠) and the Astex diverse set (⁠|$R\in [0.65,0.7]$|⁠), outperforming a number of other approaches. A similar structural protein–ligand interaction fingerprint (SPLIF), which was developed earlier with |$n^P=n^L=1$|[18], also resulted in a good affinity prediction in PDBbind[119].

Featurizers tailored for deep-learning methods

Traditional machine learning-based scoring functions rely heavily on representing molecules with hand-curated, fixed-length vector features. This limitation can be mitigated by the flexibility of deep neural networks, which make data-driven decisions to learn a hierarchy of feature representations from the simplest representations. Recently, such deep-learning models have begun to thrive in the fields of computational biology, chemistry and physics, including the studies of protein–ligand binding [33]. Below, we list several promising deep-learning models that have been applied to the prediction of protein–ligand binding affinity.

ACNN: Atomic convolutional neural networks (ACNNs) were proposed in the work of Gomes et al.[33] to predict protein–ligand binding affinity. An ACNN is composed of atom-type convolution layers, radial pooling layers and atomistic fully connected layers (Figure 6A). Based on the atomic coordinates and atom-type list of a molecule, a distance matrix |$R$| (⁠|$N\times M$|⁠) and atom-type matrix |$Z$| (⁠|$N\times M$|⁠) can be constructed. Here, |$N$| is the number of atoms in the molecule, |$M$| (⁠|$M=12$| in [33]) is the number of neighbors for each atom, |$R_{ij}$| is the distance between atom |$i$| and its |$j$|-th neighbor, and |$Z_{ij}$| is the atom type of the |$j$|-th neighbor of atom |$i$|⁠. An atom-type convolution operation with a filter |$f=(f_1,\ldots ,f_{N_{at}})$| of size |$1\times 1\times N_{at}$| and a stride of 1 can be defined as
(22)
The architecture of ACNNs and the application to protein–ligand ($P-L$) affinity prediction. (A) ACNN architecture. (B) Combining ACNNs with a thermodynamic cycle for affinity prediction.
Figure 6

The architecture of ACNNs and the application to protein–ligand (⁠|$P-L$|⁠) affinity prediction. (A) ACNN architecture. (B) Combining ACNNs with a thermodynamic cycle for affinity prediction.

where the depth |$N_{at}$| of the filter is the number of unique atom types in the molecule, and |$f_i$| (⁠|$i=1,\ldots ,N_{at}$|⁠) is an atom type. This layer expands |$R$| (⁠|$N\times M$|⁠) into a matrix |$C$| (⁠|$N\times M\times N_{at}$|⁠). |$C$| is then down-sampled through feature binning in the radial-pooling layer, with filters |$g_l(x)$| (⁠|$l=1,\ldots ,N_r$|⁠) of size |$1\times M\times 1$| and a stride of 1 (Equation 23).
(23)
Each pooling filter is imposed on slices of |$C$| and attempts to summarize a type of radial interaction by summing up the pairwise interactions between an atom (type |$a_i$|⁠) and all its neighbors of type |$a_j$|⁠. In this regard, ACNN follows the idea of pairwise atomic potentials (knowledge-based scoring). Subsequently, the pooling results |$P$| (⁠|$N\times N_{at}\times N_r$|⁠) are flattened (⁠|$N\times N_{at}\cdot N_r$|⁠) and fed row-wise into an atomistic fully connected layer (same weights and biases for the rows/atoms), which outputs the atomic energies |$E_i$| (⁠|$i=1,\ldots ,N$|⁠). Such atomistic fully connected layers only concern the number of features (⁠|$N_{at}\times N_r$|⁠), and thus can generalize to larger molecular systems. The total energy of the molecule is derived by an accumulation |$E=\sum _iE_i$|⁠. For applying ACNNs to the prediction of protein–ligand binding affinity, a larger network is constructed as a conjunction of three weight-sharing ACNNs for the complex, protein and ligand, respectively (Figure 6b). This implicitly appends the thermodynamic cycle to the network structure (⁠|$y^{prediction}=\Delta E=E^{complex}-E^{protein}-E^{ligand}$|⁠) and aims to minimize the training loss of |$L=(y^{prediction}-y^{true})^2$|⁠. ACNNs have been trained and tested on the PDBbind benchmarks (version 2015, train/test splits followed 80/20 ratio and several partition schemes), with the best performances of |$R^2=0.529/MAE=0.668$| (temporal split) for the refined set and |$R^2=0.448/MAE=0.774$| (random split) for the core set. This outperformed a number of other structure-based or ligand-based prediction models.

|$\boldsymbol{K_{DEEP}}$|⁠: In [50], |$\boldsymbol{K_{DEEP}}$|⁠, which integrated 3D voxel representations of molecules and 3D convolutional neural networks (CNNs), was developed to predict protein–ligand binding affinity. The voxel representation indicates a 3D-grid featurizer, where each voxel or grid point is described by the properties (e.g. atom type or physico-chemical types) of the atoms nearby. A 3D-grid of a fixed size (⁠|$24\mathring{A}\times 24\mathring{A}\times 24\mathring{A}$| with |$1\mathring{A}$| intervals at each side in [50]) is generally placed at the geometric center of the ligand, in order to capture a neighborhood of the binding site. The atomic properties including hydrophobicity, hydrogen-bond donor and acceptor, aromaticity, positive ionizable or negative ionizable, metallicity and excluded volume for both the protein and ligand in a complex were considered in [50], leading to 16 channels in the representation. Such a voxel representation can be summarized as a 4D-tensor |$(x,y,z,(f_1,\ldots ,f_{16}))$|⁠, where the first three dimensions indicate the Cartesian coordinates of the center of a voxel and the last dimension is a vector of properties describing this voxel (Figure 7). In each property channel, the contribution of an atom to a voxel is measured according to the vdW radius |$r_{vdW}$| of this atom and the atom-voxel distance |$r$|⁠, as |$n(r) = 1-exp(-(\frac{r_{vdW}}{r})^{12})$|⁠. To train a model insensitive to protein–ligand orientation, each complex structure was augmented to 24 structures with different orientations by rotated 3D-grids (all possible combinations of 90|$^{\circ }$| rotations). The 3D CNNs, comprising multiple convolutional/pooling layers and a fully connected layer, were applied to the learning of such representations. The network architecture was initially inspired by AlexNet [44]. |$\boldsymbol{K_{DEEP}}$| was trained on the PDBbind refined set (version 2016, with the core set excluded) and tested on the core set, having a performance of |$R=0.82/RMSE=1.27$|⁠. It was also tested on a number of other benchmarks such as the CSAR sets, resulting in superior or competitive performances compared with other predictions.

Voxel representation focused on the geometrical center of the ligand in a protein-ligand complex, with $M$ property channels.
Figure 7

Voxel representation focused on the geometrical center of the ligand in a protein-ligand complex, with |$M$| property channels.

Pafnucy: Stepniewska-Dziubinska et al.[98] developed Pafnucy, a similar model to |$\boldsymbol{K_{DEEP}}$| that uses CNNs to process voxel representations of protein–ligand complexes for affinity prediction. In their study, each complex was cropped to a cubic box with the size of |$20\mathring{A}\times 20\mathring{A}\times 20\mathring{A}$| and the focus at the geometric center of the ligand. A 3D-grid was generated for the cubic box with |$1\mathring{A}$| spacing, and all heavy atoms were placed in the grids by discretizing their positions, differing from |$\boldsymbol{K_{DEEP}}$| that measures the contributions of each atom to each grid point. Pafnucy covers more atomic properties, including nine atom types, atom hybridization, bonds to heavy atoms, bonds to other heteroatoms, hydrophobicity, aromaticity, hydrogen-bond donor and acceptor, ring membership, partial charge and the sign for protein/ligand. The properties for each voxel were derived by summing up the values of all colliding atoms in that voxel. The voxel representations were subsequently learned by CNNs, with an architecture composed of three 3D convolutional layers, each followed by a max pooling layer, and three fully connected layers, followed by a final single output neuron for predicting the binding affinity. Pafnucy was trained on the PDBbind general set (version 2016, with the validation/test sets excluded) with a validation on a random set selected from the refined set (1000 complexes) for parameter tuning. Then, the trained model was tested on the PDBbind core sets (version 2016 and 2013) and the Astex diverse set [38], with the performances of |$R=0.78/RMSE=1.42$|⁠, |$R=0.70/RMSE=1.62$| and |$R=0.57/RMSE=1.43$|⁠, respectively.

DeepDTA: In [83], the DeepDTA model, which describes a protein–ligand complex by a 1D representation and employs CNNs to learn these representations, was proposed for the prediction of protein–ligand binding affinity. The 1D representation was derived by a simple label encoding (64 labels for ligand and 25 labels for protein) of the ligand SMILES sequence and the protein sequence into fixed lengths (by truncation or 0-padding). Then, a network architecture composed of two convolution blocks (one for the ligand and the other for the protein) and one block of fully connected layers was adopted for processing these representations. The convolution blocks each concatenate three convolutional layers and one max-pooling layer, and the final block contains three fully connected layers (with dropouts used to avoid overfitting). DeepDTA was trained and tested on two data sets (Davis[21] and KIBA[101]), with |$\frac{5}{6}$| of each set used for training (5-fold cross-validation scheme) and |$\frac{1}{6}$| for testing. The concordance index (⁠|$CI$|⁠), which measures whether the predicted affinities were in the same order as the true affinities were, and mean square error (⁠|$MSE=RMSE^2$|⁠) were used for performance evaluation. The performances were |$CI=0.878/MSE=0.261$| (Davis set) and |$CI=0.863/MSE=0.194$| (KIBA set), respectively. However, this work has not been validated on a well-recognized benckmark or compared with other state-of-the-art scoring functions. Furthermore, the constructed models seemed dataset-specific (e.g. different dimensions for inputs), which may be questioned for the generalization capabilities to other datasets.

DeepBindRG: Zhang et al.[121] developed DeepBindRG, which uses a deep-learning model to learn protein–ligand contacts, for predicting the binding affinity. In the characterization of protein–ligand complexes, the atom types for the ligand and protein were first defined referring to the generalized Amber force field (84 types for ligand) and the Amber99 force field (41 types for protein). Protein–ligand interacting atom pairs were identified based on a distance threshold (⁠|$4\mathring{A}$|⁠), and the atoms in each pair were one-hot encoded according to the ligand atom types or the protein atom types. The interacting atoms in each protein were further clustered into five groups, for gathering the atom pairs that had atoms in the same group in the contact matrix |$C$|⁠. |$C$| is the final feature representation of a protein–ligand complex, with 1000 rows (by truncation or 0-padding) and each row indicating the one-hot representation of an interacting atom pair. Clustering the protein interacting atoms seemed to retain the local spatial information, which can be useful in convolutions. Such extracted 2D features were learned by a ResNet, consisting of seven convolutional blocks (three convolutional layers for each), a max pooling layer and a fully connected layer, to predict protein–ligand binding affinity. The PDBbind general set (version 2018, with several independent test sets excluded) was partitioned for training, validation and testing of DeepBindRG, with a performance of |$R=0.5993/RMSE=1.497$| on the test set. Furthermore, the trained model was tested on several independent sets and had prediction performances of |$R=0.6394/RMSE=1.817$| for the core set (version 2013), |$R=0.6585/RMSE=1.7239$| for the CSAR-NRC HiQ set and |$R=0.4657/RMSE=1.6209$| for the Astex diverse set [38]. However, this model underperformed Pafnucy (described above) in the predictions for several test sets and lacked a thorough comparison with other state-of-the-art scoring functions.

Table 1

Experimental and predicted binding free energies of small molecules binding to bacteriophage T4 lysozyme. Prediction methods include TI, LIE, umbrella sampling (US) and molecular mechanics Poisson-Boltzmann (generalized Born)/surface area (MM-PB(GB)/SA)

Predicted |$\Delta\Delta G_{bind}^{[b]}$| (computational costs|$^{[c]}$|⁠)
Ligand pair (formula)Experimental |$\Delta\Delta G_{bind}^{[a]}$|TILIEUSMM-PB/SAMM-GB/SA
benzene|$\rightarrow$|toluene (C6H6|$\rightarrow$|C7H8)−0.320.02|$\pm$|0.14 (73.67)−0.65|$\pm$|0.68 (3.57)6.05 (129.7)−1.33|$\pm$|2.46 (3.25)−3.09|$\pm$|1.97 (2.8)
benzene|$\rightarrow$|phenol (C6H6|$\rightarrow$|C6H6O)|$>0$|1.35|$\pm$|0.44 (10.05)−1.55|$\pm$|1.13 (3.58)7.15 (128.68)1.24|$\pm$|2.93 (3.26)−0.62|$\pm$|2.20 (2.81)
Predicted |$\Delta\Delta G_{bind}^{[b]}$| (computational costs|$^{[c]}$|⁠)
Ligand pair (formula)Experimental |$\Delta\Delta G_{bind}^{[a]}$|TILIEUSMM-PB/SAMM-GB/SA
benzene|$\rightarrow$|toluene (C6H6|$\rightarrow$|C7H8)−0.320.02|$\pm$|0.14 (73.67)−0.65|$\pm$|0.68 (3.57)6.05 (129.7)−1.33|$\pm$|2.46 (3.25)−3.09|$\pm$|1.97 (2.8)
benzene|$\rightarrow$|phenol (C6H6|$\rightarrow$|C6H6O)|$>0$|1.35|$\pm$|0.44 (10.05)−1.55|$\pm$|1.13 (3.58)7.15 (128.68)1.24|$\pm$|2.93 (3.26)−0.62|$\pm$|2.20 (2.81)

[a]Calculated based on experimental |$K_d$| values. |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}=RT\ ln(\frac{K_d^{L\prime}}{K_d^L})$| with |$T=300$| K.

[b] In kcal/mol. |$K_d$||$\sigma_{\Delta\Delta G_{bind}^{L\rightarrow L\prime}}=\sqrt{\sigma_{\Delta G_{L\rightarrow L\prime}^{protein}}^2\pm\sigma_{\Delta G_{L\rightarrow L\prime}^{water}}^2}$| (TI) or |$\sigma_{\Delta\Delta G_{bind}^{L\rightarrow L\prime}}=\sqrt{\sigma_{\Delta G_{bind}^{L}}^2\pm\sigma_{\Delta G_{bind}^{L\prime}}^2}$| (LIE and MM-PB(GB)/SA).

[c]In hours. Calculated as the sum of simulation time and energy-estimation time.

[d]Entropy contributions not included

Table 1

Experimental and predicted binding free energies of small molecules binding to bacteriophage T4 lysozyme. Prediction methods include TI, LIE, umbrella sampling (US) and molecular mechanics Poisson-Boltzmann (generalized Born)/surface area (MM-PB(GB)/SA)

Predicted |$\Delta\Delta G_{bind}^{[b]}$| (computational costs|$^{[c]}$|⁠)
Ligand pair (formula)Experimental |$\Delta\Delta G_{bind}^{[a]}$|TILIEUSMM-PB/SAMM-GB/SA
benzene|$\rightarrow$|toluene (C6H6|$\rightarrow$|C7H8)−0.320.02|$\pm$|0.14 (73.67)−0.65|$\pm$|0.68 (3.57)6.05 (129.7)−1.33|$\pm$|2.46 (3.25)−3.09|$\pm$|1.97 (2.8)
benzene|$\rightarrow$|phenol (C6H6|$\rightarrow$|C6H6O)|$>0$|1.35|$\pm$|0.44 (10.05)−1.55|$\pm$|1.13 (3.58)7.15 (128.68)1.24|$\pm$|2.93 (3.26)−0.62|$\pm$|2.20 (2.81)
Predicted |$\Delta\Delta G_{bind}^{[b]}$| (computational costs|$^{[c]}$|⁠)
Ligand pair (formula)Experimental |$\Delta\Delta G_{bind}^{[a]}$|TILIEUSMM-PB/SAMM-GB/SA
benzene|$\rightarrow$|toluene (C6H6|$\rightarrow$|C7H8)−0.320.02|$\pm$|0.14 (73.67)−0.65|$\pm$|0.68 (3.57)6.05 (129.7)−1.33|$\pm$|2.46 (3.25)−3.09|$\pm$|1.97 (2.8)
benzene|$\rightarrow$|phenol (C6H6|$\rightarrow$|C6H6O)|$>0$|1.35|$\pm$|0.44 (10.05)−1.55|$\pm$|1.13 (3.58)7.15 (128.68)1.24|$\pm$|2.93 (3.26)−0.62|$\pm$|2.20 (2.81)

[a]Calculated based on experimental |$K_d$| values. |$\Delta \Delta G_{bind}^{L\rightarrow L^{\prime}}=RT\ ln(\frac{K_d^{L\prime}}{K_d^L})$| with |$T=300$| K.

[b] In kcal/mol. |$K_d$||$\sigma_{\Delta\Delta G_{bind}^{L\rightarrow L\prime}}=\sqrt{\sigma_{\Delta G_{L\rightarrow L\prime}^{protein}}^2\pm\sigma_{\Delta G_{L\rightarrow L\prime}^{water}}^2}$| (TI) or |$\sigma_{\Delta\Delta G_{bind}^{L\rightarrow L\prime}}=\sqrt{\sigma_{\Delta G_{bind}^{L}}^2\pm\sigma_{\Delta G_{bind}^{L\prime}}^2}$| (LIE and MM-PB(GB)/SA).

[c]In hours. Calculated as the sum of simulation time and energy-estimation time.

[d]Entropy contributions not included

Case study of interpretable scoring functions

Since many machine-learning methods involve ‘black boxes’, obtaining interpretable scoring functions is an important goal in machine learning-based affinity prediction. In this regard, we conducted a case study of several interpretable scoring functions, based on simple geometrical descriptors that have led to good prediction performance and are closed tied to a number of scoring functions (e.g. RF-Score, Quasi-fragmental descriptors, CScore, SVR-KB and ACNN). Specifically, the latest PDBbind refined set (version 2019, with the core set excluded) of 4586 protein–ligand complexes were used for training the scoring functions, and the core set (CASF 2016) of 285 complexes for validation. These datasets mostly contain drug-like ligands and a variety of protein targets, which serve as useful data sources for scoring-function development and computer-aided drug discovery. The geometrical descriptors in RF-Score (Section 4.2.1; [5]) were used to characterize each protein–ligand complex. These descriptors |$\{c_{(A_p,A_l)}\}$| (36 types) indicate the occurrences of atom-type pairs (⁠|$A_p$|⁠, |$A_l$|⁠) in the interaction sites. For example, |$c_{(C,C)}$| represents the count of interacting atom pairs each comprising a Carbon atom from the protein and a Carbon atom from the ligand. Then, we used linear regression models, regression trees and RFs to map these descriptors to the experimental binding affinity (⁠|$\Delta G_{bind}=RT\ ln(K_d)$| with |$K_d$| supplied by PDBbind and T=300 K). These models were selected because they are simpler but more interpretable than other machine-learning models like neural networks, and we name the corresponding scoring functions as LM-Score, TREE-Score and RF-Score. The tree in TREE-Score was pruned using an optimal complexity parameter that was evaluated by cross validation [102]. The setting in [5] were adopted for constructing RF-Score (500 trees, optimal number of descriptors considered at each split: 8). All the models were run in a CPU setting (Intel® CoreTM i9-9900K @ 3.60GHz).

The distributions of the selected geometrical descriptors, in the training and test sets, are displayed in Figure 8A. We can see that occurrences of atom-type pairs such as |$(C,C)$|⁠, |$(N,C)$| and |$(O,C)$| are dominant, which makes sense since these are the commonest heavy atoms in proteins and ligands. The scoring functions were trained using these descriptors and the experimental affinity data of the training set and tested on the validation set (results in Table 2). From the computational costs (time for descriptor extraction, training and validation) listed in Table 2, we can see that these scoring functions are quite efficient. More results are elaborated as follows.

  • (i) LM-Score: Using the simplest linear regression model, we derived an RMSE of 2.34 kcal/mol, the correlations of 0.657 (Pearson’s) and 0.681 (Spearman’s). The detailed scoring function can be expression as |$y_{pred}=-0.0005\times c_{C,C}-0.0012\times c_{N,C}-0.0002\times c_{O,C}-0.0016\times c_{S,C}-0.0005\times c_{C,N}-0.0028\times c_{N,N}+0.0028\times c_{O,N}-0.0227\times c_{S,N}-0.0002\times c_{C,O}+ 0.0056\times c_{N,O}-0.0045\times c_{O,O}+0.0222\times c_{S,O}+0.0024\times c_{C,F}-0.0218\times c_{N,F}+0.0093\times c_{O,F}-0.0156\times c_{S,F}+0.0072\times c_{C,P}-0.0398\times c_{N,P}+0.0125\times c_{O,P}-0.0076\times c_{S,P}-0.0024\times c_{C,S}-0.0217\times c_{N,S}+0.0084\times c_{O,S}+0.1573\times c_{S,S}+0.0041\times c_{C,Cl}-0.0049\times c_{N,Cl}-0.0158\times c_{O,Cl}-0.0600\times c_{S,Cl}+0.0093\times c_{C,Br}+0.0298\times c_{N,Br}-0.0640\times c_{O,Br}-0.1271\times c_{S,Br}+0.0028\times c_{C,I}+0.0364\times c_{N,I}-0.0376\times c_{O,I}-0.1313\times c_{S,I}-5.5858$|⁠. The predicted binding affinities are presented in Figure 9A.

  • (ii) TREE-Score: Compared with LM-Score, employing regression trees improved the prediction performance (RMSE=2.18 kcal/mol, Person’s and Spearman’s correlations of 0.701 and 0.705). The predicted affinities are displayed in Figure 9B, where the predicted values are stratified mainly due to the structure of the regression tree. The final pruned regression tree, as shown in Figure 9D, classifies a protein–ligand complex according to the rules in the non-leaf nodes and scores it using the value of the leaf node that it belongs to. As an instance, if a complex has descriptors of |$c_{C,C}\lt 2182$|⁠, |$c_{N,S}\ge 68$| and |$c_{N,N}\lt 68$|⁠, then it belongs to leaf node 13 and has the predicted binding affinity of -6.8 kcal/mol.

  • (iii) RF-Score: An RF comprises a group of trees with similar structures as Figure 9D. Specifically, 500 trees with each split considering eight randomly selected descriptors were constructed in this work, and each complex was scored as the averaged outputs of these trees. The importance rankings of all the descriptors in the constructed RF are shown in Figure 8B, demonstrating that descriptors like |$c_{C,O}$|⁠, |$c_{O,O}$| and |$c_{N,O}$| primarily influenced the predictions. Using such complicated models resulted in a further improvement in the prediction performance (RMSE=1.98 kcal/mol, Person’s and Spearman’s correlations of 0.789 and 0.785).

Geometrical descriptors representing the occurrences of atom-type pairs in protein–ligand interaction sites. (A) Descriptor distributions in the training and test sets. (B) Importance of descriptors in RF-Score, according to the increase in mean square error of predictions (%IncMSE) caused by permuting each descriptor.
Figure 8

Geometrical descriptors representing the occurrences of atom-type pairs in protein–ligand interaction sites. (A) Descriptor distributions in the training and test sets. (B) Importance of descriptors in RF-Score, according to the increase in mean square error of predictions (%IncMSE) caused by permuting each descriptor.

Table 2

Performance comparisons among LM-Score, TREE-Score and RF-Score. Scoring functions were trained on PDBbind refined set (2019) and tested on the core set. The predicted affinities for the core set were evaluated according to the RMSE and correlations to the experimental data, as shown below

Scoring functionRMSE (kcal/mol)Pearson's correlationSpearman's correlationComputational costs|$^{[a]}$| (minutes)
LM-Score2.340.6570.6811.85
TREE-Score 2.18 0.7010.7051.85
RF-Score1.980.7890.78519.70
Scoring functionRMSE (kcal/mol)Pearson's correlationSpearman's correlationComputational costs|$^{[a]}$| (minutes)
LM-Score2.340.6570.6811.85
TREE-Score 2.18 0.7010.7051.85
RF-Score1.980.7890.78519.70

[a]Computational costs include the time for descriptor extraction, training and validation.

Table 2

Performance comparisons among LM-Score, TREE-Score and RF-Score. Scoring functions were trained on PDBbind refined set (2019) and tested on the core set. The predicted affinities for the core set were evaluated according to the RMSE and correlations to the experimental data, as shown below

Scoring functionRMSE (kcal/mol)Pearson's correlationSpearman's correlationComputational costs|$^{[a]}$| (minutes)
LM-Score2.340.6570.6811.85
TREE-Score 2.18 0.7010.7051.85
RF-Score1.980.7890.78519.70
Scoring functionRMSE (kcal/mol)Pearson's correlationSpearman's correlationComputational costs|$^{[a]}$| (minutes)
LM-Score2.340.6570.6811.85
TREE-Score 2.18 0.7010.7051.85
RF-Score1.980.7890.78519.70

[a]Computational costs include the time for descriptor extraction, training and validation.

Prediction results. Experimental versus predicted binding affinities by (A) LM-Score, (B) TREE-Score and (C) RF-Score. (D) The tree structure of TREE-Score.
Figure 9

Prediction results. Experimental versus predicted binding affinities by (A) LM-Score, (B) TREE-Score and (C) RF-Score. (D) The tree structure of TREE-Score.

Fusion and choice

Free energy-based simulations and machine learning-based scoring functions are two different groups of methods. Simulation methods mainly depend on thermodynamic-cycle designing, free-energy sampling and MD simulations (MM, QM or hybridization). MD simulations are often used to capture dynamical processes of biomolecules across different timescales, generating large amount of data (compressed as trajectory files). These data provide a valuable source of information for predicting free-energy differences in simulation methods, while are rarely used in machine learning-based scoring functions. The scoring functions mostly consider single, static complex structures and are constructed primarily based on characterizing the ligand-binding pockets. Modern GPU calculations have made the MD simulations much more efficient, offering an opportunity for fusion with machine-learning methods in constructing predictive models. For binding-affinity prediction in protein–ligand complexes, using the knowledge extracted from the rich simulation data to build scoring functions will be a promising direction in near future [85, 107].

Besides potential fusions between the two groups of methods, choice between them is another important topic. Compared with other free energy-based simulations, the MM-GB/SA method only depends on end-state simulations (⁠|$LP$| in water) and is therefore considered as an efficient simulation method. Accordingly, we attempted to compare MM-GB/SA with several scoring functions (in Section 4.3) in a case study. A small set of 100 protein–ligand complexes were used for validation (listed in Supplementary Table 1). It stemmed from the PDBbind core set (CASF 2016), and was further purified by removing complexes with crystal structures that have gaps (missing or modified residues) or metal interactions. Such selections aimed for faster and more accurate simulations. Specific settings for the methods are listed as follows.

  • (i) MM-GB/SA: For the simulations of each complex, a short energy minimization (500 steps), a heating step (50 ps), an equilibration step (500 ps) and a production simulation (3 ns) were conducted (same as in Section 3.5). The first 1-ns simulation data were abandoned and the following 2-ns trajectories (100 frames) were used for the calculation of binding free energy. Mostly, MM-GB/SA is employed for calculating the effective part of binding free energy (⁠|$\Delta \tilde{G}_{bind}$|⁠), namely ignoring the entropy contribution that is difficult to estimate. Although methods such as NMA can be used for entropy estimation, combining the effective part and such an estimated entropy may not lead to a better prediction of the absolute binding free energy. To provide a comparison with |$\Delta \tilde{G}_{bind}$|⁠, we estimated the binding free energy with entropy contribution for each complex. Only 20 trajectory frames were used for this estimation, due to the high computational costs of NMA calculations. All the simulations were GPU-accelerated, but the MM-PB/SA and NMA calculations were implemented using 20 CPU parallel processes (GPU-calculations not available).

  • (ii) Scoring functions: The machine learning-based scoring functions were also validated on the above-mentioned 100 complexes, while trained on the remaining complexes in the core set. Here, the information of validation set was not used when constructing the scoring functions, which differed from the simulation methods. Besides, the training set contained some complexes with gapped structures, which may not make a large difference as the scoring functions mostly concern the protein–ligand interaction sites (structural gaps rarely occur). TREE-Score was constructed as a pruned tree (with the optimal complexity parameter), and RF-Score was built using 20 trees (due to the small set) and considering 24 descriptors at each split. All the models were run in a CPU setting (same as in Section 4.3).

Since MM-GB/SA only calculates |$\Delta \tilde{G}_{bind}$|⁠, we evaluated the methods using correlations between the predicted and experimental affinities. The results are shown in Table 3. The |$\Delta \tilde{G}_{bind}$| calculated by MM-GB/SA achieved a correlation of |$>0.6$| with the experimental affinities, which outperformed or was comparable to the scoring functions. However, adding the entropy contributions (NMA-estimated) did not improve the correlations between the predicted and experimental affinities (⁠|$>0.5$|⁠). Using more data for entropy estimation or employing other estimation strategies may improve the results, but it remains a challenge. On the other hand, it indicates that MM-GB/SA is more applicable to calculating the relative binding affinities of similar systems. The simulations and MM-GB/SA predictions costed |$\sim $|297 hours (without entropy) and |$\sim $|616 hours (with entropy) for the selected complexes, which were much more expensive than the scoring functions (costs of several seconds). However, it is worth noting that, the scoring functions performed worse than those in Section 4.3 (with more training data), which demonstrated their strong dependence on training data. Conversely, it also shows that such scoring functions are improvable upon increased quantity/quality of training data. Using more data will not improve the performance of simulation methods primarily due to their latent physical models. The best performer in scoring functions (RF-Score) corresponded to correlations of 0.65 (Person’s) and 0.612 (Spearman’s) between the predicted and experimental data, which was comparable with MM-GB/SA with no entropy estimation (Person’s correlation: 0.627, Spearman’s correlation: 0.616). Methods like FEP/TI and umbrella sampling are promising to offer more accurate predictions, but with much higher computational costs. Overall, free energy-based simulations are commonly applicable to small-scale, accurate calculations of relative binding affinities, whereas machine learning-based scoring functions are improvable with more data and can be used efficiently for large-scale predictions of absolute binding affinities.

Discussion and conclusion

Free energy-based simulations and machine learning-based scoring functions are capable of providing accurate predictions of protein–ligand binding affinity. These two classes of computational methods have their specific strengths. Free energy-based simulations mostly consider the physical processes in molecular recognition, have a sound basis in statistical mechanisms and do not depend on a heavy parametrization. Based on the rapid growth of data available from the chemical and crystallographic communities, machine learning-based scoring functions are data-driven, can implicitly capture the binding effects that are hard to be modeled explicitly and can fast screen a vast number of compounds. Aside from these strengths, they also have some weaknesses as follows.

To achieve chemical accuracy in molecular simulations, the bottlenecks are the underlying physical models and the sampling convergence [49]. The sampling problem is the most important in this scenario, and the simulations hold great promise if the sampling convergence is solved. Technically, sampling difficulty in the simulations can be tackled with better algorithms and more powerful computers. A number of enhanced sampling methods (e.g. replica exchange [100], orthogonal space tempering [122], REST [68] and REST2 [112]) have been proposed accordingly. With the advances in such enhanced sampling techniques and the GPU-accelerated simulations, the sampling problem can be greatly mitigated. Compared with the sampling errors, the force-field errors are most likely small but fundamental. Progress has also been made in this field to support accurate simulations. Modern accurate force fields (e.g. OPLS3 force field [37] and polarized force fields [49, 117]) have been broadly applied in the simulations. Overall, these modern techniques have aided the accurate free energy-based simulations for affinity prediction. Nevertheless, there is still room for improvements in these techniques. Additionally, such simulations are computationally very expensive and remain impractical for the screening of a large number of protein–ligand complexes. Typically, they are limited to family-specific predictions [5].

Table 3

Performance comparison between free energy-based simulations (MM-GB/SA) and machine learning-based scoring functions (⁠|$\textbf{LM-Score}$|⁠, |$\textbf{TREE-Score}$| and |$\textbf{RF-Score}$|⁠). Performance was validated on a high-quality data set of 100 protein--ligand complexes, according to the correlations between predicted and experimental affinities

MethodPearson's correlationSpearman's correlationComputational costs[a]
Free energy-based simulationsMM-GB/SA (no entropy)0.6270.616297.43 hours (2.97 hours per complex)
MM-GB/SA(with entropy[b])0.5340.526616.20 hours (6.51 hours per complex)
LM-Score0.5290.5948 seconds
Machine learning-based scoring functionsTREE-Score0.5570.5688 seconds
RF-Score0.650.6128 seconds
MethodPearson's correlationSpearman's correlationComputational costs[a]
Free energy-based simulationsMM-GB/SA (no entropy)0.6270.616297.43 hours (2.97 hours per complex)
MM-GB/SA(with entropy[b])0.5340.526616.20 hours (6.51 hours per complex)
LM-Score0.5290.5948 seconds
Machine learning-based scoring functionsTREE-Score0.5570.5688 seconds
RF-Score0.650.6128 seconds
1

[a]Computational costs for MM-GB/SA include the time for simulations and free-energy calculations, costs for scoring functions include the time for descriptor extraction, training and validation.

1

[b]Entropy contributions were estimated using the NMA.

Table 3

Performance comparison between free energy-based simulations (MM-GB/SA) and machine learning-based scoring functions (⁠|$\textbf{LM-Score}$|⁠, |$\textbf{TREE-Score}$| and |$\textbf{RF-Score}$|⁠). Performance was validated on a high-quality data set of 100 protein--ligand complexes, according to the correlations between predicted and experimental affinities

MethodPearson's correlationSpearman's correlationComputational costs[a]
Free energy-based simulationsMM-GB/SA (no entropy)0.6270.616297.43 hours (2.97 hours per complex)
MM-GB/SA(with entropy[b])0.5340.526616.20 hours (6.51 hours per complex)
LM-Score0.5290.5948 seconds
Machine learning-based scoring functionsTREE-Score0.5570.5688 seconds
RF-Score0.650.6128 seconds
MethodPearson's correlationSpearman's correlationComputational costs[a]
Free energy-based simulationsMM-GB/SA (no entropy)0.6270.616297.43 hours (2.97 hours per complex)
MM-GB/SA(with entropy[b])0.5340.526616.20 hours (6.51 hours per complex)
LM-Score0.5290.5948 seconds
Machine learning-based scoring functionsTREE-Score0.5570.5688 seconds
RF-Score0.650.6128 seconds
1

[a]Computational costs for MM-GB/SA include the time for simulations and free-energy calculations, costs for scoring functions include the time for descriptor extraction, training and validation.

1

[b]Entropy contributions were estimated using the NMA.

Developing a machine learning-based scoring function, which is generally structure-guided, depends heavily on the quality of data for learning and the featurizer for representing the protein–ligand complexes. Considerable efforts have been made in the past two decades to construct comprehensive benchmark databases for the development and evaluation of scoring functions. Although it is hard to define a perfect data set, some common characteristics include high-resolution structures, a wide range of protein targets with each having several congeneric series of ligands (drug-like, both actives and inactives included), valid experimental affinity, consistency across the experiments for targets and the inclusion of experimental values of ligand physical properties [26]. Given a dataset, a generic scoring function is often learned from all the protein targets with different experimental techniques and affinity measurements (⁠|$K_d$|⁠, |$K_i$| and |$IC_{50}$|⁠). However, this generic treatment may increase the noise in the affinity prediction. Moreover, the experimental error is mostly underestimated in the prediction, as the measurements of one complex from an independent laboratory can differ by 3-fold and let alone the measurements from two laboratories or under different experimental conditions. As well, the consistency (e.g. protein sequence, pH, etc.) across the experiments of crystallography and affinity measurement can be another factor that deviates the prediction. All these requirements will lead to more stringent quality assessments for benchmark sets and more targeted treatments of the data. On the other hand, selecting an appropriate featurizer for the complexes is a key segment in machine learning-based scoring functions. In typical machine-learning models, heavy feature engineering (rules defined according to expert knowledge or statistical inference) is common, and the quality of features rather than the selection of regression models drives the prediction performance. A wide variety of features ranging from simple geometrical/physico-chemical descriptors to more complicated characterizations have been explored in earlier studies; however, new types of features can still be investigated in the future. Recently, the trend has been changed to providing the simplest feature representations (reduced feature engineering) for complexes and letting a deep-learning model to extract a hierarchy of feature representations that is relevant for affinity prediction (learning of a sophisticated relationship from data) [98]. With the fast development of computer power and deep-learning techniques, such trend will remain in future studies. No matter what machine-learning techniques are used, the overfitting problem in training and the generalization capability to new cases is always a noteworthy problem. Techniques specific to different machine-learning models have been developed, such as pruning in tree learning and adding dropout layers in neural networks, to mitigate the overfitting. For the generalization problem, it is recommended to test the model on data sets from different sources in order to remove the database-specific artifacts, and to use more comprehensive data splitting mechanisms for training/testing. In [119], four data splitting mechanisms, including random split, scaffold split, stratified split and time split, have been proposed to facilitate the evaluation of bioactivity prediction, which can be useful in the context of scoring function evaluation. Lastly, developing an interpretable model is always preferable in machine learning studies. An example is the hierarchical feature-representation learning that attempts to find a specific feature hierarchy/patch for pattern recognition. More advanced learning techniques will be needed in near future to achieve such goals.

Key Points
  • The paper reviews the free energy-based simulations and machine learning-based scoring functions for computationally predicting protein–ligand binding affinities.

  • A number of thermodynamic cycles are summarized in this review for the affinity prediction using free energy-based simulations.

  • Classified according to different feature representations, the machine learning-based scoring functions (including deep-learning models) are reviewed.

  • Case studies of the two groups of prediction methods and the comparison between them are provided for a better understanding of their prediction logics.

  • Strengths and weaknesses of the two classes of methods are discussed.

Acknowledgments

D.D.W. is conducting computational predictions of mutation-induced affinity change and drug resistance and developing bioinformatics and health-informatics tools. She is an assistant professor in the School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology.

M.Z. is working on molecular interpretations of mutation-induced drug resistance in EGFR-mutated non-small-cell lung cancer patients. He is a PhD candidate in department of electrical engineering, City University of Hong Kong.

H.Y.’s research interests are focused on biomolecular pattern recognition, biomedical image processing and high-performance computing. He is chair professor of Computer Engineering and Wong Chung Hong Professor of Data Engineering and is a former dean of the college of science and engineering, at City University of Hong Kong.

Funding

This work was support by the Hong Kong Research Grants Council (Project CityU 11200818) and the Hong Kong Institute for Data Science.

Debby D. Wang is conducting computational predictions of mutation-induced affinity change and drug resistance, and developing bioinformatics and healthinformatics tools. She is an Assistant Professor in the School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology.

Mengxu Zhu is working on molecular interpretations of mutation-induced drug resistance in EGFR-mutated non-small-cell lung cancer patients. He is a Ph.D. candidate in Department of Electrical Engineering, City University of Hong Kong.

Hong Yan’s research interests are focused on biomolecular pattern recognition, biomedical image processing and high-performance computing. He is Chair Professor of Computer Engineering and Wong Chung Hong Professor of Data Engineering, and is a former Dean of the College of Science and Engineering, at City University of Hong Kong.

REFERENCES

1.

Qvist
J
.
Comment on “transferability of ion models”
.
J Phys Chem
1994
;
98
(
33
):
8253
5
.

2.

Aldeghi
M
,
Heifetz
A
,
Bodkin
MJ
, et al.
Accurate calculation of the absolute free energy of binding for drug molecules
.
Chem Sci
2016
;
7
(
1
):
207
18
.

3.

Qvist
J
,
Medina
C
,
Samuelsson
J-E
.
A new method for predicting binding affinity in computer-aided drug design
.
Protein Eng Des Sel
1994
;
7
(
3
):
385
91
.

4.

Artemenko
N
.
Distance dependent scoring function for describing protein–ligand intermolecular interactions
.
J Chem Inf Model
2008
;
48
(
3
):
569
74
.

5.

Ballester
PJ
,
Mitchell
JBO
.
A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking
.
Bioinformatics
2010
;
26
(
9
):
1169
75
.

6.

Barducci
A
,
Bonomi
M
,
Parrinello
M
.
Metadynamics
.
Wiley Interdiscip Rev Comput Mol Sci
2011
;
1
(
5
):
826
43
.

7.

Bartels
C
,
Karplus
M
.
Multidimensional adaptive umbrella sampling: applications to main chain and side chain peptide conformations
.
J Comput Chem
1997
;
18
(
12
):
1450
62
.

8.

Bash
PA
,
Field
MJ
,
Karplus
M
.
Free energy perturbation method for chemical reactions in the condensed phase: a dynamic approach based on a combined quantum and molecular mechanics potential
.
J Am Chem Soc
1987
;
109
(
26
):
8092
4
.

9.

Bennett
CH
.
The thermodynamics of computation-a review
.
Internat J Theoret Phys
1982
;
21
(
12
):
905
40
.

10.

Beutler
TC
,
Mark
AE
, van Schaik, et al.
Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations
.
Chem Phys Lett
1994
;
222
(
6
):
529
39
.

11.

Bitencourt-Ferreira
G
, de, Azevedo WF.
Development of a machine-learning model to predict gibbs free energy of binding for protein–ligand complexes
.
Biophys Chem
2018
;
240
:
63
9
.

12.

Boresch
S
,
Tettinger
F
,
Leitgeb
M
, et al.
Absolute binding free energies: a quantitative approach for their calculation
.
J Phys Chem B
2003
;
107
(
35
):
9535
51
.

13.

Brandsdal
BO
,
Österberg
F
,
Almlöf
M
, et al.
Free energy calculations and ligand binding
.
Adv Protein Chem
2003
;
66
:
123
58
Elsevier
.

14.

Carlson
HA
,
Smith
RD
,
Damm-Ganamet
KL
, et al.
CSAR 2014: a benchmark exercise using unpublished data from pharma
.
J Chem Inf Model
2016
;
56
(
6
):
1063
77
.

15.

Case
DA
,
Betz
RM
,
Cerutti
DS
, et al.
Amber 2016
, vol.
810
.
San Francisco
:
University of California
,
2016
.

16.

Chen
X
,
Liu
M
,
Gilson
MK
.
BindingDB: a web-accessible molecular recognition database
.
Comb Chem High Throughput Screen
2001
;
4
(
8
):
719
25
.

17.

Cole
DJ
, de, Vaca IC,
Jorgensen
WL
.
Computation of protein–ligand binding free energies using quantum mechanical bespoke force fields
.
Medchemcomm
2019
;
10
(
7
):
1116
20
.

18.

Da
C
,
Kireev
D
.
Structural protein–ligand interaction fingerprints (SPLIF) for structure-based virtual screening: method and benchmark study
.
J Chem Inf Model
2014
;
54
(
9
):
2555
61
.

19.

Damian
L
.
Isothermal titration calorimetry for studying protein–ligand interactions
. In:
Protein–Ligand Interactions
.
Totowa, NJ: Humana Press
,
2013
,
103
18
.

20.

Das
S
,
Krein
MP
,
Breneman
CM
.
Binding affinity prediction with property-encoded shape distribution signatures
.
J Chem Inf Model
2010
;
50
(
2
):
298
308
.

21.

Davis
MI
,
Hunt
JP
,
Herrgard
S
, et al.
Comprehensive analysis of kinase inhibitor selectivity
.
Nat Biotechnol
2011
;
29
(
11
):
1046
.

22.

de Ávila
MB
,
Xavier
MM
,
Pintro
VO
, et al.
Supervised machine learning techniques to predict binding affinity. A study for cyclin-dependent kinase 2
.
Biochem Biophys Res Commun
2017
;
494
(
1–2
):
305
10
.

23.

Deng
W
,
Breneman
C
,
Embrechts
MJ
.
Predicting protein–ligand binding affinities using novel geometrical descriptors and machine-learning methods
.
J Chem Inf Comput Sci
2004
;
44
(
2
):
699
703
.

24.

Deng
Y
,
Roux
B
.
Calculation of standard binding free energies: aromatic molecules in the T4 lysozyme l99a mutant
.
J Chem Theory Comput
2006
;
2
(
5
):
1255
73
.

25.

Dunbar
JB, Jr
,
Smith
RD
,
Damm-Ganamet
KL
, et al.
CSAR data set release 2012: ligands, affinities, complexes, and docking decoys
.
J Chem Inf Model
2013
;
53
(
8
):
1842
52
.

26.

Dunbar
JB, Jr
,
Smith
RD
,
Yang
C-Y
, et al.
CSAR benchmark exercise of 2010: selection of the protein–ligand complexes
.
J Chem Inf Model
2011
;
51
(
9
):
2036
46
.

27.

Eldridge
MD
,
Murray
CW
,
Auton
TR
, et al.
Empirical scoring functions: I. the development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes
.
J Comput Aided Mol Des
1997
;
11
(
5
):
425
45
.

28.

Eriksson
AE
,
Baase
WA
,
Wozniak
JA
, et al.
A cavity-containing mutant of T4 lysozyme is stabilized by buried benzene
.
Nature
1992
;
355
(
6358
):
371
3
.

29.

Gerber
PR
,
Mark
AE
,
van Gunsteren
WF
.
An approximate but efficient method to calculate free energy trends by computer simulation: application to dihydrofolate reductase-inhibitor complexes
.
J Comput Aided Mol Des
1993
;
7
(
3
):
305
23
.

30.

Gilson
MK
,
Liu
T
,
Baitaluk
M
, et al.
BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology
.
Nucleic Acids Res
2016
;
44
(
D1
):
D1045
53
.

31.

Gohlke
H
,
Hendlich
M
,
Klebe
G
.
Knowledge-based scoring function to predict protein–ligand interactions
.
J Mol Biol
2000
;
295
(
2
):
337
56
.

32.

Gohlke
H
,
Klebe
G
.
Statistical potentials and scoring functions applied to protein–ligand binding
.
Curr Opin Struct Biol
2001
;
11
(
2
):
231
5
.

33.

Gomes
J
,
Ramsundar
B
,
Feinberg
EN
, et al.
Atomic convolutional networks for predicting protein-ligand binding affinity
.
arXiv preprint arXiv:1703.10603
,
2017
. https://arxiv.org/abs/1703.10603.

34.

Gramatica
P
.
Principles of QSAR models validation: internal and external
.
QSAR Comb Sci
2007
;
26
(
5
):
694
701
.

35.

Terán
H G-d
,
Qvist
J
.
Linear interaction energy: method and applications in drug design
. In:
Computational Drug Discovery and Design
.
New York, NY: Springer
,
2012
,
305
23
.

36.

Hansen
HS
,
Hünenberger
PH
.
Using the local elevation method to construct optimized umbrella sampling potentials: calculation of the relative free energies and interconversion barriers of glucopyranose ring conformers in water
.
J Comput Chem
2010
;
31
(
1
):
1
23
.

37.

Harder
E
,
Damm
W
,
Maple
J
, et al.
OPLS3: a force field providing broad coverage of drug-like small molecules and proteins
.
J Chem Theory Comput
2016
;
12
(
1
):
281
96
.

38.

Hartshorn
MJ
,
Verdonk
ML
,
Chessari
G
, et al.
Diverse, high-quality test set for the validation of protein–ligand docking performance
.
J Med Chem
2007
;
50
(
4
):
726
41
.

39.

Hermans
J
,
Wang
LU
.
Inclusion of loss of translational and rotational freedom in theoretical estimates of free energies of binding. Application to a complex of benzene and mutant T4 lysozyme
.
J Am Chem Soc
1997
;
119
(
11
):
2707
14
.

40.

Hu
L
,
Benson
ML
,
Smith
RD
, et al.
Binding moad (mother of all databases)
.
Proteins
2005
;
60
(
3
):
333
40
.

41.

Huang
N
,
Kalyanaraman
C
,
Bernacki
K
, et al.
Molecular mechanics methods for predicting protein–ligand binding
.
Phys Chem Chem Phys
2006
;
8
(
44
):
5166
77
.

42.

Huang
S-Y
,
Grinter
SZ
,
Zou
X
.
Scoring functions and their evaluation methods for protein–ligand docking: recent advances and future directions
.
Phys Chem Chem Phys
2010
;
12
(
40
):
12899
908
.

43.

Huo
S
,
Wang
J
,
Cieplak
P
, et al.
Molecular dynamics and free energy analyses of cathepsin D-inhibitor interactions: insight into structure-based ligand design
.
J Med Chem
2002
;
45
(
7
):
1412
9
.

44.

Iandola
FN
,
Han
S
,
Moskewicz
MW
, et al.
SqueezeNet: Alexnet-level accuracy with |$50\times $| fewer parameters and <0.5MB model size
.
arXiv preprint arXiv:1602.07360
,
2016
. https://arxiv.org/abs/1602.07360.

45.

Ishikita
H
,
Warshel
A
.
Predicting drug-resistant mutations of HIV protease
.
Angew Chem Int Ed
2008
;
47
(
4
):
697
700
.

46.

Izrailev
S
,
Stepaniants
S
,
Balsera
M
, et al.
Molecular dynamics study of unbinding of the avidin-biotin complex
.
Biophys J
1997
;
72
(
4
):
1568
81
.

47.

Jaquillard
L
,
Saab
F
,
Schoentgen
F
, et al.
Improved accuracy of low affinity protein–ligand equilibrium dissociation constants directly determined by electrospray ionization mass spectrometry
.
J Am Soc Mass Spectrom
2012
;
23
(
5
):
908
22
.

48.

Jiang
W
,
Roux
B
.
Free energy perturbation Hamiltonian replica-exchange molecular dynamics (FEP/H-REMD) for absolute ligand binding free energy calculations
.
J Chem Theory Comput
2010
;
6
(
9
):
2559
65
.

49.

Jiao
D
,
Golubkov
PA
,
Darden
TA
, et al.
Calculation of protein–ligand binding free energy by using a polarizable potential
.
Proc Natl Acad Sci
2008
;
105
(
17
):
6290
5
.

50.

Jiménez
J
,
Skalic
M
,
Martinez-Rosell
G
, et al.
KDEEP: protein–ligand absolute binding affinity prediction via 3D-convolutional neural networks
.
J Chem Inf Model
2018
;
58
(
2
):
287
96
.

51.

Jorgensen
WL
,
Ravimohan
C
.
Monte Carlo simulation of differences in free energies of hydration
.
J Chem Phys
1985
;
83
(
6
):
3050
4
.

52.

Jorgensen
WL
,
Thomas
LL
.
Perspective on free-energy perturbation calculations for chemical equilibria
.
J Chem Theory Comput
2008
;
4
(
6
):
869
76
.

53.

Jorgensen
WL
,
Tirado-Rives
J
.
The OPLS force field for proteins. Energy minimizations for crystals of cyclic peptides and crambin
.
J Am Chem Soc
1988
;
110
(
6
):
1657
66
.

54.

Kästner
J
.
Umbrella sampling
.
Wiley Interdiscip Rev Comput Mol Sci
2011
;
1
(
6
):
932
42
.

55.

Kästner
J
,
Thiel
W
.
Bridging the gap between thermodynamic integration and umbrella sampling provides a novel analysis method: umbrella integration
.
J Chem Phys
2005
;
123
(
14
):
144104
.

56.

Kästner
J
,
Thiel
W
.
Analysis of the statistical error in umbrella sampling simulations by umbrella integration
.
J Chem Phys
2006
;
124
(
23
):
234106
.

57.

Knight
JL
,
Brooks
CL, III
.
|$\lambda $|-dynamics free energy simulation methods
.
J Comput Chem
2009
;
30
(
11
):
1692
700
.

58.

Kuhn
B
,
Gerber
P
,
Schulz-Gasch
T
, et al.
Validation and use of the MM-PBSA approach for drug discovery
.
J Med Chem
2005
;
48
(
12
):
4040
8
.

59.

Kumar
S
,
Rosenberg
JM
,
Bouzida
D
, et al.
The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method
.
J Comput Chem
1992
;
13
(
8
):
1011
21
.

60.

Kundu
I
,
Paul
G
,
Banerjee
R
.
A machine learning approach towards the prediction of protein–ligand binding affinity based on fundamental molecular properties
.
RSC Adv
2018
;
8
(
22
):
12127
37
.

61.

Lapinsh
M
,
Prusis
P
,
Gutcaits
A
, et al.
Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactions
.
Biochim Biophys Acta
2001
;
1525
(
1–2
):
180
90
.

62.

Lee
FS
,
Chu
Z-T
,
Bolger
MB
, et al.
Calculations of antibody-antigen interactions: microscopic and semi-microscopic evaluation of the free energies of binding of phosphorylcholine analogs to McPC603
.
Protein Eng Des Sel
1992
;
5
(
3
):
215
28
.

63.

Lemkul
JA
,
Bevan
DR
.
Assessing the stability of alzheimer’s amyloid protofibrils using molecular dynamics
.
J Phys Chem B
2010
;
114
(
4
):
1652
60
.

64.

Eelke
B
,
Lenselink
JL
,
Forti
AF
, et al.
Predicting binding affinities for GPCR ligands using free-energy perturbation
.
ACS Omega
2016
;
1
(
2
):
293
304
.

65.

Li
G-B
,
Yang
L-L
,
Wang
W-J
, et al.
ID-score: a new empirical scoring function based on a comprehensive set of descriptors related to protein–ligand interactions
.
J Chem Inf Model
2013
;
53
(
3
):
592
600
.

66.

Li
L
,
Wang
B
,
Meroueh
SO
.
Support vector regression scoring of receptor–ligand complexes for rank-ordering and virtual screening of chemical libraries
.
J Chem Inf Model
2011
;
51
(
9
):
2132
8
.

67.

Li
S
,
Xi
L
,
Wang
C
, et al.
A novel method for protein–ligand binding affinity prediction and the related descriptors exploration
.
J Comput Chem
2009
;
30
(
6
):
900
9
.

68.

Liu
P
,
Kim
B
,
Friesner
RA
, et al.
Replica exchange with solute tempering: a method for sampling biological systems in explicit water
.
Proc Natl Acad Sci
2005
;
102
(
39
):
13749
.

69.

Liu
Q
,
Kwoh
CK
,
Li
J
.
Binding affinity prediction for protein–ligand complexes based on |$\beta $| contacts and b factor
.
J Chem Inf Model
2013
;
53
(
11
):
3076
85
.

70.

Liu
X
,
Peng
L
,
Zhou
Y
, et al.
Computational alanine scanning with interaction entropy for protein–ligand binding free energies
.
J Chem Theory Comput
2018
;
14
(
3
):
1772
80
.

71.

Lu
N
,
Kofke
DA
,
Woolf
TB
.
Improving the efficiency and reliability of free energy perturbation calculations using overlap sampling methods
.
J Comput Chem
2004
;
25
(
1
):
28
40
.

72.

Ma
L
,
Wang
DD
,
Huang
Y
, et al.
EGFR Mutant Structural Database: computationally predicted 3D structures and the corresponding binding free energies with gefitinib and erlotinib
.
BMC Bioinformatics
2015
;
16
(
1
):
85
.

73.

Maier
JA
,
Martinez
C
,
Kasavajhala
K
, et al.
ff14sb: improving the accuracy of protein side chain and backbone parameters from ff99sb
.
J Chem Theory Comput
2015
;
11
(
8
):
3696
713
.

74.

Mezei
M
.
Adaptive umbrella sampling: self-consistent determination of the non-boltzmann bias
.
J Comput Phys
1987
;
68
(
1
):
237
48
.

75.

Miao
Y
,
Huang
Y-m M
,
Walker
RC
, et al.
Ligand binding pathways and conformational transitions of the HIV protease
.
Biochemistry
2018
;
57
(
9
):
1533
41
.

76.

Miranda
WE
,
Noskov
SY
,
Valiente
PA
.
Improving the LIE method for binding free energy calculations of protein–ligand complexes
.
J Chem Inf Model
2015
;
55
(
9
):
1867
77
.

77.

Mitchell
MJ
,
McCammon
JA
.
Free energy difference calculations by thermodynamic integration: difficulties in obtaining a precise value
.
J Comput Chem
1991
;
12
(
2
):
271
5
.

78.

Miyamoto
S
,
Kollman
PA
.
Absolute and relative binding free energy calculations of the interaction of biotin and its analogs with streptavidin using molecular dynamics/free energy perturbation approaches
.
Proteins
1993
;
16
(
3
):
226
45
.

79.

Muegge
I
,
Tao
H
,
Warshel
A
.
A fast estimate of electrostatic group contributions to the free energy of protein-inhibitor binding
.
Protein Eng
1997
;
10
(
12
):
1363
72
.

80.

Murcko
MA
.
Computational methods to predict binding free energy in ligand-receptor complexes
.
J Med Chem
1995
;
38
(
26
):
4953
67
.

81.

Ngo
ST
,
Vu
KB
,
Bui
LM
, et al.
Effective estimation of ligand-binding affinity using biased sampling method
.
ACS Omega
2019
;
4
(
2
):
3887
93
.

82.

Ouyang
X
,
Handoko
SD
,
Kwoh
CK
.
CScore: a simple yet effective scoring function for protein–ligand binding affinity prediction using modified CMAC learning architecture
.
J Bioinform Comput Biol
2011
;
9
(
supp01
):
1
14
.

83.

Öztürk
H
,
Özgür
A
,
Ozkirimli
E
.
DeepDTA: deep drug–target binding affinity prediction
.
Bioinformatics
2018
;
34
(
17
):
i821
9
.

84.

Pearlman
DA
.
Evaluating the molecular mechanics Poisson–Boltzmann surface area free energy method using a congeneric series of ligands to p38 map kinase
.
J Med Chem
2005
;
48
(
24
):
7796
807
.

85.

Pérez
A
,
Martínez-Rosell
G
, De, Fabritiis G.
Simulations meet machine learning in structural biology
.
Curr Opin Struct Biol
2018
;
49
:
139
44
.

86.

Pires
DEV
,
Blundell
TL
,
Ascher
DB
.
Platinum: a database of experimentally measured effects of mutations on structurally defined protein–ligand complexes
.
Nucleic Acids Res
2015
;
43
(
D1
):
D387
91
.

87.

Rastelli
G
,
Del Rio
A
,
Degliesposti
G
, et al.
Fast and accurate predictions of binding free energies using MM-PBSA and MM-GBSA
.
J Comput Chem
2010
;
31
(
4
):
797
810
.

88.

Reinard
T
,
Jacobsen
H-J
.
An inexpensive small volume equilibrium dialysis system for protein–ligand binding assays
.
Anal Biochem
1989
;
176
(
1
):
157
60
.

89.

Rogers
D
,
Hahn
M
.
Extended-connectivity fingerprints
.
J Chem Inf Model
2010
;
50
(
5
):
742
54
.

90.

Sham
YY
,
Chu
ZT
,
Tao
H
, et al.
Examining methods for calculations of binding free energies: LRA, LIE, PDLD-LRA, and PDLD/S-LRA calculations of ligands binding to an HIV protease
.
Proteins
2000
;
39
(
4
):
393
407
.

91.

Singh
N
,
Warshel
A
.
Absolute binding free energy calculations: on the accuracy of computational scoring of protein–ligand interactions
.
Proteins
2010
;
78
(
7
):
1705
23
.

92.

Smith
PE
, van, Gunsteren WF.
Predictions of free energy differences from a single simulation of the initial state
.
J Chem Phys
1994
;
100
(
1
):
577
85
.

93.

Smith
RD
,
Damm-Ganamet
KL
,
Dunbar
JB, Jr
, et al.
CSAR benchmark exercise 2013: evaluation of results from a combined computational protein design, docking, and scoring/ranking challenge
.
J Chem Inf Model
2016
;
56
(
6
):
1022
31
.

94.

Smith
RH
,
Jorgensen
WL
,
Tirado-Rives
J
, et al.
Prediction of binding affinities for tibo inhibitors of HIV-1 reverse transcriptase using Monte Carlo simulations in a linear response method
.
J Med Chem
1998
;
41
(
26
):
5272
86
.

95.

Sotriffer
CA
,
Sanschagrin
P
,
Matter
H
, et al.
SFCscore: scoring functions for affinity prediction of protein–ligand complexes
.
Proteins
2008
;
73
(
2
):
395
419
.

96.

Stafford
WF
.
Protein–protein and ligand–protein interactions studied by analytical ultracentrifugation
. In:
Protein Structure, Stability, and Interactions
.
Humana Press
,
2009
,
83
113
.

97.

Steinbrecher
T
,
Case
DA
,
Labahn
A
.
A multistep approach to structure-based drug design: studying ligand binding at the human neutrophil elastase
.
J Med Chem
2006
;
49
(
6
):
1837
44
.

98.

Stepniewska-Dziubinska
MM
,
Zielenkiewicz
P
,
Siedlecki
P
.
Development and evaluation of a deep learning model for protein–ligand binding affinity prediction
.
Bioinformatics
2018
;
34
(
21
):
3666
74
.

99.

Suenaga
A
,
Okimoto
N
,
Hirano
Y
, et al.
An efficient computational method for calculating ligand binding affinities
.
PLoS One
2012
;
7
(
8
):e42846.

100.

Sugita
Y
,
Okamoto
Y
.
Replica-exchange molecular dynamics method for protein folding
.
Chem Phys Lett
1999
;
314
(
1–2
):
141
51
.

101.

Tang
J
,
Szwajda
A
,
Shakyawar
S
, et al.
Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis
.
J Chem Inf Model
2014
;
54
(
3
):
735
43
.

102.

Therneau
T
,
Atkinson
B
,
Ripley
B
, et al.
Package ‘rpart’
.
cran.ma.ic.ac.uk/web/packages/rpart/rpart.pdf (20 April 2016, date last accessed)
,
2015
.

103.

Torrie
GM
,
Valleau
JP
.
Nonphysical sampling distributions in Monte Carlo free-energy estimation: umbrella sampling
.
J Comput Phys
1977
;
23
(
2
):
187
99
.

104.

Trott
O
,
Olson
AJ
.
AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading
.
J Comput Chem
2010
;
31
(
2
):
455
61
.

105.

Viegas
A
,
Manso
J
,
Nobrega
FL
, et al.
Saturation-transfer difference (STD) NMR: a simple and fast method for ligand screening and characterization of protein binding
.
J Chem Educ
2011
;
88
(
7
):
990
4
.

106.

Wang
DD
,
Lee
VHF
,
Zhu
G
, et al.
Selectivity profile of afatinib for EGFR-mutated non-small-cell lung cancer
.
Mol Biosyst
2016
;
12
(
5
):
1552
63
.

107.

Wang
DD
,
Ou-Yang
L
,
Zhu
M
, et al.
Predicting the impacts of mutations on protein–ligand binding affinity based on molecular dynamics simulations and machine learning methods
.
Comput Struct Biotechnol J
2020
;
18
:
439
54
.

108.

Wang
DD
,
Zhou
W
,
Yan
H
, et al.
Personalized prediction of EGFR mutation-induced drug resistance in lung cancer
.
Sci Rep
2013
;
3
(
1
):
1
8
.

109.

Wang
J-C
,
Lin
J-H
.
Scoring functions for prediction of protein–ligand interactions
.
Curr Pharm Des
2013
;
19
(
12
):
2174
82
.

110.

Wang
J
,
Wolf
RM
,
Caldwell
JW
, et al.
Development and testing of a general amber force field
.
J Comput Chem
2004
;
25
(
9
):
1157
74
.

111.

Wang
L
,
Chambers
J
,
Abel
R
.
Protein–ligand binding free energy calculations with FEP+
. In:
Biomolecular Simulations
.
New York, NY: Humana
,
2019
,
201
32
.

112.

Wang
L
,
Friesner
RA
,
Berne
BJ
.
Replica exchange with solute scaling: a more efficient version of replica exchange with solute tempering (rest2)
.
J Phys Chem B
2011
;
115
(
30
):
9431
8
.

113.

Wang
L
,
Wu
Y
,
Deng
Y
, et al.
Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field
.
J Am Chem Soc
2015
;
137
(
7
):
2695
703
.

114.

Wang
R
,
Fang
X
,
Lu
Y
, et al.
The PDBbind database: collection of binding affinities for protein–ligand complexes with known three-dimensional structures
.
J Med Chem
2004
;
47
(
12
):
2977
80
.

115.

Wang
R
,
Lai
L
,
Wang
S
.
Further development and validation of empirical scoring functions for structure-based binding affinity prediction
.
J Comput Aided Mol Des
2002
;
16
(
1
):
11
26
.

116.

Wang
W
,
Wang
J
,
Kollman
PA
.
What determines the van der Waals coefficient |$\beta $| in the lie (linear interaction energy) method to estimate binding free energies using molecular dynamics simulations?
Proteins
1999
;
34
(
3
):
395
402
.

117.

Warshel
A
,
Kato
M
,
Pisliakov
AV
.
Polarizable force fields: history, test cases, and prospects
.
J Chem Theory Comput
2007
;
3
(
6
):
2034
45
.

118.

Wójcikowski
M
,
Kukiełka
M
,
Stepniewska-Dziubinska
MM
, et al.
Development of a protein–ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions
.
Bioinformatics
2019
;
35
(
8
):
1334
41
.

119.

Wu
Z
,
Ramsundar
B
,
Feinberg
EN
, et al.
Moleculenet: a benchmark for molecular machine learning
.
Chem Sci
2018
;
9
(
2
):
513
30
.

120.

Zacharias
M
,
Straatsma
TP
,
McCammon
JA
.
Separation-shifted scaling, a new scaling method for Lennard–Jones interactions in thermodynamic integration
.
J Chem Phys
1994
;
100
(
12
):
9025
31
.

121.

Zhang
H
,
Liao
L
,
Saravanan
KM
, et al.
DeepBindRG: a deep learning based method for estimating effective protein–ligand affinity
.
PeerJ
2019
;
7
:
e7362
.

122.

Zheng
L
,
Yang
W
.
Practically efficient and robust free energy calculations: double-integration orthogonal space tempering
.
J Chem Theory Comput
2012
;
8
(
3
):
810
23
.

123.

Zhou
R
,
Friesner
RA
,
Ghosh
A
, et al.
New linear interaction method for binding affinity calculations using a continuum solvent model
.
J Phys Chem B
2001
;
105
(
42
):
10388
97
.

124.

Zilian
D
,
Sotriffer
CA
.
SFCscore|$^RF$|⁠: a random forest-based scoring function for improved affinity prediction of protein–ligand complexes
.
J Chem Inf Model
2013
;
53
(
8
):
1923
33
.

125.

Zwanzig
RW
.
High-temperature equation of state by a perturbation method. I. Nonpolar gases
.
J Chem Phys
1954
;
22
(
8
):
1420
6
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)