Predicting drug–target binding affinity through molecule representation block based on multi-head attention and skip connection

Summary of parameter settings for MRBDTA

Parameters	Davis	KIBA
Max length for drugs	85	100
Max length for proteins	1200	1000
Embedding size	128	128
The number of Trans block for drugs	1	1
The number of Trans block for proteins	1	1
The number of heads in multi-head attention	4	4
Feed-forward layer	512	512
Hidden size in FNNs	2048 512 1	2048 512 1
Batch size	256	1024
Epoch	300	600
Dropout	0.1	0.1
Optimizer	Adam	Adam
Learning rate	0.001	0.001
Activation function	ReLU	ReLU

Parameters	Davis	KIBA
Max length for drugs	85	100
Max length for proteins	1200	1000
Embedding size	128	128
The number of Trans block for drugs	1	1
The number of Trans block for proteins	1	1
The number of heads in multi-head attention	4	4
Feed-forward layer	512	512
Hidden size in FNNs	2048 512 1	2048 512 1
Batch size	256	1024
Epoch	300	600
Dropout	0.1	0.1
Optimizer	Adam	Adam
Learning rate	0.001	0.001
Activation function	ReLU	ReLU

Table 1

Summary of parameter settings for MRBDTA

Parameters	Davis	KIBA
Max length for drugs	85	100
Max length for proteins	1200	1000
Embedding size	128	128
The number of Trans block for drugs	1	1
The number of Trans block for proteins	1	1
The number of heads in multi-head attention	4	4
Feed-forward layer	512	512
Hidden size in FNNs	2048 512 1	2048 512 1
Batch size	256	1024
Epoch	300	600
Dropout	0.1	0.1
Optimizer	Adam	Adam
Learning rate	0.001	0.001
Activation function	ReLU	ReLU

Parameters	Davis	KIBA
Max length for drugs	85	100
Max length for proteins	1200	1000
Embedding size	128	128
The number of Trans block for drugs	1	1
The number of Trans block for proteins	1	1
The number of heads in multi-head attention	4	4
Feed-forward layer	512	512
Hidden size in FNNs	2048 512 1	2048 512 1
Batch size	256	1024
Epoch	300	600
Dropout	0.1	0.1
Optimizer	Adam	Adam
Learning rate	0.001	0.001
Activation function	ReLU	ReLU

In our study, MSE, CI and $r_{m}^{2}$ are utilized to evaluate the performance of various computational models. In regression prediction tasks, MSE is a common metric of measuring the deviation between true values and predicted values, and the smaller MSE indicates the higher prediction accuracy of a model. CI is proposed in the literature [44] and can be applied to measure the probability of the concordance between truth values and predicted values. The larger CI manifests the better prediction performance of a model. It is well known that $r_{m}^{2}$ has been extensively applied in quantitative structure–activity relationship (QASR) models and can reflect the external predictive potential of a model [45]. An acceptable model should have an $r_{m}^{2}$ value greater than 0.5, and the larger $r_{m}^{2}$ reflects the better generalization performance of a model.

As aforementioned, the training set was equally divided into five subsets in the 5-fold cross-validation experiment. Then, one subset was left out in turn and we trained MRBDTA based on other four subsets with the above-obtained optimal parameters. Next, we utilized the trained MRBDTA to predict binding affinities in the test set. Based on the predicted and true affinities of drug–target pairs in the test set, we could calculate the value of MSE, CI and $r_{m}^{2}$ ⁠, respectively. The above experiment process was repeated five times because each of five subsets in the training set was left out in turn. Table 2 shows final results on the test set of two benchmark datasets. Finally, we were able to get the average value and standard deviation (SD) of MSE, CI and $r_{m}^{2}$ ⁠. As displayed in Table 3, the average MSE, CI and $r_{m}^{2}$ of MRBDTA on the test set in the Davis dataset are 0.216, 0.901 and 0.716, respectively. Besides, the SD of MSE, CI and $r_{m}^{2}$ is 0.006, 0.004 and 0.008, respectively. Analogously, in Table 4, we have given the average MSE (0.146), CI (0.892) and $r_{m}^{2}$ (0.778) of MRBDTA on the test set in the KIBA dataset. Besides, the SD of MSE, CI and $r_{m}^{2}$ is 0.001, 0.002 and 0.005, respectively. Compared with 11 state-of-the-art computational models (KronRLS [22], SimBoost [23], DeepDTA [24], MT-DTI [29], WideDTA [25], DeepCPI [26], DeepCDA [30], GraphDTA [31], DeepGS [27], MATT_DTI [32] and DeepFusionDTA [28]), MRBDTA achieves almost the best performance in both Davis and KIBA datasets (see Tables 3 and 4). Concretely, in the Davis dataset, the MSE, CI and $r_{m}^{2}$ of MRBDTA are the best. In the KIBA dataset, the CI and $r_{m}^{2}$ of MRBDTA also gain the best results, and the MSE of MRBDTA (0.146) is second only to the MSE of GraphDTA (0.139) [31]. In addition, the low SD of MSE, CI and $r_{m}^{2}$ in both Davis and KIBA datasets reflect the strong stability of MRBDTA (See Table 3 and Table 4).

Table 2

Results predicted by MRBDTA on test set of Davis and KIBA datasets for five times

Time	CI (SD)	MSE (SD)	r_m² (SD)
On Davis dataset
1	0.894	0.219	0.724
2	0.906	0.225	0.707
3	0.899	0.218	0.705
4	0.904	0.213	0.721
5	0.900	0.207	0.723
Average of five times	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)
On KIBA dataset
1	0.889	0.146	0.788
2	0.891	0.147	0.776
3	0.894	0.145	0.775
4	0.895	0.144	0.778
5	0.890	0.147	0.773
Average of five times	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

Time	CI (SD)	MSE (SD)	r_m² (SD)
On Davis dataset
1	0.894	0.219	0.724
2	0.906	0.225	0.707
3	0.899	0.218	0.705
4	0.904	0.213	0.721
5	0.900	0.207	0.723
Average of five times	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)
On KIBA dataset
1	0.889	0.146	0.788
2	0.891	0.147	0.776
3	0.894	0.145	0.775
4	0.895	0.144	0.778
5	0.890	0.147	0.773
Average of five times	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

Table 2

Results predicted by MRBDTA on test set of Davis and KIBA datasets for five times

Time	CI (SD)	MSE (SD)	r_m² (SD)
On Davis dataset
1	0.894	0.219	0.724
2	0.906	0.225	0.707
3	0.899	0.218	0.705
4	0.904	0.213	0.721
5	0.900	0.207	0.723
Average of five times	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)
On KIBA dataset
1	0.889	0.146	0.788
2	0.891	0.147	0.776
3	0.894	0.145	0.775
4	0.895	0.144	0.778
5	0.890	0.147	0.773
Average of five times	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

Time	CI (SD)	MSE (SD)	r_m² (SD)
On Davis dataset
1	0.894	0.219	0.724
2	0.906	0.225	0.707
3	0.899	0.218	0.705
4	0.904	0.213	0.721
5	0.900	0.207	0.723
Average of five times	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)
On KIBA dataset
1	0.889	0.146	0.788
2	0.891	0.147	0.776
3	0.894	0.145	0.775
4	0.895	0.144	0.778
5	0.890	0.147	0.773
Average of five times	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

Table 3

Results on test set of Davis dataset based on MRBDTA and existing baseline methods

Method	CI (SD)	MSE (SD)	r_m² (SD)
KronRLS	0.871 (0.0008)	0.379	0.407 (0.005)
SimBoost	0.872 (0.002)	0.282	0.644 (0.006)
DeepDTA	0.878 (0.004)	0.261	0.630 (0.017)
MT-DTI	0.887 (0.003)	0.245	0.665 (0.014)
WideDTA	0.886 (0.003)	0.262	–
DeepCPI	0.867 (−)	0.293	0.607 (−)
DeepCDA	0.891 (0.003)	0.248	0.649 (0.009)
GraphDTA	0.881 (−)	0.245	–
DeepGS	0.882 (−)	0.252	0.686 (−)
MATT_DTI	0.891 (0.002)	0.227	0.683 (0.017)
DeepFusionDTA	0.887 (−)	0.253	–
MRBDTA	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)

Method	CI (SD)	MSE (SD)	r_m² (SD)
KronRLS	0.871 (0.0008)	0.379	0.407 (0.005)
SimBoost	0.872 (0.002)	0.282	0.644 (0.006)
DeepDTA	0.878 (0.004)	0.261	0.630 (0.017)
MT-DTI	0.887 (0.003)	0.245	0.665 (0.014)
WideDTA	0.886 (0.003)	0.262	–
DeepCPI	0.867 (−)	0.293	0.607 (−)
DeepCDA	0.891 (0.003)	0.248	0.649 (0.009)
GraphDTA	0.881 (−)	0.245	–
DeepGS	0.882 (−)	0.252	0.686 (−)
MATT_DTI	0.891 (0.002)	0.227	0.683 (0.017)
DeepFusionDTA	0.887 (−)	0.253	–
MRBDTA	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)

The bold values means the best results among MRBDTA and 11 previous state-of-the-art computational models

Table 3

Results on test set of Davis dataset based on MRBDTA and existing baseline methods

Method	CI (SD)	MSE (SD)	r_m² (SD)
KronRLS	0.871 (0.0008)	0.379	0.407 (0.005)
SimBoost	0.872 (0.002)	0.282	0.644 (0.006)
DeepDTA	0.878 (0.004)	0.261	0.630 (0.017)
MT-DTI	0.887 (0.003)	0.245	0.665 (0.014)
WideDTA	0.886 (0.003)	0.262	–
DeepCPI	0.867 (−)	0.293	0.607 (−)
DeepCDA	0.891 (0.003)	0.248	0.649 (0.009)
GraphDTA	0.881 (−)	0.245	–
DeepGS	0.882 (−)	0.252	0.686 (−)
MATT_DTI	0.891 (0.002)	0.227	0.683 (0.017)
DeepFusionDTA	0.887 (−)	0.253	–
MRBDTA	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)

Method	CI (SD)	MSE (SD)	r_m² (SD)
KronRLS	0.871 (0.0008)	0.379	0.407 (0.005)
SimBoost	0.872 (0.002)	0.282	0.644 (0.006)
DeepDTA	0.878 (0.004)	0.261	0.630 (0.017)
MT-DTI	0.887 (0.003)	0.245	0.665 (0.014)
WideDTA	0.886 (0.003)	0.262	–
DeepCPI	0.867 (−)	0.293	0.607 (−)
DeepCDA	0.891 (0.003)	0.248	0.649 (0.009)
GraphDTA	0.881 (−)	0.245	–
DeepGS	0.882 (−)	0.252	0.686 (−)
MATT_DTI	0.891 (0.002)	0.227	0.683 (0.017)
DeepFusionDTA	0.887 (−)	0.253	–
MRBDTA	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)

The bold values means the best results among MRBDTA and 11 previous state-of-the-art computational models

Table 4

Results on test set of KIBA dataset based on MRBDTA and existing baseline methods

Method	CI (SD)	MSE (SD)	r_m² (SD)
KronRLS	0.782 (0.0009)	0.411	0.342 (0.001)
SimBoost	0.836 (0.001)	0.222	0.629 (0.007)
DeepDTA	0.863 (0.002)	0.194	0.673 (0.009)
MT-DTI	0.882 (0.001)	0.152	0.738 (0.006)
WideDTA	0.875 (0.001)	0.179	–
DeepCPI	0.852 (−)	0.211	0.657 (−)
DeepCDA	0.889 (0.002)	0.176	0.682 (0.008)
GraphDTA	0.891 (−)	0.139	–
DeepGS	0.860 (−)	0.193	0.684 (−)
MATT_DTI	0.889 (0.001)	0.150	0.756 (0.011)
DeepFusionDTA	0.876 (−)	0.176	–
MRBDTA	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

Method	CI (SD)	MSE (SD)	r_m² (SD)
KronRLS	0.782 (0.0009)	0.411	0.342 (0.001)
SimBoost	0.836 (0.001)	0.222	0.629 (0.007)
DeepDTA	0.863 (0.002)	0.194	0.673 (0.009)
MT-DTI	0.882 (0.001)	0.152	0.738 (0.006)
WideDTA	0.875 (0.001)	0.179	–
DeepCPI	0.852 (−)	0.211	0.657 (−)
DeepCDA	0.889 (0.002)	0.176	0.682 (0.008)
GraphDTA	0.891 (−)	0.139	–
DeepGS	0.860 (−)	0.193	0.684 (−)
MATT_DTI	0.889 (0.001)	0.150	0.756 (0.011)
DeepFusionDTA	0.876 (−)	0.176	–
MRBDTA	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

The bold values means the best results among MRBDTA and 11 previous state-of-the-art computational models

Table 4

Results on test set of KIBA dataset based on MRBDTA and existing baseline methods

Method	CI (SD)	MSE (SD)	r_m² (SD)
KronRLS	0.782 (0.0009)	0.411	0.342 (0.001)
SimBoost	0.836 (0.001)	0.222	0.629 (0.007)
DeepDTA	0.863 (0.002)	0.194	0.673 (0.009)
MT-DTI	0.882 (0.001)	0.152	0.738 (0.006)
WideDTA	0.875 (0.001)	0.179	–
DeepCPI	0.852 (−)	0.211	0.657 (−)
DeepCDA	0.889 (0.002)	0.176	0.682 (0.008)
GraphDTA	0.891 (−)	0.139	–
DeepGS	0.860 (−)	0.193	0.684 (−)
MATT_DTI	0.889 (0.001)	0.150	0.756 (0.011)
DeepFusionDTA	0.876 (−)	0.176	–
MRBDTA	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

Method	CI (SD)	MSE (SD)	r_m² (SD)
KronRLS	0.782 (0.0009)	0.411	0.342 (0.001)
SimBoost	0.836 (0.001)	0.222	0.629 (0.007)
DeepDTA	0.863 (0.002)	0.194	0.673 (0.009)
MT-DTI	0.882 (0.001)	0.152	0.738 (0.006)
WideDTA	0.875 (0.001)	0.179	–
DeepCPI	0.852 (−)	0.211	0.657 (−)
DeepCDA	0.889 (0.002)	0.176	0.682 (0.008)
GraphDTA	0.891 (−)	0.139	–
DeepGS	0.860 (−)	0.193	0.684 (−)
MATT_DTI	0.889 (0.001)	0.150	0.756 (0.011)
DeepFusionDTA	0.876 (−)	0.176	–
MRBDTA	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

The bold values means the best results among MRBDTA and 11 previous state-of-the-art computational models

Analysis experiments

The main contributions of MRBDTA lie in developing Trans block to extract molecule features through improving the encoder of transformer and introducing skip connection at encoder level in Trans block. Therefore, in order to verify the effectiveness of Trans block and skip connection, respectively, we implemented analysis experiments by replacing Trans block with single Trans encoder (MRBDTA_STE) and removing skip connection in Trans block (MRBDTA_RSC) based on Davis and KIBA datasets under evaluation metrics of MSE, CI and $r_{m}^{2}$ ⁠. Besides, through comparing MRBDTA_STE and MRBDTA_RSC with MRBDTA, respectively, we have observed that MRBDTA achieved the best performance among the three models (See Table 5, Supplementary Tables 1 and 2 available online at http://bib.oxfordjournals.org/). Further, the results of analysis experiments manifest that Trans block and skip connection can indeed improve the prediction accuracy, stability and reliability of MRBDTA.

The interpretability based on multi-head attention mechanism

Another important contribution of MRBDTA lies in that multi-head attention mechanism could enhance the ability to capture interaction sites between proteins and drugs and hence benefit the biological interpretability of MRBDTA. Therefore, depending on multi-head attention mechanism, we calculated the attention weight value for each amino acid residue of a protein, and the attention weight value means contribution of the amino acid residue to DTIs associated with this protein. According to the attention weight value, we ranked all amino acid residues in a protein. In addition, potential and captured interaction sites were introduced to analyze the reliability of MRBDTA. Usually, in the protein data bank (PDB), potential interaction sites between a drug and a protein are defined by a distance in the range of <5.0 Angstrom between a drug and all amino acid residues of a protein [30]. Additionally, for a drug–protein pair, we selected the amino acid residues with higher attention weight values as captured interaction sites by MRBDTA. Here, the number of selected amino acid residues is equal to the number of potential interaction sites in the drug–protein pair. We also visualized potential and captured interaction sites between proteins and drugs based on the 3D View tool of PDB.

Figure 1 shows the potential and captured interaction sites for two complexes, namely 1N6R and 3AQV. According to PDB, Ras-related protein Rab-5A reacts with GppNHp to form 1N6R, and 3AQV is obtained from the chemical reaction after mixing AMP-activated protein kinase and Dorsomorphin. Concretely, the potential and captured interaction sites marked with red in 1N6R are displayed in Figure 1A and B, respectively. There are 38 amino acid residues viewed as potential interaction sites in 1N6R on the basis of PDB (See Supplementary File 1 available online at http://bib.oxfordjournals.org/). According to the attention weight value, we obtained top 38 amino acid residues with higher attention weight values in 1N6R as captured interaction sites by MRBDTA (See Supplementary File 1 available online at http://bib.oxfordjournals.org/). Here, we regarded these captured interaction sites, which were included in potential interaction sites or near potential interaction sites (the absolute value of the PDB residue number difference between a captured interaction site and a potential interaction site was less than or equal to 10) as correctly captured interaction sites by MRBDTA. As illustrated in Figure 1B, the 21 circled amino acid residues were correctly captured interaction sites in 1N6R (See Supplementary File 1 available online at http://bib.oxfordjournals.org/). Moreover, the potential and captured interaction sites marked with red in 3AQV are shown in Figure 1C and D, respectively. Relying on PDB, there are 22 amino acid residues seen as potential interaction sites in 3AQV (See Supplementary File 1 available online at http://bib.oxfordjournals.org/). Here, we achieved top 22 amino acid residues with higher attention weight values in 3AQV as captured interaction sites by MRBDTA (See Supplementary File 1 available online at http://bib.oxfordjournals.org/). The nine circled amino acid residues were viewed as correctly captured interaction sites in 3AQV (see Figure 1D and Supplementary File 1 available online at http://bib.oxfordjournals.org/).

Table 5

Results on test set of Davis and KIBA datasets based on analysis experiments

Analysis experiments	CI (SD)	MSE (SD)	r_m² (SD)
On Davis dataset
MRBDTA_STE	0.869 (0.006)	0.235 (0.009)	0.683 (0.010)
MRBDTA_RSC	0.877 (0.009)	0.223 (0.007)	0.697 (0.009)
MRBDTA	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)
On KIBA dataset
MRBDTA_STE	0.868 (0.005)	0.173 (0.002)	0.734 (0.007)
MRBDTA_RSC	0.876 (0.004)	0.162 (0.007)	0.762 (0.006)
MRBDTA	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

Analysis experiments	CI (SD)	MSE (SD)	r_m² (SD)
On Davis dataset
MRBDTA_STE	0.869 (0.006)	0.235 (0.009)	0.683 (0.010)
MRBDTA_RSC	0.877 (0.009)	0.223 (0.007)	0.697 (0.009)
MRBDTA	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)
On KIBA dataset
MRBDTA_STE	0.868 (0.005)	0.173 (0.002)	0.734 (0.007)
MRBDTA_RSC	0.876 (0.004)	0.162 (0.007)	0.762 (0.006)
MRBDTA	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

The bold values means the best results among the three models

Table 5

Open in new tab Download slide

Results on test set of Davis and KIBA datasets based on analysis experiments

Analysis experiments	CI (SD)	MSE (SD)	r_m² (SD)
On Davis dataset
MRBDTA_STE	0.869 (0.006)	0.235 (0.009)	0.683 (0.010)
MRBDTA_RSC	0.877 (0.009)	0.223 (0.007)	0.697 (0.009)
MRBDTA	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)
On KIBA dataset
MRBDTA_STE	0.868 (0.005)	0.173 (0.002)	0.734 (0.007)
MRBDTA_RSC	0.876 (0.004)	0.162 (0.007)	0.762 (0.006)
MRBDTA	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

Analysis experiments	CI (SD)	MSE (SD)	r_m² (SD)
On Davis dataset
MRBDTA_STE	0.869 (0.006)	0.235 (0.009)	0.683 (0.010)
MRBDTA_RSC	0.877 (0.009)	0.223 (0.007)	0.697 (0.009)
MRBDTA	0.901 (0.004)	0.216 (0.006)	0.716 (0.008)
On KIBA dataset
MRBDTA_STE	0.868 (0.005)	0.173 (0.002)	0.734 (0.007)
MRBDTA_RSC	0.876 (0.004)	0.162 (0.007)	0.762 (0.006)
MRBDTA	0.892 (0.002)	0.146 (0.001)	0.778 (0.005)

The bold values means the best results among the three models

Figure 1

The visualization of interaction sites between proteins and drugs. The potential and captured interaction sites marked with red in 1N6R are shown in (A) and (B), respectively. As depicted in (A), there are 38 potential interaction sites in 1N6R. In (B), among the 38 captured interaction sites by MRBDTA, the circled 21 captured interaction sites were regarded as correctly captured interaction sites in 1N6R. The potential and captured interaction sites marked with red in 3AQV are exhibited in (C) and (D), respectively. As displayed in (C), there are 22 potential interaction sites in 3AQV. In (D), the circled 9 of the 22 captured interaction sites were considered as correctly captured interaction sites in 3AQV.

All in all, the above two examples indicate that our model can correctly capture part of interaction sites between proteins and drugs, which partly explains the reliable performance of MRBDTA.

Case studies

In recent years, the infection of SARS-CoV-2 has spread rapidly around the world, posing a huge threat to the human life. Due to a severe shortage of therapeutic drugs for patients with SARS-CoV-2, and long cycle for developing a new drug, drug repurposing, a novel strategy for drug discovery, has been utilized to find effective drugs for patients with SARS-CoV-2 from FDA-approved drugs [46, 47]. In this case study, we firstly applied trained MRBDTA to predict binding affinities between 3137 FDA-approved drugs and SARS-CoV-2 replication-related proteins. Secondly, we compared experimentally measured binding affinities between 3C-like proteinase and 185 drugs with those predicted by trained MRBDTA. The purpose of this case study is to provide an application example of MRBDTA in real-life situations and verify reliable prediction performance of MRBDTA in drug design for SARS-CoV-2. Besides, we expect the predicted results of MRBDTA are capable of providing scientists with some ideas in developing novel drugs against SARS-CoV-2 and assist in treating patients with SARS-CoV-2.

The FASTA sequences of SARS-CoV-2 replication-related proteins containing 3C-like proteinase (accession YP_009725301.1), RNA-dependent RNA polymerase (accession YP_009725307.1), helicase (accession YP_009725308.1), 3′-to-5′ exonuclease (accession YP_009725309.1), endoRNAse (accession YP_009725310.1) and 2′-O-ribose methyltransferase (accession YP_009725311.1) have been acquired from National Center for Biotechnology Information (NCBI) database. Relying on literature [32], we obtained SMILES sequences of 3137 FDA-approved drugs. The binding affinities predicted by MRBDTA between 3137 FDA-approved drugs and 6 SARS-CoV-2 replication-related proteins based on KIBA score and K_d in nM can be found in Supplementary File 2 and Supplementary File 3 available online at http://bib.oxfordjournals.org/, respectively. For each SARS-CoV-2 replication-related protein, we first ranked 3137 FDA-approved drugs according to predicted binding affinities. Then, through comparing MRBDTA with MT-DTI [29] and MATT_DTI [32], we observed that among top 50 drugs with better predicted affinities, MRBDTA can predict more antiviral drugs and their rankings are higher (See Tables 6 and 7). Specifically, for six SARS-CoV-2 replication-related proteins, there are respectively 22 and 13 antiviral drugs in top 50 drugs predicted by MRBDTA under two different types of binding affinities, while there are merely 6 and 8 antiviral drugs in top 50 drugs predicted by MT-DTI based on the binding affinity of K_d in nM and MATT_DTI based on the binding affinity of KIBA score, respectively. It should be pointed out that since the codes of MT-DTI and MATT_DTI are unavailable, we only could adopt the prediction results of MT-DTI and MATT_DTI in their studies, respectively. What’s more, there are 14 and 12 antiviral drugs in top 30 drugs predicted by MRBDTA, while there are no antiviral drugs in top 30 drugs predicted MT-DTI and only 3 antiviral drugs are included in top 30 drugs predicted by MATT_DTI. Moreover, the best ranking of antiviral drug predicted by MRBDTA is 2nd, while the best ranking of antiviral drug predicted by MT-DTI and MATT_DTI are 32nd and 21st, respectively. To sum up, these results indicate that MRBDTA shows better performance than MT-DTI and MATT_DTI in this important application.

In order to further verify reliable prediction performance of MRBDTA in drug design for SARS-CoV-2, we first obtained the sequence of 3C-like proteinase, structure files of 185 drugs and experimentally measured binding affinities between 3C-like proteinase and 185 drugs from literature [48, 49]. Then, depending on structure files of the 185 drugs, we utilized the software RDKit [40] to get SMILES sequences of the 185 drugs. Since the type of true binding affinities for drug–target pairs in the Davis dataset is the same as experimentally measured binding affinities of these 185 drug–target pairs, we trained MRBDTA on the basis of the Davis dataset. It is worth nothing that 3C-like proteinase is not included in the training data. The binding affinities predicted by trained MRBDTA between 3C-like proteinase and 185 drugs can be found in Supplementary File 4 available online at http://bib.oxfordjournals.org/. Relying on predicted and experimentally measured binding affinities of the 185 drug–target pairs, we obtained the MSE (1.451) of MRBDTA. Further, the low MSE of MRBDTA on the 185 drug–target pairs manifests that most binding affinities predicted by MRBDTA are satisfactory.

Table 6

For six SARS-CoV-2 replication-related proteins, the antiviral drugs in top 50 drugs with better affinities predicted by MRBDTA and MT-DTI based on K_d in nM

Key proteins in SARS-CoV-2	MRBDTA			MT-DTI
Key proteins in SARS-CoV-2	Antiviral drug	K_d in nM	Rank out of 3137	Antiviral drug	K_d in nM	Rank out of 3411
3C-like proteinase	Saquinavir mesylate	46.20	8
	Danoprevir	59.32	15
	Ritonavir	71.30	30
	Boceprevir	74.41	34
RNA-dependent RNA polymerase	Ritonavir	56.87	2	Grazoprevir	8.69	40
	Danoprevir	188.29	36
	Boceprevir	212.39	44
Helicase	Ritonavir	47.43	17	Remdesivir	6.48	32
	Saquinavir mesylate	50.26	21
	Boceprevir	58.50	30
3′-to-5′ exonuclease	Saquinavir mesylate	53.52	19	Simeprevir	13.40	32
	Danoprevir	58.97	23
	Ritonavir	65.27	30
	Boceprevir	76.09	41
EndoRNAse	Saquinavir mesylate	47.15	11	Efavirenz	34.19	50
	Ritonavir	64.54	29
	Boceprevir	69.04	32
	Danoprevir	71.21	36
2’-O-ribose methyltransferase	Saquinavir mesylate	44.61	8	Remdesivir	134.39	40
	Ritonavir	64.11	27	Dolutegravir	153.73	46
	Boceprevir	70.19	35
	Danoprevir	76.67	41

Key proteins in SARS-CoV-2	MRBDTA			MT-DTI
Key proteins in SARS-CoV-2	Antiviral drug	K_d in nM	Rank out of 3137	Antiviral drug	K_d in nM	Rank out of 3411
3C-like proteinase	Saquinavir mesylate	46.20	8
	Danoprevir	59.32	15
	Ritonavir	71.30	30
	Boceprevir	74.41	34
RNA-dependent RNA polymerase	Ritonavir	56.87	2	Grazoprevir	8.69	40
	Danoprevir	188.29	36
	Boceprevir	212.39	44
Helicase	Ritonavir	47.43	17	Remdesivir	6.48	32
	Saquinavir mesylate	50.26	21
	Boceprevir	58.50	30
3′-to-5′ exonuclease	Saquinavir mesylate	53.52	19	Simeprevir	13.40	32
	Danoprevir	58.97	23
	Ritonavir	65.27	30
	Boceprevir	76.09	41
EndoRNAse	Saquinavir mesylate	47.15	11	Efavirenz	34.19	50
	Ritonavir	64.54	29
	Boceprevir	69.04	32
	Danoprevir	71.21	36
2’-O-ribose methyltransferase	Saquinavir mesylate	44.61	8	Remdesivir	134.39	40
	Ritonavir	64.11	27	Dolutegravir	153.73	46
	Boceprevir	70.19	35
	Danoprevir	76.67	41

Table 6

For six SARS-CoV-2 replication-related proteins, the antiviral drugs in top 50 drugs with better affinities predicted by MRBDTA and MT-DTI based on K_d in nM

Key proteins in SARS-CoV-2	MRBDTA			MT-DTI
Key proteins in SARS-CoV-2	Antiviral drug	K_d in nM	Rank out of 3137	Antiviral drug	K_d in nM	Rank out of 3411
3C-like proteinase	Saquinavir mesylate	46.20	8
	Danoprevir	59.32	15
	Ritonavir	71.30	30
	Boceprevir	74.41	34
RNA-dependent RNA polymerase	Ritonavir	56.87	2	Grazoprevir	8.69	40
	Danoprevir	188.29	36
	Boceprevir	212.39	44
Helicase	Ritonavir	47.43	17	Remdesivir	6.48	32
	Saquinavir mesylate	50.26	21
	Boceprevir	58.50	30
3′-to-5′ exonuclease	Saquinavir mesylate	53.52	19	Simeprevir	13.40	32
	Danoprevir	58.97	23
	Ritonavir	65.27	30
	Boceprevir	76.09	41
EndoRNAse	Saquinavir mesylate	47.15	11	Efavirenz	34.19	50
	Ritonavir	64.54	29
	Boceprevir	69.04	32
	Danoprevir	71.21	36
2’-O-ribose methyltransferase	Saquinavir mesylate	44.61	8	Remdesivir	134.39	40
	Ritonavir	64.11	27	Dolutegravir	153.73	46
	Boceprevir	70.19	35
	Danoprevir	76.67	41

Key proteins in SARS-CoV-2	MRBDTA			MT-DTI
Key proteins in SARS-CoV-2	Antiviral drug	K_d in nM	Rank out of 3137	Antiviral drug	K_d in nM	Rank out of 3411
3C-like proteinase	Saquinavir mesylate	46.20	8
	Danoprevir	59.32	15
	Ritonavir	71.30	30
	Boceprevir	74.41	34
RNA-dependent RNA polymerase	Ritonavir	56.87	2	Grazoprevir	8.69	40
	Danoprevir	188.29	36
	Boceprevir	212.39	44
Helicase	Ritonavir	47.43	17	Remdesivir	6.48	32
	Saquinavir mesylate	50.26	21
	Boceprevir	58.50	30
3′-to-5′ exonuclease	Saquinavir mesylate	53.52	19	Simeprevir	13.40	32
	Danoprevir	58.97	23
	Ritonavir	65.27	30
	Boceprevir	76.09	41
EndoRNAse	Saquinavir mesylate	47.15	11	Efavirenz	34.19	50
	Ritonavir	64.54	29
	Boceprevir	69.04	32
	Danoprevir	71.21	36
2’-O-ribose methyltransferase	Saquinavir mesylate	44.61	8	Remdesivir	134.39	40
	Ritonavir	64.11	27	Dolutegravir	153.73	46
	Boceprevir	70.19	35
	Danoprevir	76.67	41

Discussion

The identification of drug–target binding affinities is a crucial step in drug discovery. Lately, computational methods have gradually emerged to predict binding affinities, and some pharmaceutical companies have benefited from these methods to some extent [50–52]. An excellent computational model of predicting drug–target binding affinity can not only shorten the cycle of drug discovery but also reduce the cost of drug discovery. In this research, we put forward a deep learning model named MRBDTA to make predictions for drug–target binding affinity. The implementation process can be divided into three parts. In embedding and positional encoding, we convert the original protein FASTA and drug SMILES sequences into the embedding layer space and add position information to the converted data. Then, we adopt two Trans blocks to extract features for converted proteins and drugs, respectively. In DTI learning, we integrate features of proteins and drugs extracted by the Trans block and predict binding affinities between proteins and drugs. To evaluate the performance of our deep learning method, we tested MRBDTA on two benchmark datasets and carried out analysis experiments, interpretability analysis and case studies. In the Davis and KIBA datasets, the MSE, CI and $r_{m}^{2}$ of MRBDTA on test data almost exceed all the 11 state-of-the-art computational models. The low SD of MSE, CI and $r_{m}^{2}$ in both Davis and KIBA datasets reveals the strong stability of MRBDTA. Besides, we conducted analysis experiments via replacing Trans block with single Trans encoder and removing skip connection in Trans block and confirmed that Trans block and skip connection could effectively improve the prediction accuracy, stability and reliability of MRBDTA. Then, for 1N6R and 3AQV, depending on multi-head attention mechanism, we performed the interpretability analysis to illustrate that the ability of correctly capturing part of interaction sites between a protein and a drug is one important reason for the excellent performance of MRBDTA. In case studies, we firstly utilized MRBDTA to predict binding affinities between FDA-approved drugs and SARS-CoV-2 replication-related proteins. As a result, in top 50 drugs with better-predicted affinities, compared with MT-DTI and MATT_DTI, our model can predict more antiviral drugs and their rankings are higher. Secondly, the trained MRBDTA was applied to predict binding affinities between 3C-like proteinase and 185 drugs. We found that most binding affinities predicted by MRBDTA were close to experimentally measured binding affinities of the 185 drug–target pairs. This further comprehensively manifests the excellent performance of MRBDTA.

MRBDTA shows better performance over the state-of-the-art computational models because of the following three factors. First, through optimizing the encoder of transformer, we developed a novel module called Trans block to extract molecule features. Trans block can fully take advantage of multi-head attention mechanism to capture more detailed information about molecule sequences from a wider perspective. Second, in Trans block, we introduced skip connection at encoder level to avoid the loss of global molecule features when acquiring the local molecule features, so as to ensure comprehensive capture of molecule features. Third, MRBDTA has the ability to correctly capture part of interaction sites between proteins and drugs, which reflects the interpretability of MRBDTA.

However, MRBDTA still has much room for improvement. Firstly, because MRBDTA is a large model with a lot of parameters, it may suffer from slow convergence when applied to larger datasets. In the future, we will further optimize MRBDTA and reduce the number of parameters in MRBDTA as much as possible when ensuring the performance of MRBDTA. Secondly, all proteins used in this study are less than 1500 amino acids in length. Therefore, when MRBDTA is applied to a protein with the length greater than 1500 amino acids, its performance is unknown. Over time, with the increase of trained data including long-sequence proteins, the generalization performance of MRBDTA will be further promoted. Last but not least, we clarify the biological interpretation of MRBDTA from the perspective of the protein sequence, but a protein and a drug react in three-dimensional space. As more and more protein tertiary structures are reported, and excellent protein structure prediction models such as AlphaFold2 [53], RoseTTAFold [54], trRosetta [55] and MMpred [56] gradually emerge, available protein tertiary structure data will further increase. In future research, we will work on introducing protein tertiary structure data to further improve the interpretability of MRBDTA.

The future direction of drug–target binding affinity prediction is mainly reflected in three aspects. First, drug and protein tertiary structure data will be widely utilized in predicting drug–target binding affinity. Second, researchers will introduce more different deep learning models to predict drug–target binding affinity, and some deep learning models may achieve excellent performance in this field. Third, the biological interpretability of computational models for drug–target binding affinity prediction will be further enhanced.

Table 7

For six SARS-CoV-2 replication-related proteins, the antiviral drugs in top 50 drugs with better affinities predicted by MRBDTA and MATT_DTI based on KIBA score

Key proteins in SARS-CoV-2	MRBDTA			MATT_DTI
Key proteins in SARS-CoV-2	Antiviral drug	KIBA score	Rank out of 3137	Antiviral drug	KIBA score	Rank out of 3137
3C-like proteinase	Daclatasvir (BMS-790052)	13.9089	4	Peramivir	12.1797	25
3C-like proteinase	Ritonavir	13.4445	21	Lopinavir	12.0090	45
RNA-dependent RNA polymerase	Daclatasvir (BMS-790052)	13.4401	8
	Ritonavir	12.6714	29
	Entecavir	12.5049	45
Helicase	Daclatasvir (BMS-790052)	13.9021	3
Helicase	Ritonavir	13.3434	18
3′-to-5′ exonuclease	Daclatasvir (BMS-790052)	13.7957	5	Zanamivir	12.2460	30
	Ritonavir	13.3427	19	Peramivir	12.1969	33
				Saquinavir	12.1460	37
EndoRNAse	Daclatasvir (BMS-790052)	13.8885	4	Peramivir	12.0356	37
EndoRNAse	Ritonavir	13.4665	19
2’-O-ribose methyltransferase	Daclatasvir (BMS-790052)	13.9041	4	Peramivir	12.2849	21
2’-O-ribose methyltransferase	Ritonavir	13.4015	24	Zanamivir	12.0110	46

Key proteins in SARS-CoV-2	MRBDTA			MATT_DTI
Key proteins in SARS-CoV-2	Antiviral drug	KIBA score	Rank out of 3137	Antiviral drug	KIBA score	Rank out of 3137
3C-like proteinase	Daclatasvir (BMS-790052)	13.9089	4	Peramivir	12.1797	25
3C-like proteinase	Ritonavir	13.4445	21	Lopinavir	12.0090	45
RNA-dependent RNA polymerase	Daclatasvir (BMS-790052)	13.4401	8
	Ritonavir	12.6714	29
	Entecavir	12.5049	45
Helicase	Daclatasvir (BMS-790052)	13.9021	3
Helicase	Ritonavir	13.3434	18
3′-to-5′ exonuclease	Daclatasvir (BMS-790052)	13.7957	5	Zanamivir	12.2460	30
	Ritonavir	13.3427	19	Peramivir	12.1969	33
				Saquinavir	12.1460	37
EndoRNAse	Daclatasvir (BMS-790052)	13.8885	4	Peramivir	12.0356	37
EndoRNAse	Ritonavir	13.4665	19
2’-O-ribose methyltransferase	Daclatasvir (BMS-790052)	13.9041	4	Peramivir	12.2849	21
2’-O-ribose methyltransferase	Ritonavir	13.4015	24	Zanamivir	12.0110	46

Table 7

For six SARS-CoV-2 replication-related proteins, the antiviral drugs in top 50 drugs with better affinities predicted by MRBDTA and MATT_DTI based on KIBA score

Key proteins in SARS-CoV-2	MRBDTA			MATT_DTI
Key proteins in SARS-CoV-2	Antiviral drug	KIBA score	Rank out of 3137	Antiviral drug	KIBA score	Rank out of 3137
3C-like proteinase	Daclatasvir (BMS-790052)	13.9089	4	Peramivir	12.1797	25
3C-like proteinase	Ritonavir	13.4445	21	Lopinavir	12.0090	45
RNA-dependent RNA polymerase	Daclatasvir (BMS-790052)	13.4401	8
	Ritonavir	12.6714	29
	Entecavir	12.5049	45
Helicase	Daclatasvir (BMS-790052)	13.9021	3
Helicase	Ritonavir	13.3434	18
3′-to-5′ exonuclease	Daclatasvir (BMS-790052)	13.7957	5	Zanamivir	12.2460	30
	Ritonavir	13.3427	19	Peramivir	12.1969	33
				Saquinavir	12.1460	37
EndoRNAse	Daclatasvir (BMS-790052)	13.8885	4	Peramivir	12.0356	37
EndoRNAse	Ritonavir	13.4665	19
2’-O-ribose methyltransferase	Daclatasvir (BMS-790052)	13.9041	4	Peramivir	12.2849	21
2’-O-ribose methyltransferase	Ritonavir	13.4015	24	Zanamivir	12.0110	46

Key proteins in SARS-CoV-2	MRBDTA			MATT_DTI
Key proteins in SARS-CoV-2	Antiviral drug	KIBA score	Rank out of 3137	Antiviral drug	KIBA score	Rank out of 3137
3C-like proteinase	Daclatasvir (BMS-790052)	13.9089	4	Peramivir	12.1797	25
3C-like proteinase	Ritonavir	13.4445	21	Lopinavir	12.0090	45
RNA-dependent RNA polymerase	Daclatasvir (BMS-790052)	13.4401	8
	Ritonavir	12.6714	29
	Entecavir	12.5049	45
Helicase	Daclatasvir (BMS-790052)	13.9021	3
Helicase	Ritonavir	13.3434	18
3′-to-5′ exonuclease	Daclatasvir (BMS-790052)	13.7957	5	Zanamivir	12.2460	30
	Ritonavir	13.3427	19	Peramivir	12.1969	33
				Saquinavir	12.1460	37
EndoRNAse	Daclatasvir (BMS-790052)	13.8885	4	Peramivir	12.0356	37
EndoRNAse	Ritonavir	13.4665	19
2’-O-ribose methyltransferase	Daclatasvir (BMS-790052)	13.9041	4	Peramivir	12.2849	21
2’-O-ribose methyltransferase	Ritonavir	13.4015	24	Zanamivir	12.0110	46

Materials and methods

Benchmark datasets

In our research, the Davis and KIBA datasets are used as benchmark datasets [57, 58]. Specifically, the Davis dataset involves interactions between 442 kinase proteins and 68 inhibitors (drugs), measured by the dissociation constant (K_d) value. In line with literature [32], the K_d values are logarithmically transformed into pK_d as binding affinities, which is described as follows:

p K_{d} = - \log_{10} (\frac{K_{d}}{10^{9}})

(1)

KIBA dataset originally consists of 467 proteins, 52 498 drugs and KIBA scores between these proteins and drugs. KIBA scores measure kinase inhibitor bioactivities and are deemed as binding affinities. To balance samples, He et al. [23] filtered the original KIBA dataset to contain proteins and drugs with at least 10 interactions eventually yielding 229 proteins and 2111 drugs. Table 8 summarizes details of the two benchmark datasets including the number of proteins and drugs, the interactions between proteins and drugs, and the size of training set and test set.

Table 8

Details of two benchmark datasets

Details	Davis	KIBA
Proteins	442	229
Drugs	68	2111
Interactions	30 056	118 254
Training set	25 046	98 545
Test set	5010	19 709

Table 8

Open in new tab Download slide

Details of two benchmark datasets

Details	Davis	KIBA
Proteins	442	229
Drugs	68	2111
Interactions	30 056	118 254
Training set	25 046	98 545
Test set	5010	19 709

Molecule Representation Block-based Drug-Target binding Affinity prediction

As demonstrated in Figure 2, based on the embedding layer, the positional encoding, the encoder module of transformer [59], the skip connection and the feed-forward layer, we proposed a novel deep learning model MRBDTA to predict drug–target binding affinities. MRBDTA is composed of three parts: embedding and positional encoding, Trans block and DTI learning. In detail, the original protein FASTA and drug SMILES sequences are encoded during embedding and positional encoding process. Next, two Trans blocks are exploited to extract features from encoded proteins and drugs, respectively. Eventually, the interaction learning module is employed to integrate features of proteins and drugs extracted by Trans block and predict binding affinities between proteins and drugs.

Figure 2

Illustration of MRBDTA proposed in this study. The architecture of MRBDTA is shown in (A). Especially, the architecture of MRBDTA consists of three parts: embedding and positional encoding, Trans block and DTI learning. (B) The implementation steps of scaled dot-product attention with mask operation. The principle of multi-head attention mechanism is given in (C). Here, multi-head attention is composed of four attention layers running in parallel. (D) and (E) describe the details of Trans encoder and L-Trans encoder in MRBDTA, respectively.

Embedding and positional encoding

Adopting the same approach used in most prediction models for drug–target binding affinities, we regard the original protein FASTA and drug SMILES sequences as the inputs of MRBDTA. As known, a protein FASTA sequence is composed of different amino acids. In our research, a protein P is defined as below:

P = {p_{1}, p_{2}, . . ., p_{i}, . . ., p_{n_{P}}} p_{i} \in N^{p}

(2)

where

p_{i}

represents the ith amino acid.

N^{p}

means the amino acid set including 25 common amino acids. The sequence length

n_{\begin{matrix}  \end{matrix} P}

is varied depending on protein. Here, we defined a hyperparameter l to denote the length of the largest protein in our study. Inspired by transformer [59], we first implemented the embedding for all amino acids in a protein P, and the output

E^{P} \in R^{l \times e}

of the embedding layer has a trainable weight

W^{P} \in R^{v \times e}

where v is the size of the above-mentioned amino acid set, and e is the embedding size of the amino acid. To add the relative or absolute position information for each amino acid in a protein P, we carried out the positional encoding.

P E^{P} \in R^{l \times d}

denotes the output of the positional encoding for all amino acids in a protein P and is defined by Equation (3)

\begin{array}{l} P E^{P} (i, 2 j) = \sin (i / 10000^{2 j / d}) \\ P E^{P} (i, 2 j + 1) = \cos (i / 10000^{2 j / d}) \end{array} \begin{matrix} i = 1, 2, 3, . . ., n_{P} (n_{P} \leq l) \\ j = 0, 1, 2, . . ., \frac{d}{2} \end{matrix}

(3)

where d is the positional encoding size of the amino acid.

P E^{P} (i, :)

is the ith row of the matrix

P E^{P}

and represents the positional encoding of the ith amino acid in a protein P. It should be pointed out that if

n_{P} < l

⁠, the elements from row

n_{P} + 1

to l in

P E^{P}

are zero. For a protein P, we set the positional encoding size equal to the embedding size (d = e) and thus can directly add

P E^{P}

and

E^{\begin{matrix}  \end{matrix} P}

⁠.

X^{P}

denotes the output of a protein P processed by the embedding and positional encoding and is defined by Equation (4).

X^{\begin{matrix}  \end{matrix} P} = E^{\begin{matrix}  \end{matrix} P} + P E^{\begin{matrix}  \end{matrix} P} \begin{array}{cc}  \end{array} X^{\begin{matrix}  \end{matrix} P} \in R^{\begin{matrix}  \end{matrix} l \times e}

(4)

In the same way as the definition of a protein P, C is the mathematical expression of a drug and is defined by Equation (5)

\begin{array}{cc} C = {c_{1}, c_{2}, . . ., c_{i}, . . ., c_{m_{C}}} & c_{i} \in N^{c} \end{array}

(5)

where

c_{i}

means the ith SMILES character.

N^{\begin{matrix}  \end{matrix} c}

signifies the SMILES set including 62 SMILES characters. The SMILES length

m_{\begin{matrix}  \end{matrix} C}

of a drug C is varied. We also defined a hyperparameter z to denote the length of the largest drug in our study.

X^{\begin{matrix}  \end{matrix} C}

represents the output of a drug C processed by the embedding and positional encoding and is defined by Equation (6)

X^{\begin{matrix}  \end{matrix} C} = E^{\begin{matrix}  \end{matrix} C} + P E^{\begin{matrix}  \end{matrix} C} \begin{array}{cc}  \end{array} X^{\begin{matrix}  \end{matrix} C} \in R^{\begin{matrix}  \end{matrix} z \times u}

(6)

where u is the embedding size of the SMILES character. Here, the amino acid and the SMILES character have the same embedding size (u = e).

E^{C} \in R^{z \times u}

signifies the output of the embedding for all SMILES characters in a drug C.

P E^{\begin{matrix}  \end{matrix} C} \in R^{z \times r}

(r = u) means the output of the positional encoding for all SMILES characters in a drug C, where r is the positional encoding size of the SMILES character in a drug C.

Trans block

The internal details of Trans block are listed in Figure 2A. Concretely, Trans block includes one L-Trans encoder, two parallel Trans encoders, the skip connection drawn as a red line and the concat operation. The input of Trans block is the output

X^{\begin{matrix}  \end{matrix} P}

(⁠

X^{\begin{matrix}  \end{matrix} C}

⁠) of a protein P (drug C) processed by the embedding and positional encoding. The concatenation for the output of two parallel Trans encoders is treated as the output of Trans block. The output of Trans block is denoted as

X^{P A} \in R^{\begin{matrix}  \end{matrix} l \times 2 e}

or

X^{\begin{matrix}  \end{matrix} C A} \in R^{\begin{matrix}  \end{matrix} z \times 2 u}

⁠. Here, the scaled dot-product attention layer is the fundamental layer in Trans block. Through integrating the linear layer, the scaled dot-product attention layer and the concat operation, we first obtained the multi-head attention layer. Then, based on the linear layer, the multi-head attention layer, the residual connection, the layer normalization and the feed-forward layer, we constructed Trans encoder and L-Trans encoder, respectively. Finally, depending on L-Trans encoder, Trans encoder, the skip connection and the concat operation, we developed the Trans block to extract features from encoded proteins and drugs. The detailed implementation steps of the scaled dot-product attention layer, the multi-head attention layer, Trans block and L-Trans encoder are as follows.

In Figure 2B, a scaled dot-product attention layer can be depicted as mapping a query (Q) and a set of key-value (K-V) pairs to an output. In detail, the input of the scaled dot-product attention layer in our model contains matrices

Q_{L}

and

K_{L}

of dimension d_k and a matrix

V_{L}

of dimension d_v (d_k = d_v = 0.25e). Here,

Q_{L \begin{matrix}  \end{matrix}} =_{\begin{matrix}  \end{matrix}} K_{L \begin{matrix}  \end{matrix}} =_{\begin{matrix}  \end{matrix}} V_{L}

is the matrix obtained from performing linear projection on the input matrix of Trans block with the linear layer. As shown in Figure 2B, we performed the MatMul operation to compute the dot product of

Q_{L}

with

K_{L}

⁠, carried out the Scale operation of dividing the dot product by

\sqrt{d_{k}}

and applied the SoftMax operation to obtain weights on

V_{L}

⁠. Before performing the SoftMax operation, we implemented the Mask operation of replacing zeros in the feature matrix of protein or compound with negative bias

- \infty

to avoid invalid calculations resulting from the softmax function. Finally, we performed the MatMul operation of calculating the dot product of

V_{L}

and weights on

V_{L}

to get matrix

A t t e n t i o n (Q_{L}, K_{L}, V_{L})

⁠, which represents the output of a scaled dot-product attention layer, and

A t t e n t i o n (Q_{L}, K_{L}, V_{L})

is defined by Equation (7)

A t t e n t i o n (Q_{L}, K_{L}, V_{L}) = s o f t max (\frac{Q_{L} K_{L}^{T}}{\sqrt{d_{k}}}) V_{L}

(7)

where

Q_{L} \in R^{n \times d_{k}}

⁠,

K_{L} \in R^{n \times d_{k}}

and

V_{L} \in R^{n \times d_{v}}

⁠. Here, n is the length of the largest protein or drug in our study (n = l or n = z). As depicted in Figure 2C, the multi-head attention layer in our model is made up of 4 scaled dot-product attention layers running in parallel, 13 linear layers and the concat operation. Here,

Q_{\begin{matrix}  \end{matrix}} =_{\begin{matrix}  \end{matrix}} K_{\begin{matrix}  \end{matrix}} =_{\begin{matrix}  \end{matrix}} V

is the input

X^{\begin{matrix}  \end{matrix} P}

or

X^{\begin{matrix}  \end{matrix} C}

of Trans block. Firstly, the e-dimensional matrices Q, K and V were linearly projected h (h = 4) times with linear layers to obtain 4

Q_{L}

matrices, 4

K_{L}

matrices and 4

V_{L}

matrices, respectively. Then, we utilized the scaled dot-product attention layer to process

Q_{L}

⁠,

K_{L}

and

V_{L}

⁠, yielding the output head_i of the ith scaled dot-product attention layer (i = 1,2,3,4). The head_i is defined by Equation (8)

h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(8)

where

W_{i}^{Q} \in R^{e \times d_{k}}

⁠,

W_{i}^{K} \in R^{e \times d_{k}}

and

W_{i}^{V} \in R^{e \times d_{v}}

are linear projection matrices. Finally, the outputs of four scaled dot-product attention layers were concatenated and delivered into a linear layer, resulting in the output MultiHead(Q,K,V) of the multi-head attention layer. The MultiHead(Q,K,V) is defined by Equation (9)

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, . . ., h e a d_{h}) W^{O}

(9)

where

W^{\begin{matrix}  \end{matrix} O} \in R^{\begin{matrix}  \end{matrix} h d_{v} \times e}

is a linear projection matrix.

Figure 2D and E describes the details of Trans encoder and L-Trans encoder, respectively. Trans encoder in our model is the same as the encoder in transformer. Specifically, Trans encoder contains two sub-layers (a multi-head attention layer and a position-wise feed-forward layer), the residual connection namely Add operation and the layer normalization called Norm operation. In the above paragraph, we have elaborated on the multi-head attention layer. The position-wise feed-forward layer is made up of two linear transformations with a ReLU activation in between [60]. What’s more, we applied a residual connection around each of the two sub-layers, followed by layer normalization [61–63]. Unlike Trans encoder, L-Trans encoder, which could extremely enhance the robustness of our deep learning model, was constructed through adding a linear layer at the beginning of Trans encoder.

In summary, relying on Trans encoder and L-Trans encoder, we introduced the skip connection and the concat operation to develop Trans block. It should be noted that the skip connection is created through adding the input of the L-Trans encoder to the output of the L-Trans encoder at encoder level. As the most important component of MRBDTA, Trans block can extract effective features from encoded proteins and drugs, respectively. Besides, in DTI learning, the extracted features will be utilized to make predictions for drug–target binding affinity.

DTI learning

As depicted in Figure 2A, the interaction learning module contains the concat operation, two feed-forward layers and a linear layer. The feed-forward layer is orderly made up of a linear layer, a layer normalization namely Norm operation, a dropout layer and a ReLU activation [64]. Here, we first summed columns of $X^{P A} \in R^{l \times 2 e}$ and $X^{C A} \in R^{z \times 2 u}$ ⁠, which are outputs of two Trans blocks (one is for a protein P, the other is for a drug C) to get $X^{P S A} \in R^{1 \times 2 e}$ and $X^{C S A} \in R^{1 \times 2 u}$ ⁠, respectively. Then, $X^{A} \in R^{1 \times 4 e}$ obtained from concatenating $X^{P S A} \in R^{1 \times 2 e}$ and $X^{C S A} \in R^{1 \times 2 u}$ was orderly fed into two feed-forward layers and a linear layer to achieve the output $Y^{A}$ of the interaction learning module. $Y^{A}$ is regarded as the predicted binding affinity value between a protein P and a drug C. What’s more, the fusion for features of a protein P and a drug C extracted by the Trans block is realized in the interaction learning module. From the perspective of molecular biology, the interaction learning module can be regarded as the simulation for the chemical reaction of a protein P and a drug C.

Key points

MRBDTA is composed of three parts: embedding and positional encoding, Trans block and drug–target interaction learning.
The Trans block is constructed through improving the encoder of transformer and introducing skip connection at encoder level.
Compared with 11 state-of-the-art computational models, MRBDTA achieves almost the best performance in both Davis and KIBA datasets.
Based on multi-head attention mechanism, we performed the interpretability experiment to elaborate that MRBDTA can correctly capture part of interaction sites between proteins and drugs, which further explains the excellent performance of MRBDTA.
We applied MRBDTA to predict binding affinities (KIBA score and K_d in nM) between 3137 FDA-approved drugs and SARS-CoV-2 replication-related proteins. Through comparing MRBDTA with MT-DTI and MATT_DTI, we observed that among top 50 drugs with better predicted affinities, MRBDTA can predict more antiviral drugs and their rankings are higher.

Funding

National Natural Science Foundation of China (under grant no. 61972399 to X.C.).

Data and code availability

The supporting data for this study and the implementation codes of MRBDTA are available online at https://github.com/LiZhang30/MRBDTA.

Author Biographies

Li Zhang, is a PhD student of School of Information and Control Engineering, China University of Mining and Technology. His research interests include bioinformatics, drug discovery, neural networks and deep learning.

Chun-Chun Wang, is a PhD student of School of Information and Control Engineering, China University of Mining and Technology. His research interests include bioinformatics, complex network algorithm and machine learning.

Xing Chen, PhD, is a professor of China University of Mining and Technology. He is the associate dean of Artificial Intelligence Research Institute, China University of Mining and Technology. He is also the founding director of Institute of Bioinformatics, China University of Mining and Technology and Big Data Research Center, China University of Mining and Technology. His research interests include complex disease-related non-coding RNA biomarker prediction, computational models for drug discovery and early detection of human complex disease based on big data and artificial intelligence algorithms.

References

1.

Li

XH

,

Babu

MM

.

Human diseases from gain-of-function mutations in disordered protein regions

.

Cell

2018

;

175

:

40

–

2

.

2.

Mullard

A

.

2020 FDA drug approvals

.

Nat Rev Drug Discov

2021

;

20

:

85

–

90

.

3.

Paul

SM

,

Mytelka

DS

,

Dunwiddie

CT

, et al.

How to improve R&D productivity: the pharmaceutical industry's grand challenge

.

Nat Rev Drug Discov

2010

;

9

:

203

–

14

.

4.

Kola

I

,

Landis

J

.

Can the pharmaceutical industry reduce attrition rates?

Nat Rev Drug Discov

2004

;

3

:

711

–

5

.

5.

Stokes

JM

,

Yang

K

,

Swanson

K

, et al.

A deep learning approach to antibiotic discovery

.

Cell

2020

;

180

:

688

–

702

.

6.

Chen

X

,

Yan

CC

,

Zhang

X

, et al.

Drug-target interaction prediction: databases, web servers and computational models

.

Brief Bioinform

2016

;

17

:

696

–

712

.

7.

Sadybekov

AA

,

Sadybekov

AV

,

Liu

Y

, et al.

Synthon-based ligand discovery in virtual libraries of over 11 billion compounds

.

Nature

2022

;

601

:

452

–

9

.

8.

Sun

L

,

Li

P

,

Ju

X

, et al.

In vivo structural characterization of the SARS-CoV-2 RNA genome identifies host proteins vulnerable to repurposed drugs

.

Cell

2021

;

184

:

1865

–

83

.

9.

Lago

SG

,

Tomasik

J

,

van

Rees

GF

, et al.

Drug discovery for psychiatric disorders using high-content single-cell screening of signaling network responses ex vivo

.

Sci Adv

2019

;

5

:eaau9093.

10.

Reker

D

,

Bernardes

GJL

,

Rodrigues

T

.

Computational advances in combating colloidal aggregation in drug discovery

.

Nat Chem

2019

;

11

:

402

–

18

.

11.

Chen

X

,

Xie

D

,

Zhao

Q

, et al.

MicroRNAs and complex diseases: from experimental results to computational models

.

Brief Bioinform

2019

;

20

:

515

–

39

.

12.

D'Souza

S

,

Prema

KV

,

Balaji

S

.

Machine learning models for drug-target interactions: current knowledge and future directions

.

Drug Discov Today

2020

;

25

:

748

–

56

.

13.

Yang

Z

,

Zhong

W

,

Zhao

L

, et al.

MGraphDTA: deep multiscale graph neural network for explainable drug-target binding affinity prediction

.

Chem Sci

2022

;

13

:

816

–

33

.

14.

Chen

X

,

Yan

CC

,

Zhang

X

, et al.

Long non-coding RNAs and complex diseases: from experimental results to computational models

.

Brief Bioinform

2017

;

18

:

558

–

76

.

PubMed

15.

Srivastava

PK

,

van

Eyll

J

,

Godard

P

, et al.

A systems-level framework for drug discovery identifies Csf1R as an anti-epileptic drug target

.

Nat Commun

2018

;

9

:

3561

.

16.

Ye

Q

,

Hsieh

CY

,

Yang

Z

, et al.

A unified drug-target interaction prediction framework based on knowledge graph and recommendation system

.

Nat Commun

2021

;

12

:

6775

.

17.

Luo

Y

,

Zhao

X

,

Zhou

J

, et al.

A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information

.

Nat Commun

2017

;

8

:

573

.

18.

Madhukar

NS

,

Khade

PK

,

Huang

L

, et al.

A Bayesian machine learning approach for drug target identification using diverse data types

.

Nat Commun

2019

;

10

:

5221

.

19.

Clarelli

F

,

Palmer

A

,

Singh

B

, et al.

Drug-target binding quantitatively predicts optimal antibiotic dose levels in quinolones

.

PLoS Comput Biol

2020

;

16

:e1008106.

20.

Piazza

I

,

Beaton

N

,

Bruderer

R

, et al.

A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes

.

Nat Commun

2020

;

11

:

4200

.

21.

Li

S

,

Wan

F

,

Shu

H

, et al.

MONN: a multi-objective neural network for predicting compound-protein interactions and affinities

.

Cell Syst

2020

;

10

:

308

–

22

.

22.

Pahikkala

T

,

Airola

A

,

Pietilä

S

, et al.

Toward more realistic drug-target interaction predictions

.

Brief Bioinform

2015

;

16

:

325

–

37

.

23.

He

T

,

Heidemeyer

M

,

Ban

F

, et al.

SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines

.

J Chem

2017

;

9

:

24

.

24.

Öztürk

H

,

Özgür

A

,

Ozkirimli

E

.

DeepDTA: deep drug-target binding affinity prediction

.

Bioinformatics

2018

;

34

:

i821

–

9

.

25.

Öztürk

H

,

Ozkirimli

E

,

Özgür

A

.

WideDTA: prediction of drug-target binding affinity

.

arXiv preprint

2019

;arXiv:1902.04166.

26.

Wan

F

,

Zhu

Y

,

Hu

H

, et al.

DeepCPI: a deep learning-based framework for large-scale in silico drug screening

.

Genomics Proteomics Bioinformatics

2019

;

17

:

478

–

95

.

27.

Lin

X

.

DeepGS: deep representation learning of graphs and sequences for drug-target binding affinity prediction

.

Eur Conf Artif Intell (ECAI)

2020

;

325

:

1301

–

8

.

28.

Pu

Y

,

Li

J

,

Tang

J

, et al.

DeepFusionDTA: drug-target binding affinity prediction with information fusion and hybrid deep-learning ensemble model

.

IEEE/ACM Trans Comput Biol Bioinform

2022

;

19

:2760–69.

29.

Shin

B

,

Park

S

,

Kang

K

, et al.

Self-attention based molecule representation for predicting drug-target interaction

.

arXiv preprint

2019

;arXiv:1908.06760.

30.

Abbasi

K

,

Razzaghi

P

,

Poso

A

, et al.

DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks

.

Bioinformatics

2020

;

36

:

4633

–

42

.

31.

Nguyen

T

,

Le

H

,

Quinn

TP

, et al.

GraphDTA: predicting drug-target binding affinity with graph neural networks

.

Bioinformatics

2021

;

37

:

1140

–

7

.

32.

Zeng

Y

,

Chen

X

,

Luo

Y

, et al.

Deep drug-target binding affinity prediction with multiple attention blocks

.

Brief Bioinform

2021

;

22

:bbab117.

33.

Ding

SHH

,

Fung

BCM

,

Iqbal

F

, et al.

Learning stylometric representations for authorship analysis

.

IEEE Trans Cybern

2019

;

49

:

107

–

21

.

34.

Manica

M

,

Mathis

R

,

Cadow

J

, et al.

Context-specific interaction networks from vector representation of words

.

Nat Mach Intell

2019

;

1

:

181

–

90

.

35.

Costa-jussà

RM

.

An analysis of gender bias studies in natural language processing

.

Nat Mach Intell

2019

;

1

:

495

–

6

.

36.

Papadimitriou

CH

,

Raghavan

P

,

Tamaki

H

, et al.

Latent semantic indexing: a probabilistic analysis

.

J Comput Syst Sci

1998

;

61

:

217

–

35

.

37.

Zhang

S

,

Tian

Q

,

Hua

G

, et al.

Generating descriptive visual words and visual phrases for large-scale image applications

.

IEEE Trans Image Process

2011

;

20

:

2664

–

77

.

38.

Yu

F

,

Koltun

V

. Multi-scale context aggregation by dilated convolutions. In:

2016 International Conference on Learning Representation (ICLR).

San Juan, Puerto Rico, 2016. OpenReview.net, Amherst, Massachusetts, USA.

39.

Lin

T

,

Wang

Y

,

Liu

X

, et al.

A Survey of Transformers

.

arXiv preprint

2021

;arXiv:2106.04554.

40.

Landrum

G

.

RDKit: open-source cheminformatics. Release 2014.03.1

.

arXiv preprint

2010

;arXiv:1908.06760.

41.

Kip

FTN

,

Welling

M

.

Semi-supervised classification with graph convolutional networks

. In:

2017 International Conference on Learning Representation (ICLR).

Toulon, France, 2017. OpenReview.net, Amherst, Massachusetts, USA.

42.

Velikovi

P

,

Cucurull

G

,

Casanova

A

, et al.

Graph attention networks

. In:

2018 International Conference on Learning Representation (ICLR).

Vancouver, Canada, 2018. OpenReview.net, Amherst, Massachusetts, USA.

43.

Xu

K

,

Hu

W

,

Leskovec

J

, et al.

How powerful are graph neural networks?

In:

2019 International Conference on Learning Representation (ICLR).

New Orleans, USA, 2019. OpenReview.net, Amherst, Massachusetts, USA.

44.

Mithat

G

,

Glenn

HJB

.

Concordance probability and discriminatory power in proportional hazards regression

.

Biometrika

2005

;

92

:

965

–

70

.

45.

Roy

K

,

Chakraborty

P

,

Mitra

I

, et al.

Some case studies on application of "r(m)2" metrics for judging quality of quantitative structure-activity relationship predictions: emphasis on scaling of response data

.

J Comput Chem

2013

;

34

:

1071

–

82

.

46.

Riva

L

,

Yuan

S

,

Yin

X

, et al.

Discovery of SARS-CoV-2 antiviral drugs through large-scale compound repurposing

.

Nature

2020

;

586

:

113

–

9

.

47.

Dittmar

M

,

Lee

JS

,

Whig

K

, et al.

Drug repurposing screens reveal cell-type-specific entry pathways and FDA-approved drugs active against SARS-Cov-2

.

Cell Rep

2021

;

35

:

108959

.

48.

Li

XS

,

Liu

X

,

Lu

L

, et al.

Multiphysical graph neural network (MP-GNN) for COVID-19 drug design

.

Brief Bioinform

2022

;

23

:bbac231.

49.

Nguyen

DD

,

Gao

K

,

Chen

J

, et al.

Unveiling the molecular mechanism of SARS-CoV-2 main protease inhibition from 137 crystal structures using algebraic topology and deep learning

.

Chem Sci

2020

;

11

:

12036

–

46

.

50.

Méndez-Lucio

O

,

Baillif

B

,

Clevert

DA

, et al.

De novo generation of hit-like molecules from gene expression signatures using artificial intelligence

.

Nat Commun

2020

;

11

:

10

.

51.

Bagherian

M

,

Sabeti

E

,

Wang

K

, et al.

Machine learning approaches and databases for prediction of drug-target interaction: a survey paper

.

Brief Bioinform

2021

;

22

:

247

–

69

.

52.

Rifaioglu

AS

,

Atas

H

,

Martin

MJ

, et al.

Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases

.

Brief Bioinform

2019

;

20

:

1878

–

912

.

53.

Jumper

J

,

Evans

R

,

Pritzel

A

, et al.

Highly accurate protein structure prediction with AlphaFold

.

Nature

2021

;

596

:

583

–

9

.

54.

Baek

M

,

DiMaio

F

,

Anishchenko

I

, et al.

Accurate prediction of protein structures and interactions using a three-track neural network

.

Science

2021

;

373

:

871

–

6

.

55.

Yang

J

,

Anishchenko

I

,

Park

H

, et al.

Improved protein structure prediction using predicted interresidue orientations

.

Proc Natl Acad Sci U S A

2020

;

117

:

1496

–

503

.

56.

Zhao

KL

,

Liu

J

,

Zhou

XG

, et al.

MMpred: a distance-assisted multimodal conformation sampling for de novo protein structure prediction

.

Bioinformatics

2021

;

37

:4350–4356.

57.

Davis

MI

,

Hunt

JP

,

Herrgard

S

, et al.

Comprehensive analysis of kinase inhibitor selectivity

.

Nat Biotechnol

2011

;

29

:

1046

–

51

.

58.

Tang

J

,

Szwajda

A

,

Shakyawar

S

, et al.

Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis

.

J Chem Inf Model

2014

;

54

:

735

–

43

.

59.

Vaswani

A

,

Shazeer

N

,

Parmar

N

, et al.

Attention is all you need

.

Adv Neural Inf Process Syst

2017

;

30

:

5998

–

6008

.

60.

Dittmer

S

,

King

EJ

,

Maass

P

.

Singular values for ReLU layers

.

IEEE Trans Neural Netw Learn Syst

2020

;

31

:

3594

–

605

.

61.

He

K

,

Zhang

X

,

Ren

S

, et al. Deep residual learning for image recognition,

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

2016

:

770

–

8

.

62.

Lu

N

,

Yu

W

,

Qi

X

, et al.

MASTER: multi-aspect non-local network for scene text recognition

.

Pattern Recogn

2019

;

117

:

107980

.

63.

Ba

JL

,

Kiros

JR

,

Hinton

GE

.

Layer normalization

.

arXiv preprint

2016

;arXiv:1607.06450.