Multiple similarity drug–target interaction prediction with random walks and matrix factorization

Characteristics of datasets.

Dataset	\|$n_d$\|	\|$n_t$\|	\|$\|P_1\|$\|	Sparsity	\|$m_d$\|	\|$m_t$\|
NR	54	26	166	0.118	4	4
GPCR	223	95	1096	0.052
IC	210	204	2331	0.054
E	445	664	4256	0.014
Luo	708	1512	1923	0.002	4	3

Dataset	\|$n_d$\|	\|$n_t$\|	\|$\|P_1\|$\|	Sparsity	\|$m_d$\|	\|$m_t$\|
NR	54	26	166	0.118	4	4
GPCR	223	95	1096	0.052
IC	210	204	2331	0.054
E	445	664	4256	0.014
Luo	708	1512	1923	0.002	4	3

Table 1

Open in new tab Download slide

Characteristics of datasets.

Dataset	\|$n_d$\|	\|$n_t$\|	\|$\|P_1\|$\|	Sparsity	\|$m_d$\|	\|$m_t$\|
NR	54	26	166	0.118	4	4
GPCR	223	95	1096	0.052
IC	210	204	2331	0.054
E	445	664	4256	0.014
Luo	708	1512	1923	0.002	4	3

Dataset	\|$n_d$\|	\|$n_t$\|	\|$\|P_1\|$\|	Sparsity	\|$m_d$\|	\|$m_t$\|
NR	54	26	166	0.118	4	4
GPCR	223	95	1096	0.052
IC	210	204	2331	0.054
E	445	664	4256	0.014
Luo	708	1512	1923	0.002	4	3

The first one is a collection of four golden standard datasets constructed by Yamanishi et al. [21], each one corresponding to a target protein family, namely Nuclear Receptors (NR), Ion Channel (IC), G-protein coupled receptors (GPCR), and Enzyme (E). Because the interactions in these datasets were discovered 14 years ago, we updated them by adding newly discovered interactions between drugs and targets in these datasets recorded in the last version of KEGG [22], DrugBank [23], and ChEMBL [24] databases. Details on new DTIs collection can be found in Supplementary Section A1. Four types of drug similarities, including SIMCOMP [25] built upon chemical structures, AERS-freq, AERS-bit [26] and SIDER [27] derived from drug side effects, as well as four types of target similarities, namely gene ontology (GO) term based semantic similarity, Normalized Smith-Waterman (SW), spectrum kernel with 3-mers length (SPEC-k3) and 4-mers length (SPEC-k4) based amino acid sequence similarities, obtained from [28], are utilized to describe diverse drug and target relations, since they possess higher local interaction consistency [10].

The second one provided by Luo et al. [17] (denoted as Luo) was built in 2017, which includes DTIs and drug–drug interactions (DDI) obtained from DrugBank 3.0 [23], as well as drug side effect (SE) associations, protein–protein interactions (PPI) and disease-related associations extracted from SIDER [27], HPRD [29], and Comparative Toxicogenomics Database [30], respectively. Based on diverse interaction/association profiles, three drug similarities derived from DDI, SE and drug–disease associations as well as two target similarities derived from PPI and target–disease associations are computed. The Jaccard similarity coefficient is employed to assess the similarity of drugs, SEs or diseases (proteins or diseases) associated/interacted with two drugs (targets). In addition, drug similarity based on chemical structure and target similarity based on genome sequence are also computed. Therefore, four drug similarities and three target similarities are used for this dataset.

Multiple similarity DeepWalk-based matrix factorization

A DTI dataset, associated with multiple drug and target similarities, can be viewed as a multiplex heterogeneous network |$G^{DTI}$|⁠. This can be done by treating drugs and targets as two types of vertices, and by considering non-zero similarities and interactions as edges connecting two homogeneous and heterogeneous entities, respectively, where the weight of each edge equals the corresponding similarity or interaction value. In a DTI dataset, the interaction matrix is typically more sparse than similarity matrices, causing that similarity-derived edges linking two drugs (targets) markedly outnumber more crucial bipartite interaction edges. To balance the distribution of different types of edges and stress relations of more similar entities in the DTI network, we replace each original dense similarity matrix with the sparse adjacency matrix of its corresponding |$k$|-nearest neighbors (⁠|$k$|-NNs) graph. Specifically, given a similarity matrix |$\boldsymbol{S}^{d,h}$|⁠, its sparsified matrix |$\hat{\boldsymbol{S}}^{d,h} \in \mathbb{R}^{n_d \times n_d}$| is defined as:

$$\begin{align}& \hat{S}^{d,h}_{ij} = \left\{ \begin{array}{@{}l} S^{d,h}_{ij} \text{, if}\ d_j \in \mathcal{N}^{k,h}_{d_i} \textrm{ and}\ d_i \in \mathcal{N}^{k,h}_{d_j} \\ 0 \text{, if}\ d_j \notin \mathcal{N}^{k,h}_{d_i} \textrm{ and}\ d_i \notin \mathcal{N}^{k,h}_{d_j} \\ \frac{1}{2}S^{d,h}_{ij} \text{, otherwise,} \end{array}\right. \end{align}$$

(4)

where |$\mathcal{N}^{k,h}_{d_i}$| is the set of the |$k$|-NNs of |$d_i$| based on similarity |$\boldsymbol{S}^{d,h}$|⁠.

Formally, |$G^{DTI}$| consists of three parts: (i) |$G^{d}=\{\hat{\boldsymbol{S}}^{d,h}\}_{h=1}^{m_d}$|⁠, which is a multiplex drug subnetwork containing |$m_d$| layers, with |$\hat{\boldsymbol{S}}^{d,h}$| being the adjacency matrix of the |$h$|-th drug layer; (ii) |$G^{t}=\{\hat{\boldsymbol{S}}^{t,h}\}_{h=1}^{m_t}$|⁠, which is a multiplex target subnetwork including |$m_t$| layers with |$\hat{\boldsymbol{S}}^{t,h}$| denoting the adjacency matrix of the |$h$|-th target layer; (iii) |$G^Y=\boldsymbol{Y}$|⁠, which is a bipartite interaction subnetwork connecting drug and target nodes in each layer. Figure 1A depicts an example DTI network.

Figure 1

Representing a DTI dataset with three drug and two target similarities as a network. (A) A multiplex heterogeneous network including three drug layers, two target layers and six identical bipartite interaction subnetworks connecting drug and target nodes in each layer. (B) Six multiple hyper-layers, where each of them is composed of a drug and a target layer along with the interaction subnetwork.

The DeepWalk matrix cannot be directly calculated for the complex DTI network that includes two multiplex and a bipartite subnetwork. To facilitate its computation, we consider each combination of a drug and a target layer along with the interaction subnetwork as a hyper-layer, and reformulate the DTI network as a multiplex network containing |$m_d\cdot m_t$| hyper-layers. The hyper-layer incorporating the |$i$|-th drug layer and |$j$|-th target layer is defined by the adjacency matrix

$$\boldsymbol{A}^{i,j} = \begin{bmatrix} \hat{\boldsymbol{S}}^{d,i} & \boldsymbol{Y} \\ \boldsymbol{Y}^\top & \hat{\boldsymbol{S}}^{t,j} \\ \end{bmatrix} $$

⁠, upon which |$G^{DTI}$| could be expressed as a set of hyper-layers |$\{\boldsymbol{A}^{i,j}\}^{m_d,m_t}_{i=1,j=1}$|⁠. Figure 1B illustrates the multiple hyper-layer network corresponding to the original DTI network in Figure 1A.

Based on the above reformulation, we compute a DeepWalk matrix |$\boldsymbol{M}^{i,j} \in \mathbb{R}^{(n_d+n_t) \times (n_d+n_t)}$| for each |$\boldsymbol{A}^{i,j}$| using Eq. (3), which reflects node co-occurrences in truncated random walks and captures richer proximity among nodes than the original hyper-layer—especially the proximity between unlinked nodes. In particular, if a pair of unlinked drug and target in |$\boldsymbol{A}^{i,j}$| has a certain level of proximity (non-zero value) in |$\boldsymbol{M}^{i,j}$|⁠, the corresponding relation represented by the DeepWalk matrix could be interpreted as the recovery of their missing interaction, which supplements the incomplete interaction information and reduces the noise in the original dataset. See an example in Supplementary Section A2.1.

In order to mine multiple DeepWalk matrices effectively, we define a unified DeepWalk matrix for the whole DTI network by aggregating every |$\boldsymbol{M}^{i,j}$|⁠:

$$\begin{align}& \bar{\boldsymbol{M}} = \sum_{i=1}^{m_d}\sum_{j=1}^{m_t} w^d_i w^t_j \boldsymbol{M}^{i,j}, \end{align}$$

(5)

where |$w^d_i$| and |$w^t_j$| are weights of |$i$|-th drug and |$j$|-th target layers, respectively, with |$\sum _{i=1}^{m_d}w^d_i=1$| and |$\sum _{j=1}^{m_t}w^t_j=1$|⁠. In Eq. (5), the importance of each hyper-layer is determined by multiplying the weights of its involved drug and target layers. This work employs the local interaction consistency (LIC)-based similarity weight, which assesses the proportion of proximate drugs (targets) having the same interactions, and has been found more effective than other similarity weights for DTI prediction [10]. More details on LIC weights can be found in Supplementary Section A2.2.

Let

$$\boldsymbol{Q} = \begin{bmatrix} \boldsymbol{U} \\ \boldsymbol{V} \end{bmatrix}$$

be the vertical concatenation of drug and target embeddings. We encourage |$\boldsymbol{Q}\boldsymbol{Q}^\top $| to approximate |$\bar{\boldsymbol{M}}$|⁠, which enables the learned embeddings to capture the topology information characterized by the holistic DeepWalk matrix. Hence, we derive the DeepWalk regularization term that diminishes the discrepancy between |$\bar{\boldsymbol{M}}$| and |$\boldsymbol{Q}\boldsymbol{Q}^\top $|⁠:

$$\begin{align}& \mathcal{R}_{dw}(\boldsymbol{U},\boldsymbol{V}) = ||\bar{\boldsymbol{M}}-\boldsymbol{Q}\boldsymbol{Q}^\top||_F^2. \end{align}$$

(6)

Considering that the adjacency matrix |$\boldsymbol{A}^{i,j}$| includes four blocks, |$\bar{\boldsymbol{M}}$| and |$\boldsymbol{Q}\boldsymbol{Q}^\top $| could be divided into four blocks accordingly:

$$\begin{align}& \bar{\boldsymbol{M}} = \begin{bmatrix} \bar{\boldsymbol{M}}_{S_d} & \bar{\boldsymbol{M}}_{Y} \\{\bar{\boldsymbol{M}}_{Y}}^\top & \bar{\boldsymbol{M}}_{S_t} \\ \end{bmatrix} \quad \boldsymbol{Q}\boldsymbol{Q}^\top=\begin{bmatrix} \boldsymbol{U}\boldsymbol{U}^\top & \boldsymbol{U}\boldsymbol{V}^\top \\ \boldsymbol{V}\boldsymbol{U}^\top & \boldsymbol{V}\boldsymbol{V}^\top \\ \end{bmatrix}. \end{align}$$

(7)

Thus, |$\mathcal{R}_{dw}(\boldsymbol{U},\boldsymbol{V})$| can be expressed as the sum of norms of these blocks:

$$\begin{align} \mathcal{R}_{dw}(\boldsymbol{U},\boldsymbol{V}) = & ||\bar{\boldsymbol{M}}_{S_d}-\boldsymbol{U}{\boldsymbol{U}}^\top||_F^2 + 2||\bar{\boldsymbol{M}}_{Y}-\boldsymbol{U}{\boldsymbol{V}}^\top||_F^2 \nonumber\\ &+ ||\bar{\boldsymbol{M}}_{S_t}-\boldsymbol{V}{\boldsymbol{V}}^\top||_F^2. \end{align}$$

(8)

However, aggregating all per-layer DeepWalk matrices to the holistic one inevitably leads to substantial loss of layer-specific topology information. To address this limitation, we employ graph regularization for each sparsified drug (target) layer to preserve per layer drug (target) proximity in the embedding space, i.e. similar drugs (targets) in each layer are likely to have similar latent features. To distinguish the utility of each layer, each graph regularization is multiplied by the LIC-based weight of its corresponding layer, which emphasizes the proximity of more reliable similarities. Furthermore, Tikhonov regularization is added to prevent latent features from overfitting the training set.

By replacing |$\mathcal{R}(\boldsymbol{U},\boldsymbol{V})$| in Eq. (1) with the above regularization terms, we arrive to the objective of MDMF:

$$\begin{align} \min_{\boldsymbol{U},\boldsymbol{V}} \quad & \mathcal{L}(\hat{\boldsymbol{Y}},\boldsymbol{Y}) + \frac{\lambda_M}{2}\mathcal{R}_{dw}(\boldsymbol{U},\boldsymbol{V}) + \frac{\lambda_d}{2}\sum_{i=1}^{m_d}w^d_i\textrm{tr}({\boldsymbol{U}}^{\top} \boldsymbol{L}^d_i \boldsymbol{U}) \nonumber\\ & + \frac{\lambda_t}{2}\sum_{j=1}^{m_t}w^t_j\textrm{tr}({\boldsymbol{V}}^\top \boldsymbol{L}^t_j \boldsymbol{V}) + \frac{\lambda_r}{2}\left(||\boldsymbol{U}||_{F}^2+||\boldsymbol{V}||_{F}^2 \right), \end{align}$$

(9)

where |$\boldsymbol{L}^d_i=\textrm{diag}(\hat{\boldsymbol{S}}^{d,i}\boldsymbol{e})-\hat{\boldsymbol{S}}^{d,i}$| and |$\boldsymbol{L}^t_j=\textrm{diag}(\hat{\boldsymbol{S}}^{t,j}\boldsymbol{e})-\hat{\boldsymbol{S}}^{t,j}$| are graph Laplacian matrices of |$\hat{\boldsymbol{S}}^{d,i}$| and |$\hat{\boldsymbol{S}}^{t,j}$|⁠, respectively, |$\lambda _M$|⁠, |$\lambda _d$|⁠, |$\lambda _t$| and |$\lambda _r$| are regularization coefficients. Eq. (9) can be solved by updating |$\boldsymbol{U}$| and |$\boldsymbol{V}$| alternatively [2, 7], using an optimization algorithm, e.g. gradient descent (GD) or AdaGrad [31]. The details for the optimization procedure of MDMF are provided in Supplementary Section A2.3.

Optimizing the area under the curve with MDMF

Area under the curve loss functions

AUPR and AUC are two widely used area under the curve metrics in DTI prediction. Modeling differentiable surrogate losses that optimize these two metrics can lead to improvements in predicting performance [10]. Therefore, we instantiate the loss function in Eq. (9) with AUPR and AUC losses, and derive two DeepWalk-based MF models, namely MDMFAUPR and MDMFAUC, that optimize the AUPR and AUC metrics, respectively.

Given |$\boldsymbol{Y}$| and its predictions |$\boldsymbol{\hat{Y}}=\sigma (\boldsymbol{U}\boldsymbol{V}^\top )$|⁠, which are sorted in descending order according to their predicted scores, AUPR without the interpolation to estimate the curve is computed as:

$$\begin{align} \textrm{AUPR}(\boldsymbol{\hat{Y}}, \boldsymbol{Y}) &= \sum_{h=1}^{n_dn_t} \textrm{Prec}@h * \textrm{InRe}@h, \end{align}$$

(10)

where Prec|$@h$| is the precision of the |$h$| first predictions, and InRe|$@h$| is the incremental recall from rank |$h$|-1 to |$h$|⁠. In addition, histogram binning [32], which assigns predictions into |$n_b$| ordered bins, is employed to simulate the non-differentiable and non-smooth sorting operation used to rank predictions, deriving differential precision and incremental recall as:

$$\begin{align}& \textrm{Prec}^{\prime}@h = \frac{\sum_{i=1}^h \psi(\delta(\hat{\boldsymbol{Y}},i)\odot \boldsymbol{Y})}{\sum_{i=1}^h \psi(\delta(\hat{\boldsymbol{Y}},i))} \end{align}$$

(11)

$$\begin{align}& \textrm{InRe}^{\prime}@h = \frac{1}{\psi({\boldsymbol{Y}})}\psi(\delta(\hat{\boldsymbol{Y}},h)\odot \boldsymbol{Y}), \end{align}$$

(12)

where |$\delta (\hat{\boldsymbol{Y}},h)$| is the soft assignment function which returns the membership degree of each prediction to the |$h$|-th bin, i.e. |$[\delta (\hat{\boldsymbol{Y}},h)]_{ij} = \max \left ( 1-|\hat{Y}_{ij}-b_h|/\Delta ,0 \right )$|⁠, where |$b_h$| is the center of the |$h$|-th bin and |$\Delta = 1/(n_b-1)$| is the bin width. Considering that maximizing AUPR is equivalent to minimizing |$-\textrm{AUPR}(\boldsymbol{\hat{Y}}, \boldsymbol{Y})$|⁠, we obtain the differential AUPR loss according to Eq. 10–12:

$$\begin{align}& \mathcal{L}_{AP} = -\sum_{h=1}^{n_b}\frac{\psi(\delta(\hat{\boldsymbol{Y}},h)\odot \boldsymbol{Y}) \sum_{i=1}^h \psi(\delta(\hat{\boldsymbol{Y}},i)\odot \boldsymbol{Y})}{\sum_{i=1}^h \psi(\delta(\hat{\boldsymbol{Y}},i))}. \end{align}$$

(13)

The objective of MDMFAUPR is given by using |$\mathcal{L}_{AP}$| to replace |$\mathcal{L}(\hat{\boldsymbol{Y}},\boldsymbol{Y})$| in Eq. (9).

AUC, which assesses the proportion of correctly ordered tuples, where the interacting drug–target pairs have higher prediction than the non-interacting ones, is defined as:

$$\begin{align}& \textrm{AUC}(\hat{\boldsymbol{Y}},\boldsymbol{Y}) = \frac{1}{|P|}\sum_{(ij,hl)\in P}[\!\!\![ \hat{Y}_{ij}>\hat{Y}_{hl}]\!] \ \ \, \end{align}$$

(14)

where predictions |$\hat{\boldsymbol{Y}}=\boldsymbol{U}\boldsymbol{V}^\top $| with |$f=\omega $|⁠, and |$P=\{(ij,hl)|(i,j) \in P_1, (h,l) \in P_0\}$| is the Cartesian product of |$P_1=\{(i,j)|Y_{ij}=1\}$| and |$P_0=\{(i,j)|Y_{ij}=0\}$|⁠. To maximize AUC, we need to minimize the proportion of wrongly ordered tuples, where the prediction of the interacting pair is lower than the non-interacting one. In order to make the AUC loss to be optimized easily, the discontinuous indicator function is substituted by a convex approximate function, i.e. |$\phi (x)=\log (1+\exp (-x))$|⁠, and the derived convex AUC loss is:

$$\begin{align}& \mathcal{L}_{AUC} = \sum_{(ij,hl)\in P} \phi( \zeta_{ijhl}), \end{align}$$

(15)

where |$\zeta _{ijhl}=\hat{Y}_{ij}-\hat{Y}_{hl}$|⁠. With the substitution of |$\mathcal{L}(\hat{\boldsymbol{Y}},\boldsymbol{Y})$| with |$\mathcal{L}_{AUC}$| in Eq. (9), we obtain the objective of MDMFAUC.

More details for optimizing the AUPR and AUC losses can be found in Supplementary Section A2.4.

Inferring embeddings of new entities

When a new drug (target) arrives, MDMFAUPR and MDMFAUC infer its embeddings using its neighbors in the training set. Given a new drug, we first compute its similarities with all training drugs in |$D$| based on its chemical structure, side effects, etc. Then, we linearly integrate the different types of similarity with LIC-based weights to obtain the fused similarity vector:

$$\begin{align}& \bar{\boldsymbol{s}}^{\alpha}_x = \sum_{h=1}^{m_\alpha}w^{\alpha}_h \bar{\boldsymbol{s}}^{\alpha,h}_{x}, \quad \alpha \in \{d,t\}, \end{align}$$

(16)

where |$\alpha _x$| is an unseen entity with |$\alpha =d$| denoting a new drug and |$\alpha =t$| indicating a novel target, and |$\bar{\boldsymbol{s}}^{\alpha }_x \in \mathbb{R}^{n_\alpha }$| is the fused similarity vector of |$\alpha _x$|⁠. Based on |$\bar{\boldsymbol{s}}^{\alpha }_x$|⁠, we retrieve the |$k$|-NNs of |${\alpha }_x$| from the training set, denoted as |$\mathcal{N}^k_{\alpha _x}$|⁠, and estimate the embeddings of |$\alpha _x$| as follows:

$$\begin{align} \boldsymbol{U}_x & = \frac{1}{\sum_{d_i \in{\mathcal{N}^k_{d_x}}}\bar{\boldsymbol s}^d_{xi}} \sum_{d_i \in{\mathcal{N}^k_{d_x}}}\eta^{i^{\prime}-1}\bar{\boldsymbol s}^d_{xi}\boldsymbol{U}_{i}, \quad \textrm{if}\ \alpha=d \nonumber \\ \boldsymbol{V}_x & = \frac{1}{\sum_{t_j \in{\mathcal{N}^k_{t_x}}}\bar{\boldsymbol s}^t_{xj}} \sum_{t_j \in{\mathcal{N}^k_{t_x}}}\eta^{j^{\prime}-1}\bar{\boldsymbol s}^t_{xj}\boldsymbol{V}_{j}, \quad \textrm{if}\ \alpha=t, \end{align}$$

(17)

where |$\bar{\boldsymbol s}^d_{xi}$| is the similarity between |$d_x$| and |$d_i$|⁠, |$i^{\prime}$| (⁠|$j^{\prime}$|⁠) is the rank of |$d_i$| (⁠|$t_j$|⁠) among |$\mathcal{N}^k_{d_x}$| (⁠|$\mathcal{N}^k_{t_x}$|⁠), e.g. |$i^{\prime}=2$| if |$d_i$| is the second nearest training drug of |$d_x$| and |$\eta \in (0,1]$| is the decay coefficient shrinking the weight of further neighbors, which is an important parameter to control the embedding aggregation. In addition, we employ a pseudo embedding based |$\eta $| selection strategy [10] to choose the optimal |$\eta $| values from a set of candidate |$\mathcal{C}$| for prediction settings involving new drugs or/and new targets, i.e. S2, S3 and S4, respectively. In MDMFAUPR, the |$\eta $| value leading to the best AUPR result is used, while the optimal |$\eta $| in MDMFAUC is the one achieving the best AUC result.

MDMF2A: combining AUPR and AUC

It is known that both AUPR and AUC play a vital role in DTI prediction. However, MDMFAUPR and MDMFAUC only optimize one measure but ignore the other. To overcome this limitation, we propose an ensemble approach, called DWFM2A, which integrates the two MF models by aggregating their predicted scores. Given a test pair |$(d_x, t_z)$|⁠, along with its predicted scores |$\hat{Y}^{AP}_{xz}$| and |$\hat{Y}^{AC}_{xz}$| obtained from MDMFAUPR and MDMFAUC, respectively, the final prediction output by MDMF2A is defined as

$$\begin{align}& \hat{Y}_{xz} = \beta \hat{Y}^{AP}_{xz}+(1-\beta)\sigma\left(\hat{Y}^{AC}_{xz}\right), \end{align}$$

(18)

where |$\beta \in [0,1]$| is the trade-off coefficient for the two MF base models, and |$\sigma $| converts |$\hat{Y}^{AC}_{xz}$| and |$\hat{Y}^{AP}_{xz}$| into the same scale, i.e. |$(0,1)$|⁠. The flowchart of MDMF2A is shown in Figure 2.

Figure 2

The flowchart of MDMF2A. The DTI dataset, consisting of multiple drug and target similarities, is firstly reformulated as a multiplex DTI network containing multiple hyper-layers. Then, two base models of MDMF2A, namely MDMFAUPR and MDMFAUC, are trained upon the derived multiple hyper-layers DTI network with the instantiation of two area under the curves metric based losses, respectively. Then, given a test pair, its associated similarity vectors are fused according to the LIC-based weights. Next, the two base models leverage these fused similarities to generate the estimation of the test pair, respectively. Lastly, MDMF2A aggregates the outputs of two models using Eq. (18) to obtain its final prediction.

Open in new tab Download slide

The computational complexity analysis of the proposed methods can be found in Supplementary Section A2.5.

Experimental evaluation and discussion

Experimental Setup

Following [20], four types of cross validation (CV) are conducted to examine the methods in four prediction settings, respectively. In S1, the 10-fold CV on pairs is used, where one fold of pairs is removed for testing. In S2 (S3), the 10-fold CV on drugs (targets) is applied, where one drug (target) fold along with its corresponding rows (columns) in |$\boldsymbol{Y}$| is separated for testing. The 3|$\times $|3-fold block-wise CV, which splits a drug fold and target fold along with the interactions between them for testing, using the interactions between the remaining drugs and targets for training, is applied to S4. AUPR and AUC defined in Eq. (10) and (14) are used as evaluation measures.

To evaluate the performance of MDMF2A in all prediction settings, we have compared it to eight DTI prediction models (WkNNIR [6], NRLMF [7], MSCMF [9], GRGMF [33], MF2A [10], DRLSM [3], DTINet [17], NEDTP [4]) and two network embedding approaches applicable to any domain (NetMF [18] and Multi2Vec [34]). WkNNIR cannot perform predictions in S1, as it is specifically designed to predict interactions involving new drugs or/and targets (S2, S3, S4) [6]. Furthermore, the proposed MDMF2A is also compared with four deep learning-based methods, namely NeoDTI [12], DTIP [5], DCFME [14] and SupDTI [13]. These deep learning competitors can only be applied to the Luo dataset in S1, because they formulate the DTI dataset as a heterogeneous network consisting of four types of nodes (drugs, targets, drug-side effects and diseases) and learn embeddings for all types of nodes. The illustration of all baseline methods can be found in Supplementary Section A3.

The parameters of all baseline methods are set based on the suggestions in the respective articles. For MDMF2A, the trade-off ensemble weight |$\beta $| is chosen from |$\{0,0.01,\ldots ,1.0\}$|⁠. For the two base models of MDMF2A, i.e. MDMFAUPR and MDMFAUC, the number of neighbors is set to |$k=5$|⁠, the window size of the random walk to |$n_w=5$|⁠, the number of negative samples to |$n_s=1$|⁠, the learning rate to |$\theta =0.1$|⁠, candidate decay coefficient set |$\mathcal{C}$|={0.1, 0.2, |$\dotsc $|⁠, 1.0}, the embedding dimension |$r$| is chosen from {50, 100}, |$\lambda _d$|⁠, |$\lambda _t$| and |$\lambda _r$| are chosen from {|$2^{-6}$|⁠, |$2^{-4}$|⁠, |$2^{-2}$|⁠, |$2^{0}$|⁠, |$2^{2}$|}, and |$\lambda _M=0.005$|⁠. The number of bins |$n_b$| in MDMFAUPR is chosen from {11, 16, 21, 26, 31} for small and medium datasets (NR, GPCR and IC), while it is set to 21 for larger datasets (E and Luo). Similar to [3, 7, 9], we obtain the best hyperparameters of our model via grid search, and the detailed settings are listed in Supplementary Table A2.

Results and discussion

Tables 2 and 3 list the results of MDMF2A and its nine competitors on five datasets under four prediction settings. The numbers in the last row denote the average rank across all prediction settings, and ‘*’ indicates that the advantage of DAMF2A over the competitor is statistically significant according to a Wilcoxon signed-rank test with Bergman-Hommel’s correction [35] at 5% level on the results of all prediction settings.

Table 2

AUPR results in all prediction settings

Setting	Dataset	WkNNIR	MSCMF	NRMFL	GRGMF	MF2A	DRLSM	DTINet	NEDTP	NetMF	Multi2Vec	MDMF2A
S1	NR	-	0.628(6)	0.64(5)	0.658(3)	0.673(2)	0.642(4)	0.508(8)	0.546(7)	0.455(9)	0.43(10)	0.675(1)
	GPCR	-	0.844(4.5)	0.86(3)	0.844(4.5)	0.87(2)	0.835(6)	0.597(10)	0.798(8)	0.831(7)	0.747(9)	0.874(1)
	IC	-	0.936(3)	0.934(4)	0.913(7)	0.943(2)	0.914(6)	0.693(10)	0.906(8)	0.929(5)	0.867(9)	0.946(1)
	E	-	0.818(6)	0.843(3)	0.832(4)	0.858(2)	0.811(7)	0.305(10)	0.78(8)	0.831(5)	0.736(9)	0.859(1)
	Luo	-	0.599(6)	0.603(5)	0.636(3)	0.653(2)	0.598(7)	0.216(9)	0.08(10)	0.615(4)	0.451(8)	0.679(1)
\|$AveRank$\|		-	5.1	4	4.3	2	6	9.4	8.2	6	9	1
S2	NR	0.562(5)	0.531(7)	0.547(6)	0.564(4)	0.578(2)	0.57(3)	0.339(10)	0.486(8)	0.34(9)	0.338(11)	0.602(1)
	GPCR	0.54(4)	0.472(7)	0.508(6)	0.542(3)	0.551(2)	0.532(5)	0.449(9)	0.451(8)	0.356(10)	0.254(11)	0.561(1)
	IC	0.491(4)	0.379(8.5)	0.479(5)	0.493(3)	0.495(2)	0.466(6)	0.365(10)	0.407(7)	0.379(8.5)	0.16(11)	0.502(1)
	E	0.405(4)	0.288(8)	0.389(5)	0.415(3)	0.422(2)	0.376(6)	0.173(10)	0.33(7)	0.269(9)	0.141(11)	0.428(1)
	Luo	0.485(2)	0.371(8)	0.458(6)	0.462(5)	0.472(4)	0.502(1)	0.187(9)	0.077(11)	0.376(7)	0.079(10)	0.48(3)
\|$AveRank$\|		3.8	7.7	5.6	3.6	2.4	4.2	9.6	8.2	8.7	10.8	1.4
S3	NR	0.56(3)	0.505(7)	0.519(6)	0.545(5)	0.588(1)	0.546(4)	0.431(8)	0.375(9)	0.359(10)	0.335(11)	0.582(2)
	GPCR	0.774(3)	0.69(7)	0.729(6)	0.755(5)	0.787(2)	0.757(4)	0.546(10)	0.684(8)	0.638(9)	0.327(11)	0.788(1)
	IC	0.861(3)	0.827(7)	0.838(6)	0.851(5)	0.863(2)	0.855(4)	0.599(10)	0.8(8)	0.791(9)	0.4(11)	0.865(1)
	E	0.728(3)	0.623(8)	0.711(6)	0.715(4)	0.731(2)	0.712(5)	0.313(10)	0.615(9)	0.656(7)	0.257(11)	0.738(1)
	Luo	0.243(5)	0.08(8)	0.204(7)	0.234(6)	0.292(3)	0.248(4)	0.061(9)	0.046(10)	0.294(2)	0.027(11)	0.299(1)
\|$AveRank$\|		3.4	7.4	6.2	5	2	4.2	9.4	8.8	7.4	11	1.2
S4	NR	0.309(1)	0.273(6)	0.278(5)	0.308(2)	0.289(3)	0.236(8)	0.249(7)	0.16(9)	0.146(11)	0.149(10)	0.286(4)
	GPCR	0.393(3)	0.323(6)	0.331(5)	0.368(4)	0.407(1.5)	0.077(10)	0.306(7)	0.29(8)	0.159(9)	0.074(11)	0.407(1.5)
	IC	0.339(4)	0.194(9)	0.327(5)	0.347(3)	0.352(2)	0.086(10)	0.264(6)	0.251(7)	0.196(8)	0.067(11)	0.356(1)
	E	0.228(3)	0.074(9)	0.221(5)	0.224(4)	0.235(2)	0.063(10)	0.1(7)	0.112(6)	0.09(8)	0.017(11)	0.239(1)
	Luo	0.132(3)	0.018(10)	0.096(6)	0.106(5)	0.175(2)	0.061(7)	0.035(8)	0.03(9)	0.127(4)	0.002(11)	0.182(1)
\|$AveRank$\|		2.8	8	5.2	3.6	2.1	9	7	7.8	8	10.8	1.7
\|$Summary$\|		3.33*	7.05*	5.25*	4.13*	2.13*	5.85*	8.85*	8.25*	7.53*	10.4*	1.33

Setting	Dataset	WkNNIR	MSCMF	NRMFL	GRGMF	MF2A	DRLSM	DTINet	NEDTP	NetMF	Multi2Vec	MDMF2A
S1	NR	-	0.628(6)	0.64(5)	0.658(3)	0.673(2)	0.642(4)	0.508(8)	0.546(7)	0.455(9)	0.43(10)	0.675(1)
	GPCR	-	0.844(4.5)	0.86(3)	0.844(4.5)	0.87(2)	0.835(6)	0.597(10)	0.798(8)	0.831(7)	0.747(9)	0.874(1)
	IC	-	0.936(3)	0.934(4)	0.913(7)	0.943(2)	0.914(6)	0.693(10)	0.906(8)	0.929(5)	0.867(9)	0.946(1)
	E	-	0.818(6)	0.843(3)	0.832(4)	0.858(2)	0.811(7)	0.305(10)	0.78(8)	0.831(5)	0.736(9)	0.859(1)
	Luo	-	0.599(6)	0.603(5)	0.636(3)	0.653(2)	0.598(7)	0.216(9)	0.08(10)	0.615(4)	0.451(8)	0.679(1)
\|$AveRank$\|		-	5.1	4	4.3	2	6	9.4	8.2	6	9	1
S2	NR	0.562(5)	0.531(7)	0.547(6)	0.564(4)	0.578(2)	0.57(3)	0.339(10)	0.486(8)	0.34(9)	0.338(11)	0.602(1)
	GPCR	0.54(4)	0.472(7)	0.508(6)	0.542(3)	0.551(2)	0.532(5)	0.449(9)	0.451(8)	0.356(10)	0.254(11)	0.561(1)
	IC	0.491(4)	0.379(8.5)	0.479(5)	0.493(3)	0.495(2)	0.466(6)	0.365(10)	0.407(7)	0.379(8.5)	0.16(11)	0.502(1)
	E	0.405(4)	0.288(8)	0.389(5)	0.415(3)	0.422(2)	0.376(6)	0.173(10)	0.33(7)	0.269(9)	0.141(11)	0.428(1)
	Luo	0.485(2)	0.371(8)	0.458(6)	0.462(5)	0.472(4)	0.502(1)	0.187(9)	0.077(11)	0.376(7)	0.079(10)	0.48(3)
\|$AveRank$\|		3.8	7.7	5.6	3.6	2.4	4.2	9.6	8.2	8.7	10.8	1.4
S3	NR	0.56(3)	0.505(7)	0.519(6)	0.545(5)	0.588(1)	0.546(4)	0.431(8)	0.375(9)	0.359(10)	0.335(11)	0.582(2)
	GPCR	0.774(3)	0.69(7)	0.729(6)	0.755(5)	0.787(2)	0.757(4)	0.546(10)	0.684(8)	0.638(9)	0.327(11)	0.788(1)
	IC	0.861(3)	0.827(7)	0.838(6)	0.851(5)	0.863(2)	0.855(4)	0.599(10)	0.8(8)	0.791(9)	0.4(11)	0.865(1)
	E	0.728(3)	0.623(8)	0.711(6)	0.715(4)	0.731(2)	0.712(5)	0.313(10)	0.615(9)	0.656(7)	0.257(11)	0.738(1)
	Luo	0.243(5)	0.08(8)	0.204(7)	0.234(6)	0.292(3)	0.248(4)	0.061(9)	0.046(10)	0.294(2)	0.027(11)	0.299(1)
\|$AveRank$\|		3.4	7.4	6.2	5	2	4.2	9.4	8.8	7.4	11	1.2
S4	NR	0.309(1)	0.273(6)	0.278(5)	0.308(2)	0.289(3)	0.236(8)	0.249(7)	0.16(9)	0.146(11)	0.149(10)	0.286(4)
	GPCR	0.393(3)	0.323(6)	0.331(5)	0.368(4)	0.407(1.5)	0.077(10)	0.306(7)	0.29(8)	0.159(9)	0.074(11)	0.407(1.5)
	IC	0.339(4)	0.194(9)	0.327(5)	0.347(3)	0.352(2)	0.086(10)	0.264(6)	0.251(7)	0.196(8)	0.067(11)	0.356(1)
	E	0.228(3)	0.074(9)	0.221(5)	0.224(4)	0.235(2)	0.063(10)	0.1(7)	0.112(6)	0.09(8)	0.017(11)	0.239(1)
	Luo	0.132(3)	0.018(10)	0.096(6)	0.106(5)	0.175(2)	0.061(7)	0.035(8)	0.03(9)	0.127(4)	0.002(11)	0.182(1)
\|$AveRank$\|		2.8	8	5.2	3.6	2.1	9	7	7.8	8	10.8	1.7
\|$Summary$\|		3.33*	7.05*	5.25*	4.13*	2.13*	5.85*	8.85*	8.25*	7.53*	10.4*	1.33

Table 2

AUPR results in all prediction settings

Setting	Dataset	WkNNIR	MSCMF	NRMFL	GRGMF	MF2A	DRLSM	DTINet	NEDTP	NetMF	Multi2Vec	MDMF2A
S1	NR	-	0.628(6)	0.64(5)	0.658(3)	0.673(2)	0.642(4)	0.508(8)	0.546(7)	0.455(9)	0.43(10)	0.675(1)
	GPCR	-	0.844(4.5)	0.86(3)	0.844(4.5)	0.87(2)	0.835(6)	0.597(10)	0.798(8)	0.831(7)	0.747(9)	0.874(1)
	IC	-	0.936(3)	0.934(4)	0.913(7)	0.943(2)	0.914(6)	0.693(10)	0.906(8)	0.929(5)	0.867(9)	0.946(1)
	E	-	0.818(6)	0.843(3)	0.832(4)	0.858(2)	0.811(7)	0.305(10)	0.78(8)	0.831(5)	0.736(9)	0.859(1)
	Luo	-	0.599(6)	0.603(5)	0.636(3)	0.653(2)	0.598(7)	0.216(9)	0.08(10)	0.615(4)	0.451(8)	0.679(1)
\|$AveRank$\|		-	5.1	4	4.3	2	6	9.4	8.2	6	9	1
S2	NR	0.562(5)	0.531(7)	0.547(6)	0.564(4)	0.578(2)	0.57(3)	0.339(10)	0.486(8)	0.34(9)	0.338(11)	0.602(1)
	GPCR	0.54(4)	0.472(7)	0.508(6)	0.542(3)	0.551(2)	0.532(5)	0.449(9)	0.451(8)	0.356(10)	0.254(11)	0.561(1)
	IC	0.491(4)	0.379(8.5)	0.479(5)	0.493(3)	0.495(2)	0.466(6)	0.365(10)	0.407(7)	0.379(8.5)	0.16(11)	0.502(1)
	E	0.405(4)	0.288(8)	0.389(5)	0.415(3)	0.422(2)	0.376(6)	0.173(10)	0.33(7)	0.269(9)	0.141(11)	0.428(1)
	Luo	0.485(2)	0.371(8)	0.458(6)	0.462(5)	0.472(4)	0.502(1)	0.187(9)	0.077(11)	0.376(7)	0.079(10)	0.48(3)
\|$AveRank$\|		3.8	7.7	5.6	3.6	2.4	4.2	9.6	8.2	8.7	10.8	1.4
S3	NR	0.56(3)	0.505(7)	0.519(6)	0.545(5)	0.588(1)	0.546(4)	0.431(8)	0.375(9)	0.359(10)	0.335(11)	0.582(2)
	GPCR	0.774(3)	0.69(7)	0.729(6)	0.755(5)	0.787(2)	0.757(4)	0.546(10)	0.684(8)	0.638(9)	0.327(11)	0.788(1)
	IC	0.861(3)	0.827(7)	0.838(6)	0.851(5)	0.863(2)	0.855(4)	0.599(10)	0.8(8)	0.791(9)	0.4(11)	0.865(1)
	E	0.728(3)	0.623(8)	0.711(6)	0.715(4)	0.731(2)	0.712(5)	0.313(10)	0.615(9)	0.656(7)	0.257(11)	0.738(1)
	Luo	0.243(5)	0.08(8)	0.204(7)	0.234(6)	0.292(3)	0.248(4)	0.061(9)	0.046(10)	0.294(2)	0.027(11)	0.299(1)
\|$AveRank$\|		3.4	7.4	6.2	5	2	4.2	9.4	8.8	7.4	11	1.2
S4	NR	0.309(1)	0.273(6)	0.278(5)	0.308(2)	0.289(3)	0.236(8)	0.249(7)	0.16(9)	0.146(11)	0.149(10)	0.286(4)
	GPCR	0.393(3)	0.323(6)	0.331(5)	0.368(4)	0.407(1.5)	0.077(10)	0.306(7)	0.29(8)	0.159(9)	0.074(11)	0.407(1.5)
	IC	0.339(4)	0.194(9)	0.327(5)	0.347(3)	0.352(2)	0.086(10)	0.264(6)	0.251(7)	0.196(8)	0.067(11)	0.356(1)
	E	0.228(3)	0.074(9)	0.221(5)	0.224(4)	0.235(2)	0.063(10)	0.1(7)	0.112(6)	0.09(8)	0.017(11)	0.239(1)
	Luo	0.132(3)	0.018(10)	0.096(6)	0.106(5)	0.175(2)	0.061(7)	0.035(8)	0.03(9)	0.127(4)	0.002(11)	0.182(1)
\|$AveRank$\|		2.8	8	5.2	3.6	2.1	9	7	7.8	8	10.8	1.7
\|$Summary$\|		3.33*	7.05*	5.25*	4.13*	2.13*	5.85*	8.85*	8.25*	7.53*	10.4*	1.33

Setting	Dataset	WkNNIR	MSCMF	NRMFL	GRGMF	MF2A	DRLSM	DTINet	NEDTP	NetMF	Multi2Vec	MDMF2A
S1	NR	-	0.628(6)	0.64(5)	0.658(3)	0.673(2)	0.642(4)	0.508(8)	0.546(7)	0.455(9)	0.43(10)	0.675(1)
	GPCR	-	0.844(4.5)	0.86(3)	0.844(4.5)	0.87(2)	0.835(6)	0.597(10)	0.798(8)	0.831(7)	0.747(9)	0.874(1)
	IC	-	0.936(3)	0.934(4)	0.913(7)	0.943(2)	0.914(6)	0.693(10)	0.906(8)	0.929(5)	0.867(9)	0.946(1)
	E	-	0.818(6)	0.843(3)	0.832(4)	0.858(2)	0.811(7)	0.305(10)	0.78(8)	0.831(5)	0.736(9)	0.859(1)
	Luo	-	0.599(6)	0.603(5)	0.636(3)	0.653(2)	0.598(7)	0.216(9)	0.08(10)	0.615(4)	0.451(8)	0.679(1)
\|$AveRank$\|		-	5.1	4	4.3	2	6	9.4	8.2	6	9	1
S2	NR	0.562(5)	0.531(7)	0.547(6)	0.564(4)	0.578(2)	0.57(3)	0.339(10)	0.486(8)	0.34(9)	0.338(11)	0.602(1)
	GPCR	0.54(4)	0.472(7)	0.508(6)	0.542(3)	0.551(2)	0.532(5)	0.449(9)	0.451(8)	0.356(10)	0.254(11)	0.561(1)
	IC	0.491(4)	0.379(8.5)	0.479(5)	0.493(3)	0.495(2)	0.466(6)	0.365(10)	0.407(7)	0.379(8.5)	0.16(11)	0.502(1)
	E	0.405(4)	0.288(8)	0.389(5)	0.415(3)	0.422(2)	0.376(6)	0.173(10)	0.33(7)	0.269(9)	0.141(11)	0.428(1)
	Luo	0.485(2)	0.371(8)	0.458(6)	0.462(5)	0.472(4)	0.502(1)	0.187(9)	0.077(11)	0.376(7)	0.079(10)	0.48(3)
\|$AveRank$\|		3.8	7.7	5.6	3.6	2.4	4.2	9.6	8.2	8.7	10.8	1.4
S3	NR	0.56(3)	0.505(7)	0.519(6)	0.545(5)	0.588(1)	0.546(4)	0.431(8)	0.375(9)	0.359(10)	0.335(11)	0.582(2)
	GPCR	0.774(3)	0.69(7)	0.729(6)	0.755(5)	0.787(2)	0.757(4)	0.546(10)	0.684(8)	0.638(9)	0.327(11)	0.788(1)
	IC	0.861(3)	0.827(7)	0.838(6)	0.851(5)	0.863(2)	0.855(4)	0.599(10)	0.8(8)	0.791(9)	0.4(11)	0.865(1)
	E	0.728(3)	0.623(8)	0.711(6)	0.715(4)	0.731(2)	0.712(5)	0.313(10)	0.615(9)	0.656(7)	0.257(11)	0.738(1)
	Luo	0.243(5)	0.08(8)	0.204(7)	0.234(6)	0.292(3)	0.248(4)	0.061(9)	0.046(10)	0.294(2)	0.027(11)	0.299(1)
\|$AveRank$\|		3.4	7.4	6.2	5	2	4.2	9.4	8.8	7.4	11	1.2
S4	NR	0.309(1)	0.273(6)	0.278(5)	0.308(2)	0.289(3)	0.236(8)	0.249(7)	0.16(9)	0.146(11)	0.149(10)	0.286(4)
	GPCR	0.393(3)	0.323(6)	0.331(5)	0.368(4)	0.407(1.5)	0.077(10)	0.306(7)	0.29(8)	0.159(9)	0.074(11)	0.407(1.5)
	IC	0.339(4)	0.194(9)	0.327(5)	0.347(3)	0.352(2)	0.086(10)	0.264(6)	0.251(7)	0.196(8)	0.067(11)	0.356(1)
	E	0.228(3)	0.074(9)	0.221(5)	0.224(4)	0.235(2)	0.063(10)	0.1(7)	0.112(6)	0.09(8)	0.017(11)	0.239(1)
	Luo	0.132(3)	0.018(10)	0.096(6)	0.106(5)	0.175(2)	0.061(7)	0.035(8)	0.03(9)	0.127(4)	0.002(11)	0.182(1)
\|$AveRank$\|		2.8	8	5.2	3.6	2.1	9	7	7.8	8	10.8	1.7
\|$Summary$\|		3.33*	7.05*	5.25*	4.13*	2.13*	5.85*	8.85*	8.25*	7.53*	10.4*	1.33

Table 3

AUC results in all prediction settings

Setting	Dataset	WkNNIR	MSCMF	NRMFL	GRGMF	MF2A	DRLSM	DTINet	NEDTP	NetMF	Multi2Vec	MDMF2A
S1	NR	-	0.882(4.5)	0.882(4.5)	0.891(2)	0.884(3)	0.879(6)	0.797(9)	0.846(7)	0.818(8)	0.788(10)	0.892(1)
	GPCR	-	0.962(6)	0.972(4)	0.978(2)	0.978(2)	0.971(5)	0.916(10)	0.953(8)	0.96(7)	0.93(9)	0.978(2)
	IC	-	0.982(6)	0.989(2)	0.988(4)	0.989(2)	0.981(7.5)	0.938(10)	0.981(7.5)	0.985(5)	0.97(9)	0.989(2)
	E	-	0.961(8)	0.981(4)	0.982(3)	0.983(2)	0.964(7)	0.839(10)	0.97(5)	0.966(6)	0.944(9)	0.984(1)
	Luo	-	0.922(7)	0.951(3)	0.947(4)	0.966(2)	0.941(5)	0.894(9)	0.929(6)	0.917(8)	0.861(10)	0.97(1)
\|$AveRank$\|		-	6.3	3.5	3	2.2	6.1	9.6	6.7	6.8	9.4	1.4
S2	NR	0.825(5.5)	0.802(7)	0.826(4)	0.825(5.5)	0.833(2)	0.831(3)	0.666(11)	0.786(8)	0.739(9)	0.727(10)	0.837(1)
	GPCR	0.914(4)	0.882(7)	0.913(5)	0.924(2.5)	0.924(2.5)	0.867(8)	0.858(9)	0.885(6)	0.852(10)	0.811(11)	0.925(1)
	IC	0.826(4)	0.783(9)	0.825(5)	0.833(1)	0.828(2)	0.796(7)	0.766(10)	0.794(8)	0.803(6)	0.715(11)	0.827(3)
	E	0.877(4)	0.835(7)	0.858(5)	0.891(3)	0.892(2)	0.799(9)	0.78(10)	0.837(6)	0.811(8)	0.732(11)	0.895(1)
	Luo	0.904(5)	0.897(7)	0.917(3)	0.899(6)	0.927(2)	0.864(9)	0.873(8)	0.907(4)	0.861(10)	0.776(11)	0.937(1)
\|$AveRank$\|		4.5	7.4	4.4	3.6	2.1	7.2	9.6	6.4	8.6	10.8	1.4
S3	NR	0.82(3)	0.786(7)	0.825(2)	0.845(1)	0.819(4)	0.798(6)	0.756(8)	0.726(9)	0.712(10)	0.703(11)	0.805(5)
	GPCR	0.952(4)	0.902(9)	0.946(5)	0.965(1)	0.96(3)	0.917(7)	0.879(10)	0.918(6)	0.909(8)	0.798(11)	0.961(2)
	IC	0.958(5)	0.941(7)	0.96(4)	0.965(3)	0.967(1)	0.942(6)	0.907(10)	0.939(8)	0.938(9)	0.866(11)	0.966(2)
	E	0.935(5)	0.881(9)	0.936(4)	0.943(3)	0.944(2)	0.883(8)	0.841(10)	0.918(6)	0.911(7)	0.815(11)	0.948(1)
	Luo	0.835(5)	0.826(9)	0.828(8)	0.84(4)	0.901(2)	0.801(10)	0.829(7)	0.86(3)	0.831(6)	0.633(11)	0.902(1)
\|$AveRank$\|		4.4	8.2	4.6	2.4	2.4	7.4	9	6.4	8	11	2.2
S4	NR	0.637(5)	0.597(6)	0.656(3)	0.677(1)	0.649(4)	0.592(7)	0.562(8)	0.548(9)	0.531(10)	0.524(11)	0.661(2)
	GPCR	0.871(4)	0.798(8)	0.866(5)	0.89(1)	0.886(3)	0.405(11)	0.803(7)	0.816(6)	0.721(9)	0.581(10)	0.887(2)
	IC	0.774(5)	0.658(9)	0.775(4)	0.782(2)	0.776(3)	0.498(11)	0.706(6)	0.702(7)	0.683(8)	0.555(10)	0.783(1)
	E	0.819(3)	0.695(8)	0.799(5)	0.815(4)	0.821(2)	0.453(11)	0.757(6)	0.744(7)	0.69(9)	0.541(10)	0.827(1)
	Luo	0.819(4)	0.752(7)	0.804(5)	0.745(8)	0.848(2)	0.438(11)	0.787(6)	0.822(3)	0.732(9)	0.513(10)	0.85(1)
\|$AveRank$\|		4.2	7.6	4.4	3.2	2.8	10.2	6.6	6.4	9	10.2	1.4
\|$Summary$\|		4.37*	7.38*	4.23*	3.05	2.38*	7.73*	8.7*	6.48*	8.1*	10.35*	1.6

Setting	Dataset	WkNNIR	MSCMF	NRMFL	GRGMF	MF2A	DRLSM	DTINet	NEDTP	NetMF	Multi2Vec	MDMF2A
S1	NR	-	0.882(4.5)	0.882(4.5)	0.891(2)	0.884(3)	0.879(6)	0.797(9)	0.846(7)	0.818(8)	0.788(10)	0.892(1)
	GPCR	-	0.962(6)	0.972(4)	0.978(2)	0.978(2)	0.971(5)	0.916(10)	0.953(8)	0.96(7)	0.93(9)	0.978(2)
	IC	-	0.982(6)	0.989(2)	0.988(4)	0.989(2)	0.981(7.5)	0.938(10)	0.981(7.5)	0.985(5)	0.97(9)	0.989(2)
	E	-	0.961(8)	0.981(4)	0.982(3)	0.983(2)	0.964(7)	0.839(10)	0.97(5)	0.966(6)	0.944(9)	0.984(1)
	Luo	-	0.922(7)	0.951(3)	0.947(4)	0.966(2)	0.941(5)	0.894(9)	0.929(6)	0.917(8)	0.861(10)	0.97(1)
\|$AveRank$\|		-	6.3	3.5	3	2.2	6.1	9.6	6.7	6.8	9.4	1.4
S2	NR	0.825(5.5)	0.802(7)	0.826(4)	0.825(5.5)	0.833(2)	0.831(3)	0.666(11)	0.786(8)	0.739(9)	0.727(10)	0.837(1)
	GPCR	0.914(4)	0.882(7)	0.913(5)	0.924(2.5)	0.924(2.5)	0.867(8)	0.858(9)	0.885(6)	0.852(10)	0.811(11)	0.925(1)
	IC	0.826(4)	0.783(9)	0.825(5)	0.833(1)	0.828(2)	0.796(7)	0.766(10)	0.794(8)	0.803(6)	0.715(11)	0.827(3)
	E	0.877(4)	0.835(7)	0.858(5)	0.891(3)	0.892(2)	0.799(9)	0.78(10)	0.837(6)	0.811(8)	0.732(11)	0.895(1)
	Luo	0.904(5)	0.897(7)	0.917(3)	0.899(6)	0.927(2)	0.864(9)	0.873(8)	0.907(4)	0.861(10)	0.776(11)	0.937(1)
\|$AveRank$\|		4.5	7.4	4.4	3.6	2.1	7.2	9.6	6.4	8.6	10.8	1.4
S3	NR	0.82(3)	0.786(7)	0.825(2)	0.845(1)	0.819(4)	0.798(6)	0.756(8)	0.726(9)	0.712(10)	0.703(11)	0.805(5)
	GPCR	0.952(4)	0.902(9)	0.946(5)	0.965(1)	0.96(3)	0.917(7)	0.879(10)	0.918(6)	0.909(8)	0.798(11)	0.961(2)
	IC	0.958(5)	0.941(7)	0.96(4)	0.965(3)	0.967(1)	0.942(6)	0.907(10)	0.939(8)	0.938(9)	0.866(11)	0.966(2)
	E	0.935(5)	0.881(9)	0.936(4)	0.943(3)	0.944(2)	0.883(8)	0.841(10)	0.918(6)	0.911(7)	0.815(11)	0.948(1)
	Luo	0.835(5)	0.826(9)	0.828(8)	0.84(4)	0.901(2)	0.801(10)	0.829(7)	0.86(3)	0.831(6)	0.633(11)	0.902(1)
\|$AveRank$\|		4.4	8.2	4.6	2.4	2.4	7.4	9	6.4	8	11	2.2
S4	NR	0.637(5)	0.597(6)	0.656(3)	0.677(1)	0.649(4)	0.592(7)	0.562(8)	0.548(9)	0.531(10)	0.524(11)	0.661(2)
	GPCR	0.871(4)	0.798(8)	0.866(5)	0.89(1)	0.886(3)	0.405(11)	0.803(7)	0.816(6)	0.721(9)	0.581(10)	0.887(2)
	IC	0.774(5)	0.658(9)	0.775(4)	0.782(2)	0.776(3)	0.498(11)	0.706(6)	0.702(7)	0.683(8)	0.555(10)	0.783(1)
	E	0.819(3)	0.695(8)	0.799(5)	0.815(4)	0.821(2)	0.453(11)	0.757(6)	0.744(7)	0.69(9)	0.541(10)	0.827(1)
	Luo	0.819(4)	0.752(7)	0.804(5)	0.745(8)	0.848(2)	0.438(11)	0.787(6)	0.822(3)	0.732(9)	0.513(10)	0.85(1)
\|$AveRank$\|		4.2	7.6	4.4	3.2	2.8	10.2	6.6	6.4	9	10.2	1.4
\|$Summary$\|		4.37*	7.38*	4.23*	3.05	2.38*	7.73*	8.7*	6.48*	8.1*	10.35*	1.6

Table 3

AUC results in all prediction settings

Setting	Dataset	WkNNIR	MSCMF	NRMFL	GRGMF	MF2A	DRLSM	DTINet	NEDTP	NetMF	Multi2Vec	MDMF2A
S1	NR	-	0.882(4.5)	0.882(4.5)	0.891(2)	0.884(3)	0.879(6)	0.797(9)	0.846(7)	0.818(8)	0.788(10)	0.892(1)
	GPCR	-	0.962(6)	0.972(4)	0.978(2)	0.978(2)	0.971(5)	0.916(10)	0.953(8)	0.96(7)	0.93(9)	0.978(2)
	IC	-	0.982(6)	0.989(2)	0.988(4)	0.989(2)	0.981(7.5)	0.938(10)	0.981(7.5)	0.985(5)	0.97(9)	0.989(2)
	E	-	0.961(8)	0.981(4)	0.982(3)	0.983(2)	0.964(7)	0.839(10)	0.97(5)	0.966(6)	0.944(9)	0.984(1)
	Luo	-	0.922(7)	0.951(3)	0.947(4)	0.966(2)	0.941(5)	0.894(9)	0.929(6)	0.917(8)	0.861(10)	0.97(1)
\|$AveRank$\|		-	6.3	3.5	3	2.2	6.1	9.6	6.7	6.8	9.4	1.4
S2	NR	0.825(5.5)	0.802(7)	0.826(4)	0.825(5.5)	0.833(2)	0.831(3)	0.666(11)	0.786(8)	0.739(9)	0.727(10)	0.837(1)
	GPCR	0.914(4)	0.882(7)	0.913(5)	0.924(2.5)	0.924(2.5)	0.867(8)	0.858(9)	0.885(6)	0.852(10)	0.811(11)	0.925(1)
	IC	0.826(4)	0.783(9)	0.825(5)	0.833(1)	0.828(2)	0.796(7)	0.766(10)	0.794(8)	0.803(6)	0.715(11)	0.827(3)
	E	0.877(4)	0.835(7)	0.858(5)	0.891(3)	0.892(2)	0.799(9)	0.78(10)	0.837(6)	0.811(8)	0.732(11)	0.895(1)
	Luo	0.904(5)	0.897(7)	0.917(3)	0.899(6)	0.927(2)	0.864(9)	0.873(8)	0.907(4)	0.861(10)	0.776(11)	0.937(1)
\|$AveRank$\|		4.5	7.4	4.4	3.6	2.1	7.2	9.6	6.4	8.6	10.8	1.4
S3	NR	0.82(3)	0.786(7)	0.825(2)	0.845(1)	0.819(4)	0.798(6)	0.756(8)	0.726(9)	0.712(10)	0.703(11)	0.805(5)
	GPCR	0.952(4)	0.902(9)	0.946(5)	0.965(1)	0.96(3)	0.917(7)	0.879(10)	0.918(6)	0.909(8)	0.798(11)	0.961(2)
	IC	0.958(5)	0.941(7)	0.96(4)	0.965(3)	0.967(1)	0.942(6)	0.907(10)	0.939(8)	0.938(9)	0.866(11)	0.966(2)
	E	0.935(5)	0.881(9)	0.936(4)	0.943(3)	0.944(2)	0.883(8)	0.841(10)	0.918(6)	0.911(7)	0.815(11)	0.948(1)
	Luo	0.835(5)	0.826(9)	0.828(8)	0.84(4)	0.901(2)	0.801(10)	0.829(7)	0.86(3)	0.831(6)	0.633(11)	0.902(1)
\|$AveRank$\|		4.4	8.2	4.6	2.4	2.4	7.4	9	6.4	8	11	2.2
S4	NR	0.637(5)	0.597(6)	0.656(3)	0.677(1)	0.649(4)	0.592(7)	0.562(8)	0.548(9)	0.531(10)	0.524(11)	0.661(2)
	GPCR	0.871(4)	0.798(8)	0.866(5)	0.89(1)	0.886(3)	0.405(11)	0.803(7)	0.816(6)	0.721(9)	0.581(10)	0.887(2)
	IC	0.774(5)	0.658(9)	0.775(4)	0.782(2)	0.776(3)	0.498(11)	0.706(6)	0.702(7)	0.683(8)	0.555(10)	0.783(1)
	E	0.819(3)	0.695(8)	0.799(5)	0.815(4)	0.821(2)	0.453(11)	0.757(6)	0.744(7)	0.69(9)	0.541(10)	0.827(1)
	Luo	0.819(4)	0.752(7)	0.804(5)	0.745(8)	0.848(2)	0.438(11)	0.787(6)	0.822(3)	0.732(9)	0.513(10)	0.85(1)
\|$AveRank$\|		4.2	7.6	4.4	3.2	2.8	10.2	6.6	6.4	9	10.2	1.4
\|$Summary$\|		4.37*	7.38*	4.23*	3.05	2.38*	7.73*	8.7*	6.48*	8.1*	10.35*	1.6

Setting	Dataset	WkNNIR	MSCMF	NRMFL	GRGMF	MF2A	DRLSM	DTINet	NEDTP	NetMF	Multi2Vec	MDMF2A
S1	NR	-	0.882(4.5)	0.882(4.5)	0.891(2)	0.884(3)	0.879(6)	0.797(9)	0.846(7)	0.818(8)	0.788(10)	0.892(1)
	GPCR	-	0.962(6)	0.972(4)	0.978(2)	0.978(2)	0.971(5)	0.916(10)	0.953(8)	0.96(7)	0.93(9)	0.978(2)
	IC	-	0.982(6)	0.989(2)	0.988(4)	0.989(2)	0.981(7.5)	0.938(10)	0.981(7.5)	0.985(5)	0.97(9)	0.989(2)
	E	-	0.961(8)	0.981(4)	0.982(3)	0.983(2)	0.964(7)	0.839(10)	0.97(5)	0.966(6)	0.944(9)	0.984(1)
	Luo	-	0.922(7)	0.951(3)	0.947(4)	0.966(2)	0.941(5)	0.894(9)	0.929(6)	0.917(8)	0.861(10)	0.97(1)
\|$AveRank$\|		-	6.3	3.5	3	2.2	6.1	9.6	6.7	6.8	9.4	1.4
S2	NR	0.825(5.5)	0.802(7)	0.826(4)	0.825(5.5)	0.833(2)	0.831(3)	0.666(11)	0.786(8)	0.739(9)	0.727(10)	0.837(1)
	GPCR	0.914(4)	0.882(7)	0.913(5)	0.924(2.5)	0.924(2.5)	0.867(8)	0.858(9)	0.885(6)	0.852(10)	0.811(11)	0.925(1)
	IC	0.826(4)	0.783(9)	0.825(5)	0.833(1)	0.828(2)	0.796(7)	0.766(10)	0.794(8)	0.803(6)	0.715(11)	0.827(3)
	E	0.877(4)	0.835(7)	0.858(5)	0.891(3)	0.892(2)	0.799(9)	0.78(10)	0.837(6)	0.811(8)	0.732(11)	0.895(1)
	Luo	0.904(5)	0.897(7)	0.917(3)	0.899(6)	0.927(2)	0.864(9)	0.873(8)	0.907(4)	0.861(10)	0.776(11)	0.937(1)
\|$AveRank$\|		4.5	7.4	4.4	3.6	2.1	7.2	9.6	6.4	8.6	10.8	1.4
S3	NR	0.82(3)	0.786(7)	0.825(2)	0.845(1)	0.819(4)	0.798(6)	0.756(8)	0.726(9)	0.712(10)	0.703(11)	0.805(5)
	GPCR	0.952(4)	0.902(9)	0.946(5)	0.965(1)	0.96(3)	0.917(7)	0.879(10)	0.918(6)	0.909(8)	0.798(11)	0.961(2)
	IC	0.958(5)	0.941(7)	0.96(4)	0.965(3)	0.967(1)	0.942(6)	0.907(10)	0.939(8)	0.938(9)	0.866(11)	0.966(2)
	E	0.935(5)	0.881(9)	0.936(4)	0.943(3)	0.944(2)	0.883(8)	0.841(10)	0.918(6)	0.911(7)	0.815(11)	0.948(1)
	Luo	0.835(5)	0.826(9)	0.828(8)	0.84(4)	0.901(2)	0.801(10)	0.829(7)	0.86(3)	0.831(6)	0.633(11)	0.902(1)
\|$AveRank$\|		4.4	8.2	4.6	2.4	2.4	7.4	9	6.4	8	11	2.2
S4	NR	0.637(5)	0.597(6)	0.656(3)	0.677(1)	0.649(4)	0.592(7)	0.562(8)	0.548(9)	0.531(10)	0.524(11)	0.661(2)
	GPCR	0.871(4)	0.798(8)	0.866(5)	0.89(1)	0.886(3)	0.405(11)	0.803(7)	0.816(6)	0.721(9)	0.581(10)	0.887(2)
	IC	0.774(5)	0.658(9)	0.775(4)	0.782(2)	0.776(3)	0.498(11)	0.706(6)	0.702(7)	0.683(8)	0.555(10)	0.783(1)
	E	0.819(3)	0.695(8)	0.799(5)	0.815(4)	0.821(2)	0.453(11)	0.757(6)	0.744(7)	0.69(9)	0.541(10)	0.827(1)
	Luo	0.819(4)	0.752(7)	0.804(5)	0.745(8)	0.848(2)	0.438(11)	0.787(6)	0.822(3)	0.732(9)	0.513(10)	0.85(1)
\|$AveRank$\|		4.2	7.6	4.4	3.2	2.8	10.2	6.6	6.4	9	10.2	1.4
\|$Summary$\|		4.37*	7.38*	4.23*	3.05	2.38*	7.73*	8.7*	6.48*	8.1*	10.35*	1.6

The proposed MDMF2A is the best-performing model in most cases for both metrics, achieving the highest average rank in all prediction settings and statistically significantly outperforming all competitors, except for GRGMF in AUC. This demonstrates the effectiveness of our model to sufficiently exploit the topology information embedded in the multiplex heterogeneous DTI network and optimize the two area under the curve metrics. MF2A is the runner-up. Its inferiority to MDMF2A is mainly attributed to the ignorance of high-order proximity captured by random walks and the view-specific information loss caused by aggregating multi-type similarities. GRGMF, WkNNIR, NRMLF, DRLSM and MSCMF come next. They are outperformed by the proposed MDMF2A, because they fail to capture the unique information provided by each view. The two graph embedding based DTI prediction models are usually inferior to other DTI prediction approaches, because they generate embedding in an unsupervised manner without exploiting the interacting information. Specifically, NEDTP using the class imbalance resilient GBDT as the predicting classifier outperforms DTINet which employs simple linear projection to estimate DTIs. Regarding the two general multiplex network embedding methods, NetMF is better than Multi2Vec, because the latter requires dichotomizing the edge weights, which wipes out the different influence of the connected nodes in the similarity subnetwork. In addition, averaging all per-layer embeddings does not distinguish the importance of each hyper-layer, unlike the holistic DeepWalk Matrix used in NetMF, which contributes to the inferiority of Multi2Vec to NetMF as well.

There are some cases where MDMF2A does not achieve the best performance. Some baseline models, such as WkNNIR, NRLMF, GRGMF and MF2A, are better than MDMF2A on NR datasets, implying that the random walk-based embedding generation may not yield enough benefit in the case of small-sized datasets. Besides, concerning the Luo dataset under S2, MDMF2A is outperformed by WKNNIR and DRLSM, which incorporate the neighborhood interaction recovery procedure, in terms of AUPR. In the Luo dataset, the neighbor drugs are more likely to share the same interactions, leading to the effectiveness of neighborhood-based interaction estimation for new drugs. Nevertheless, the AUC results of the two baselines are 3.7% and 8.4% lower than DWFM2A, respectively. Also, MF2A is slightly better than MDMF2A on the IC dataset in terms of AUC under S2 and S3, but the gap of results between them is tiny, e.g. 0.001. Finally, GRGMF achieves better AUC results than MDMF2A on the GPCR dataset under S3 and S4 as well as the IC dataset under S2, mainly because GRGMF learns neighbor information adaptively. But it is worse than MDMF2A in terms of AUPR, which is more informative when evaluating a model under extremely imbalanced class distribution (sparse interaction).

The advantage of MDMF2A is also observed in the comparison with deep learning-based DTI prediction models on the Luo dataset. As shown in Table 4, MDMF2A outperforms all competitors in terms of AUPR, achieving 14% improvements over the best-performing competitor (DCFME). In terms of AUC, MDMF2A is only 1.1% lower than DTIP, and 5.3%, 3.6% and 3.9% higher than NeoDTI, DCFME and SupDTI. Although DTIP emphasizes the AUC performance and slightly outperforms MDMF2A, it suffers a significant decline in the AUPR results. Compared with deep learning competitors, MDMF2A takes full advantage of the information shared by high-order neighbors and explicitly optimizes AUPR and AUC that are more effective than the conventional entropy and focal losses to identifying the less frequent interacting pairs, resulting in better performance.

Table 4

Results of MDMF2A and Deep Learning models on Luo dataset in S1

Metric	NeoDTI	DTIP	DCFME	SupDTI	MDMF2A
AUPR	0.573(4)	0.399(5)	0.596(2)	0.585(3)	0.679(1)
AUC	0.921(5)	0.981(1)	0.936(3)	0.933(4)	0.97(2)

Metric	NeoDTI	DTIP	DCFME	SupDTI	MDMF2A
AUPR	0.573(4)	0.399(5)	0.596(2)	0.585(3)	0.679(1)
AUC	0.921(5)	0.981(1)	0.936(3)	0.933(4)	0.97(2)

Table 4

Results of MDMF2A and Deep Learning models on Luo dataset in S1

Metric	NeoDTI	DTIP	DCFME	SupDTI	MDMF2A
AUPR	0.573(4)	0.399(5)	0.596(2)	0.585(3)	0.679(1)
AUC	0.921(5)	0.981(1)	0.936(3)	0.933(4)	0.97(2)

Metric	NeoDTI	DTIP	DCFME	SupDTI	MDMF2A
AUPR	0.573(4)	0.399(5)	0.596(2)	0.585(3)	0.679(1)
AUC	0.921(5)	0.981(1)	0.936(3)	0.933(4)	0.97(2)

To comprehensively investigate the proposed MDMF2A, we conduct an ablation study to demonstrate the effectiveness of its ensemble framework and all regularization terms and analyze the sensitivity of three important parameters, i.e. |$r$|⁠, |$n_w$| and |$n_s$|⁠. Please see more details in Supplementary Sections A5–A6.

Discovery of novel DTIs

We examine the capability of MDMF2A to find novel DTIs not recorded in the Luo dataset. We do not consider updated golden standard datasets, since they have included all recently validated DTIs collected from up-to-date databases. We split all noninteracting pairs into 10 folds, and obtain predictions of each fold by training an MDMF2A model with all interacting pairs and the other nine folds of noninteracting ones. All noninteracting pairs are ranked based on their predicting scores, and the top 10 pairs are selected as newly discovered DTIs, which are shown in Table 5. To further verify the reliability of these new interaction candidates, we search their supportive evidence from DrugBank (DB) [23] and DrugCentral (DC) [36]. As we can see, 8/10 new interactions (in bold) are confirmed, demonstrating the success of MDMF2A in trustworthy new DTI discovery.

Table 5

Top 10 new DTIs discovered by MDMF2A from Luo’s datasets

Drug ID	Drug name	Target ID	Target name	Rank	Database
DB00829	Diazepam	P48169	GABRA4	1	DB
DB01215	Estazolam	P48169	GABRA4	2	DB
DB00580	Valdecoxib	P23219	PTGS1	3	-
DB01367	Rasagiline	P21397	MAOA	4	DC
DB00333	Methadone	P41145	OPRK1	5	DC
DB00363	Clozapine	P21918	DRD5	6	DC
DB06216	Asenapine	P21918	DRD5	7	DB
DB06800	Methylnal-trexone	P41143	OPRD1	8	DC
DB00802	Alfentanil	P41145	OPRK1	9	-
DB00482	Celecoxib	P23219	PTGS1	10	DC

Drug ID	Drug name	Target ID	Target name	Rank	Database
DB00829	Diazepam	P48169	GABRA4	1	DB
DB01215	Estazolam	P48169	GABRA4	2	DB
DB00580	Valdecoxib	P23219	PTGS1	3	-
DB01367	Rasagiline	P21397	MAOA	4	DC
DB00333	Methadone	P41145	OPRK1	5	DC
DB00363	Clozapine	P21918	DRD5	6	DC
DB06216	Asenapine	P21918	DRD5	7	DB
DB06800	Methylnal-trexone	P41143	OPRD1	8	DC
DB00802	Alfentanil	P41145	OPRK1	9	-
DB00482	Celecoxib	P23219	PTGS1	10	DC

Table 5

Top 10 new DTIs discovered by MDMF2A from Luo’s datasets

Drug ID	Drug name	Target ID	Target name	Rank	Database
DB00829	Diazepam	P48169	GABRA4	1	DB
DB01215	Estazolam	P48169	GABRA4	2	DB
DB00580	Valdecoxib	P23219	PTGS1	3	-
DB01367	Rasagiline	P21397	MAOA	4	DC
DB00333	Methadone	P41145	OPRK1	5	DC
DB00363	Clozapine	P21918	DRD5	6	DC
DB06216	Asenapine	P21918	DRD5	7	DB
DB06800	Methylnal-trexone	P41143	OPRD1	8	DC
DB00802	Alfentanil	P41145	OPRK1	9	-
DB00482	Celecoxib	P23219	PTGS1	10	DC

Drug ID	Drug name	Target ID	Target name	Rank	Database
DB00829	Diazepam	P48169	GABRA4	1	DB
DB01215	Estazolam	P48169	GABRA4	2	DB
DB00580	Valdecoxib	P23219	PTGS1	3	-
DB01367	Rasagiline	P21397	MAOA	4	DC
DB00333	Methadone	P41145	OPRK1	5	DC
DB00363	Clozapine	P21918	DRD5	6	DC
DB06216	Asenapine	P21918	DRD5	7	DB
DB06800	Methylnal-trexone	P41143	OPRD1	8	DC
DB00802	Alfentanil	P41145	OPRK1	9	-
DB00482	Celecoxib	P23219	PTGS1	10	DC

Conclusion

This paper proposed MDMF2A, a random walk and matrix factorization based model, to predict DTIs by effectively mining topology information from the multiplex heterogeneous network involving diverse drug and target similarities. It integrates two base predictors that leverage our designed objective function, encouraging the learned embeddings to preserve holistic network and layer-specific topology structures. The two base models utilize the convex AUPR and AUC losses in their objectives, enabling MDMF2A to simultaneously optimize two crucial metrics in the DTI prediction task. We have conducted extensive experiments on five DTI datasets under various prediction settings. The results affirmed the superiority of the proposed MDMF2A to other competing DTI prediction methods. Furthermore, the practical ability of MDMF2A to discover novel DTIs was supported by the evidence from online biological databases.

In the future, we plan to extend our model to handle attributed DTI networks, including both topological and feature information for drugs and targets.

Key Points

Incorporating multiple hyper-layers based DeepWalk matrix decomposition and layer-specific graph Laplacian to learn robust node representations that preserve both global and view-specific topology.
MDMF integrates multiplex heterogeneous network representation learning and DTI prediction into a unified optimization framework, learning latent features in a supervised manner and implicitly recoveries possible missing interactions.
MDMF2A, the instantiation of MDMF, optimizes both AUPR and AUC metrics.
Our method statistically significantly outperforms state-of-the-art methods under various prediction settings and can discover new reliable DTIs.

Data and code availability

The source code and data are available could be found at https://github.com/intelligence-csd-auth-gr/DTI_MDMF2A

Funding

This work was supported by the China Scholarship Council (CSC) [201708500095]; the French National Research Agency (ANR) under the JCJC project GraphIA [ANR-20-CE23-0009-01].

Author Biographies

Bin Liu is a lecturer at Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications and received his PhD Degree in computer science from Aristotle University of Thessaloniki. His research interests include multi-label learning and bioinformatics.

Dimitrios Papadopoulos is a PhD student at the School of Informatics, Aristotle University of Thessaloniki. His research interests include supervised machine learning, graph mining, and drug discovery.

Fragkiskos D. Malliaros is an Assistant Professor at Paris-Saclay University, CentraleSupélec and associate researcher at Inria Saclay. His research interests include graph mining, machine learning and graph-based information extraction.

Grigorios Tsoumakas is an Associate Professor at the Aristotle University of Thessaloniki. His research interests include machine learning (ensembles, multi-target prediction) and natural language processing (semantic indexing, keyphrase extraction, summarization)

Apostolos N. Papadopoulos is Associate Professor at the School of Informatics, Aristotle University of Thessaloniki. His research interests include data management, data mining and big data analytics.

References

1.

Bagherian

M

,

Sabeti

E

,

Wang

K

, et al.

Machine learning approaches and databases for prediction of drug-target interaction: a survey paper

.

Brief Bioinform

2021

;

22

(

1

):

247

–

69

.

2.

Ezzat

A

,

Zhao

P

,

Min

W

, et al.

Drug-target interaction prediction with graph regularized matrix factorization

.

IEEE/ACM Trans Comput Biol Bioinform

2017

;

14

(

3

):

646

–

56

.

3.

Ding

Y

,

Tang

J

,

Guo

F

.

Identification of drug-target interactions via dual laplacian regularized least squares with multiple kernel fusion

.

Knowl Based Syst

2020

;

204

:

106254

.

4.

An

Q

,

Liang

Y

.

A heterogeneous network embedding framework for predicting similarity-based drug-target interactions

.

Brief Bioinform

2021

;

22

(

6

):

1

–

10

.

PubMed

5.

Xuan

P

,

Yu

Z

,

Cui

H

, et al.

Integrating multi-scale neighbouring topologies and cross-modal similarities for drug-protein interaction prediction

.

Brief Bioinform

2021

;

22

(

5

):

bbab119

.

6.

Liu

B

,

Pliakos

K

,

Vens

C

, et al.

Drug-target interaction prediction via an ensemble of weighted nearest neighbors with interaction recovery

.

Appl Intell

2022

;

52

(

4

):

3705

–

27

.

7.

Liu

Y

,

Wu

M

,

Miao

C

, et al.

Neighborhood regularized logistic matrix factorization for drug-target interaction prediction

.

PLoS Comput Biol

2016

;

12

(

2

):

e1004760

.

8.

Pliakos

K

,

Vens

C

,

Tsoumakas

G

.

Predicting drug-target interactions with multi-label classification and label partitioning

.

IEEE/ACM Trans Comput Biol Bioinform

2021

;

18

(

4

):

1596

–

607

.

9.

Zheng

X

,

Ding

H

,

Mamitsuka

H

, et al. Collaborative matrix factorization with multiple similarities for predicting drug-Target interactions. In: Dhillon IS, Koren Y, Ghani R, Senator TE, Bradley P, Parekh R, He J, Grossman RL, Uthurusamy R (eds.)

Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min

. Chicago, IL, USA: ACM, pages

1025

–

33

,

2013

.

10.

Liu

B

,

Tsoumakas

G

.

Optimizing area under the curve measures via matrix factorization for predicting drug-target interaction with multiple similarities

.

arXiv

, 2021;abs/2105.01545:1–14.

11.

Olayan

RS

,

Ashoor

H

,

Bajic

VB

.

DDR: efficient computational method to predict drug-target interactions using graph mining and machine learning approaches

.

Bioinformatics

2018

;

7

(

34

):

1164

–

73

.

12.

Wan

F

,

Hong

L

,

Xiao

A

, et al.

NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions

.

Bioinformatics

2019

;

35

(

1

):

104

–

11

.

13.

Chen

J

,

Liang

Z

,

Cheng

K

, et al.

Predicting drug-target interaction via self-supervised learning

.

IEEE/ACM Trans Comput Biol Bioinform

2022

;

PP

:

1

.

14.

Chen

R

,

Xia

F

,

Hu

B

, et al.

Drug-target interactions prediction via deep collaborative filtering with multiembeddings

.

Brief Bioinform

2022

;

23

(2):bbab520.

15.

Dai

H

,

Li

H

,

Tian

T

, et al. Adversarial attack on graph structured data. In: Dy JG, Krause A (eds.)

Proc. Int. Conf. on Mach. Learn

. Stockholm, Sweden: PMLR, pages

1115

–

24

,

2018

.

16.

Pliakos

K

,

Vens

C

.

Drug-target interaction prediction with tree-ensemble learning and output space reconstruction

.

BMC Bioinform

2020

;

21

(

1

):

1

–

11

.

17.

Luo

Y

,

Zhao

X

,

Zhou

J

, et al.

A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information

.

Nat Commun

2017

;

8

(

1

):

1

–

13

.

PubMed

18.

Qiu

J

,

Dong

Y

,

Ma

H

, et al. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In: Chang Y, Zhai C, Liu Y, Maarek Y (eds.)

Proc. ACM Int. Conf. Web Search Data Min

. Marina Del Rey, CA, USA: ACM, pages

459

–

67

,

2018

.

19.

Perozzi

B

,

Al-Rfou

R

,

Skiena

S

. DeepWalk: Online learning of social representations. In: Macskassy SA, Perlich C, Leskovec J, Wang W, Ghani R (eds.)

Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min

. New York, NY, USA: ACM, pages

701

–

10

,

2014

.

20.

Pahikkala

T

,

Airola

A

,

Pietilä

S

, et al.

Toward more realistic drug-target interaction predictions

.

Brief Bioinform

2015

;

16

(

2

):

325

–

37

.

21.

Yamanishi

Y

,

Araki

M

,

Gutteridge

A

, et al.

Prediction of drug-target interaction networks from the integration of chemical and genomic spaces

.

Bioinformatics

2008

;

24

(

13

):

i232

–

40

.

22.

Kanehisa

M

,

Furumichi

M

,

Tanabe

M

, et al.

KEGG: new perspectives on genomes, pathways, diseases and drugs

.

Nucleic Acids Res

2017

;

45

(

D1

):

D353

–

61

.

23.

Wishart

DS

,

Feunang

YD

,

Guo

AC

, et al.

DrugBank 5.0: a major update to the DrugBank database for 2018

.

Nucleic Acids Res

2018

;

46

(

D1

):

D1074

–

82

.

24.

Mendez

D

,

Gaulton

A

,

Bento

AP

, et al.

ChEMBL: towards direct deposition of bioassay data

.

Nucleic Acids Res

2019

;

47

(

D1

):

D930

–

40

.

25.

Hattori

M

,

Okuno

Y

,

Goto

S

, et al.

Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways

.

J Am Chem Soc

2003

;

125

(

39

):

11853

–

65

.

26.

Takarabe

M

,

Kotera

M

,

Nishimura

Y

, et al.

Drug target prediction using adverse event report systems: a pharmacogenomic approach

.

Bioinformatics

2012

;

28

(

18

):

i611

–

8

.

27.

Kuhn

M

,

Letunic

I

,

Jensen

LJ

, et al.

The SIDER database of drugs and side effects

.

Nucleic Acids Res

2016

;

44

(

D1

):

D1075

–

9

.

28.

Nascimento

ACA

,

Prudêncio

RBC

,

Costa

IG

.

A multiple kernel learning algorithm for drug-target interaction prediction

.

BMC Bioinform

2016

;

17

(

1

):

1

–

16

.

29.

Prasad

TSK

,

Goel

R

,

Kandasamy

K

, et al.

Human protein reference database-2009 update

.

Nucleic Acids Res

2009

;

37

(

suppl_1

):

D767

–

72

.

PubMed

30.

Davis

AP

,

Murphy

CG

,

Johnson

R

, et al.

The comparative toxicogenomics database: update 2013

.

Nucleic Acids Res

2013

;

41

(

D1

):

D1104

–

14

.

31.

Duchi

J

,

Hazan

E

,

Singer

Y

.

Adaptive subgradient methods for online learning and stochastic optimization

.

J Mach Learn Res

2011

;

12

:

2121

–

59

.

32.

Revaud

J

,

Almazan

J

,

Rezende

R

, et al. Learning with average precision: Training image retrieval with a listwise loss. In: Lee KM, Forsyth D, Pollefeys M, Tang X (eds.)

Proc IEEE Int Conf Comput Vis

. Seoul, South Korea: IEEE, pages

5106

–

15

,

2019

.

33.

Zhang

ZC

,

Zhang

XF

,

Wu

M

, et al.

A graph regularized generalized matrix factorization model for predicting links in biomedical bipartite networks

.

Bioinformatics

2020

;

36

(

11

):

3474

–

81

.

34.

Teng

X

,

Liu

J

,

Li

L

.

A synchronous feature learning method for multiplex network embedding

.

Inform Sci

2021

;

574

:

176

–

91

.

35.

Benavoli

A

,

Corani

G

,

Mangili

F

.

Should we really use post-hoc tests based on mean-ranks?

J Mach Learn Res

2016

;

17

(

1

):

1

–

10

.