MetaHMEI: meta-learning for prediction of few-shot histone modifying enzyme inhibitors

Lu, Qi; Zhang, Ruihan; Zhou, Hongyuan; Ni, Dongxuan; Xiao, Weilie; Li, Jin

doi:10.1093/bib/bbad115

Abstract

Motivation

Histones are the chief protein components of chromatin, and the chemical modifications on histones crucially influence the transcriptional state of related genes. Histone modifying enzyme (HME), responsible for adding or removing the chemical labels, has emerged as a very important class of drug target, with a few HME inhibitors launched as anti-cancerous drugs and tens of molecules under clinical trials. To accelerate the drug discovery process of HME inhibitors, machine learning-based predictive models have been developed to enrich the active molecules from vast chemical space. However, the number of compounds with known activity distributed largely unbalanced among different HMEs, particularly with many targets of less than a hundred active samples. In this case, it is difficult to build effective virtual screening models directly based on machine learning.

Results

To this end, we propose a new Meta-learning-based Histone Modifying Enzymes Inhibitor prediction method (MetaHMEI). Our proposed MetaHMEI first uses a self-supervised pre-training approach to obtain high-quality molecular substructure embeddings from a large unlabeled chemical dataset. Then, MetaHMEI exploits a Transformer-based encoder and meta-learning framework to build a prediction model. MetaHMEI allows the effective transfer of the prior knowledge learned from HMEs with sufficient samples to HMEs with a small number of samples, so the proposed model can produce accurate predictions for HMEs with limited data. Extensive experimental results on our collected and curated HMEs datasets show that MetaHMEI is better than other methods in the case of few-shot learning. Furthermore, we applied MetaHMEI in the virtual screening process of histone JMJD3 inhibitors and successfully obtained three small molecule inhibitors, further supporting the validity of our model.

few-shot learning, meta-learning, small molecule inhibitors, histone modifying enzymes, drug discovery

INTRODUCTION

Histone modification is an essential mechanism of epigenetic regulation, which controls gene expression without changing the gene sequence. A variety of histone modification enzymes (HMEs) are involved in adding (writers), removing (erasers) and reading (readers) the chemical groups on histones. Since aberrant histone modifications lead to the occurrence and development of human diseases, HMEs as an important class of drug targets have attracted interest from academia to industry. Despite that more than 120 HMEs have been discovered, up to now, only a few HMEs were successfully launched, including histone deacetylases (HDACs) and histone methyltransferases (HMT) EZH2, whereas the majority of HMEs are still under clinical or pre-clinical phases [1, 2]. As the pioneer of HME inhibitors, histone deacetylase (HDAC) inhibitors have been approved for the treatment of multiple myeloma and T-cell lymphoma [3]. The potential of HME inhibitors for other diseases is under clinical trials or pre-clinical investigation, such as solid tumors [4], immune system diseases [5], neurological disorders [6], diabetes [7], etc. One of the major challenges of HME inhibitors development is the poor isoform-selectivity, which limits the application of HME inhibitors due to the off-target toxicity. The substrates and catalytic centers of HMEs share similarities to a different extent, bringing the difficulty to design selective inhibitors among similar targets [8, 9]. In this case, determining active and selective scaffolds during preliminary screening would accelerate the following drug R&D process.

In contrast with traditional wet-experiment-based screening, machine learning-based approaches for predicting HME inhibitors are highly efficient and low-cost, highly desired by us. Yang et al. [10] applied long short-term memory neural network model in generating novel small molecules that target the p300/CBP protein–protein interaction, resulting in the discovery of a highly potent inhibitory. Recently, Norberto et al. [11, 12] reported a machine learning-based predictive tool for 55 epigenetic targets, including HMEs, kinases, chromatin remodelers, etc., and systematically compared the performance of different algorithms and descriptors [11]. Evaluating the activity against multiple targets is very useful in the discovery of selective inhibitors. However, there are only 29 HMEs included in the previous model, due to the difficulty of using traditional machine learning methods on the targets with insufficient compound-target activity data. There are other machine learning-based virtual screening methods [13–15], which assume that the model has sufficient samples for training, without considering the few-shot sample learning.

Generally, using machine learning-based methods to solve the virtual screening problem of HME inhibitors still faces the following challenges: First, as far as we know, there is no public and well-curated small molecule dataset dedicated to the prediction of histone-modified enzyme inhibitors. This poses an obstacle to the virtual screening of HME inhibitors using data-driven machine learning. Second, by analyzing and investigating the data of some known histone-modifying enzyme inhibitor families, such as HDACs, we found that the number of compounds with known activity distributed largely imbalanced among different HMEs, especially with many targets of less than several or dozens of active samples. The issues of the imbalance distribution of samples and few-shot samples challenge the effective training of machine learning prediction models. In particular, the few-shot samples issue will seriously hinder the ligand-based prediction for the inhibitors of some newly discovered HMEs, which often lack compound-target activity data.

To meet the above challenges, the main contributions of this study are as follows.

First, we constructed a small molecule compound data set (named HMEDB) for HMEs prediction by collecting and curating compound-target activity data of HMEs from the open ChEMBL database [16]. HMEDB includes four groups of histones modifying enzymes, a total of 56 different protein targets, and 97 363 known active and 130 114 inactive compound-protein data points. As far as we know, this is the first dataset for the activity prediction task of HMEs.

Second, meta-learning [17] is a new learning paradigm for few-shot sample scenarios, focused on deriving common prior knowledge across related learning tasks, to rapidly adapt to a new learning task with the prior and a small amount of training data. Since HMEs in the same family have similar substrates and catalytic mechanisms, inhibitors of related HMEs may share the same chemical space. Therefore, HME inhibitor prediction is suitable to be formulated as a meta-learning problem, where each task is constructed for an HME target to learn the specific prediction model. From the tasks for the HME target with sufficient training samples, the meta-learner learns a prior with strong generalization capacity during meta-training, such that it can be easily and quickly adapted to the new tasks of the target with scarce data during meta-testing. To work on the few-shot issue of many HME targets, we established a novel meta-learning-based HME inhibitor predictor MetaHMEI which consists of the following three modules: (a) Self-supervised Pretraining Module, (b) transformer-based representation learning module for compound SMILEs and (c) meta-learning-based few-shot HME prediction module to tackle the aforementioned few-shot samples prediction challenges.

Finally, we conducted extensive experiments to verify the MetaHMEI prediction performance. The experimental results show that MetaHMEI achieves superior prediction performance compared with the baselines in terms of AUC under the few-shot samples. In particular, to further evaluate the performance of our proposed model for histone inhibitor screening, we conducted a case study on a specific HME, i.e. JMJD3. MetaHMEI was used to screen 500 000 compounds on TopScience Database and get 5000 candidates, followed by structure-based virtual screening. Finally, 15 compounds were selected for the enzymatic inhibitory test, resulting in the discovery of three new JMJD3 inhibitors with the IC₅₀ < 10 μM. It demonstrated the effectiveness and feasibility of the proposed method in screening large compound libraries for HME inhibitors, especially for the targets with the few-shot issue.

MATERIALS

The dataset was collected and curated from the ChEMBL database (https://www.ebi.ac.uk/chembl/CHEMBL). We focused on the compound-target activity data of histone-modifying enzymes, including HMT, histone acetylases, histone demethylases (HDM) and HDAC. The primary dataset was processed in the following steps.

(i) Remove the records with no SMILES and no ‘Standard Value’, as well as those with the ‘Standard Type’ other than Inhibition, IC₅₀, EC₅₀, K_i and K_d.

(ii) Define the active compounds as with the IC₅₀ / EC₅₀ / K_i / K_d ≤ 10 μM, and the inactive compounds are with IC₅₀ /EC₅₀/ K_i/K_d > 10 μM. For the records with the ‘standard type’ as Inhibition, we checked the ‘Assay Description’ for the compound concentration to identify them as active or inactive. Then, the records cannot be classified into active or inactive (e.g. IC₅₀ > 5 μM or Inhibition > 70% under 30 μM) and were removed from the dataset.

(iii) For the compounds with multiple activity records, the active or inactive label was decided by the majority rule. If the activity records of the same compound contradicted seriously each other, this compound would be removed.

As a result, the final dataset consists of 56 HME targets, 97 363 active data points and 130 114 inactive data points. The details information on the dataset can be found in Table S1 (see Supplementary Data available online at https://github.com/ljatynu/MetaHMEI/).

METHODS

In this section, we mainly introduce the basic idea and architecture of MetaHMEI to solve the problem of few-shot HME inhibitor prediction.

The general framework of MetaHMEI

As shown in Figure 1, the overall architecture of MetaHMEI consists of the following three modules.

Figure 1

The general framework of MetaHMEIMetaHMEI takes compound SMILES as inputs and consists of three modules. (A) Pre-training Module, a tokenized method (e.g. ECFP) is used to obtain a vocabulary of SMILES substructures. Then, self-supervised learning Mol2vec is exploited to obtain substructures embeddings from a large chemical corpus (e.g. ChEMBL). (B) Meta-training Module, Transformer-based encoder to produce high-quality compound em-beddings. A multi-tasks meta-learning framework is adopted to learn a well-generalized model from HMEs with sufficient samples, which can fast adapt to target HME with limited samples. (C) Task adaptation Module, target task adapts the prior learned during meta-training stage via one or a few gradient steps w.r.t. its support set and finally produces the parameters specific to the target task.

Open in new tab Download slide

(i) Self-supervised pre-training module. MetaHMEI uses the Simplified Molecular Input Line Entry System (SMILES) of compounds as input to train the prediction model. MetaHMEI first exploited a substructure tokenized method to obtain a substructures vocabulary of SMILES. Then, Mol2vec [18] was used to obtain molecular substructure embeddings from a large unlabeled chemical dataset (e.g. ChEMBL). The learned substructure embeddings convey chemical semantics and ensure to point in similar directions for chemically related substructures.

(ii) Meta-learning-based prediction model. The substructure embeddings were then input into a Transformer-based encoder to produce comprehensive and high-quality vector representation of the compounds (SMILES). Our proposed prediction model adopted a multi-tasks meta-learning framework to learn a well-generalized model from HMEs with sufficient samples, which can fast adapt to target HMEs with limited samples. As MetaHMEI allows effective learning to transfer the prior knowledge learned from HMEs with sufficient samples to HMEs with a small number of samples, the proposed model can produce accurate predictions for HMEs with limited data.

(iii) Target task adaptation module. The target task (i.e. HME prediction task with insufficient samples) adapted the prior learned during the meta-training stage via one or a few gradient steps w.r.t. its support set and finally produces the parameters specific to the target task.

Self-supervised pretraining for substructures of compounds

Mol2vec is an NLP-inspired technique that considers compound substructures derived from the Extended Connectivity Fingerprint (ECFP) [19] as ‘words’ and compounds as ‘sentences’. By application of the Word2vec algorithm [20] on a corpus of compound sequences (e.g. ChEMBL), low-dimensional dense embeddings of substructures are obtained, where the vectors for chemically related substructures direct the same direction of vector space. Mol2vec is a self-supervised method that is initially trained on unlabeled data to obtain feature vectors of substructures, which can be summed to obtain compound vectors.

Specifically, a total of 1 938 745 canonical SMILES sequences are collected from ChEMBL [16] as a corpus of compound sequences for self-supervised learning. A vocabulary including 65 different substructures was obtained by ECFP on the corpus of compound sequences. The SMILE of each compound is divided into substructures using ECFP and obtains substructure list |$L=\left\{{s}_1,{s}_2,\dots, {s}_l\right\}$|⁠. The model parameters in Mol2vec include two transformation matrices denoted as |${\mathrm{W}}_{V\times d}$| and |${\mathrm{W}}_{d\times V}^{\prime }$| (⁠|$V$| is the size of various substructures, and |$d$| is the size of the dimension of embeddings), respectively. For each substructure |${s}_t$|⁠, we aim to predict the substructure |$\left\{{s}_{t-k},{s}_{t-k+1},\dots, {s}_{t-1}\right\}$| and the substructure |$\left\{{s}_{t+1},{s}_{t+2},\dots, {s}_{t+k}\right\}$| (i.e. Skip-gram model). The matrices |${\mathrm{W}}_{N\times d}$| and |${\mathrm{W}}_{d\times V}^{\prime }$| are learned through back-propagation of the loss function. |${\mathrm{W}}_{N\times d}$| (or |${\mathrm{W}}_{d\times V}^{\prime }$|⁠) is considered to be as the embeddings matrix for substructures.

Meta-learning-based HMEs prediction model

Based on the embedded representation of the substructure, we construct a compound representation learning model based on Transformer [21]. Specifically, let |${\mathrm{e}}_i$| be the embedding for the substructure |${s}_i$|⁠. If the substructure does not exist in the vocabulary of substructure or is a placeholder, zero vector |$0$| is used as its embedding. Self-attention mechanism [21] is applied to capture the interaction information between substructures, i.e.

$$\begin{equation} \mathrm{K}=\mathrm{X}{\mathrm{W}}_K,\mathrm{Q}=\mathrm{X}{\mathrm{W}}_Q,\mathrm{V}=\mathrm{X}{\mathrm{W}}_V \end{equation}$$

(1)

where |$\mathrm{E}=\left[{\mathrm{e}}_1,{\mathrm{e}}_2,\dots, {\mathrm{e}}_l\right]$| is the embedding representations of all substructures. |${\mathrm{W}}_K$|⁠, |${\mathrm{W}}_Q$| and |${\mathrm{W}}_V$| are three learnable feature transformation matrices. Then, the attention coefficient between the compound substructure is calculated as the weight, and the substructure eigenvector after transformation is weighted summed

$$\begin{equation} {\mathrm{E}}^{\prime }=\mathrm{Softmax}\left(\frac{\mathrm{K}{\mathrm{Q}}^{\top }}{\sqrt{d_X}}\right)\mathrm{V}, \end{equation}$$

(2)

where |${\mathrm{E}}^{\prime }$| is the embedding matrix considering the interaction between substructures. The final embedding representation of the compound is obtained by summing the embeddings of the substructure of the compound.

First, we introduce our HME active prediction model. For each HME, a simple multi-layer MLPs are adopted as task prediction model, i.e.

$$\begin{equation} x=\sigma \left({W}_2\sigma \left({W}_1e+{b}_1\right)+{b}_2\right), \end{equation}$$

(3)

where |$x$| is the final embedding representation of the compound, |$\sigma$| denotes the activation function (we used RELU here) and |${W}_i$| and |${b}_i$| are learnable parameters of MLP. Finally, we pass |$x$| into the predict function, output the prediction result |${y}^{pre}$|

$$\begin{equation} {y}^{pre}=\sigma \left({\mathrm{\omega}}_{pre}x+{b}_{pre}\right), \end{equation}$$

(4)

where |${\mathrm{\omega}}_{pre}$| and |${b}_{pre}$| are learnable parameters of predict function and |$\sigma$| denotes the activation function (we used Sigmoid here).

Due to the lack of sufficient samples for model training, the prediction performance of the HMEs with insufficient samples will be degraded. The data sparsity thus raises a challenge for the effective prediction of HMEs. To alleviate the data scarcity problem, in this paper, we propose a novel meta-learning approach, named MetaHMEI, for the active prediction of HMEs. MetaHMEI consists of two phases: meta-training and meta-test (few-shot samples adaptation for target task). In the meta-training phase, multiple HMEs with sufficient samples are adopted as meta-training tasks to obtain a well-initialized model which could be fast adapted to a new HME with limited data. In the target task adaptation phase, a few (e.g. <5) known active and inactive samples from a new target HME are used to fine-tune the model on this HME to capture its specific model. With the transferability and fast adaptability between meta-training tasks and the new tasks with few-shot samples, MetaHMEI can be applied to mitigate the data scarcity issue. Before formally describe and define MetaHMEI, we introduce some mathematical notations. Each task |${\mathcal{T}}_k,.$| is constructed for a HME |$k$|⁠. Let |$\mathcal{T}={\mathcal{T}}_{\mathrm{tr}}\cup{\mathcal{T}}_{\mathrm{ts}}$| (⁠|${\mathcal{T}}_{\mathrm{tr}}\cap{\mathcal{T}}_{\mathrm{ts}}=\varnothing$|⁠) be the total tasks set. |${\mathcal{T}}_{\mathrm{tr}}=\left\{{\mathcal{T}}_1,{\mathcal{T}}_2,\dots, {\mathcal{T}}_{\ell}\right\}$| denotes the set of tasks with sufficient samples. |${\mathcal{T}}_{\mathrm{ts}}=\left\{{\mathcal{T}}_{\ell +1},{\mathcal{T}}_{\ell +2},\dots, {\mathcal{T}}_m\right\}$| denotes the set of tasks with few-shot samples. |${\Omega}^{+}$| (⁠|${\Omega}^{-}$|⁠) is the compounds set of inhibitory active (inactive). Each task |${\mathcal{T}}_k=\left\{{S}_{{\mathcal{T}}_k},{Q}_{{\mathcal{T}}_k}\right\}$| for a kinase k consists of a support compound set |${S}_{{\mathcal{T}}_k}$| and a query compound set |${Q}_{{\mathcal{T}}_k}$|⁠, where |${S}_{{\mathcal{T}}_k}\subset{\Omega}_{{\mathcal{T}}_k}^{+}\cup{\Omega}_{{\mathcal{T}}_k}^{-} {Q}_{{\mathcal{T}}_k}\subset{\Omega}_{{\mathcal{T}}_k}^{+}\cup{\Omega}_{{\mathcal{T}}_k}^{-}$| sampled from the set of active or inactive compounds for the HME k, such that the support and query compounds are mutually exclusive, i.e. |${S}_{{\mathcal{T}}_k}\cap{Q}_{{\mathcal{T}}_k}=\varnothing$|

For the convenience of description, we use |$\mathrm{\theta}$| to represent all the learnable parameters in MetaHMEI which includes both the parameters in compound representation learning and prediction model. MetaHMEI consists of two following phases. (i) Meta-training Phase (⁠|${\mathrm{\theta}}^{\prime}\leftarrow \mathrm{MT}\left({\mathcal{T}}_{\mathrm{tr}}|\mathrm{\theta} \right)$|⁠). Starting with random initializing parameters |$\mathrm{\theta}$|⁠, the learned meta parameters |${\mathrm{\theta}}^{\prime }$| are obtained by the meta-training algorithm |$\mathrm{MT}\left(\bullet \right)$| using head tasks |${\mathcal{T}}_{\mathrm{tr}}$| as training tasks. The parameters |${\mathrm{\theta}}^{\prime }$| learned by the |$\mathrm{MT}\left(\bullet \right)$| contain the prior knowledge of all head tasks which is desired to be generalized to all target tasks. For each head task |${\mathcal{T}}_k=\left\{{S}_{{\mathcal{T}}_k},{Q}_{{\mathcal{T}}_k}\right\}\in{\mathcal{T}}_{\mathrm{tr}}$|⁠. The meta-learner adapts the global prior |$\mathrm{\theta}$| to task-specific parameters |${\mathrm{\theta}}_{{\mathcal{T}}_k}^{\prime }$| w.r.t. the loss on the support set |${S}_{{\mathcal{T}}_k}$|

$$\begin{equation} {\mathrm{\theta}}_{{\mathcal{T}}_k}^{\prime}\leftarrow \mathrm{\theta} -\alpha{\nabla}_{\mathrm{\theta}}\mathcal{L}\left({S}_{{\mathcal{T}}_k}|\mathrm{\theta} \right). \end{equation}$$

(5)

The loss function |$\mathcal{L}$| is defined as a cross entropy between label and the predicted output described by Equation (4). Equation (5) is called the inner-loop update process of meta-training. The |$\alpha$| is the inner-loop learning rate and is fixed as a hyperparameter and shared by all meta-training tasks. For simplicity of notation, one gradient update is shown in Equation (5), but using multiple gradient updates is allowed as well.

For each query set |${Q}_{{\mathcal{T}}_k}$|⁠, the loss under the task-specific parameters |${\mathrm{\theta}}_{{\mathcal{T}}_k}^{\prime }$| is calculated and the backward propagation is exploited to update the global |$\mathrm{\theta}$| using the loss sum of all meta-training tasks.

$$\begin{equation} {\mathrm{\theta}}^{\prime}\leftarrow \mathrm{\theta} -\beta{\nabla}_{\mathrm{\theta}}{\sum}_{{\mathcal{T}}_k\in{\mathcal{T}}_{\mathrm{tr}}}\mathcal{L}\left({Q}_{{\mathcal{T}}_k}|\mathrm{\theta} -\alpha{\nabla}_{\mathrm{\theta}}\mathcal{L}\left({S}_{{\mathcal{T}}_k}|\mathrm{\theta} \right)\right). \end{equation}$$

(6)

Equation (6) is called the outer-loop update process of meta-training where |$\beta$| is called the outer-loop learning rate which is fixed as a hyperparameter. Algorithm 1 describes the complete procedure of meta-training.

Algorithm 1. Meta-training of MetaHMEI

Input: the HME tasks with sufficient samples, denoted by |${\mathbf{\mathcal{T}}}_{\mathrm{tr}}=\left\{{\mathcal{T}}_1,{\mathcal{T}}_2,\dots, {\mathcal{T}}_{\ell}\right\}$|⁠.

Output: the meta parameters |${\boldsymbol{\mathrm{\theta}}}^{\prime }$|⁠.

1: Initialize the meta parameter |$\boldsymbol{\mathrm{\theta}},$|and the learn rates |$\alpha, \beta$|⁠.

2: while not converge do

3: Sample batch of tasks |$\mathcal{B}$| from |${\mathcal{T}}_{\mathrm{ts}}$|⁠;

4: for each |${\mathcal{T}}_k\in \mathcal{B}$|do

5: Obtaining the parameters specific to the task |${\mathcal{T}}_k$| by the process of gradient descent:

|${\boldsymbol{\mathrm{\theta}}}_{{\mathcal{T}}_k}^{\prime}\leftarrow \boldsymbol{\mathrm{\theta}} -\alpha{\nabla}_{\boldsymbol{\mathrm{\theta}}}\mathcal{L}\left({S}_{{\mathcal{T}}_k}|\boldsymbol{\mathrm{\theta}} \right)$|⁠;

6: end for

7: Update the meta parameters via gradient descent:

|${\boldsymbol{\mathrm{\theta}}}^{\prime}\boldsymbol{\leftarrow}\boldsymbol{\mathrm{\theta} } -\beta{\nabla}_{\boldsymbol{\mathrm{\theta}}}{\sum}_{{\mathcal{T}}_k\in{\mathcal{T}}_{\mathrm{tr}}}\mathcal{L}\left({Q}_{{\mathcal{T}}_k}|{\boldsymbol{\mathrm{\theta}}}_{{\mathcal{T}}_k}^{\prime}\right)$|⁠;

8: end while

9: output|${\boldsymbol{\mathrm{\theta}}}^{\prime }$|

(ii) Few-shot Adaptation Phase (⁠|${\mathrm{\theta}}_j^{\prime \prime}\leftarrow \mathrm{apt}\big({\mathcal{T}}_j|{S}_{{\mathcal{T}}_j},{\mathrm{\theta}}^{\prime}\big)$|⁠). For each tail task |${\mathcal{T}}_j\in{\mathcal{T}}_{\mathrm{ts}}$|⁠, the support set |${S}_{{\mathcal{T}}_j}$| still contains a small number of active and inactive compounds for HME |$j$|⁠. The MetaHMEI adapts the prior |${\mathrm{\theta}}^{\prime }$| learned during meta-training stage via one or a few gradient steps w.r.t. its support set |${S}_{{\mathcal{T}}_j}$| and finally obtains the parameters|${\mathrm{\theta}}_j^{\prime \prime }$| specific to the task |${\mathcal{T}}_j$|⁠, i.e.

$$\begin{equation} {\mathrm{\theta}}_j^{\prime \prime}\leftarrow \mathrm{\theta} -\alpha{\nabla}_{\mathrm{\theta}}\mathcal{L}\left({S}_{{\mathcal{T}}_j}|{\mathrm{\theta}}^{\prime}\right). \end{equation}$$

(7)

Now, each few-shot HME prediction task |${\mathcal{T}}_j$| corresponds to the specific model parameters |${\mathrm{\theta}}_j^{\prime \prime }$|⁠, which includes the parameters of compound representation learning and prediction model. Active compounds specific to the HME |$j$| can be predicted using this model.

EXPERIMENTAL RESULTS

In this section, we conduct extensive experiments on collected datasets of HME to compare performances of different models and analyze and discuss the result of the experiment.

Experimental setup

We carry on experiments on three datasets of histone-modifying enzymes (HDM, HDAC and HMT, as the number of HAT tasks available for experimental is too small, we exclude them in the validation experiment) to evaluate the proposed method. In the experiment, we selected several HMEs with sufficient samples in an enzyme family as the meta training task, and several HMEs with insufficient samples as the meta test task. In each task, the relation-ship between compounds and HMEs can be divided into active instances and inactive instances. An active instance means that this compound is active on this HME, and an inactive instance means that this compound is inactive on this HME, and each compound is uniquely represented by its SMILES. The details of experimental datasets are listed in Table 1.

Table 1

Open in new tab

The statistics of experimental datasets. Details of meta-training tasks and meta-testing tasks in HDM, HDAC and HMT groups

Group	Meta-training tasks			Meta-testing tasks
Group	HMEs (tasks)	Active	Inactive	HMEs (tasks)	Active	Inactive
HDM	JMJD2	6619	44976	JMJD3	51	149
	KDM4E	3976	35455	KDM4D	25	20
	LSD1	830	620	PHF8	69	40
	KDM4C	563	227	KDM2A	75	34
	KDM5A	406	177	JMJD4	25	47
HDAC	HDAC1	2966	609	HDAC11	312	124
	HDAC6	1906	474	HDAC10	298	117
	HDAC3	1028	306	HDAC4	233	303
	HDAC8	1016	439	HDAC7	202	214
	HDAC2	983	247	HDAC5	197	214
	SIRT2	613	1118	HDAC9	162	144
	SIRT1	316	1083	SIRT3	158	371
HMT	EZH2	453	156	SMYD3	86	10
	PRMT5	364	47	PRMT6	95	42
	WHSC1	250	338	PRMT3	65	42
	CARM1	159	57	SMYD2	38	13
	DOT1L	107	112	PRMT8	35	13
	PRMT1	71	164	SETD8	13	67

Group	Meta-training tasks			Meta-testing tasks
Group	HMEs (tasks)	Active	Inactive	HMEs (tasks)	Active	Inactive
HDM	JMJD2	6619	44976	JMJD3	51	149
	KDM4E	3976	35455	KDM4D	25	20
	LSD1	830	620	PHF8	69	40
	KDM4C	563	227	KDM2A	75	34
	KDM5A	406	177	JMJD4	25	47
HDAC	HDAC1	2966	609	HDAC11	312	124
	HDAC6	1906	474	HDAC10	298	117
	HDAC3	1028	306	HDAC4	233	303
	HDAC8	1016	439	HDAC7	202	214
	HDAC2	983	247	HDAC5	197	214
	SIRT2	613	1118	HDAC9	162	144
	SIRT1	316	1083	SIRT3	158	371
HMT	EZH2	453	156	SMYD3	86	10
	PRMT5	364	47	PRMT6	95	42
	WHSC1	250	338	PRMT3	65	42
	CARM1	159	57	SMYD2	38	13
	DOT1L	107	112	PRMT8	35	13
	PRMT1	71	164	SETD8	13	67

Table 1

Open in new tab

The statistics of experimental datasets. Details of meta-training tasks and meta-testing tasks in HDM, HDAC and HMT groups

Group	Meta-training tasks			Meta-testing tasks
Group	HMEs (tasks)	Active	Inactive	HMEs (tasks)	Active	Inactive
HDM	JMJD2	6619	44976	JMJD3	51	149
	KDM4E	3976	35455	KDM4D	25	20
	LSD1	830	620	PHF8	69	40
	KDM4C	563	227	KDM2A	75	34
	KDM5A	406	177	JMJD4	25	47
HDAC	HDAC1	2966	609	HDAC11	312	124
	HDAC6	1906	474	HDAC10	298	117
	HDAC3	1028	306	HDAC4	233	303
	HDAC8	1016	439	HDAC7	202	214
	HDAC2	983	247	HDAC5	197	214
	SIRT2	613	1118	HDAC9	162	144
	SIRT1	316	1083	SIRT3	158	371
HMT	EZH2	453	156	SMYD3	86	10
	PRMT5	364	47	PRMT6	95	42
	WHSC1	250	338	PRMT3	65	42
	CARM1	159	57	SMYD2	38	13
	DOT1L	107	112	PRMT8	35	13
	PRMT1	71	164	SETD8	13	67

Group	Meta-training tasks			Meta-testing tasks
Group	HMEs (tasks)	Active	Inactive	HMEs (tasks)	Active	Inactive
HDM	JMJD2	6619	44976	JMJD3	51	149
	KDM4E	3976	35455	KDM4D	25	20
	LSD1	830	620	PHF8	69	40
	KDM4C	563	227	KDM2A	75	34
	KDM5A	406	177	JMJD4	25	47
HDAC	HDAC1	2966	609	HDAC11	312	124
	HDAC6	1906	474	HDAC10	298	117
	HDAC3	1028	306	HDAC4	233	303
	HDAC8	1016	439	HDAC7	202	214
	HDAC2	983	247	HDAC5	197	214
	SIRT2	613	1118	HDAC9	162	144
	SIRT1	316	1083	SIRT3	158	371
HMT	EZH2	453	156	SMYD3	86	10
	PRMT5	364	47	PRMT6	95	42
	WHSC1	250	338	PRMT3	65	42
	CARM1	159	57	SMYD2	38	13
	DOT1L	107	112	PRMT8	35	13
	PRMT1	71	164	SETD8	13	67

We conduct experiments ten times with ten random seeds based on 10-fold cross validation (10-CV) for each task and calculate the average of AUC to evaluate the performance of each task. The closer the AUC is to 1, the better the prediction performance of the task. According to experimental result, we set the gradient descent step size |$\alpha$| of support set to 0.004, and set the gradient descent step size |$\beta$| of query set to 0.0001. The support set size is set as 5 and the query set size as 256 for all meta-training tasks.

A three-layer MLPs on all tasks of the HDM dataset, and a two-layer MLPs on all tasks of all other datasets. The values of the transformation matrix of the self-attention mechanism are initialized to 1.

The experimental code is implemented based on the opensource machine learning framework Pytorch (https://pytorch.org). All experiments are carried on Windows 10 operation system with a Dell Precision T5820 workstation computer of an intel W-2245 8 cores 3.91GHz CPU and 128G memory and a NVIDIA TITAN RTX 24G GPU.

Baselines

The following two recent methods about molecule properties prediction are used as baselines to compare the performance with our proposed method.

(i) MolTrans [22]: it regards compounds and proteins as linear structures and makes full use of massive un-labeled bio-medical data to learn substructure information through frequent continuous subsequences (FCS). It constructs an augmented transformer encoder to get interaction matrices between compounds and proteins; finally, the prediction results of the relationship between compounds and proteins are obtained by extracting information from the interaction matrixes through CNN [23].

(ii) MetaMGNN [24]: the first meta-learning-based method for molecule properties prediction. It regards compounds as graph structures and learns the latent representation of compounds through the pre-trained graph neural network (preGNN) [25]. Learning the parameters of a model through gradient-based meta-learning algorithm MAML [17]. In order to enhance the meta-training process, it used a self-supervised module to make full use of the unlabeled information in the molecular graph.

In order to comprehensively evaluate the performance of MetaHMEI, in addition to two recent research works [22, 24] published by others as comparison methods, we also designed and implemented two variant methods of MetaHMEI named MetaECFP and TransferHMEI to compare their prediction performance with MetaHMEI.

(iii) MetaECFP: it is based on extended connectivity fingerprints (ECFP) of the Morgan algorithm, and encodes heavy atoms (i.e. non-hydrogens) into multiple annular layers of a given diameter. In MetaECFP, the ECFPs of compounds are obtained by invoking rdkit (www.rdkit.org) and multi-layer MLPs are used as the active predictive classifier. Then, HMEs with sufficient samples are used as meta-training tasks to train multi-layer MLPs to obtain meta parameters. The model of the target task with insufficient samples is obtained by adapting the meta parameters via small samples. It is worth noting that the major difference between MetaECFP and MetaHMEI is that the two methods use different compound representations, i.e. MetaECFP simply uses ECFP fingerprints whereas MetaHMEI uses pretraining and Transformer based compound representations.

(iv) TransferHMEI: It uses the same pre-training and Transformer based compound representation learning model as MetaHMEI. However, TransferHMEI is a model based on transfer learning [26], that is, the model is first pre-trained on the tasks with a sufficient number of samples, and then, the model is adapted to the small samples to obtain the target task model.

Factors influencing the performance of TrustworthyCPI

Three hyperparameters are used in MetaHMEI, (1) |$r$| is used to control the radius around each atom to be regarded as a substructure in the process of substructure division. When r is equal to 0, each atom itself is considered a substructure. (2) |$s$| is used to control the size of the support set in each iteration during the meta-training process. (3) |$q$| is used to control the size of the query set in each iteration during the meta-training process. Experiments are carried out on HMEDB to investigate the effects of these factors on the performance of MetaHMEI.

We fix the parameters |$s=5$|⁠, |$q=256$|⁠, and set |$r$| to 0,1,2,3, respectively, and conduct experiments on the HDM, and HDAC dataset. The experimental results are shown in Figure 2. The y-axis represents the average AUC of the model on five test tasks in the HDM dataset and seven test tasks in the HDAC dataset under different |$r$| values. From Figure 2, we can see that when the division radius of our substructure is set to 0, that is, when each atom itself is regarded as a substructure, the model has the worst performance on HDM and HDAC, and when the radius is set to 1, the best performance is achieved both on HDM and HDAC.

Figure 2

The influence of various r of ECFP on performance of MetaHMEI.

Open in new tab Download slide

We fix the other parameters |$r=1$|⁠, |$q=256$|⁠, and set |$s$| to 5, 10, 30, 50, 70, respectively, and conduct experiments on the HDM, HDAC dataset. The experimental results are shown in Figure 3. The y-axis represents the average AUC of the model on five test tasks in HDM dataset and seven test tasks in HDAC dataset under different |$s$| values From Figure 3, we can see that the performance of the model on HDM and HDAC decreases gradually with the increase of the size of the support set.

Figure 3

The influence of various size of support set s on performance of MetaHMEI.

Open in new tab Download slide

We conduct experiments on HDM, HDAC dataset with the parameters |$r=1$|⁠, |$s=5$|⁠, and |$q$| as 16, 32, 64, 128, 256, 384, respectively. The experimental results are shown in Figure 4. The y-axis represents the average AUC of the model on five test tasks in HDM dataset and seven test tasks in HDAC dataset under different |$q$| values. From Figure 4, we can see that the performance of the model on HDM and HDAC gradually increases with the increase of the size of the query set. When |$q=256$|⁠, the performance reaches the maximum value, and then decreases with the continuous increase of the size of the query set.

Figure 4

The influence of various size of query set q on performance of MetaHMEI.

Open in new tab Download slide

Predictive performance of MetaHMEI

In these experiments, we compare the predictive performance of MetaHMEI with that of other baselines on three groups of HMEs, i.e. HDM, HDAC and HMT where for HDM, HDAC and HMT 5/5, 7/7 and 6/6 meta-training tasks/few-shot test tasks (as described in Table 1) are, respectively, used to train the model and evaluate the performance. We compare the prediction performance of all methods in two few-shot sample cases, i.e. 1-shot and 5-shot. In 1-shot case, the task model only uses one active sample and one inactive sample to adapt the meta model (or pre-training model). In 5-shot case, the task model only uses five active samples and five inactive samples to adapt the meta model (or pre-training model).

As shown in Tables 2 and 3, we can see that our proposed MetaHMEI achieves obviously better predictive performance than other baselines in metrics of AUC under two few-shot adaptation cases. Especially, in 1-shot case, our method is 1.82%, 5.75% and 4.89% better than the best comparison baseline MetaMGNN [23] on HDM, HDAC and HMT task groups, respectively. In 5-shot case. Our method is 4.85%, 6.85% and 3.06% better than the best comparison baseline MetaMGNN on HDM, HDAC and HMT task groups, respectively.

Table 2

Open in new tab

The performance comparison with baselines in term of AUC (1-shot)

Group	Task	MolTrans[19]	MetaMGNN[21]	TransferHIA	MetaECFP4	MetaHMEI
HDM	JMJD3	69.97	79.12	63.02	63.57	74.12
	KDM4D	67.10	73.76	72.35	73.61	75.47
	PHF8	59.99	84.95	83.14	91.36	92.43
	KDM2A	61.79	59.11	62.74	55.58	68.66
	JMJD4	61.86	73.46	76.04	76.27	68.79
	AVG	64.14	74.07	71.46	72.07	75.89(+1.82)
HDAC	HDAC11	65.27	75.23	79.62	79.54	81.04
	HDAC10	56.64	69.19	75.26	75.37	77.75
	HDAC4	53.58	68.91	63.84	69.47	73.40
	HDAC7	61.56	72.56	73.11	80.63	78.98
	HDAC5	53.20	73.29	71.17	77.07	78.89
	HDAC9	55.97	71.79	71.60	79.50	78.58
	SIRT3	59.42	85.02	68.37	82.72	87.63
	AVG	57.94	73.71	71.85	77.75	79.46(+5.75)
HMT	SMYD3	71.24	85.03	85.93	87.30	91.17
	PRMT6	73.04	81.33	83.05	80.45	87.30
	PRMT3	63.98	65.16	46.50	60.89	59.33
	SMYD2	73.87	69.48	62.74	73.28	72.72
	PRMT8	69.85	87.25	89.68	92.96	93.50
	SETD8	71.59	51.13	52.63	49.54	64.71
	AVG	70.59	73.23	70.09	74.07	78.12(+4.89)

Group	Task	MolTrans[19]	MetaMGNN[21]	TransferHIA	MetaECFP4	MetaHMEI
HDM	JMJD3	69.97	79.12	63.02	63.57	74.12
	KDM4D	67.10	73.76	72.35	73.61	75.47
	PHF8	59.99	84.95	83.14	91.36	92.43
	KDM2A	61.79	59.11	62.74	55.58	68.66
	JMJD4	61.86	73.46	76.04	76.27	68.79
	AVG	64.14	74.07	71.46	72.07	75.89(+1.82)
HDAC	HDAC11	65.27	75.23	79.62	79.54	81.04
	HDAC10	56.64	69.19	75.26	75.37	77.75
	HDAC4	53.58	68.91	63.84	69.47	73.40
	HDAC7	61.56	72.56	73.11	80.63	78.98
	HDAC5	53.20	73.29	71.17	77.07	78.89
	HDAC9	55.97	71.79	71.60	79.50	78.58
	SIRT3	59.42	85.02	68.37	82.72	87.63
	AVG	57.94	73.71	71.85	77.75	79.46(+5.75)
HMT	SMYD3	71.24	85.03	85.93	87.30	91.17
	PRMT6	73.04	81.33	83.05	80.45	87.30
	PRMT3	63.98	65.16	46.50	60.89	59.33
	SMYD2	73.87	69.48	62.74	73.28	72.72
	PRMT8	69.85	87.25	89.68	92.96	93.50
	SETD8	71.59	51.13	52.63	49.54	64.71
	AVG	70.59	73.23	70.09	74.07	78.12(+4.89)

The performance comparison of all methods in 1-shot case. Our method is 1.82%, 5.75% and 4.89% better than the best baseline MetaMGNN on HDM, HDAC and HMT task groups. The best score is bold. The second score is marked underline.

Table 2

Open in new tab

The performance comparison with baselines in term of AUC (1-shot)

Group	Task	MolTrans[19]	MetaMGNN[21]	TransferHIA	MetaECFP4	MetaHMEI
HDM	JMJD3	69.97	79.12	63.02	63.57	74.12
	KDM4D	67.10	73.76	72.35	73.61	75.47
	PHF8	59.99	84.95	83.14	91.36	92.43
	KDM2A	61.79	59.11	62.74	55.58	68.66
	JMJD4	61.86	73.46	76.04	76.27	68.79
	AVG	64.14	74.07	71.46	72.07	75.89(+1.82)
HDAC	HDAC11	65.27	75.23	79.62	79.54	81.04
	HDAC10	56.64	69.19	75.26	75.37	77.75
	HDAC4	53.58	68.91	63.84	69.47	73.40
	HDAC7	61.56	72.56	73.11	80.63	78.98
	HDAC5	53.20	73.29	71.17	77.07	78.89
	HDAC9	55.97	71.79	71.60	79.50	78.58
	SIRT3	59.42	85.02	68.37	82.72	87.63
	AVG	57.94	73.71	71.85	77.75	79.46(+5.75)
HMT	SMYD3	71.24	85.03	85.93	87.30	91.17
	PRMT6	73.04	81.33	83.05	80.45	87.30
	PRMT3	63.98	65.16	46.50	60.89	59.33
	SMYD2	73.87	69.48	62.74	73.28	72.72
	PRMT8	69.85	87.25	89.68	92.96	93.50
	SETD8	71.59	51.13	52.63	49.54	64.71
	AVG	70.59	73.23	70.09	74.07	78.12(+4.89)

Group	Task	MolTrans[19]	MetaMGNN[21]	TransferHIA	MetaECFP4	MetaHMEI
HDM	JMJD3	69.97	79.12	63.02	63.57	74.12
	KDM4D	67.10	73.76	72.35	73.61	75.47
	PHF8	59.99	84.95	83.14	91.36	92.43
	KDM2A	61.79	59.11	62.74	55.58	68.66
	JMJD4	61.86	73.46	76.04	76.27	68.79
	AVG	64.14	74.07	71.46	72.07	75.89(+1.82)
HDAC	HDAC11	65.27	75.23	79.62	79.54	81.04
	HDAC10	56.64	69.19	75.26	75.37	77.75
	HDAC4	53.58	68.91	63.84	69.47	73.40
	HDAC7	61.56	72.56	73.11	80.63	78.98
	HDAC5	53.20	73.29	71.17	77.07	78.89
	HDAC9	55.97	71.79	71.60	79.50	78.58
	SIRT3	59.42	85.02	68.37	82.72	87.63
	AVG	57.94	73.71	71.85	77.75	79.46(+5.75)
HMT	SMYD3	71.24	85.03	85.93	87.30	91.17
	PRMT6	73.04	81.33	83.05	80.45	87.30
	PRMT3	63.98	65.16	46.50	60.89	59.33
	SMYD2	73.87	69.48	62.74	73.28	72.72
	PRMT8	69.85	87.25	89.68	92.96	93.50
	SETD8	71.59	51.13	52.63	49.54	64.71
	AVG	70.59	73.23	70.09	74.07	78.12(+4.89)

The performance comparison of all methods in 1-shot case. Our method is 1.82%, 5.75% and 4.89% better than the best baseline MetaMGNN on HDM, HDAC and HMT task groups. The best score is bold. The second score is marked underline.

Table 3

Open in new tab

The performance comparison with baselines in term of AUC (5-shot)

Group	Task	MolTrans[19]	MetaMGNN[21]	TransferHIA	MetaECFP4	MetaHMEI
HDM	JMJD3	77.46	83.84	83.34	67.62	86.79
	KDM4D	78.00	75.00	70.38	70.06	74.96
	PHF8	68.48	89.59	87.94	91.38	93.79
	KDM2A	67.24	59.65	66.76	53.64	69.47
	JMJD4	72.14	78.30	81.35	80.57	85.6
	AVG	72.66	77.27	77.95	72.65	82.12(+4.85)
HDAC	HDAC11	51.62	72.57	82.07	80.00	85.62
	HDAC10	55.52	70.37	76.79	75.38	81.44
	HDAC4	58.69	71.49	72.22	69.64	78.14
	HDAC7	61.28	76.77	74.12	79.60	78.03
	HDAC5	54.54	72.44	76.20	76.77	79.09
	HDAC9	61.17	73.20	73.71	78.62	81.1
	SIRT3	64.48	87.52	83.69	84.74	88.78
	AVG	58.18	74.90	76.97	77.82	81.75(+6.85)
HMT	SMYD3	78.27	89.50	91.90	87.03	91.97
	PRMT6	85.10	86.89	86.03	81.00	90.05
	PRMT3	75.31	72.52	59.51	61.28	65.9
	SMYD2	70.45	68.93	73.07	76.81	81.93
	PRMT8	67.08	86.37	92.35	92.29	92.79
	SETD8	71.77	67.27	65.32	62.20	67.4
	AVG	74.66	78.58	78.02	76.76	81.64(+3.06)

Group	Task	MolTrans[19]	MetaMGNN[21]	TransferHIA	MetaECFP4	MetaHMEI
HDM	JMJD3	77.46	83.84	83.34	67.62	86.79
	KDM4D	78.00	75.00	70.38	70.06	74.96
	PHF8	68.48	89.59	87.94	91.38	93.79
	KDM2A	67.24	59.65	66.76	53.64	69.47
	JMJD4	72.14	78.30	81.35	80.57	85.6
	AVG	72.66	77.27	77.95	72.65	82.12(+4.85)
HDAC	HDAC11	51.62	72.57	82.07	80.00	85.62
	HDAC10	55.52	70.37	76.79	75.38	81.44
	HDAC4	58.69	71.49	72.22	69.64	78.14
	HDAC7	61.28	76.77	74.12	79.60	78.03
	HDAC5	54.54	72.44	76.20	76.77	79.09
	HDAC9	61.17	73.20	73.71	78.62	81.1
	SIRT3	64.48	87.52	83.69	84.74	88.78
	AVG	58.18	74.90	76.97	77.82	81.75(+6.85)
HMT	SMYD3	78.27	89.50	91.90	87.03	91.97
	PRMT6	85.10	86.89	86.03	81.00	90.05
	PRMT3	75.31	72.52	59.51	61.28	65.9
	SMYD2	70.45	68.93	73.07	76.81	81.93
	PRMT8	67.08	86.37	92.35	92.29	92.79
	SETD8	71.77	67.27	65.32	62.20	67.4
	AVG	74.66	78.58	78.02	76.76	81.64(+3.06)

The performance comparison of all methods in 5-shot case. Our method is 4.85%, 6.85% and 3.06% better than the best baseline MetaMGNN on HDM, HDAC and HMT task groups. The best score is bold. The second score is marked underline.

Table 3

Open in new tab

The performance comparison with baselines in term of AUC (5-shot)

Group	Task	MolTrans[19]	MetaMGNN[21]	TransferHIA	MetaECFP4	MetaHMEI
HDM	JMJD3	77.46	83.84	83.34	67.62	86.79
	KDM4D	78.00	75.00	70.38	70.06	74.96
	PHF8	68.48	89.59	87.94	91.38	93.79
	KDM2A	67.24	59.65	66.76	53.64	69.47
	JMJD4	72.14	78.30	81.35	80.57	85.6
	AVG	72.66	77.27	77.95	72.65	82.12(+4.85)
HDAC	HDAC11	51.62	72.57	82.07	80.00	85.62
	HDAC10	55.52	70.37	76.79	75.38	81.44
	HDAC4	58.69	71.49	72.22	69.64	78.14
	HDAC7	61.28	76.77	74.12	79.60	78.03
	HDAC5	54.54	72.44	76.20	76.77	79.09
	HDAC9	61.17	73.20	73.71	78.62	81.1
	SIRT3	64.48	87.52	83.69	84.74	88.78
	AVG	58.18	74.90	76.97	77.82	81.75(+6.85)
HMT	SMYD3	78.27	89.50	91.90	87.03	91.97
	PRMT6	85.10	86.89	86.03	81.00	90.05
	PRMT3	75.31	72.52	59.51	61.28	65.9
	SMYD2	70.45	68.93	73.07	76.81	81.93
	PRMT8	67.08	86.37	92.35	92.29	92.79
	SETD8	71.77	67.27	65.32	62.20	67.4
	AVG	74.66	78.58	78.02	76.76	81.64(+3.06)

Group	Task	MolTrans[19]	MetaMGNN[21]	TransferHIA	MetaECFP4	MetaHMEI
HDM	JMJD3	77.46	83.84	83.34	67.62	86.79
	KDM4D	78.00	75.00	70.38	70.06	74.96
	PHF8	68.48	89.59	87.94	91.38	93.79
	KDM2A	67.24	59.65	66.76	53.64	69.47
	JMJD4	72.14	78.30	81.35	80.57	85.6
	AVG	72.66	77.27	77.95	72.65	82.12(+4.85)
HDAC	HDAC11	51.62	72.57	82.07	80.00	85.62
	HDAC10	55.52	70.37	76.79	75.38	81.44
	HDAC4	58.69	71.49	72.22	69.64	78.14
	HDAC7	61.28	76.77	74.12	79.60	78.03
	HDAC5	54.54	72.44	76.20	76.77	79.09
	HDAC9	61.17	73.20	73.71	78.62	81.1
	SIRT3	64.48	87.52	83.69	84.74	88.78
	AVG	58.18	74.90	76.97	77.82	81.75(+6.85)
HMT	SMYD3	78.27	89.50	91.90	87.03	91.97
	PRMT6	85.10	86.89	86.03	81.00	90.05
	PRMT3	75.31	72.52	59.51	61.28	65.9
	SMYD2	70.45	68.93	73.07	76.81	81.93
	PRMT8	67.08	86.37	92.35	92.29	92.79
	SETD8	71.77	67.27	65.32	62.20	67.4
	AVG	74.66	78.58	78.02	76.76	81.64(+3.06)

The performance comparison of all methods in 5-shot case. Our method is 4.85%, 6.85% and 3.06% better than the best baseline MetaMGNN on HDM, HDAC and HMT task groups. The best score is bold. The second score is marked underline.

DISCUSSION

The satisfactory performance of MetaHMEI can be explained by the following factors. First, we can see from the comparison results that the models based on effective knowledge transfer between tasks achieve better prediction results than the model without knowledge transfer (i.e. MolTrans versus other methods). Second, in the case of limited samples, the meta-learning model using ‘learn to learn’ has better generalization performance than the transfer learning model using ‘learn then fine-tune’ as seen from the comparison results between MetaHMEI and TransferHMEI. Finally, as shown in comparison result of MetaHMEI versus MetaECFP and MetaMGNN, both the high-quality molecular substructure embedding obtained on a large number of unlabeled compounds through the self-supervised pre-training method and the compound feature representation learned by the Transformer encoder also effectively improves the prediction performance of MetaHMEI.

Case study

JMJD3 (KDM6B), as a member of the jumonji family of HDM, is responsible for removing the trimethylation on histone H3 lysine 27 (H3K27me3-specific demethylase). The function of JMJD3 is correlated with human diseases such as immune system diseases, cancer, infectious diseases, etc., and JMJD3 has emerged as a potential therapeutic target, especially for inflammation and auto-immune diseases [27, 28]. Although highly efficient inhibitors GSK-J1/J4 has been reported in 2012, achieving the half maximum inhibitory concentration (IC₅₀) against JMJD3 of 60 nM, there are less than 100 inhibitors (IC₅₀ < 10 μM) recorded in the ChEMBL database, and still no JMJD3 inhibitor has been approved for human disease therapeutics [29]. In this case, the discovery of diverse and effective JMJD3 inhibitors is essential for the development of drug leads targeting this enzyme, as well as the further understanding of the function of HDM.

To further evaluate the validity of our proposed model for histone inhibitor screening, we conduct a case study on JMJD3, and the screening process is shown in Figure 5.

Figure 5

Virtual screening protocol for the case study on JMJD3.

Open in new tab Download slide

We adapted the MetaHMEI on the small number of known samples targeting JMJD3 and obtained a model for JMJD3 inhibitor prediction. The model was then used to predict the JMJD3 inhibitory activity of 500 000 compounds in the TopScience database. Then, 5000 compounds with the highest prediction score were selected for molecular docking (detail was presented in the supporting information.) To ensure the structural diversity of the compounds selected for the biochemical experiment, the top 5% ranked molecules among the 5000 candidates (250 molecules) were clustered into 40 classes according to ECFP6 by Discovery Studio. Then, 0–2 molecules were selected from each class, resulting in 15 compounds bought from the TopScience Co. Ltd. for inhibitory test against JMJD3.

The enzymatic inhibitory activity of the selected compounds against JMJD3 was tested by ChemPartner Co. Ltd., Shanghai, People’s Republic of China. For preliminary screening, the inhibition rates at single compound concentrations (50 μM) were tested in duplicates. For IC₅₀ estimations, five concentrations were measured using AlphaLisa assay for each compound, with a starting point of 50 μM and gradient 5-fold dilution. GSK-J1 was used as the reference compound. The raw data of enzymatic inhibitory assay were provided as supporting information (Table S1, Figure S1, see Supplementary Data available online at https://github.com/ljatynu/MetaHMEI/).

As shown in Figure 6, compounds 2, 7 and 13 exhibited potential activity with the IC₅₀ of 7.64, 2.99 and 6.27 μM, respectively. The interaction mode between these compounds and JMJD3 was investigated via molecular docking (Figure 7). Compounds 2 and 7 make three hydrogen bonds with K1381, N1400 and R1246 of JMJD3 catalytic pocket, but only one chelating interaction with Co²⁺. Compound 13, although loses the hydrogen bond with R1246, makes the bidentate interaction with Co²⁺, which is like that for the highly efficient inhibitors GSK-J1/J4 [29]. This may explain the better inhibitory activity of compound 13, compared with compounds 2 and 7. It is the first report of the JMJD3 inhibitory activity of these three compounds, providing new scaffolds for the further optimization and development of JMJD3 inhibitors.

Figure 6

The structure of 15 compounds selected for enzymatic inhibitory test against JMJD3.

Open in new tab Download slide

Figure 7

Docking pose of compound 2 (A), 7 (B) and 13 (C) in JMJD3. Dashed line represents the polar interaction.

Open in new tab Download slide

CONCLUSION

In this work, we proposed a meta-learning-based prediction method, MetaHMEI, to solve the few-shot problem in HME inhibitor prediction. The experimental results demonstrated that MetaHMEI outperformed existing approaches in predicting kinase-specific phosphorylation sites when training samples are limited.

MetaHMEI treats compounds as sequence structures to learn their eigenvector representation. We make full use of a large amount of unlabeled information and consider the interaction information between substructures. Through meta-learning, our model learns a well-initialized model parameters, which can quickly adapt to inhibitor prediction problem of emerging histones. We conducted sufficient experiments on the dataset we collected to evaluate our model, and compared it with other baselines. The experiments showed that our model was superior to other methods. We also applied our model in the virtual screening process of histone JMJD3 inhibitors and successfully obtained 3 inhibitors, showing the effectiveness of our model.

Key Points

The issue of the few-shot samples challenges the effective training of machine learning prediction models for the discovery of Histone Modifying Enzyme Inhibitors (HMEI).
We constructed a small molecule compound data set for HMEI prediction by collecting and curating compound-target activity data from ChEMBL.
We established a novel meta-learning-based HME inhibitor predictor MetaHMEI which consists of a Self-supervised Pretraining Module, transformer-based representation learning module and meta-learning-based few-shot HME prediction module.
We applied MetaHMEI in the virtual screening process of histone JMJD3 inhibitors and successfully obtained 3 inhibitors, showing the effectiveness of MetaHMEI.

ACKNOWLEDGEMENTS

We thank anonymous reviewers for their valuable suggestions.

FUNDING

National Natural Science Foundation of China (62262072, 82260694, 81903541); Fundamental Research Project of Yunnan Province (202001BB050052, 202201AT070297, 202201AW070012); Open Foundation of Key Laboratory in Media Convergence of Yunnan Province (220225201); Yun Ling Scholar Project to Wei-Lie Xiao, Project of Yunnan Characteristic Plant Screening and R&D Service CXO Platform (2022YKZY001).

Qi Lu is a postgraduate student at the School of Software, Yunnan University. His research focuses on bioinformatics and machine learning.

Ruihan Zhang is currently an associate professor at the Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan University. Her research interests include computer-aided drug design, cheminformatics and bioinformatics.

Hongyuan Zhou is a postgraduate student at the School of Chemical Science and Technology. His research focuses on computer-aided drug design.

Dongyuan Ni is a postgraduate student at the School of Chemical Science and Technology. His research focuses on computer-aided drug design.

Weilie Xiao is currently a professor at the Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Yunnan University. His research focuses on medicinal chemistry and drug discovery.

Jin Li is currently a professor at the School of Software, Yunnan University. His research interests include trustworthy machine learning, meta-learning and bioinformatics.

References

1

Arrowsmith

CH

,

Bountra

C

,

Fish

PV

, et al.

Epigenetic protein families: a new frontier for drug discovery

.

Nat Rev Drug Discov

2012

;

11

:

384

–

400

.

2

Jones

PA

,

Baylin

SB

.

The epigenomics of cancer

.

Cell

2007

;

128

:

683

–

92

.

3

Yao

R

,

Han

D

,

Sun

X

, et al.

Scriptaid inhibits cell survival, cell cycle, and promotes apoptosis in multiple myeloma via epigenetic regulation of p21

.

Exp Hematol

2018

;

60

:

63

–

72

.

4

Liu

KY

,

Wang

LT

,

Hsu

SH

.

Modification of epigenetic histone acetylation in hepatocellular carcinoma

.

Cancer

2018

;

10

:

8

.

Google Scholar

Crossref

WorldCat

5

Eom

GH

,

Kook

H

.

Role of histone deacetylase 2 and its post-translational modifications in cardiac hypertrophy

.

BMB Rep

2015

;

48

:

131

–

8

.

6

Kazantsev

AG

,

Thompson

LM

.

Therapeutic application of histone deacetylase inhibitors for central nervous system disorders

.

Nat Rev Drug Discov

2008

;

7

:

854

–

68

.

7

Williams

SR

,

Aldred

MA

,

Der Kaloustian

VM

, et al.

Haploinsufficiency of HDAC4 causes brachydactyly mental retardation syndrome, with brachydactyly type E, developmental delays, and behavioral problems

.

The American Journal of Human Genetics

2010

;

87

:

219

–

28

.

8

Grabiec

AM

,

Korchynskyi

O

,

Tak

PP

, et al.

Histone deacetylase inhibitors suppress rheumatoid arthritis fibroblast-like synoviocyte and macrophage IL-6 production by accelerating mRNA decay

.

Ann Rheum Dis

2012

;

71

:

424

–

31

.

9

Yoshizaki

T

,

Schenk

S

,

Imamura

T

, et al.

SIRT1 inhibits inflammatory path-ways in macrophages and modulates insulin sensitivity

.

Am J Physiol Endocrinol Metab

2010

;

298

:

e419

–

28

.

10

Yang

Y

,

Zhang

R

,

Li

Z

, et al.

Discovery of highly potent, selective, and orally efficacious p300/CBP histone acetyltransferases inhibitors

.

J Med Chem

2020

;

63

:

1337

–

60

.

11

Norberto

SC

,

Medina-Franco

JL

.

Epigenetic target fishing with accurate machine learning models

.

J Med Chem

2021

;

64

:

8208

–

20

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

12

Norberto

SC

,

Medina-Franco

JL

.

Epigenetic target profiler: a web server to predict epigenetic targets of small molecules

.

J Chem Inf Model

2021

;

61

:

1550

–

4

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

13

Wang

HM

,

Guo

F

,

du

M

, et al.

A novel method for drug-target interaction prediction based on graph transformers model

.

BMC Bioinformatics

2022

;

23

:

459

.

14

Gao

C

,

Xu

S

,

Wang

L

.

An algorithm for protein helix assignment using helix geometry

.

PloS One

2015

;

10

(

7

):

e0129674

eCollection 2015

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

15

Gao

C

,

Xu

S

.

Improving the performance of the PLB index for ligand-binding site prediction using dihedral angles and the solvent-accessible surface area

.

Sci Rep

2016

;

6

:

33232

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

16

Gaulton

A

,

Bellis

LJ

,

Bento

AP

, et al.

ChEMBL: a large-scale bioactivity database for drug discovery

.

Nucleic Acids Res

2012

;

40

:

D1100

–

7

.

17

Finn

C

,

Abbeel

P

,

Levine

S

.

Model-agnostic meta-learning for fast adaptation of deep net-works [C] //international conference on machine learning

.

PMLR

2017

;

70

:

1126

–

35

.

Google Scholar

OpenURL Placeholder Text

WorldCat

18

Jaeger

S

,

Fulle

S

,

Turk

S

.

Mol2vec: unsupervised machine learning approach with chemical intuition

.

J Chem Inf Model

2018

;

58

:

27

–

35

.

19

Rogers

D

,

Hahn

M

.

Extended-connectivity finger-prints

.

J Chem Inf Model

2010

;

50

:

742

–

54

.

20

Mikolov

T

,

Chen

K

,

Corrado

G

, et al.

Efficient estimation of word representations in vector space

. In

Proceedings of the International Conference on Learning Representations (ICLR’13)

.

arXiv preprint arXiv:1301.3781

.

2013

.

21

Vaswani

A

,

Shazeer

N

,

Parmar

N

, et al.

Attention is all you need

.

Adv Neural Inf Process Syst

2017

;

30

:5998–6008.

Google Scholar

OpenURL Placeholder Text

WorldCat

22

Huang

K

,

Xiao

C

,

Glass

LM

, et al.

MolTrans: molecular inter-action transformer for drug–target interaction prediction

.

Bioinformatics

2021

;

37

:

830

–

6

.

23

Krizhevsky

A

,

Sutskever

I

,

Hinton

GE

.

ImageNet classification with deep convolutional neural networks

.

Adv Neural Inf Process Syst

2012

;

25

:1097–1105.

Google Scholar

OpenURL Placeholder Text

WorldCat

24

Guo

Z

,

Zhang

C

,

Yu

W

, et al. Few-shot graph learning for molecular property prediction. In:

Proceedings of the Web Conference 2021

. Ljubljana, Slovenia, ACM/IW3C2,

2021

. pp.

2559

–

67

.

25

Hu

W H

,

Liu

B

,

Gomes

J

, et al. . Strategies for pre-training graph neural networks. In:

International Conference for Learning Representation

. Addis Ababa, Ethiopia,

2020

.

26

Li

X

,

Fourches

D

.

Inductive transfer learning for molecular activity prediction: next-gen QSAR models with MolPMoFiT

.

J Chem

2020

;

12

:

27

.

Google Scholar

OpenURL Placeholder Text

WorldCat

27

Zhang

X

,

Liu

L

,

Yuan

X

, et al.

JMJD3 in the regulation of human diseases

.

Protein Cell

2019

;

10

:

864

–

82

.

28

Cribbs

A

,

Hookway

ES

,

Wells

G

, et al.

Inhibition of histone H3K27 demethylases selectively modulates inflammatory phenotypes of natural killer cells

.

J Biol Chem

2018

;

293

:

2422

–

37

.

29

Kruidenier

L

,

Chung

CW

,

Cheng

Z

, et al.

A selective jumonji H3K27 demethylase inhibitor modulates the proinflammatory macrophage response

.

Nature

2012

;

488

:

404

–

8

.

Author notes

Qi Lu, Ruihan Zhang, Hongyuan Zhou contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
March 2023	82
April 2023	61
May 2023	38
June 2023	48
July 2023	19
August 2023	32
September 2023	35
October 2023	17
November 2023	27
December 2023	36
January 2024	200
February 2024	96
March 2024	108
April 2024	74
May 2024	87
June 2024	86
July 2024	77
August 2024	54
September 2024	58
October 2024	44
November 2024	59
December 2024	66
January 2025	61
February 2025	52
March 2025	100
April 2025	55

Article Contents

MetaHMEI: meta-learning for prediction of few-shot histone modifying enzyme inhibitors

Abstract

INTRODUCTION

MATERIALS

METHODS

The general framework of MetaHMEI

Self-supervised pretraining for substructures of compounds

Meta-learning-based HMEs prediction model

EXPERIMENTAL RESULTS

Experimental setup

Baselines

Factors influencing the performance of TrustworthyCPI

Predictive performance of MetaHMEI

DISCUSSION

Case study

CONCLUSION

ACKNOWLEDGEMENTS

FUNDING

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

MetaHMEI: meta-learning for prediction of few-shot histone modifying enzyme inhibitors

Abstract

INTRODUCTION

MATERIALS

METHODS

The general framework of MetaHMEI

Self-supervised pretraining for substructures of compounds

Meta-learning-based HMEs prediction model

EXPERIMENTAL RESULTS

Experimental setup

Baselines

Factors influencing the performance of TrustworthyCPI

Predictive performance of MetaHMEI

DISCUSSION

Case study

CONCLUSION

ACKNOWLEDGEMENTS

FUNDING

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only