-
PDF
- Split View
-
Views
-
Cite
Cite
Yanqing Yang, Deshan Zhou, Xinben Zhang, Yulong Shi, Jiaxin Han, Liping Zhou, Leyun Wu, Minfei Ma, Jintian Li, Shaoliang Peng, Zhijian Xu, Weiliang Zhu, D3AI-CoV: a deep learning platform for predicting drug targets and for virtual screening against COVID-19, Briefings in Bioinformatics, Volume 23, Issue 3, May 2022, bbac147, https://doi.org/10.1093/bib/bbac147
- Share Icon Share
Abstract
Target prediction and virtual screening are two powerful tools of computer-aided drug design. Target identification is of great significance for hit discovery, lead optimization, drug repurposing and elucidation of the mechanism. Virtual screening can improve the hit rate of drug screening to shorten the cycle of drug discovery and development. Therefore, target prediction and virtual screening are of great importance for developing highly effective drugs against COVID-19. Here we present D3AI-CoV, a platform for target prediction and virtual screening for the discovery of anti-COVID-19 drugs. The platform is composed of three newly developed deep learning-based models i.e., MultiDTI, MPNNs-CNN and MPNNs-CNN-R models. To compare the predictive performance of D3AI-CoV with other methods, an external test set, named Test-78, was prepared, which consists of 39 newly published independent active compounds and 39 inactive compounds from DrugBank. For target prediction, the areas under the receiver operating characteristic curves (AUCs) of MultiDTI and MPNNs-CNN models are 0.93 and 0.91, respectively, whereas the AUCs of the other reported approaches range from 0.51 to 0.74. For virtual screening, the hit rate of D3AI-CoV is also better than other methods. D3AI-CoV is available for free as a web application at http://www.d3pharma.com/D3Targets-2019-nCoV/D3AI-CoV/index.php, which can serve as a rapid online tool for predicting potential targets for active compounds and for identifying active molecules against a specific target protein for COVID-19 treatment.
Introduction
COVID-19 caused by SARS-CoV-2 has become a global pandemic [1–3]. As of 28 December 2021, there have been more than 270 million fatalities caused by the virus [4]. Although the vaccines against COVID-19 have shown great success, immune escape is becoming a real threat as new variants of the virus are emerging from time to time [5, 6]. Besides, there are eight other coronaviruses regarded as potential health threats, viz., severe acute respiratory syndrome coronavirus (SARS) in 2003, Middle East respiratory syndrome coronavirus (MERS) in 2012, human betacoronavirus 2c EMC/2012, human coronavirus 229E, feline infectious peritonitis virus, human coronavirus OC43, human coronavirus NL63 and human coronavirus HKU1. Therefore, identifying effective drug targets and developing effective drugs accordingly to cure COVID-19 as well as other coronaviruses are of great importance.
At the beginning of the COVID-19 outbreak, we developed a web server, namely D3Targets-2019-nCoV (http://www.d3pharma.com/D3Targets-2019-nCoV/index.php), for target prediction and virtual screening against COVID-19. The server is composed of two modules, a structure-based module named D3Docking [7, 8] and a ligand-based module named D3Similarity [9]. Other computational tools were also developed for combating COVID-19 e.g., COVID-19 Docking Server [10], Shennong [11], DockThor-VS [12], Virus-CKB [13], MolAICal [14] and REDIAL-2020 [15]. However, structure-based approaches are in general limited by the availability of their three-dimensional structures, whereas ligand-based approaches are usually hard to reveal the ligand–protein interactions.
Artificial intelligence (AI), especially deep learning (DL), has been applied successfully to drug discovery and design, and it has shown its strength in improving the accuracy. For example, Atomwise, the first DL-based technology for discovering drugs [16], has been successfully applied to discover hit compounds for more than 80 targets. With IBM Watson [17], Pfizer carried out its immuno-oncology drug discovery program at high efficiency [18]. Stokes developed a DL-based model that has identified eight antibacterial compounds from the ZINC15 database [19]. Zhavoronkov developed a deep model (GENTRL) for discovering potent inhibitors of discoidin domain receptor 1 [20]. Likewise, DL also played a key role during the COVID-19 pandemic. For example, Deep Docking has been applied to discover hits against SARS-CoV-2 Mpro from 1.3 billion compounds [21]. COVIDVS-3 DL-based model was used to screen 4.9 million drug-like molecules from the ZINC15 database, discovering a compound as the inhibitor of the 3C-like protease of SARS-CoV-2 [22]. Through DL and AI, baricitinib, atazanavir and other antiviral agents against hepatitis C have been identified as effective anti-COVID-19 agents [23–25]. Apart from the compounds mentioned above, Zhang et al. have discovered 26 herbal plants containing anti-COVID-19 ingredients using molecular docking and network pharmacology analysis [26].
Recently, we developed a multimodal drug–target interaction (DTI) prediction model, ‘MultiDTI’ [27], which projects drug, target, side effect and disease nodes in the heterogeneous network into a common space. If a drug and a target are connected by an edge, the Euclidean distance between them in the common space is adjusted to be closer. A prediction layer is designed to predict the DTI score based on the distance between the drug and the target in the common space. In addition, the graph neural network performs well in analyzing graph- or tree-like structures and can extract the contextual information contained in graph neighborhoods. Molecules can be regarded as molecular graphs, with atoms as nodes and bonds as edges. As a kind of graph neural network, Message Passing Neural Networks (MPNNs) outperform fingerprint-based methods in predicting the properties of small molecules [28, 29]. Therefore, MPNNs might be a complementary approach to MultiDTI.
Considering both the limitations of the conventional structure-based or ligand-based approaches and the unique advantages of DL, we utilized the two approaches in this study to construct MultiDTI and MPNNs-CNN [29, 30] models for target prediction and MPNNs-CNN-R for virtual screening against COVID-19. Together with a validation using an external test, it was found that the accuracy and the efficiency of target prediction and virtual screening with the newly developed DL-based models are tremendously improved in comparison with existing methods.
Materials and methods
Preparation of database including all compounds against pathogenic coronaviruses
Through literature search, a total of 842 molecules with potential activity against nine pathogenic coronaviruses are currently collected in our database. All molecules and their related information in the database are downloadable in sdf format from the http://www.d3pharma.com/D3Targets-2019-nCoV/CoViLigands/index.php webpage. The database will be continuously updated in the future.
Preprocessing of small molecules and protein targets
Firstly, all compound–target pairs collected in our database were used as training set. The canonical simplified molecular-input line-entry system (SMILES) files of all small molecules in the training set were prepared by using Open Babel [31]. The sequence files of all target proteins in the training set were downloaded from the uniprot website [32]. Secondly, all the compounds and target proteins in the training set were indexed. An interaction network was formed between all the compounds and their targets. 0 and 1 were used to represent the interaction between the compounds and the targets in the network. In detail, 0 means there is no interaction between the compound and the target, 1 means there is interaction. The interaction network between the compounds and the targets in the training set was used as the input data for model training.
MultiDTI model for target predicting
We constructed a DTI network, in which the compounds were represented by SMILES and targets were represented by sequence. The drugs and targets were projected into a multimodal common space after obtaining the embedding representation of SMILES and protein sequence. Compound–target pair connected by an edge in the DTI network would have smaller Euclidean distance in the multimodal common space. In detail, n-gram embedding technology was used to obtain ‘words’ of SMILES and protein sequence. We constructed a compound dictionary and a protein dictionary based on all SMILES and protein target sequences in DrugBank and DTI networks. SMILES and protein sequence were vectorized according to the dictionary. A three-layer convolutional neural networks (CNN) was applicated to obtain the regional embedding of each ‘word’ in SMILES and protein sequence. Next, multiple down-sampling residual layers were used to extract more global information. The multilayer perceptron was used to project the representations of drugs and targets into the common space. At last, the Euclidean distance between drug and target in the common space was converted to predicting score. The model continuously adjusts the Euclidean distance between drugs and targets in the common space during the training process based on the compound–target pairs in the training set. The final model is the projection of the DTI network in a multimodal common space. The framework of MultiDTI model is illustrated in Figure 1A. With the purpose of selecting the best architecture to be used in MultiDTI, we tried to use CNN and RNN to extract features for SMILES and FASTA, respectively. As shown in Table S1, the accuracy of CNN (0.86–0.90) is significantly better than that of RNN (0.47–0.52). Next, we also trained and tested models for different CNN layers (Table S2). At the end, we selected three-layers CNN and multiple residual layers to extract features since its performance was better than others.

Frameworks of D3AI-CoV. (A) The framework of MultiDTI model, a three-layer convolutional neural network and multiple down-sampling residual layers is used to extract features of SMILES and protein amino acid sequence, and multilayer perceptron is used to project the representations of drugs and targets into the common space. (B) The framework of MPNNs-CNN model, MPNNs and CNN is used to extract features of compound SMILES and protein sequence, respectively. Multilayer perceptron and logistic algorithm were used to predict potential connections between compounds and targets.
Classification model for target predicting
Compared with MutiDTI model, we used MPNNs to extract the features of small molecules, and then used classification algorithms to explore the potential relationship between all active compounds against pathogenic coronaviruses and their targets. In detail, as shown in Table S3, the atom characteristics including the atom type, the number of atomic bonds, the formal charge, chirality and aromaticity as well as the bond characteristics including the bond type and cis–trans isomerism were obtained by RDKit package [33] and were mapped to tensors. The atoms in the molecule are regarded as nodes and the bonds are regarded as edges. Thereby, a molecule can be represented as a network graph containing a lot of chemical information. In this MPNNs model, after a message passing phase and a readout phase, a molecule as a graph could be extracted as a feature vector that can represent the molecular structure. Meanwhile, as for target protein, we used CNN to extract their feature vectors. Finally, multilayer perceptron and logistic algorithm were constructed to discover potential connections between the compounds and targets. The framework of the MPNNs-CNN model is illustrated in Figure 1B.
Regression model based on activity for virtual screening
System training and test procedures
For each model, all drug–target pairs in the interaction network were used as the dataset for model training and validation. Furthermore, due to the finding of there being a drastic difference between the number of positive drug–target pairs and negative pairs, we oversampled the positive samples by 10 times so as to increase the generalization ability of the models. Next, we carried out 10-fold cross-validation on the prepared drug–target pairs. In detail, 90% of sample pairs selected by stratified sampling were used as training data, and the remaining 10% of samples were used for validation. Each model was optimized by mini-batch gradient descent method. Backpropagation strategy was used to update parameters of the models. Weight decay and dropout method were used in all our models to prevent the neural network from overfitting. To further test the generalization performance of each model, we collected 39 active compounds against COVID-19 and their targets information from the latest published literatures. In addition, we randomly selected 39 Food and Drug Administration (FDA)-approved drugs from the DrugBank [34] based on the molecular weight range of the 39 active compounds and randomly paired them with the protein targets in the database. Accordingly, a dataset, namely Test-78, was constructed based on the 78 compounds for testing the predictive ability of the MultiDTI model and the MPNNs-CNN model, and for comparing D3AI-CoV with other methods. The structures of the 78 compounds are shown in Table S4. And Table 1 summarizes the training, validation and testing datasets.
Performance test metrics
Training and validation dataset (10-fold cross-validation) . | ||
---|---|---|
. | Number of the data . | Percentage of the data . |
Training | 24 336 | 90% |
Validation | 1352 | 10% |
Test-78 | ||
Number of the data | Percentage of the data | |
Positive test set (from literatures) | 39 | 50% |
Negative test set (from DrugBank) | 39 | 50% |
Training and validation dataset (10-fold cross-validation) . | ||
---|---|---|
. | Number of the data . | Percentage of the data . |
Training | 24 336 | 90% |
Validation | 1352 | 10% |
Test-78 | ||
Number of the data | Percentage of the data | |
Positive test set (from literatures) | 39 | 50% |
Negative test set (from DrugBank) | 39 | 50% |
Training and validation dataset (10-fold cross-validation) . | ||
---|---|---|
. | Number of the data . | Percentage of the data . |
Training | 24 336 | 90% |
Validation | 1352 | 10% |
Test-78 | ||
Number of the data | Percentage of the data | |
Positive test set (from literatures) | 39 | 50% |
Negative test set (from DrugBank) | 39 | 50% |
Training and validation dataset (10-fold cross-validation) . | ||
---|---|---|
. | Number of the data . | Percentage of the data . |
Training | 24 336 | 90% |
Validation | 1352 | 10% |
Test-78 | ||
Number of the data | Percentage of the data | |
Positive test set (from literatures) | 39 | 50% |
Negative test set (from DrugBank) | 39 | 50% |


Overview of the database CoViLigands. (A) The interactive diagram categorizes the database according to virus types. (B) Enlarged view of (A). (C) Interactive fan diagram based on all targets in the database.
The workflow
Based on all active compounds against the nine pathogenic coronaviruses in the database, we trained three DL-based models for target prediction and virtual screening. For the MultiDTI model, n-gram was used to obtain ‘words’ of SMILES. A three-layer CNN and multiple down-sampling residual layers were used to extract features of SMILES. The SMILES was then projected to the trained multimodal common space by using multilayer perceptron. Target prediction was achieved by calculating the Euclidean distance between the molecule and all targets in the multimodal common space. For classification model, we used MPNNs to extract the features of small molecule submitted by user. The features were used as classification model input for target prediction.
For user convenience, we developed a webserver, named D3AI-CoV. By submitting small molecules, canonical SMILES files will be generated with Open Babel, and the predicted target proteins of the submitted molecules will be displayed on the web page. The workflow was illustrated in Figure 2.
Results and Discussion
Expanded database of the active compounds against nine pathogenic coronaviruses
Many active compounds against various coronaviruses have been reported since the SARS outbreak in 2003. As for COVID-19, numerous active compounds at the cellular level or in vivo have been discovered but their targets are still unknown. For example, clofazimine, which has been approved as an antileprosy drug by the US FDA, has been found active in treating COVID-19 cases when combined with remdesivir [35]. Rosenke et al. reported that an orally administered nucleoside analog, MK-4482, can inhibit SARS-CoV-2 in vivo [36]. The experiment data of MK-4482 in animals indicate that it is a promising drug to cure COVID-19. 25-Hydroxycholesterol has been reported as a potent SARS-CoV-2 inhibitor [37], its EC50 and the ideal safety profile show potential for further clinical development for COVID-19 treatment. Jan et al. have screened >3000 agents, 15 of which have been identified as inhibitors of SARS-CoV-2 in concentrations ranging from 0.1 nM to 50 μM [38], but no clear target information is available for the 15 inhibitors. Besides, some other drugs such as AT-527 [39], X-206 [40] and ACA [41] are also promising for the treatment of COVID-19, but without clear target information. Apart from chemical drugs, extracts of Ganoderma lucidum (RF3), Perilla frutescens, and Mentha haplocalyx have been found effective against SARS-CoV-2 infection [38]. Glycyrrhizin, a common Chinese herbal medicine, is an efficient and safe natural compound to inhibit SARS-CoV-2 and SARS [42, 43], but without target information either.
After careful review of the literature, we found 842 bioactive molecules and 29 targets against the 9 pathogenic coronaviruses. We collected the information of the 842 compounds, including molecular structures, bioactivities, target proteins, coronavirus types and crystal structures. As shown in Figure 3A, we classified all the active compounds in CoViLigands according to the virus types and made an interactive interface for easy view on the webserver. As shown in Figure 3C, all compounds were classified according to their targets. Details of ligand structures and the associated information of all compounds are provided on the webpage.
Mol ID . | Structure . | Target and Activity . | PMID . | |
---|---|---|---|---|
ICV361 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 24.8 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 10.7 μM) | 22 884 354 |
ICV362 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 21.1 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 9.2 μM) | 22 884 354 |
ICV363 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 38.7 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 8.8 μM) | 22 884 354 |
ICV364 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 14.4 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 4.9 μM) | 22 884 354 |
ICV365 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 21.1 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 30 μM) | 22 884 354 |
ICV366 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 9.35 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 24.1 μM); MERS PL(IC50 = 14.6 μM) | 29 289 665 32 272 481 |
ICV368 | ![]() | 3C-like protease (3CLpro/Mpro) MERS 3CL(IC50 = 36.2 μM) | Papain-like protease (PLpro) MERS PL(IC50 = 42.1 μM) | 28 112 000 |
ICV371 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 30.2 μM) MERS 3CL(IC50 = 34.7 μM) | Papain-like protease (PLpro) MERS PL(IC50 = 48.8 μM) | 28 112 000 |
ICV403 | ![]() | Spike protein (S protein) | Membrane protein (M protein) | 17 704 516 17 560 666 |
ICV646 | ![]() | PI3K SARS-CoV-2 (IC50 = 0.014 μM) Caco2 | mTORC1/2 SARS-CoV-2 (IC50 = 0.014 μM) Caco2 | 32 877 642 |
ICV648 | ![]() | RAF SARS-CoV-2 (IC50 = 0.6 μM) Caco2 | MEK SARS-CoV-2 (IC50 = 0.6 μM) Caco2 | 32 877 642 |
ICV693 | ![]() | p-glycoprotein 1 | TMEM16 | 33 248 195 21 573 958 33 452 205 33 827 113 |
ICV729 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL(Kd = 1.1 μM, IC50 = 13.4 μM) | TMPRSS2 SARS-CoV-2 TMPRSS2(Kd = 1.77 μM, IC50 = 2.31 μM) | 33 415 017 |
ICV734 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL (IC50 = 19.2 μM) | Papain-like protease (PLpro) SARS-CoV-2 PL (IC50 = 15.3 μM) | 33 526 482 |
ICV735 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL (IC50 = 10.4 μM) | Papain-like protease (PLpro) SARS-CoV-2 PL (IC50 = 14.2 μM) | 33 526 482 |
Mol ID . | Structure . | Target and Activity . | PMID . | |
---|---|---|---|---|
ICV361 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 24.8 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 10.7 μM) | 22 884 354 |
ICV362 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 21.1 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 9.2 μM) | 22 884 354 |
ICV363 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 38.7 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 8.8 μM) | 22 884 354 |
ICV364 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 14.4 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 4.9 μM) | 22 884 354 |
ICV365 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 21.1 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 30 μM) | 22 884 354 |
ICV366 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 9.35 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 24.1 μM); MERS PL(IC50 = 14.6 μM) | 29 289 665 32 272 481 |
ICV368 | ![]() | 3C-like protease (3CLpro/Mpro) MERS 3CL(IC50 = 36.2 μM) | Papain-like protease (PLpro) MERS PL(IC50 = 42.1 μM) | 28 112 000 |
ICV371 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 30.2 μM) MERS 3CL(IC50 = 34.7 μM) | Papain-like protease (PLpro) MERS PL(IC50 = 48.8 μM) | 28 112 000 |
ICV403 | ![]() | Spike protein (S protein) | Membrane protein (M protein) | 17 704 516 17 560 666 |
ICV646 | ![]() | PI3K SARS-CoV-2 (IC50 = 0.014 μM) Caco2 | mTORC1/2 SARS-CoV-2 (IC50 = 0.014 μM) Caco2 | 32 877 642 |
ICV648 | ![]() | RAF SARS-CoV-2 (IC50 = 0.6 μM) Caco2 | MEK SARS-CoV-2 (IC50 = 0.6 μM) Caco2 | 32 877 642 |
ICV693 | ![]() | p-glycoprotein 1 | TMEM16 | 33 248 195 21 573 958 33 452 205 33 827 113 |
ICV729 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL(Kd = 1.1 μM, IC50 = 13.4 μM) | TMPRSS2 SARS-CoV-2 TMPRSS2(Kd = 1.77 μM, IC50 = 2.31 μM) | 33 415 017 |
ICV734 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL (IC50 = 19.2 μM) | Papain-like protease (PLpro) SARS-CoV-2 PL (IC50 = 15.3 μM) | 33 526 482 |
ICV735 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL (IC50 = 10.4 μM) | Papain-like protease (PLpro) SARS-CoV-2 PL (IC50 = 14.2 μM) | 33 526 482 |
Mol ID . | Structure . | Target and Activity . | PMID . | |
---|---|---|---|---|
ICV361 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 24.8 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 10.7 μM) | 22 884 354 |
ICV362 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 21.1 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 9.2 μM) | 22 884 354 |
ICV363 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 38.7 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 8.8 μM) | 22 884 354 |
ICV364 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 14.4 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 4.9 μM) | 22 884 354 |
ICV365 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 21.1 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 30 μM) | 22 884 354 |
ICV366 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 9.35 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 24.1 μM); MERS PL(IC50 = 14.6 μM) | 29 289 665 32 272 481 |
ICV368 | ![]() | 3C-like protease (3CLpro/Mpro) MERS 3CL(IC50 = 36.2 μM) | Papain-like protease (PLpro) MERS PL(IC50 = 42.1 μM) | 28 112 000 |
ICV371 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 30.2 μM) MERS 3CL(IC50 = 34.7 μM) | Papain-like protease (PLpro) MERS PL(IC50 = 48.8 μM) | 28 112 000 |
ICV403 | ![]() | Spike protein (S protein) | Membrane protein (M protein) | 17 704 516 17 560 666 |
ICV646 | ![]() | PI3K SARS-CoV-2 (IC50 = 0.014 μM) Caco2 | mTORC1/2 SARS-CoV-2 (IC50 = 0.014 μM) Caco2 | 32 877 642 |
ICV648 | ![]() | RAF SARS-CoV-2 (IC50 = 0.6 μM) Caco2 | MEK SARS-CoV-2 (IC50 = 0.6 μM) Caco2 | 32 877 642 |
ICV693 | ![]() | p-glycoprotein 1 | TMEM16 | 33 248 195 21 573 958 33 452 205 33 827 113 |
ICV729 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL(Kd = 1.1 μM, IC50 = 13.4 μM) | TMPRSS2 SARS-CoV-2 TMPRSS2(Kd = 1.77 μM, IC50 = 2.31 μM) | 33 415 017 |
ICV734 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL (IC50 = 19.2 μM) | Papain-like protease (PLpro) SARS-CoV-2 PL (IC50 = 15.3 μM) | 33 526 482 |
ICV735 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL (IC50 = 10.4 μM) | Papain-like protease (PLpro) SARS-CoV-2 PL (IC50 = 14.2 μM) | 33 526 482 |
Mol ID . | Structure . | Target and Activity . | PMID . | |
---|---|---|---|---|
ICV361 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 24.8 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 10.7 μM) | 22 884 354 |
ICV362 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 21.1 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 9.2 μM) | 22 884 354 |
ICV363 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 38.7 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 8.8 μM) | 22 884 354 |
ICV364 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 14.4 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 4.9 μM) | 22 884 354 |
ICV365 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 21.1 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 30 μM) | 22 884 354 |
ICV366 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 9.35 μM) | Papain-like protease (PLpro) SARS PL(IC50 = 24.1 μM); MERS PL(IC50 = 14.6 μM) | 29 289 665 32 272 481 |
ICV368 | ![]() | 3C-like protease (3CLpro/Mpro) MERS 3CL(IC50 = 36.2 μM) | Papain-like protease (PLpro) MERS PL(IC50 = 42.1 μM) | 28 112 000 |
ICV371 | ![]() | 3C-like protease (3CLpro/Mpro) SARS 3CL(IC50 = 30.2 μM) MERS 3CL(IC50 = 34.7 μM) | Papain-like protease (PLpro) MERS PL(IC50 = 48.8 μM) | 28 112 000 |
ICV403 | ![]() | Spike protein (S protein) | Membrane protein (M protein) | 17 704 516 17 560 666 |
ICV646 | ![]() | PI3K SARS-CoV-2 (IC50 = 0.014 μM) Caco2 | mTORC1/2 SARS-CoV-2 (IC50 = 0.014 μM) Caco2 | 32 877 642 |
ICV648 | ![]() | RAF SARS-CoV-2 (IC50 = 0.6 μM) Caco2 | MEK SARS-CoV-2 (IC50 = 0.6 μM) Caco2 | 32 877 642 |
ICV693 | ![]() | p-glycoprotein 1 | TMEM16 | 33 248 195 21 573 958 33 452 205 33 827 113 |
ICV729 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL(Kd = 1.1 μM, IC50 = 13.4 μM) | TMPRSS2 SARS-CoV-2 TMPRSS2(Kd = 1.77 μM, IC50 = 2.31 μM) | 33 415 017 |
ICV734 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL (IC50 = 19.2 μM) | Papain-like protease (PLpro) SARS-CoV-2 PL (IC50 = 15.3 μM) | 33 526 482 |
ICV735 | ![]() | 3C-like protease (3CLpro/Mpro) SARS-CoV-2 3CL (IC50 = 10.4 μM) | Papain-like protease (PLpro) SARS-CoV-2 PL (IC50 = 14.2 μM) | 33 526 482 |
Introduction of active compounds against SARS-CoV-2 in vivo in D3AI-CoV database
Mol ID . | Structure . | Activity . | PMID . |
---|---|---|---|
ICV487 | ![]() | MERS(EC50 = 7.42 μM) Vero E6 SARS(EC50 = 15.55 μM) Vero E6 SARS-CoV-2(IC50 = 3.2 μM) Vero E6 | 24 841 273 33 452 205 |
ICV494 | ![]() | SARS(EC50 = 0.048 μM) Vero SARS-CoV-2(IC50 = 3.3 μM) Vero E6 | 15 144 898 33 452 205 |
ICV619 | ![]() | SARS-CoV-2(EC50 = 3.68 μM) Vero | 32 811 977 |
ICV732 | ![]() | SARS-CoV-2 (IC50 = 1.62 nM) Pneumocyte-like cell SARS-CoV-2 (IC50 = 0.7 nM) Vero E6 | 33 495 306 |
ICV745 | ![]() | Clinical trials (NCT04405570/NCT04405739) | 33 273 742 |
ICV754 | ![]() | SARS-CoV-2 3CL (IC50 = 15.2 nM) SARS-CoV-2 (EC50 = 35.3 nM) Huh7 cell | 33 602 867 |
ICV775 | ![]() | SARS-CoV-2 3CL (IC50 = 17.2 nM) SARS-CoV-2 (EC50 = 31 nM) Huh7 cell | 33 602 867 |
ICV835 | ![]() | SARS-CoV-2 TMPRSS2 (IC50 = 0.19 μM) | 33 844 653 |
ICV841 | ![]() | SARS-CoV-2 (EC50 = 0.31 μM) Vero E6 | 33 727 703 |
Mol ID . | Structure . | Activity . | PMID . |
---|---|---|---|
ICV487 | ![]() | MERS(EC50 = 7.42 μM) Vero E6 SARS(EC50 = 15.55 μM) Vero E6 SARS-CoV-2(IC50 = 3.2 μM) Vero E6 | 24 841 273 33 452 205 |
ICV494 | ![]() | SARS(EC50 = 0.048 μM) Vero SARS-CoV-2(IC50 = 3.3 μM) Vero E6 | 15 144 898 33 452 205 |
ICV619 | ![]() | SARS-CoV-2(EC50 = 3.68 μM) Vero | 32 811 977 |
ICV732 | ![]() | SARS-CoV-2 (IC50 = 1.62 nM) Pneumocyte-like cell SARS-CoV-2 (IC50 = 0.7 nM) Vero E6 | 33 495 306 |
ICV745 | ![]() | Clinical trials (NCT04405570/NCT04405739) | 33 273 742 |
ICV754 | ![]() | SARS-CoV-2 3CL (IC50 = 15.2 nM) SARS-CoV-2 (EC50 = 35.3 nM) Huh7 cell | 33 602 867 |
ICV775 | ![]() | SARS-CoV-2 3CL (IC50 = 17.2 nM) SARS-CoV-2 (EC50 = 31 nM) Huh7 cell | 33 602 867 |
ICV835 | ![]() | SARS-CoV-2 TMPRSS2 (IC50 = 0.19 μM) | 33 844 653 |
ICV841 | ![]() | SARS-CoV-2 (EC50 = 0.31 μM) Vero E6 | 33 727 703 |
Introduction of active compounds against SARS-CoV-2 in vivo in D3AI-CoV database
Mol ID . | Structure . | Activity . | PMID . |
---|---|---|---|
ICV487 | ![]() | MERS(EC50 = 7.42 μM) Vero E6 SARS(EC50 = 15.55 μM) Vero E6 SARS-CoV-2(IC50 = 3.2 μM) Vero E6 | 24 841 273 33 452 205 |
ICV494 | ![]() | SARS(EC50 = 0.048 μM) Vero SARS-CoV-2(IC50 = 3.3 μM) Vero E6 | 15 144 898 33 452 205 |
ICV619 | ![]() | SARS-CoV-2(EC50 = 3.68 μM) Vero | 32 811 977 |
ICV732 | ![]() | SARS-CoV-2 (IC50 = 1.62 nM) Pneumocyte-like cell SARS-CoV-2 (IC50 = 0.7 nM) Vero E6 | 33 495 306 |
ICV745 | ![]() | Clinical trials (NCT04405570/NCT04405739) | 33 273 742 |
ICV754 | ![]() | SARS-CoV-2 3CL (IC50 = 15.2 nM) SARS-CoV-2 (EC50 = 35.3 nM) Huh7 cell | 33 602 867 |
ICV775 | ![]() | SARS-CoV-2 3CL (IC50 = 17.2 nM) SARS-CoV-2 (EC50 = 31 nM) Huh7 cell | 33 602 867 |
ICV835 | ![]() | SARS-CoV-2 TMPRSS2 (IC50 = 0.19 μM) | 33 844 653 |
ICV841 | ![]() | SARS-CoV-2 (EC50 = 0.31 μM) Vero E6 | 33 727 703 |
Mol ID . | Structure . | Activity . | PMID . |
---|---|---|---|
ICV487 | ![]() | MERS(EC50 = 7.42 μM) Vero E6 SARS(EC50 = 15.55 μM) Vero E6 SARS-CoV-2(IC50 = 3.2 μM) Vero E6 | 24 841 273 33 452 205 |
ICV494 | ![]() | SARS(EC50 = 0.048 μM) Vero SARS-CoV-2(IC50 = 3.3 μM) Vero E6 | 15 144 898 33 452 205 |
ICV619 | ![]() | SARS-CoV-2(EC50 = 3.68 μM) Vero | 32 811 977 |
ICV732 | ![]() | SARS-CoV-2 (IC50 = 1.62 nM) Pneumocyte-like cell SARS-CoV-2 (IC50 = 0.7 nM) Vero E6 | 33 495 306 |
ICV745 | ![]() | Clinical trials (NCT04405570/NCT04405739) | 33 273 742 |
ICV754 | ![]() | SARS-CoV-2 3CL (IC50 = 15.2 nM) SARS-CoV-2 (EC50 = 35.3 nM) Huh7 cell | 33 602 867 |
ICV775 | ![]() | SARS-CoV-2 3CL (IC50 = 17.2 nM) SARS-CoV-2 (EC50 = 31 nM) Huh7 cell | 33 602 867 |
ICV835 | ![]() | SARS-CoV-2 TMPRSS2 (IC50 = 0.19 μM) | 33 844 653 |
ICV841 | ![]() | SARS-CoV-2 (EC50 = 0.31 μM) Vero E6 | 33 727 703 |

Model training and evaluation. (A) ROC curve of MultiDTI model by 10-fold cross-validation method. (B) ROC curve of MPNNs-CNN model by 10-fold cross-validation method. (C) Performance of the three DL-based models by 10-fold cross-validation method. (D) ROC curve of MultiDTI model on Test-78. (E) ROC curve of MPNNs-CNN model on Test-78. (F) Performance of MultiDTI and MPNNs-CNN models on Test-78.
Dual inhibitors in the database
Pathogenic coronavirus invasion is a very complex biochemical process, which involves interactions between multiple viral proteins and human proteins. For example, the interactions between the spike protein of SARS-CoV-2 and the angiotensin-converting enzyme 2 of human cells, together with the cell surface serine protease TMPRSS2, play an important role in virus invasion. RNA synthesis of coronaviruses is performed by RNA-dependent RNA polymerase. 3C-like protease and papain-like protease are necessary for the reproduction and release of the coronavirus.
Dual inhibitors can target two targets in the virus life cycle, thereby inhibiting the coronaviruses more efficiently in principle. So the development of dual inhibitors is a novel strategy for the treatment of COVID-19. There are also some dual inhibitors against pathogenic coronaviruses in our database as shown in Table 2. For example, Wang et al. identified ICV729 as a potent dual inhibitor of both SARS-CoV-2 3C-like protease and TMPRSS2 [44], with IC50 values of 13.4 μM and 2.31 μM, respectively. ICV693 is effective against SARS-CoV-2 by inhibiting TMEM16 proteins [45], whereas previous research reported that p-glycoprotein was also the target of ICV693 [46]. So ICV693 may be a dual inhibitor against COVID-19. Besides, growth factor receptor (GFR) signaling is a central pathway necessary for SARS-CoV-2 replication. The dual phosphatidylinositol 3-kinase (PI3K)/mammalian target of rapamycin (mTOR) inhibitor ICV646 and dual rapidly accelerated fibrosarcoma (RAF)/mitogen-activated protein kinase kinase (MEK) inhibitor ICV648 can prevent SARS-CoV-2 replication by inhibiting GFR signaling [47]. ICV403 can target the spike protein and membrane protein so as to prevent the virus from invading. ICV361-ICV366, ICV368 and ICV371 can inhibit 3C-like protease and papain-like protease.
Active compounds against SARS-CoV-2 in vivo in D3AI-CoV database
Compounds that are active in vivo are more likely to be promising as drug candidates for the treatment of COVID-19, which are shown in Table 3. Jan et al. found that ICV484, ICV497 and extracts of some herbal medicines were effective in vivo in the hamster model [38]. In vivo antiviral tests in a mouse model [37] showed that ICV619 is potent against SARS-CoV-2. ICV732 possessed anti-SARS-CoV-2 activity (IC90 = 0.88 nM) in vitro [48]. The in vivo efficacy in two mouse models of SARS-CoV-2 infection and limited toxicity in cell culture of ICV732 indicate that it is a potential drug for the treatment of COVID-19. Besides, Cox et al. launched a series of studies on ICV745, which is currently in phase II trials (NCT04405739) [49]. Qiao et al. designed and synthesized many 3C-like protease inhibitors [50]. ICV754 and ICV775 could reduce lung viral loads and lung lesions in a transgenic mouse model of SARS-CoV-2 infection. In addition, ICV835 and ICV841 could also inhibit SARS-CoV-2 infection in animal models [35, 51].
Training and testing of models
We developed two models, namely MultiDTI model (Figure 1A) and MPNNs-CNN model (Figure 1B), for target prediction, and one regression model based on MPNNs-CNN approach, namely MPNNs-CNN-R model, for virtual screening. About 10-fold cross-validation method was performed to train all our models. And the loss curves and accuracy curves during model training are shown in Figures S1–3. Finally, the model trained with all the drug–target pairs was used as the website backend of D3AI-CoV. The ROC curves are shown in Figure 4A and B. The AUCs of the two models for target prediction are 0.93–0.96 and 0.97–0.98. Other performance indicators, including AUPRs (0.88–0.95 and 0.95–0.98), accuracy (Acc) (0.86–0.9 and 0.92–0.94), precision (Pre) (0.79–0.91 and 0.89–0.93), recall (0.81–0.99 and 0.93–0.97) and F1 score (F1) (0.86–0.91 and 0.92–0.95) suggested the strong target prediction ability of the MultiDTI model and MPNNs-CNN model (Figure 4C; also see Table S5). Pearson correlation and concordance index of MPNNs-CNN-R model for virtual screening by 10-fold cross-validation method are 0.8–0.84 and 0.87–0.88, respectively (Table S6).

(A) The five performance indicators (AUC, AUPR, Acc, Pre, Recall and F1) of five methods for target prediction on Test-78. (B) The hit rate of the five methods for virtual screening against 3C-like protease and papain-like protease on Test-78.
The external dataset, namely Test-78, was used to further test the generalization performance of the MultiDTI model and MPNNs-CNN model against COVID-19. The ROC curves are shown in Figure 4D and E. The AUCs of the two models trained by 10-fold cross-validation method for target prediction are 0.82–0.89 and 0.82–0.87. Other performance indicators, including AUPRs (0.75–0.89 and 0.72–0.79), Acc (0.72–0.82 and 0.74–0.85), Pre (0.72–0.83 and 0.81–0.9), Recall (0.64–1.00 and 0.64–0.85) and F1(0.7–0.84 and 0.71–0.85) values of the MultiDTI and MPNNs-CNN models suggested their strong target prediction ability (Figure 4F; also see Table S7). In addition, the models trained with all data have stronger predictive performance ( Table S7). Accordingly, the DL-based models have a strong predictive ability for target prediction and virtual screening against COVID-19.
Comparison of D3AI-CoV with other methods
There are three webservers publicly available for target prediction against COVID-19, which are D3Docking, D3Similarity and Virus-CKB, whereas there are four websites for virtual screening, which are D3Docking, D3Similarity, DockThor-VS and COVID-19 Docking Server. We used the external dataset Test-78 to evaluate all the webservers for target prediction and virtual screening against COVID-19. Figure 5A and B summarizes the comparison results between our newly constructed DL-based models and other methods (Tables S8 and S9).

Graphical interface for input and output of D3AI-CoV. (A) Graphical interface for input of the target prediction module of D3AI-CoV. (B) Graphical interface for output of the target prediction module of D3AI-CoV. (C) Graphical interface for input of the virtual screening module of D3AI-CoV. (D) Graphical interface for output of the virtual screening module of D3AI-CoV.
For target predicting, a prediction is regarded as correct if the top 10 predicted targets contain the real target. In our DL-based models, a prediction is correct if the predicted probability is greater than 0.5. Next, we compared MultiDTI, MPNNs-CNN, D3Docking, D3Similarity and Virus-CKB for target prediction. All ligands of Test-78 were used for these five methods for target prediction. Based on the prediction results, we counted the correct and incorrect numbers of prediction and calculated the AUC, AUPR, Acc, Pre, Recall and F1, which were used to compare various methods. After testing, the AUCs (0.93 and 0.91), AUPRs (0.88 and 0.9), Acc (0.88 and 0.85), Pre (0.81 and 0.8), Recall (1 and 0.92) and F1 (0.9 and 0.86) of the two DL-based models outperform those of D3Docking (0.59, 0.56, 0.59, 0.65, 0.38 and 0.48), D3Similarity (0.74, 0.7, 0.74, 0.83, 0.62 and 0.71) and Virus-CKB (0.51, 0.51, 0.51, 0.56, 0.13 and 0.21) (Figure 5A; also see Table S8). Besides, DL-based models are much faster than D3Docking, D3Similarity and Virus-CKB. More importantly, the MultiDTI model correctly predicts two completely new protein targets and their molecules, which indicates MultiDTI model has great expandability.
For virtual screening, inhibitors for the 3C-like and papain-like proteases account for the two largest proportions in Test-78, so we used the Test-78 to perform virtual screening against the two targets for comparison. The hit rate was used as a criterion for evaluating the performance of different virtual screening methods. After testing, the results indicate the hit rates of MPNNs-CNN-R are 0.96 and 0.89 for 3C-like protease and papain-like protease, respectively, whereas that of other methods are 0.22–0.92 and 0.11–0.78, indicating that the new MPNNs-CNN-R model is in general much better than other methods (Figure 5B; also see Table S9).
In summary, D3AI-CoV shows great predictive performance both on the validation set and on a completely independent external test set. More importantly, when compared with other anti-COVID-19 webservers, the prediction accuracy of D3AI-CoV is much higher than other docking-based or similarity-based methods. And the efficiency of D3AI-CoV is also higher (5–10 s for a job by D3AI-CoV versus 5–15 min for a job by D3Similarity and 1–2 h for a job by D3Docking). All the results demonstrated that D3AI-CoV has great advantage in comparison with other webservers in terms of prediction accuracy and prediction speed.
Input and output
D3AI-CoV is provided free of charge for users via the web server. For target prediction, as shown in Figure 6A, the users can set the task title and select a prediction model in the target prediction interface. And then they can submit a small molecule in sdf or mol2 file format. The small molecule will be converted to canonical SMILES. Two DL-based models will be used to predict the target of the input molecule according to the SMILES. Usually, predicting process will last for a few minutes after the beginning of the calculation before the output result is returned. Therefore, D3AI-CoV is faster than the conventional structure- and ligand-based approaches. Finally, as shown in Figure 6B, the top-ranked targets will be provided on the webpage.
For virtual screening, as shown in Figure 6C, the users can upload a small molecule library in sdf or mol2 file format and select one or two target(s). All small molecules in the library will be converted to canonical SMILES. The regression model will perform the virtual screening for the small molecule library. After finishing the task, as shown in Figure 6D, the top-ranked ligands and their scores will be presented on the webpage.
Conclusion
Target prediction and virtual screening are two important issues for discovering new drugs and lead optimization. Based on protein structure and ligand information, we have developed and continuously updated a webserver, D3Targets-2019-nCoV, for target prediction and virtual screening since the COVID-19 outbreak. In this work, with the latest updated databases of both active compounds and target proteins, we developed two classification DL-based models for target prediction and a regression DL-based model for virtual screening for discovering hits against COVID-19. The results showed that the predictive abilities of the DL-based models on the external test set are significantly stronger than D3Docking, D3Smilarity and other methods. The prediction speed of the DL-based models is also much faster than other methods. We hope D3AI-CoV will be helpful to the development of anti-COVID-19 drugs.
Identifying effective drug targets and developing effective drugs accordingly to cure COVID-19 are of great importance.
We developed D3AI-CoV, a DL-based platform for target prediction and virtual screening for discovering anti-COVID-19 drugs.
The MultiDTI model and The MPNNs-CNN model can be used to predict targets for active compounds.
The MPNNs-CNN-R model can be used to perform virtual screening.
D3AI-CoV is available at free as a web application at http://www.d3pharma.com/D3Targets-2019-nCoV/D3AI-CoV/index.php
Data Availability
The data that support the findings of this study are available from the corresponding authors upon reasonable request.
Funding
This work was supported by National Key Research and Development Program of China (2016YFA0502301), Natural Science Foundation of Shanghai (21ZR1475600) and Natural Science Foundation of China (U19A2067).
Author Biographies
Yanqing Yang is a postgraduate at Shanghai Institute of Materia Medica. His research interests are deep learning, molecular docking and virtual screening. His affiliation is with CAS Key Laboratory of Receptor Research, Stake Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
Deshan Zhou is a postgraduate at Hunan University. His research interest is deep learning. His affiliation is with Department of Computer Science, Hunan University, Changsha, 410082, China.
Xinben Zhang got his Master’s degree at East China University of Science and Technology. His research interest is software development. His affiliation is with CAS Key Laboratory of Receptor Research, Stake Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
Yulong Shi is a PhD student at Shanghai Institute of Materia Medica. His research interest is molecular docking method development. His affiliation is with CAS Key Laboratory of Receptor Research, Stake Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
Jiaxin Han is a postgraduate at Nanjing University of Chinese Medicine. His research interest is molecular docking and virtual screening. His affiliation is with School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210046, China.
Liping Zhou is a PhD student at Shanghai Institute of Materia Medica. Her research interest is molecular dynamics. Her affiliation is with CAS Key Laboratory of Receptor Research, State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
Leyun Wu is a postgraduate at Shanghai Institute of Materia Medica. Her research interest is molecular dynamics. Her affiliation is with CAS Key Laboratory of Receptor Research, Stake Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
Minfei Ma is a postgraduate at Shanghai Institute of Materia Medica. Her research interest is deep learning. Her affiliation is with CAS Key Laboratory of Receptor Research, Stake Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
Jintian Li is a postgraduate at Shanghai Institute of Materia Medica. Her research interest is deep learning. Her affiliation is with CAS Key Laboratory of Receptor Research, Stake Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
Professor Shaoliang Peng got his PhD degree at National University of Defense Technology in 2008. His research interest is artificial intelligence.
Professor Zhijian Xu got his PhD degree at Shanghai Institute of Materia Medica in 2012. His research interests include computer-aided drug design, computational chemistry, computational biology and artificial intelligence. More information could be found at the website: https://www.researchgate.net/profile/Zhijian_Xu
Professor Weiliang Zhu received his PhD degree from Shanghai Institute of Materia Medica in 1998. His main research fields are computer-aided drug design, computational biology, computational chemistry and pharmaceutical chemistry, with a special focus on the theoretical research and method development of drug design.
References
Author notes
Yanqing Yang, Deshan Zhou and Xinben Zhang authors wish it to be known that, in their opinion, the first three authors should be regarded as joint first authors.