Mining small-molecule screens to repurpose drugs

Swamidass, S. Joshua

doi:10.1093/bib/bbr028

Abstract

Repurposing and repositioning drugs—discovering new uses for existing and experimental medicines—is an attractive strategy for rescuing stalled pharmaceutical projects, finding treatments for neglected diseases, and reducing the time, cost and risk of drug development. As this strategy emerged, academic researchers began performing high-throughput screens (HTS) of small molecules—the type of experiments once exclusively conducted in industry—and making the data from these screens available to all. Several methods can mine this data to inform repurposing and repositioning efforts. Despite these methods' limitations, it is hopeful that they will accelerate the discovery of new uses for known drugs, but this hope has not yet been realized.

Data mining, drug repurposing, high-throughput screening, data analysis, chemical informatics

INTRODUCTION

Finding new uses for approved medicines and experimental medicines that fail approval in their initial indication—drug ‘repurposing’ and ‘repositioning’, respectively—has emerged as an attractive strategy for rescuing stalled pharmaceutical projects, finding therapies for neglected diseases and reducing the time, cost and risk of drug development [1–6]. Encouraging success stories from both pharmaceutical companies and academia include sildenafil for the treatment of impotence [1], thalidomide for the treatment of multiple myeloma and a painful complication of leprosy [1, 4], and aminobisphosphonates for the treatment of progeria [7]. These stories highlight the immense financial payoff and humanitarian impact of successful repurposing; sildenafil grosses several billion dollars each year, and thalidomide and aminobisphosphonates ameliorate rare or neglected diseases that were previously untreatable.

At the same time that repurposing and repositioning emerged as drug-development strategies, academic researchers began performing high-throughput screens (HTS) of small molecules—the type of experiments once exclusively conducted in industry—and making the data from these screens available to all [8]. The National Institute of Health (NIH) administers the largest repository of screening data: PubChem. This repository holds the data from several hundred biochemical and phenotypic screens with several more deposited each month [9, 10]. Several additional screens are available through repositories like ChemBank [11] and Collaborative Drug Discovery's database [12], but most NIH funded work is deposited in PubChem after a brief embargo period.

Mining these public screens may prove an effective strategy for repositioning or repurposing drugs. The most promising strategy is to mine phenotypic screens—those that measure a disease-relevant response in a whole-cell or organism system—because they can identify molecules that work on any target involved in the disease process modeled by the assay. The screens' design does not fix ahead of time the targets that active molecules must act upon, so phenotypic screens can uncover unexpected relationships between drugs, targets and diseases. It is also possible, but less likely, that biochemical screens—which focus on a single protein target—could inform repurposing efforts in a similar way by uncovering unexpected interactions between drugs and important targets.

Mining HTS data is cheaper and quicker than directly testing molecules in assays [8]. Experimentally testing all known drugs in medically-relevant assays requires a large investment in both HTS infrastructure and expertise in the many assays of interest. Existing investments in HTS infrastructure at most screening centers can easily handle the relatively small number of drug molecules (Figure 1). The key limiting factor in this strategy, however, is the expertise, time, and resources required to develop each assay, a process that has not been multiplexed in any way. In contrast with the direct experimental approach, once the data from a screen is available, computational mining of this data can proceed without requiring investment in assay-specific expertise, often for a fraction of the cost of the original screen. Therefore, it is conceivable that a small team of computational scientists could mine thousands of screens in a small amount of time and for little more cost than their salaries. In order to verify predictions, some molecules must eventually be experimentally tested, but the real value of mining is that it would direct resources toward those assays most likely to yield positive results, dramatically reducing the resources required for experiments.

Figure 1:

HTS infrastructure. This robot shuffles hundreds of thousands of molecules in 384-well plates to execute some of the small-molecule screens at the Broad Institute of Havard/MIT. HTS infrastructure like this has only recently become widely available to academics.

Open in new tab Download slide

At a high-level, each mining strategy works in a similar way; they use data from a phenotypic HTS as input to predict which drugs might be treat diseases relevant to the phenotype measured by the screen's assay. Before explaining these HTS mining techniques, we first explain the details of how HTS projects are structured. These details are important because they elucidate why different strategies might be used in different cases. After explaining these details, we move to discussing several different strategies for mining HTS, their limitations, and in what cases they might be most successfully applied.

THE ANATOMY OF A SCREEN

The structure of screening experiments exposes both the technical limitations of HTS as well as reasons to hope that mining HTS data can unearth valuable information. At a high level, screening workflows are often conceptualized as a multi-stage funnel, where, at each stage, an experiment filters out uninteresting molecules and forwards the rest on to the next stage (Figure 2). Molecules are filtered by predefined thresholds—based on potency, selectivity, structure, or ADME properties—in order to satisfy project-specific goals.

Figure 2:

A screening funnel. In this hypothetical example, a primary screen is followed by a dose–response confirmatory test, and finally the confirmed actives are visually inspected to identify active molecules with chemically favorable structures. The numbers in this example are similar to those observed in typical NIH funded screens deposited in PubChem. More complex workflows include additional assays which might ensure molecules are selective or work by a particular mechanism.

Open in new tab Download slide

Screens are designed to find molecules that satisfy narrowly defined and project-specific goals. Consequently, all funnels will, by design, exclude molecules useful for other purposes. For example, one screener might look for potent and selective activators of a protein, while another might look for allosteric inhibitors with sufficiently different structures than those of known drugs. In particular, screeners commonly aim to find molecules with novel structures [13]. Therefore, they sometimes filter out known drugs—some of which may be apparently active in the initial screen—without sending them for confirmatory testing. This example highlights the key issue: the most interesting molecules for repurposing efforts may be ignored by the original screener, and key experiments useful for repurposing efforts may not be performed.

Furthermore, each stage depends on predefined cutoffs, because important properties—such as solubility, potency, selectivity, drug-likeness—are conceptually clear but imprecisely delineated. For example, a noisy initial screen might identify a list of ‘hits’ with inhibitory activity greater than 50% at a single dose. These hits are commonly sent for confirmatory testing at several doses to ensure they yield a dose–response curve consistent with true actives. Confirming hits by measuring their potency in a dose–response experiment is important because the potency of a hit is not well correlated with activity in the initial screen measured at a single dose [14] (Figure 3). The successfully confirmed hits with potencies of at least 10 μM might be further filtered by additional experiments designed to ensure they are sufficiently specific to the targeted phenotype. In this example, the 50% activity and 10 μM potency thresholds are somewhat arbitrary—another screener might have chosen 25% activity and 1μM—and additional actives can be discovered by adjusting these thresholds to advance more molecules through the workflow [15].

Figure 3:

Potency does not always correlate with single-dose activity. Each line corresponds with the behavior of a different molecule in the assay at different doses. HTS screens are usually performed at a single dose, shown as the black, dashed line, and the response at this dose is used to rank molecules in the primary screen for follow up (y-axis). High potency molecules reach 50% maximal activity (EC₅₀) at lower doses (x-axis), but potency can only be measured by testing the molecule at several doses. As in this example, activity at a single dose does not always correlate with potency, and activity in the primary screen does not always correlate with the potency measured in dose–response follow-up experiments.

Open in new tab Download slide

Moreover, screens are noisy. In well-designed screens, the assay readout has high fidelity, but several technical details can cause false positives and false negatives. Some molecules degrade because they are chemically unstable under the conditions of the HTS experiment or library storage [16, 17]. Likewise, some molecules directly interfere with the readout or are thought to promiscuously bind to several proteins [18]. Consequently, several studies have demonstrated many molecules that appear inactive in the initial screen actually are active when retested [19–21, 15].

Consequently, as noisy artifacts of complex workflows with project-specific goals, HTS data does not comprehensively survey the chemical libraries on which they are run. Rather than accurately annotating the activity of all tested molecules, HTS instead aims to identify and characterize only a small number of molecules with narrowly and subjectively delineated properties. Interesting molecules are often missed [15, 22]. Although they are not widely used, several strategies—based on systematic error correction [23, 24], structural similarity [19, 20], machine learning [21] and even economics [15]—can recover some of these missed actives.

The complexity, subjectivity and noise in HTS projects creates both a challenge and an opportunity. On one hand, mining a HTS screen can be difficult, requiring the application of sophisticated algorithms capable of integrating noisy and irregularly sampled data. On the other hand, there is hope that careful mining can unearth diamonds: molecules that do not satisfy the initial project's thresholds or goals but can still inform repositioning efforts. In the easiest cases, mining these molecules can be straightforward; however, the hardest cases require complex algorithms that integrate and analyze data from several sources.

MAKING CONNECTIONS

In the simplest case, an approved drug is confirmed active in a medically-relevant screening project. HTS projects often aim to find entirely new chemical structures, so these high value actives are often ignored because they are not chemically novel. In this case, we have only to make the connection, which can be directly confirmed in a simple experiment and then moved immediately to animal trials when sensible.

Public databases store most of the key information required to make these connections. For example, DrugBank—the most comprehensive publicly available database of known drugs [25–27]—catalogues approved, experimental, withdrawn and illicit small-molecule drugs, annotating them by indication, intended targets and cross references to PubChem and PubMed. As useful as DrugBank and related resources are, they are far from complete. Furthermore, only about 460 of DrugBank's drugs are regularly screened by the NIH. In the near future, the release of CandiStore—a much larger and comprehensive database of drugs—may improve the situation [28]. In the meantime, proprietary databases like Pharmaprojects may prove useful to some.

Several tools use DrugBank and other public resources to identify known drugs that are confirmed active in public screening data, making connections that can inform repurposing efforts [29, 30]. For example, Chem2Bio2RDF links several screening databases with several drug databases using a standards-driven interface, enabling complex queries to be succinctly described and submitted to their server. While its interface is not easy for the typical biologist to use, the developers provide many clear example queries, which can be easily edited so as to yield useful results. The real utility of integrative methods may be best seen in more user-friendly tools like BioEclipse [31] and the Web Engine for Non-obvious Drug Interactions (WENDI) [30]. The next generation of tools—like WENDI—may prove most useful when integrating data from several sources to easily answer high level questions like ‘what drugs have been confirmed active in a recent phenotypic screen?’

As useful as these tools may become, their approach—integrating databases to identify known drugs among confirmed actives—is limited. First, only a fraction of known drugs are directly tested on a regular basis (Table 1), and this approach cannot identify untested drugs.

Table 1:

Open in new tab

Known drugs in the NIH screening collection

Class	Number	Included	Neighbors
Small molecule	4646	465	2111
Approved	1493	354	996
Investigational	365	82	202
Experimental	3243	101	1103
Illicit	188	6	114
Nutraceutical	71	8	43
Withdrawn	65	18	40

Class	Number	Included	Neighbors
Small molecule	4646	465	2111
Approved	1493	354	996
Investigational	365	82	202
Experimental	3243	101	1103
Illicit	188	6	114
Nutraceutical	71	8	43
Withdrawn	65	18	40

On a category-by-category basis, the ‘Number’ column records the number of drugs found in DrugBank, the ‘Included’ column records the number of molecules of each category which are included the NIH's screening collection and regularly screened by academic labs, and the ‘Neighbors’ column records the number of drugs with at least one structural neighbor in the NIH's collection. In this table, structural neighbors are defined as molecules with at least 0.8 similarity as computed using the default settings of OpenBabel [34]. Most of the drugs listed in DrugBank are not regularly screened, but many have neighbors which are screened.

Table 1:

Open in new tab

Known drugs in the NIH screening collection

Class	Number	Included	Neighbors
Small molecule	4646	465	2111
Approved	1493	354	996
Investigational	365	82	202
Experimental	3243	101	1103
Illicit	188	6	114
Nutraceutical	71	8	43
Withdrawn	65	18	40

Class	Number	Included	Neighbors
Small molecule	4646	465	2111
Approved	1493	354	996
Investigational	365	82	202
Experimental	3243	101	1103
Illicit	188	6	114
Nutraceutical	71	8	43
Withdrawn	65	18	40

On a category-by-category basis, the ‘Number’ column records the number of drugs found in DrugBank, the ‘Included’ column records the number of molecules of each category which are included the NIH's screening collection and regularly screened by academic labs, and the ‘Neighbors’ column records the number of drugs with at least one structural neighbor in the NIH's collection. In this table, structural neighbors are defined as molecules with at least 0.8 similarity as computed using the default settings of OpenBabel [34]. Most of the drugs listed in DrugBank are not regularly screened, but many have neighbors which are screened.

Second, there are often many false negatives in screening projects—truly active molecules that were not sent for confirmatory testing and incorrectly assumed to be inactive [15]. Consequently, negative results should be treated with caution; a drug may not be confirmed active in a phenotypic screen, and nonetheless be capable of treating the disease for which the screen is designed. This is not to say that all negative results are incorrect, rather more sophisticated methods—discussed in following sections—that take the screen's uncertainty into account may yield more accurate predictions.

Third, even when they are tested, some drugs degrade because they precipitate or are chemically unstable under the conditions of library storage and handling [16, 17]. This type of instability can make testing the drug technically challenging, causing both false positives and false negatives in HTS experiments. Depending on several factors, this instability can happen frequently in some libraries. For example, about 12% of each molecule is precipitated and lost after 10 freeze-thaw cycles [16] and—depending on the exact storage conditions—more than one third of molecules can degrade to <80% purity over a year's time in storage [17]. In practice, many screening centers reduce library degradation by limiting plate-life, capping the number of freeze-thaw cycles, and reducing temperature and humidity in storage. The quality control procedures in screening centers are rarely accessible so it is difficult to know how much these factors impact the data from any specific HTS project. Recent work may be helpful in computationally identifying molecules likely to be problematic [18], but usually this source of error is very difficult to account for in any general methodology.

Finally, about 5–7% of drugs are not active in the form they are administered to patients; they are ‘prodrugs’, which are transformed by the patient's body into their active form [32, 33]. DrugBank and other repositories store the inactive form of prodrugs: the form that is not expected to be active in the screen but may still be useful in the disease. Therefore, while the data-integration approach may uncover unexpected connections between known drug molecules and screens, determining the relevance of these connections still requires careful consideration by human experts.

THE SIMILARITY PRINCIPLE

In a more difficult case, the close structural neighbors of a drug are directly tested in the HTS, but the drug itself is not. Structurally similar molecules often have similar biological properties, so the activities of these structural neighbors predict the activity of the drug to which they are similar. In the simplest strategy, a drug would be predicted active if a structurally similar molecule is experimentally demonstrated active in the screen. More sophisticated strategies use the data from the screen to train predictors—based on, for example, machine learning techniques like support vector machines and neural networks—that then annotate untested drugs with the likelihood they would be active if tested [35–37, 43].

Most of these methods rely on quantitatively computing similarity between molecular structures. A dizzying variety of similarity metrics can do this [35, 38, 39, 40–42]. The most commonly used methods are based on Tanimoto similarity, defined as the proportion of substructures in common between two molecules (Figure 4). There are several ways of enumerating substructures from molecules, and each way yields a different Tanimoto variant [39]. Tanimoto similarity—when computed on linear or circular fragments of the 2D structure of molecules up to a fixed size—correlates surprisingly well with biological activity [35, 43]. In most cases, the path-based molecule similarities computed by open-source tools are just as good as a myriad of proprietary tools [34, 44–46]. Many of the open-source tools are unified by a simple interface: Cinfony [44], which non-experts can easily use.

Figure 4:

Quantitative structural similarity. The most commonly used structural similarity metric is Tanimoto similarity: the proportion of substructures in common between two molecules. An example using Advil and Tylenol, two over-the-counter pain medications, is shown above. In this example, all linear fragments up to 3 atoms long are extracted from each molecule (with ≈ used to denote aromatic bonds). Dividing the 8 substructures seen in both molecules by the 18 seen in either yields the Tanimoto similarity 0.44. Longer fragments are used in practice, but the formula remains the same.

Open in new tab Download slide

Several issues arise when using molecular similarity based on 2D descriptors to predict activity. First, although the Tanimoto similarity always ranges from zero to one, its exact value depends on the way substructures are extracted from molecules. Therefore, choosing a fixed cutoff like 0.9 to annotate a molecule's neighbors may not work for all similarity metrics. To make matters worse, because structural similarity is highly dependent on a molecule's size and complexity, it is difficult to choose an appropriate fixed cutoff at which to determine molecules' neighbors [47–49].

Second, while structural similarity is predictive of activity, it is not always accurate. Recent research has focused on ‘activity cliffs’, pairs of molecules that are structurally very similar but have very different activity [50–52]. Some activity cliffs can be traced to deficiencies in 2D-based similarities, which are usually insensitive to stereochemistry and do not work well on large repetitive molecules [47, 53]. However, cliffs are usually difficult to predict. Therefore, structural similarity alone cannot definitively establish the activity of the molecule, and is most useful when used in conjunction with experimental confirmation [43].

LEVERAGING THE TARGET

In the most difficult case, neither the known drug nor its analogs are directly tested in the screen, rendering both direct data-integration and similarity-based methods unusable. Nonetheless, if the active molecules in the screen include molecules known to hit the same target as a known drug, it is possible to infer that this drug would be active too. In this case, confirming the known drug in the screen not only validates the prediction, but also provides experimental evidence that its target—and not some unknown off-target—is the mechanism by which the phenotype is modulated.

This strategy requires knowledge of which molecules interact with known drugs' targets. If several molecules with the same target are confirmed active, it is likely that a drug that hits this target will also be active. Right now, the most comprehensive public database of molecular targets is ChEMBL, which contains the results of over 400 000 experiments gleaned from the literature [28]. This data is lower confidence than the information about known drugs, and much more difficult to sort through. Nonetheless, some report success annotating large databases of molecules with public data [54]. More to the point, one study correctly identified known disease targets from phenotypic screens with 88% accuracy using a similar strategy [55].

Unexpected relationships between targets and diseases can motivate repurposing efforts. For example, genomic studies identified Lamin A (LMNA) as the gene responsible for the premature-aging disease, progeria. Biologists discovered that the diseased form of the protein was post-translationally modified and that inhibiting this modification with aminobisphosphonates—a class of FDA approved cancer drugs—ameliorated progeria in a mouse model [56]. A Phase II trial was initiated in May 2007, the first of its kind for progeria. Further research revealed an additional post-translational modification, modulated by statins, which also affects progression of progeria in mice [7]; this data motivated an additional Phase II trial which will soon begin. In this case study, targets were identified by genomic studies and molecular biology, not computational predictions. Nonetheless, this example illustrates how identifying targets and characterizing the pathways related to a disease can directly and quickly translate into new treatments for disease.

Still, this approach is limited by our incomplete understanding of small molecules' targets. While most bioactive molecules modulate one or a couple of known protein targets, there is no way of knowing whether they modulate additional clinically relevant proteins [57]. This is particularly a problem with certain classes of drugs, like the notoriously promiscuous kinase inhibitors [58], and no strategy handles this problem cleanly. The problem is worse for non-drug molecules because they are less studied. Nonetheless, in most cases the known target is the most likely mechanism of action of a drug for a new purpose, and successful examples of target identification using this approach are encouraging [55].

THE HOPE

The hope is that computational methods will discover unexpected connections between a known drug and a disease, and that this connection will guide focused experimental follow up, leading to successful repurposing. For example, a computational method would predict a drug would be active in a screen had it been tested. This prediction would be confirmed by testing the drug in the assay used in the screen. Next the drug would be tested in an animal model of the disease known to be relevant to the screen, where efficacy in modulating the disease would be demonstrated.

From this point, the next steps forward are complicated, but if the right details line up, the drug can rapidly move to human trials. The marketability, known side-effects, dose, intellectual property concerns and other factors directly affect whether efficacy could be demonstrated and inform whether human trials should be initiated. These issues are extremely important but beyond the scope of this discussion. The real value of HTS mining, however, is in proposing unexpected possibilities, rather than managing a repurposing project to completion.

Unfortunately, no published work clearly demonstrates a case where mining a small-molecule screen directly leads to a successful repurposing effort, so this approach remains an unrealized hope. This is not surprising; academics only recently gained access to HTS infrastructure and data, and the exact pathway used to repurpose a drug in pharmaceutical companies is rarely publicized and the cases that are known can be idiosyncratic [1, 59]. We will probably only know how successful HTS mining can be after at least a decade of academic effort that is just now beginning.

Key Points

The results of hundreds of medically-relevant screens are publicly available, and the phenotypic screens are most likely to inspire new repurposing hypotheses.
Several web-based tools can identify drugs which are obviously active in medically-relevant screen, but less obvious connections can be made using similarity-based techniques in collaboration with experts.
Data mining efforts will be most effective when coupled with focused experimental follow up to verify predictions in the initial assay and to move confirmed predictions to animal and human studies.
It is hopeful that mining small-molecule screens will speed the discovery of new uses for medications, but this hope has not yet been realized.

Acknowledgements

SJS wrote the manuscript and created the figures. The photograph in Figure 1 was taken by SJS and used with his permission. Bradley T. Calhoun compiled results for the table. BTC and Michael R. Browning provided helpful edits. Pankaj Agarwal provided substantive edits and feedback on the manuscript. The author thanks the Pathology and Immunology Department of Washington University in St Louis for supporting SJS, BTC, and MRB. Marvin was used to generate the chemical structures in Figure 4; Marvin 5.3.5, 2010, ChemAxon (http://www.chemaxon.com). The author declares he has no competing financial interests.

References

1

Ashburn

T

Thor

K

,

Drug repositioning: identifying and developing new uses for existing drugs

,

Nat Rev Drug Discov

,

2004

, vol.

3

8

(pg.

673

-

83

)

2

Tartaglia

L

,

Complementary new approaches enable repositioning of failed drug candidates

,

Expert Opin Investig Drugs

,

2006

, vol.

15

11

(pg.

1295

-

8

)

3

Chong

C

Sullivan

D

,

New uses for old drugs

,

Nature

,

2007

, vol.

448

(pg.

645

-

646

)

4

Sleigh

S

Barton

C

,

Repurposing strategies for therapeutics

,

Pharm Med

,

2010

, vol.

24

3

(pg.

151

-

9

)

Google Scholar

Crossref

WorldCat

5

Tobinick

E

,

The value of drug repositioning in the current pharmaceutical market

,

Drug News Perspect

,

2009

, vol.

22

2

pg.

119

6

O'Connor

K

Roth

B

,

Finding new tricks for old drugs: an efficient route for public-sector drug discovery

,

Nat Rev Drug Discover

,

2005

, vol.

4

12

(pg.

1005

-

4

)

Google Scholar

Crossref

WorldCat

7

Varela

I

Pereira

S

Ugalde

A

et al. ,

Combined treatment with statins and aminobisphosphonates extends longevity in a mouse model of human premature aging

,

Nat Med

,

2008

, vol.

14

7

(pg.

767

-

72

)

8

Hergenrother

P

,

Obtaining and screening compound collections: a user's guide and a call to chemists

,

Curr Opin Chem Biol

,

2006

, vol.

10

3

(pg.

213

-

8

)

9

Wang

Y

Xiao

J

Suzek

T

et al. ,

PubChem: a public information system for analyzing bioactivities of small molecules

,

Nucleic Acids Res

,

2009

, vol.

37

pg.

W623

10

Bolton

E

Wang

Y

Thiessen

P

et al. ,

PubChem: integrated platform of small molecules and biological activities

,

Annu Rep Comput Chem

,

2008

, vol.

4

(pg.

217

-

41

)

Google Scholar

OpenURL Placeholder Text

WorldCat

11

Seiler

K

George

G

Happ

M

et al. ,

ChemBank: a small-molecule screening and cheminformatics resource database

,

Nucleic Acids Res

,

2008

, vol.

36

Suppl. 1

pg.

D351

12

Hohman

M

Gregory

K

Chibale

K

et al. ,

Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery

,

Drug Discover Today

,

2009

, vol.

14

5–6

(pg.

261

-

70

)

Google Scholar

Crossref

WorldCat

13

Oprea

T

Bologa

C

Boyer

S

et al. ,

A crowdsourcing evaluation of the NIH chemical probes

,

Nat Chem Biol

,

2009

, vol.

5

7

(pg.

441

-

7

)

14

Inglese

J

Auld

D

Jadhav

A

et al. ,

Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries

,

Proc Nat Acad Sci USA

,

2006

, vol.

103

31

pg.

11473

15

Swamidass

S

Bittker

J

Bodycombe

N

et al. ,

An economic framework to prioritize confirmatory tests after a high-throughput screen

,

J Biomol Screen

,

2010

, vol.

15

6

pg.

680

16

Kozikowski

B

Burt

T

Tirey

D

et al. ,

The effect of freeze/thaw cycles on the stability of compounds in DMSO

,

J Biomol Screen

,

2003

, vol.

8

2

pg.

210

17

Cheng

X

Hochlowski

J

Tang

H

et al. ,

Studies on repository compound stability in DMSO under various conditions

,

J Biomol Screen

,

2003

, vol.

8

3

pg.

292

18

Baell

J

Holloway

G

,

New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays

,

J Med Chem

,

2010

, vol.

53

7

(pg.

2719

-

40

)

19

Posner

B

Xi

H

Mills

J

,

Enhanced HTS hit selection via a local hit rate analysis

,

J Chem Inform Model

,

2009

, vol.

49

10

(pg.

2202

-

10

)

Google Scholar

Crossref

WorldCat

20

Glick

M

Klon

A

Acklin

P

et al. ,

Enrichment of extremely noisy high-throughput screening data using a naive Bayes classifier

,

J Biomol Screen

,

2004

, vol.

9

1

pg.

32

21

Glick

M

Jenkins

J

Nettles

J

et al. ,

Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and laplacian-modified naive bayesian classifiers

,

J Chem Inform Model

,

2006

, vol.

46

1

(pg.

193

-

200

)

Google Scholar

Crossref

WorldCat

22

Davies

J

Glick

M

Jenkins

J

,

Streamlining lead discovery by aligning in silico and high-throughput screening

,

Curr Opin Chem Biol

,

2006

, vol.

10

4

(pg.

343

-

51

)

23

Makarenkov

V

Kevorkov

D

Zentilli

P

et al. ,

HTS-Corrector: software for the statistical analysis and correction of experimental high-throughput screening data

,

Bioinformatics

,

2006

, vol.

22

11

pg.

1408

24

Makarenkov

V

Zentilli

P

Kevorkov

D

et al. ,

An efficient method for the detection and elimination of systematic error in high-throughput screening

,

Bioinformatics

,

2007

, vol.

23

13

pg.

1648

25

Wishart

D

Knox

C

Guo

A

et al. ,

DrugBank: a comprehensive resource for in silico drug discovery and exploration

,

Nucleic Acids Res

,

2006

, vol.

34

pg.

D668

26

Wishart

D

Knox

C

Guo

A

et al. ,

DrugBank: a knowledgebase for drugs, drug actions and drug targets

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D901

-

6

)

27

Wishart

D

,

DrugBank and its relevance to pharmacogenomics

,

Pharmacogenomics

,

2008

, vol.

9

8

(pg.

1155

-

62

)

28

Warr

W

,

ChEMBL. an interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute outstation of the European Molecular Biology Laboratory (EMBL-EBI)

,

J Comput Aid Mol Design

,

2009

, vol.

23

(pg.

195

-

8

)

Google Scholar

Crossref

WorldCat

29

Chen

B

Dong

X

Jiao

D

et al. ,

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

,

BMC Bioinformatics

,

2010

, vol.

11

1

pg.

255

30

Zhu

Q

Lajiness

M

Ding

Y

et al. ,

Wendi: a tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications

,

J Cheminform

,

2010

, vol.

2

1

pg.

6

31

Ruiz

I

Gómez-Nieto

M

,

A java tool for the management of chemical databases and similarity analysis based on molecular graphs isomorphism. In: Proceedings of the 8th international conference on Computational Science, Part II

Springer-Verlag Berlin, Heidelberg

Google Scholar

32

Rautio

J

Kumpulainen

H

Heimbach

T

et al. ,

Prodrugs: design and clinical applications

,

Nat Rev Drug Discov

,

2008

, vol.

7

3

(pg.

255

-

70

)

33

Han

H

Amidon

G

,

Targeted prodrug design to optimize drug delivery

,

AAPS J

,

2002

, vol.

2

1

(pg.

48

-

58

)

Google Scholar

OpenURL Placeholder Text

WorldCat

34

O'Boyle

N

Morley

C

Hutchison

G

,

Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit

,

Chem Cent J

,

2008

, vol.

2

1

pg.

5

35

Swamidass

S

Chen

J

Bruand

J

et al. ,

Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity

,

Bioinformatics

,

2005

, vol.

21

Suppl. 1

pg.

i359

36

Swamidass

SJ

Azencott

CA

Lin

TW

et al. ,

Influence relevance voting: an accurate and interpretable virtual high throughput screening method

,

J Chem Inform Model

,

2009

, vol.

49

(pg.

756

-

66

)

Google Scholar

Crossref

WorldCat

37

Jorissen

R

Gilson

M

,

Virtual screening of molecular databases using a support vector machine

,

J Chem Inform Model

,

2005

, vol.

45

3

(pg.

549

-

61

)

Google Scholar

Crossref

WorldCat

38

Bajorath

J

,

Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening

,

J Chem Inform Comput Sci

,

2001

, vol.

41

2

(pg.

233

-

45

)

Google Scholar

Crossref

WorldCat

39

Hert

J

Willett

P

Wilton

D

et al. ,

Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures

,

J Chem Inform Comput Sci

,

2004

, vol.

44

3

(pg.

1177

-

85

)

Google Scholar

Crossref

WorldCat

40

Geppert

H

Vogt

M

Bajorath

J

,

Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation

,

J Chem Inform Model

,

2010

, vol.

50

2

(pg.

205

-

16

)

Google Scholar

Crossref

WorldCat

41

Willett

P

Barnard

J

Downs

G

,

Chemical similarity searching

,

J Chem Inform Comput Sci

,

1998

, vol.

38

6

(pg.

983

-

96

)

Google Scholar

Crossref

WorldCat

42

Nasr

R

Swamidass

S

Baldi

P

,

Large scale study of multiple-molecule queries

,

J Cheminform

,

2009

, vol.

1

pg.

7

43

Keiser

M

Setola

V

Irwin

J

et al. ,

Predicting new molecular targets for known drugs

,

Nature

,

2009

, vol.

462

7270

(pg.

175

-

81

)

44

Gupta

R

Gifford

E

Liston

T

et al. ,

Using open-source computational tools for predicting human metabolic stability and additional ADME/TOX properties

,

Drug Metabo Dispos

,

2010

, vol.

38

11

(pg.

2083

-

90

)

Google Scholar

Crossref

WorldCat

45

O'Boyle

N

Hutchison

G

,

Cinfony – combining open source cheminformatics toolkits behind a common interface

,

Chem Cent J

,

2008

, vol.

2

1

pg.

24

46

Steinbeck

C

Han

Y

Kuhn

S

et al. ,

The chemistry development kit (CDK): an open-source java library for chemo-and bioinformatics

,

J Chem Inform Comput Sci

,

2003

, vol.

43

2

(pg.

493

-

500

)

Google Scholar

Crossref

WorldCat

47

Flower

D

,

On the properties of bit string-based measures of chemical similarity

,

J Chem Inform Comput Sci

,

1998

, vol.

38

3

(pg.

379

-

86

)

Google Scholar

Crossref

WorldCat

48

Wang

Y

Bajorath

J

,

Advanced fingerprint methods for similarity searching: balancing molecular complexity effects

,

Comb Chem High T Scr

,

2010

, vol.

13

3

(pg.

220

-

8

)

Google Scholar

OpenURL Placeholder Text

WorldCat

49

Wang

Y

Geppert

H

Bajorath

J

,

Random reduction in fingerprint bit density improves compound recall in search calculations using complex reference molecules

,

Chem Biol Drug Design

,

2008

, vol.

71

6

(pg.

511

-

7

)

Google Scholar

Crossref

WorldCat

50

Guha

R

Van Drie

J

,

Structure- activity landscape index: identifying and quantifying activity cliffs

,

J Chem Inform Model

,

2008

, vol.

48

3

(pg.

646

-

58

)

Google Scholar

Crossref

WorldCat

51

Bajorath

J

Peltason

L

Wawer

M

et al. ,

Navigating structure-activity landscapes

,

Drug Discover Today

,

2009

, vol.

14

13–14

(pg.

698

-

705

)

Google Scholar

Crossref

WorldCat

52

Wawer

M

Peltason

L

Weskamp

N

et al. ,

Structure- activity relationship anatomy by network-like similarity graphs and local structure- activity relationship indices

,

J Med Chem

,

2008

, vol.

51

19

(pg.

6075

-

84

)

53

Tanikawa

T

Fridman

M

Zhu

W

et al. ,

Using biological performance similarity to inform disaccharide library design

,

J Am Chem Soc

,

2009

, vol.

131

14

(pg.

5075

-

83

)

54

Zhou

Y

Zhou

B

Chen

K

et al. ,

Large-scale annotation of small-molecule libraries using public databases

,

J Chem Inform Model

,

2007

, vol.

47

4

(pg.

1386

-

94

)

Google Scholar

Crossref

WorldCat

55

Glick

M

Davies

J

Jenkins

J

,

Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases

,

J Chem Inform Model

,

2006

, vol.

46

3

(pg.

1124

-

33

)

Google Scholar

Crossref

WorldCat

56

Fong

L

Frost

D

Month:	Total Views:
December 2016	3
January 2017	2
February 2017	4
March 2017	25
April 2017	5
May 2017	6
June 2017	5
July 2017	4
August 2017	2
September 2017	6
October 2017	9
November 2017	16
December 2017	39
January 2018	44
February 2018	31
March 2018	79
April 2018	52
May 2018	33
June 2018	34
July 2018	25
August 2018	20
September 2018	26
October 2018	22
November 2018	19
December 2018	24
January 2019	20
February 2019	24
March 2019	26
April 2019	28
May 2019	24
June 2019	29
July 2019	37
August 2019	16
September 2019	38
October 2019	22
November 2019	23
December 2019	17
January 2020	19
February 2020	14
March 2020	20
April 2020	27
May 2020	13
June 2020	22
July 2020	40
August 2020	11
September 2020	21
October 2020	27
November 2020	15
December 2020	25
January 2021	11
February 2021	12
March 2021	29
April 2021	27
May 2021	27
June 2021	18
July 2021	17
August 2021	18
September 2021	12
October 2021	28
November 2021	29
December 2021	22
January 2022	29
February 2022	9
March 2022	11
April 2022	22
May 2022	57
June 2022	36
July 2022	34
August 2022	22
September 2022	39
October 2022	37
November 2022	19
December 2022	29
January 2023	16
February 2023	19
March 2023	23
April 2023	24
May 2023	32
June 2023	6
July 2023	12
August 2023	14
September 2023	21
October 2023	21
November 2023	19
December 2023	37
January 2024	33
February 2024	36
March 2024	25
April 2024	35
May 2024	30
June 2024	20
July 2024	29
August 2024	16
September 2024	17
October 2024	25
November 2024	15
December 2024	20
January 2025	13
February 2025	7
March 2025	16
April 2025	14
May 2025	8

Article Contents

Mining small-molecule screens to repurpose drugs

Abstract

INTRODUCTION

THE ANATOMY OF A SCREEN

MAKING CONNECTIONS

THE SIMILARITY PRINCIPLE

LEVERAGING THE TARGET

THE HOPE

Acknowledgements

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Mining small-molecule screens to repurpose drugs

Abstract

INTRODUCTION

THE ANATOMY OF A SCREEN

MAKING CONNECTIONS

THE SIMILARITY PRINCIPLE

LEVERAGING THE TARGET

THE HOPE

Acknowledgements

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only