-
PDF
- Split View
-
Views
-
Cite
Cite
Lijuan Wang, Ying Lu, Doudou Li, Yajing Zhou, Lili Yu, Ines Mesa Eguiagaray, Harry Campbell, Xue Li, Evropi Theodoratou, The landscape of the methodology in drug repurposing using human genomic data: a systematic review, Briefings in Bioinformatics, Volume 25, Issue 2, March 2024, bbad527, https://doi.org/10.1093/bib/bbad527
- Share Icon Share
Abstract
The process of drug development is expensive and time-consuming. In contrast, drug repurposing can be introduced to clinical practice more quickly and at a reduced cost. Over the last decade, there has been a significant expansion of large biobanks that link genomic data to electronic health record data, public availability of various databases containing biological and clinical information and rapid development of novel methodologies and algorithms in integrating different sources of data. This review aims to provide a thorough summary of different strategies that utilize genomic data to seek drug-repositioning opportunities. We searched MEDLINE and EMBASE databases to identify eligible studies up until 1 May 2023, with a total of 102 studies finally included after two-step parallel screening. We summarized commonly used strategies for drug repurposing, including Mendelian randomization, multi-omic-based and network-based studies and illustrated each strategy with examples, as well as the data sources implemented. By leveraging existing knowledge and infrastructure to expedite the drug discovery process and reduce costs, drug repurposing potentially identifies new therapeutic uses for approved drugs in a more efficient and targeted manner. However, technical challenges when integrating different types of data and biased or incomplete understanding of drug interactions are important hindrances that cannot be disregarded in the pursuit of identifying novel therapeutic applications. This review offers an overview of drug repurposing methodologies, providing valuable insights and guiding future directions for advancing drug repurposing studies.
INTRODUCTION
Traditionally, drug discovery has been guided by the development of single compound-based medicine. Despite the fact that the approach has yielded numerous successful therapeutics, it frequently comes with various challenges that can lead to unintended drug effects. For example, adverse drug events (ADEs) based on mechanisms may only be revealed during the later stages of clinical trials [1]. In addition, the high costs and slow procedures render these methods untenable, leading modern medicine to discard them as ineffective strategies [2, 3].
Drug repurposing is proposed to explore alternative indications and potential side effects for already-licensed medications [4]. This approach offers apparent advantages as it is the utilization of existing medications for new therapeutic purposes, which may lead to the discovery of novel applications and treatment options. In addition, by capitalizing on the extensive knowledge and safety profiles of already-approved drugs, drug repurposing has the potential to address unmet medical needs with reduced time and costs compared to developing entirely new drugs [5].
The wide accessibility of multi-omics data, drug databases and clinical information linked to electronic health records (EHRs) enables the implementation of drug repurposing [6]. Several strategies have been proposed to preform drug repurposing. First, Mendelian randomization (MR) can be employed to examine the causal relationship between phenotypes and genetically predicted drug effects by using single-nucleotide polymorphisms (SNPs) within target genes as proxies [7]. Second, large-scale multi-omics data derived from high-throughput technologies such as genome-wide association study (GWAS) [8], transcriptome-wide association study (TWAS) [9], proteome-wide association study (PWAS) [10] and metabolome-wide association study (MWAS) [11], can enhance our understanding of disease etiology and identify novel drug targets from the associated variants and genes, reducing the time required for drug screening. Moreover, due to the wide availability of dense EHRs linked to large biorepositories that contain human DNA samples, it is possible to perform powerful phenome-wide association studies (PheWASs) to estimate the proxied drug effects on thousands of phenotypes, thus identifying novel indications and adverse drug events [12]. Third, network-based drug repurposing approaches aim to integrate existing knowledge, enabling the identification of previously undiscovered mechanisms [13]. One such example is machine learning that offers a method to assimilate information from various sources and identify novel disease subtypes and drug targets and has enabled significant advances in the healthcare and pharmaceutical sectors [14]. For example, a variety of computational approaches including deep neural networks, ligand-based cheminformatics methods and proteochemometrics models have been developed to identify new drug targets for cancer treatment [14].
As multi-class datasets and diverse advanced approaches/algorithms become available for drug repurposing analysis, there is a need to summarize commonly used strategies that integrate human genomic data with many other data sources and illustrate their strengths and limitations. Here, a systematic review was conducted to provide an overview of strategies or methodologies in drug repurposing, data sources implemented in each strategy and challenges and recommendations for future drug repurposing studies.
METHODS
Search strategy
We systematically searched MEDLINE and EMBASE databases from inception to 1 May 2023, by using a comprehensive search strategy (see Supplementary Table 1 available online at http://bib.oxfordjournals.org/ for search terms) to identify all published drug repurposing studies using genetic variants as predictors of drug effects. All identified publications underwent a two-step screening of title, abstract and full text to determine whether each individual study meet the inclusion criteria (L.W., Y.L.). For any discrepancies, the two authors conferred with a third author (L.Y.) to make final decisions.
Inclusion and exclusion criteria
Studies that performed drug repurposing by using human genomic data, with or without other forms of omics data, biological and clinical information, and studies that introduced novel computational approaches and strategies to perform drug repurposing were included in the review. We excluded (i) studies not aiming at exploring drug repurposing; (ii) studies not integrating human genotypic data; (iii) studies not in English; and (iv) correspondence, conference abstracts, comments, survey and research experiments conducted in animal/human cell lines and animal models.
Data extraction
We then extracted the following variables from the included studies: publication date, the first author, study population, sample size, definition of phenotypes (based on EHR or derived from epidemiological surveys), predictors (genetic variants or drug–gene interaction), whether the researchers conducted replication analysis in an independent population or used other algorithms and software to perform drug repurposing, data sources implemented in drug repurposing analysis and key findings. Four investigators (L.W., Y.L., D.L. and Y.Z.) independently conducted and double-checked the data extraction.
RESULTS
A total of 4016 publications were identified from MEDLINE and EMBASE databases. After screening, 102 publications were finally included (Figure 1). A summary of the main characteristics of the included studies can be found in Table 1. There are three main categories of currently existing drug repurposing strategies: MR (30.4%), multi-omic-based (14.7%) and network-based studies (54.9%). About half of the studies (49.0%) utilized data sources not from a specific population but used summary statistics from a public database such as the GWAS catalog. In terms of the sample size, 49 studies (48.0%) included more than 10 000 participants. Only a small fraction (10.8%) used EHR codes to delineate the phenotypes, while the majority (89.2%) relied on phenome definitions derived from epidemiological surveys. A total of 38 studies (37.3%) used genetic variants as predictors for the effects of drug treatments, while the remaining studies (62.7%) predicted the effects of approved drugs on novel medical indications based on drug–gene interactions. After identifying significant drug–disease associations, 18 studies (17.6%) replicated their findings in another independent population or through biological experiments or by using additional statistical approaches.

Flow chart of the study selection process of the systematic literature review.
Characteristics . | Number of studies (%) . |
---|---|
Strategy | |
MR | 31 (30.4) |
Multi-omic-based | 15 (14.7) |
Network-based | 56 (54.9) |
Sample size | |
Very large (≥10,000 subjects) | 49 (48.0) |
Large (1000–9999 subjects) | 1 (1.0) |
Small (<1000 subjects) | 2 (2.0) |
NAa | 50 (49.0) |
Phenotyping | |
EHR-based | 11 (10.8) |
Epidemiology based | 91 (89.2) |
Predictor | |
Use SNP as a proxy | 38 (37.3) |
Drug–gene interactionb | 64 (62.7) |
Replication analysis | |
Yes | 18 (17.6) |
Another independent population | 11 (10.8) |
In vitro or in vivo experiments | 6 (5.9) |
Other algorithm/software | 1 (1.0) |
No | 84 (82.4) |
Characteristics . | Number of studies (%) . |
---|---|
Strategy | |
MR | 31 (30.4) |
Multi-omic-based | 15 (14.7) |
Network-based | 56 (54.9) |
Sample size | |
Very large (≥10,000 subjects) | 49 (48.0) |
Large (1000–9999 subjects) | 1 (1.0) |
Small (<1000 subjects) | 2 (2.0) |
NAa | 50 (49.0) |
Phenotyping | |
EHR-based | 11 (10.8) |
Epidemiology based | 91 (89.2) |
Predictor | |
Use SNP as a proxy | 38 (37.3) |
Drug–gene interactionb | 64 (62.7) |
Replication analysis | |
Yes | 18 (17.6) |
Another independent population | 11 (10.8) |
In vitro or in vivo experiments | 6 (5.9) |
Other algorithm/software | 1 (1.0) |
No | 84 (82.4) |
MR, Mendelian randomization; EHR, electronic health record.
aThese studies performed drug repurposing by using publicly available multi-class sources regarding human omic data, drug and disease information, not in a specific population.
bStudies in this category first identified susceptibility genes and then referred to drug information from publicly available databases to evaluate the druggability of the identified markers or to explore potential repurposed indications for existing drugs based on the shared similarity.
Characteristics . | Number of studies (%) . |
---|---|
Strategy | |
MR | 31 (30.4) |
Multi-omic-based | 15 (14.7) |
Network-based | 56 (54.9) |
Sample size | |
Very large (≥10,000 subjects) | 49 (48.0) |
Large (1000–9999 subjects) | 1 (1.0) |
Small (<1000 subjects) | 2 (2.0) |
NAa | 50 (49.0) |
Phenotyping | |
EHR-based | 11 (10.8) |
Epidemiology based | 91 (89.2) |
Predictor | |
Use SNP as a proxy | 38 (37.3) |
Drug–gene interactionb | 64 (62.7) |
Replication analysis | |
Yes | 18 (17.6) |
Another independent population | 11 (10.8) |
In vitro or in vivo experiments | 6 (5.9) |
Other algorithm/software | 1 (1.0) |
No | 84 (82.4) |
Characteristics . | Number of studies (%) . |
---|---|
Strategy | |
MR | 31 (30.4) |
Multi-omic-based | 15 (14.7) |
Network-based | 56 (54.9) |
Sample size | |
Very large (≥10,000 subjects) | 49 (48.0) |
Large (1000–9999 subjects) | 1 (1.0) |
Small (<1000 subjects) | 2 (2.0) |
NAa | 50 (49.0) |
Phenotyping | |
EHR-based | 11 (10.8) |
Epidemiology based | 91 (89.2) |
Predictor | |
Use SNP as a proxy | 38 (37.3) |
Drug–gene interactionb | 64 (62.7) |
Replication analysis | |
Yes | 18 (17.6) |
Another independent population | 11 (10.8) |
In vitro or in vivo experiments | 6 (5.9) |
Other algorithm/software | 1 (1.0) |
No | 84 (82.4) |
MR, Mendelian randomization; EHR, electronic health record.
aThese studies performed drug repurposing by using publicly available multi-class sources regarding human omic data, drug and disease information, not in a specific population.
bStudies in this category first identified susceptibility genes and then referred to drug information from publicly available databases to evaluate the druggability of the identified markers or to explore potential repurposed indications for existing drugs based on the shared similarity.
Drug repurposing strategies and their strengths and limitations
We grouped commonly used drug repurposing approaches into three main categories: MR, multi-omic based and network based. In brief, MR employs genetic variants as instruments to evaluate the causal effects of genetically proxied drug treatments on disease outcomes. The multi-omic-based strategy harnesses either single omics or the integration of multi-omic data (i.e. genome, transcriptome, proteome and metabolome), to explore disease mechanisms, identify novel drug targets and inform effective repurposing opportunities. The network-based strategy leverages complex biological networks that integrate variant, gene, protein, disease outcome and drug, to reveal novel relationships between drugs, diseases and molecular targets, enabling the identification of potential repurposing candidates with different levels of evidence and guiding precision medicine approaches. The difference between multi-omic-based and network-based strategies lies in the data sources used, where the network-based strategy integrates more comprehensive sources related not only to molecular patterns but also to drug and clinical information. MR was classified into a separate category due to its objective of assessing the evidence of causality between drug targets and disease outcomes.
In summary, each strategy can markedly decrease the time and cost associated with drug discovery compared to traditional approaches while also having their own strengths and limitations. For example, the most important advantage of MR lies in the assessment of causality involved in drug treatment; however, it would be less effective if the genetic instruments are difficult to identify or weakly associated with the effect of drug treatment. The multi-omic-based strategy enables the identification of combined therapies by assessing how different drugs affect multiple molecular pathways simultaneously and promotes personalized medicine by considering individual molecular profiles. However, it can be challenging in terms of integrating data from multiple omic sources and the interpretation of the underlying pathogenic mechanisms. The network-based strategy analyzes how drugs interact with specific nodes in biological networks, which can uncover the underlying mechanisms for repurposed drugs and provide insights into their efficacies and potential side effects. However, if a drug can target multiple nodes in a network, this may result in lack of specificity, which potentially leads to off-target effects and adverse reactions. Researchers should select a suitable strategy according to the study’s purpose and the nature of data involved. Once a drug candidate has been predicted by the above drug repurposing approaches, biological experiments and clinical trials for further validation are still required. More details regarding the description, strengths and limitations for each strategy are summarized in Table 2.
Strategy . | Description . | Strengths . | Limitations . | Reference . |
---|---|---|---|---|
Mendelian randomization (MR) | Utilize genetic variants as instrumental variables to assess causal relationships between potential therapeutic targets and outcomes |
|
| [21–51] |
Multi-omic-based | Harness the integration of diverse omics data, such as genomics, transcriptomics, proteomics and metabolomics, to comprehensively explore disease mechanisms, identify novel drug targets and inform effective repurposing opportunities |
|
| [52–66] |
Network based | Leverages complex biological networks to uncover relationships between drugs, diseases and molecular targets, enabling the identification of potential repurposing candidates and guiding precision medicine approaches |
|
| [68–123] |
Strategy . | Description . | Strengths . | Limitations . | Reference . |
---|---|---|---|---|
Mendelian randomization (MR) | Utilize genetic variants as instrumental variables to assess causal relationships between potential therapeutic targets and outcomes |
|
| [21–51] |
Multi-omic-based | Harness the integration of diverse omics data, such as genomics, transcriptomics, proteomics and metabolomics, to comprehensively explore disease mechanisms, identify novel drug targets and inform effective repurposing opportunities |
|
| [52–66] |
Network based | Leverages complex biological networks to uncover relationships between drugs, diseases and molecular targets, enabling the identification of potential repurposing candidates and guiding precision medicine approaches |
|
| [68–123] |
Strategy . | Description . | Strengths . | Limitations . | Reference . |
---|---|---|---|---|
Mendelian randomization (MR) | Utilize genetic variants as instrumental variables to assess causal relationships between potential therapeutic targets and outcomes |
|
| [21–51] |
Multi-omic-based | Harness the integration of diverse omics data, such as genomics, transcriptomics, proteomics and metabolomics, to comprehensively explore disease mechanisms, identify novel drug targets and inform effective repurposing opportunities |
|
| [52–66] |
Network based | Leverages complex biological networks to uncover relationships between drugs, diseases and molecular targets, enabling the identification of potential repurposing candidates and guiding precision medicine approaches |
|
| [68–123] |
Strategy . | Description . | Strengths . | Limitations . | Reference . |
---|---|---|---|---|
Mendelian randomization (MR) | Utilize genetic variants as instrumental variables to assess causal relationships between potential therapeutic targets and outcomes |
|
| [21–51] |
Multi-omic-based | Harness the integration of diverse omics data, such as genomics, transcriptomics, proteomics and metabolomics, to comprehensively explore disease mechanisms, identify novel drug targets and inform effective repurposing opportunities |
|
| [52–66] |
Network based | Leverages complex biological networks to uncover relationships between drugs, diseases and molecular targets, enabling the identification of potential repurposing candidates and guiding precision medicine approaches |
|
| [68–123] |
Data sources implemented in drug repurposing studies
The most commonly used data source for MR analysis was the IEU Open GWAS database (formerly known as the MR-Base platform), which provides an extensive collection of summary statistics from diverse GWASs on various traits and diseases [15]. In regard to drug repurposing, researchers can upload genetic instruments (IVs) associated with the exposure that can be modified by the drug and investigate their associations with an outcome of interest.
Data sources implemented in multi-omic-based drug repurposing studies were mainly large-scale cohorts or GWAS consortia that contained at least one omics dataset from human samples. Taking UK Biobank as an example, in addition to extensive genomic data, this large-scale biomedical database also contains gene expression data derived from various tissues, including blood, adipose tissue and brain samples; measurements of various proteins in biological samples; and metabolomic profiling that captures the small molecules present in biological samples, such as blood or urine [16]. There were also some databases that only focus on one type of omics. Some examples are the Genotype-Tissue Expression (GTEx) that contains genome-wide transcriptional expression profiles from 49 human tissues [17] and The Human Protein Atlas (HPA), which is a rich resource that creates a detailed map of human proteome by systematically profiling the expression patterns of proteins across different tissues and organs [18]. We summarized these databases and the sources they contain in Table 3.
Data source . | Genome . | Transcriptome . | Proteome . | Metabolome . | Phenome . | Reference . |
---|---|---|---|---|---|---|
23andMe | + | + | [26, 54] | |||
BioVU | + | + | [55, 58, 62, 80] | |||
China Kadoorie Biobank | + | + | [52] | |||
CMap | + | [61, 64, 65, 72, 85, 86, 91, 94, 96, 101, 102, 118, 120] | ||||
eQTLGen Consortium | + | + | [28, 44, 60, 97] | |||
FinnGen | + | + | [29, 39] | |||
GTEx | + | + | [28, 44, 48, 56, 87, 90, 97, 104, 108, 109, 117, 118, 121] | |||
Human Protein Atlas | + | + | [87] | |||
LifeGen | + | + | + | [28] | ||
Million Veteran Program | [36] | |||||
PheWAS catalog | + | [83, 86, 105, 110, 111, 112] | ||||
Taiwan Biobank | [113] | |||||
TCGA | + | + | + | [69, 120] | ||
UK Biobank | + | + | + | + | + | [24, 25, 28, 29, 30, 34, 40, 54, 61] |
Data source . | Genome . | Transcriptome . | Proteome . | Metabolome . | Phenome . | Reference . |
---|---|---|---|---|---|---|
23andMe | + | + | [26, 54] | |||
BioVU | + | + | [55, 58, 62, 80] | |||
China Kadoorie Biobank | + | + | [52] | |||
CMap | + | [61, 64, 65, 72, 85, 86, 91, 94, 96, 101, 102, 118, 120] | ||||
eQTLGen Consortium | + | + | [28, 44, 60, 97] | |||
FinnGen | + | + | [29, 39] | |||
GTEx | + | + | [28, 44, 48, 56, 87, 90, 97, 104, 108, 109, 117, 118, 121] | |||
Human Protein Atlas | + | + | [87] | |||
LifeGen | + | + | + | [28] | ||
Million Veteran Program | [36] | |||||
PheWAS catalog | + | [83, 86, 105, 110, 111, 112] | ||||
Taiwan Biobank | [113] | |||||
TCGA | + | + | + | [69, 120] | ||
UK Biobank | + | + | + | + | + | [24, 25, 28, 29, 30, 34, 40, 54, 61] |
Data source . | Genome . | Transcriptome . | Proteome . | Metabolome . | Phenome . | Reference . |
---|---|---|---|---|---|---|
23andMe | + | + | [26, 54] | |||
BioVU | + | + | [55, 58, 62, 80] | |||
China Kadoorie Biobank | + | + | [52] | |||
CMap | + | [61, 64, 65, 72, 85, 86, 91, 94, 96, 101, 102, 118, 120] | ||||
eQTLGen Consortium | + | + | [28, 44, 60, 97] | |||
FinnGen | + | + | [29, 39] | |||
GTEx | + | + | [28, 44, 48, 56, 87, 90, 97, 104, 108, 109, 117, 118, 121] | |||
Human Protein Atlas | + | + | [87] | |||
LifeGen | + | + | + | [28] | ||
Million Veteran Program | [36] | |||||
PheWAS catalog | + | [83, 86, 105, 110, 111, 112] | ||||
Taiwan Biobank | [113] | |||||
TCGA | + | + | + | [69, 120] | ||
UK Biobank | + | + | + | + | + | [24, 25, 28, 29, 30, 34, 40, 54, 61] |
Data source . | Genome . | Transcriptome . | Proteome . | Metabolome . | Phenome . | Reference . |
---|---|---|---|---|---|---|
23andMe | + | + | [26, 54] | |||
BioVU | + | + | [55, 58, 62, 80] | |||
China Kadoorie Biobank | + | + | [52] | |||
CMap | + | [61, 64, 65, 72, 85, 86, 91, 94, 96, 101, 102, 118, 120] | ||||
eQTLGen Consortium | + | + | [28, 44, 60, 97] | |||
FinnGen | + | + | [29, 39] | |||
GTEx | + | + | [28, 44, 48, 56, 87, 90, 97, 104, 108, 109, 117, 118, 121] | |||
Human Protein Atlas | + | + | [87] | |||
LifeGen | + | + | + | [28] | ||
Million Veteran Program | [36] | |||||
PheWAS catalog | + | [83, 86, 105, 110, 111, 112] | ||||
Taiwan Biobank | [113] | |||||
TCGA | + | + | + | [69, 120] | ||
UK Biobank | + | + | + | + | + | [24, 25, 28, 29, 30, 34, 40, 54, 61] |
Databases employed in network-based strategies are not only limited by the availability of population data, but also encompass information about drug information (i.e. target gene, chemical structure and action mechanism), clinical information (i.e. the association between genomic variation and human health outcome) and biological mechanisms [i.e. protein–protein interaction (PPI) networks and biological pathways]. Particularly, The Drug Gene Interaction Database (DGIdb) is a comprehensive and widely used resource that provides information on interactions between genes and drugs [19], based on which researchers can assess the druggability for a gene or protein. ClinicalTrial.gov serves as a registry and results database for a wide range of clinical trials, including interventional studies, observational studies and expanded access programs, so that researchers can search for clinical trials that involve the use of specific drugs and gather insights into the efficacy, safety and off-label use of drugs for different indications or patient populations. The STRING database incorporates data on PPIs and integrates data from various external databases, such as drug–target databases, pathway databases and disease databases, based on which researchers can uncover functional modules or pathways relevant to a disease and access additional information on drug–target associations and disease-related annotations, supporting the identification of potential drug repurposing candidates [20]. Detailed information about these databases are displayed in Table 4. We also summarized the main findings of the included studies in Supplementary Table 2 available online at http://bib.oxfordjournals.org/.
Data source . | Drug target gene . | Drug chemical structure . | Drug action mechanism . | Biological pathway . | Clinical end point . | Reference . |
---|---|---|---|---|---|---|
ChEMBL | + | + | + | [38, 44, 48, 56, 66, 104, 112] | ||
ClinicalTrials.gov | + | [85, 94, 95, 102, 104, 107, 113, 118] | ||||
ClinVar | + | [86] | ||||
DGIdb | + | [56, 60, 64, 66, 83, 84, 88, 92, 99, 100, 103, 110, 111] | ||||
DrugBank | + | + | + | + | [24, 35, 38, 60, 61, 63, 66, 68, 69, 70, 73, 77, 80, 82, 83, 86, 87, 90, 92, 94, 95, 96, 102, 104, 105, 107, 111, 113, 115, 118] | |
Drug Repurposing Hub | + | + | + | [82] | ||
GO | + | [28, 75, 87, 88, 90, 92, 94, 99, 100, 103, 107] | ||||
KEGG | + | [57, 73, 75, 82, 87, 88, 92, 100, 103, 107, 111] | ||||
Open Targets Database | + | [28, 61, 87, 92, 109] | ||||
Pharmaprojects | + | + | + | [71, 81] | ||
PharmGKB | + | + | + | [70, 73, 74, 98, 104] | ||
PubChem | + | + | [56, 61, 102] | |||
Reactome | + | [28, 57, 87, 88, 92] | ||||
STRING | + | [57, 60, 61, 83, 92, 94, 95, 100, 103, 107, 110, 112, 113, 118, 121, 122] | ||||
Target Central Resource Database | + | + | [93] | |||
Therapeutic Target Database | + | + | + | [70, 73, 83, 92, 94, 95, 96, 98, 102, 104, 107] |
Data source . | Drug target gene . | Drug chemical structure . | Drug action mechanism . | Biological pathway . | Clinical end point . | Reference . |
---|---|---|---|---|---|---|
ChEMBL | + | + | + | [38, 44, 48, 56, 66, 104, 112] | ||
ClinicalTrials.gov | + | [85, 94, 95, 102, 104, 107, 113, 118] | ||||
ClinVar | + | [86] | ||||
DGIdb | + | [56, 60, 64, 66, 83, 84, 88, 92, 99, 100, 103, 110, 111] | ||||
DrugBank | + | + | + | + | [24, 35, 38, 60, 61, 63, 66, 68, 69, 70, 73, 77, 80, 82, 83, 86, 87, 90, 92, 94, 95, 96, 102, 104, 105, 107, 111, 113, 115, 118] | |
Drug Repurposing Hub | + | + | + | [82] | ||
GO | + | [28, 75, 87, 88, 90, 92, 94, 99, 100, 103, 107] | ||||
KEGG | + | [57, 73, 75, 82, 87, 88, 92, 100, 103, 107, 111] | ||||
Open Targets Database | + | [28, 61, 87, 92, 109] | ||||
Pharmaprojects | + | + | + | [71, 81] | ||
PharmGKB | + | + | + | [70, 73, 74, 98, 104] | ||
PubChem | + | + | [56, 61, 102] | |||
Reactome | + | [28, 57, 87, 88, 92] | ||||
STRING | + | [57, 60, 61, 83, 92, 94, 95, 100, 103, 107, 110, 112, 113, 118, 121, 122] | ||||
Target Central Resource Database | + | + | [93] | |||
Therapeutic Target Database | + | + | + | [70, 73, 83, 92, 94, 95, 96, 98, 102, 104, 107] |
Data source . | Drug target gene . | Drug chemical structure . | Drug action mechanism . | Biological pathway . | Clinical end point . | Reference . |
---|---|---|---|---|---|---|
ChEMBL | + | + | + | [38, 44, 48, 56, 66, 104, 112] | ||
ClinicalTrials.gov | + | [85, 94, 95, 102, 104, 107, 113, 118] | ||||
ClinVar | + | [86] | ||||
DGIdb | + | [56, 60, 64, 66, 83, 84, 88, 92, 99, 100, 103, 110, 111] | ||||
DrugBank | + | + | + | + | [24, 35, 38, 60, 61, 63, 66, 68, 69, 70, 73, 77, 80, 82, 83, 86, 87, 90, 92, 94, 95, 96, 102, 104, 105, 107, 111, 113, 115, 118] | |
Drug Repurposing Hub | + | + | + | [82] | ||
GO | + | [28, 75, 87, 88, 90, 92, 94, 99, 100, 103, 107] | ||||
KEGG | + | [57, 73, 75, 82, 87, 88, 92, 100, 103, 107, 111] | ||||
Open Targets Database | + | [28, 61, 87, 92, 109] | ||||
Pharmaprojects | + | + | + | [71, 81] | ||
PharmGKB | + | + | + | [70, 73, 74, 98, 104] | ||
PubChem | + | + | [56, 61, 102] | |||
Reactome | + | [28, 57, 87, 88, 92] | ||||
STRING | + | [57, 60, 61, 83, 92, 94, 95, 100, 103, 107, 110, 112, 113, 118, 121, 122] | ||||
Target Central Resource Database | + | + | [93] | |||
Therapeutic Target Database | + | + | + | [70, 73, 83, 92, 94, 95, 96, 98, 102, 104, 107] |
Data source . | Drug target gene . | Drug chemical structure . | Drug action mechanism . | Biological pathway . | Clinical end point . | Reference . |
---|---|---|---|---|---|---|
ChEMBL | + | + | + | [38, 44, 48, 56, 66, 104, 112] | ||
ClinicalTrials.gov | + | [85, 94, 95, 102, 104, 107, 113, 118] | ||||
ClinVar | + | [86] | ||||
DGIdb | + | [56, 60, 64, 66, 83, 84, 88, 92, 99, 100, 103, 110, 111] | ||||
DrugBank | + | + | + | + | [24, 35, 38, 60, 61, 63, 66, 68, 69, 70, 73, 77, 80, 82, 83, 86, 87, 90, 92, 94, 95, 96, 102, 104, 105, 107, 111, 113, 115, 118] | |
Drug Repurposing Hub | + | + | + | [82] | ||
GO | + | [28, 75, 87, 88, 90, 92, 94, 99, 100, 103, 107] | ||||
KEGG | + | [57, 73, 75, 82, 87, 88, 92, 100, 103, 107, 111] | ||||
Open Targets Database | + | [28, 61, 87, 92, 109] | ||||
Pharmaprojects | + | + | + | [71, 81] | ||
PharmGKB | + | + | + | [70, 73, 74, 98, 104] | ||
PubChem | + | + | [56, 61, 102] | |||
Reactome | + | [28, 57, 87, 88, 92] | ||||
STRING | + | [57, 60, 61, 83, 92, 94, 95, 100, 103, 107, 110, 112, 113, 118, 121, 122] | ||||
Target Central Resource Database | + | + | [93] | |||
Therapeutic Target Database | + | + | + | [70, 73, 83, 92, 94, 95, 96, 98, 102, 104, 107] |
DISCUSSION
In this study, we systematically reviewed drug repurposing studies incorporating the use of human genomic data. Three main categories of methodologies were commonly applied in eligible studies, including MR, multi-omic based and network based. We summarized strategies commonly used and data sources implemented for each category. In addition, strengths, challenges and potential insights for drug repurposing investigations are discussed.
The application of MR in drug repurposing involves leveraging genetic variants associated with a specific drug target to assess the potential therapeutic effects of modulating that target [21–51]. For example, Zhao et al. selected genetic variants as IVs for antihypertensive drugs. These variants were (i) robustly associated with systolic blood pressure (SBP) at P < 5 × 10−8; (ii) were independent from other variants with a pairwise linkage disequilibrium (LD) r2 < 0.01 based on the European reference panel from the 1000 Genomes Project; and (iii) were located within 200 kb around the target gene. They concluded that genetically proxied ACE inhibition, exerted a protective effect on diabetes [41]. An additional strategy of employing the MR approach for drug repurposing is to generate genetically proxied drug effects through protein quantitative trait loci (pQTLs), which are associated with the protein abundance of the target genes. For instance, Fang et al. utilized PCSK9 cis-eQTL and cis-pQTL as IVs and demonstrated that genetically predicted PCSK9 inhibition was associated with a reduced prostate cancer risk [46]. The fundamental concept behind drug repurposing using MR strategy is that if the target of an existing drug exerts a causal impact on an outcome in a manner consistent with the drug’s pharmacological effect, then this compound may hold therapeutically potential for the disease.
Multi-omic-based strategies aim to integrate omics data from diverse sources, such as genome, transcriptome, proteome, metabolome and phenome to uncover disease mechanisms and identify potential drug repurposing candidates. This type of research commonly starts with large-scale omics association analyses to identify significant variants and genes for a specific disease, followed by drug–gene interaction, PPI, enrichment analysis and biological experiments, to decipher disease etiology and yield insights into drug repurposing [52–66]. Chen et al. carried out a large-scale trans-ancestry TWAS of tobacco use phenotypes followed by enrichment analysis that assessed the enrichment of drug target pathways within TWAS signals and identified potential drugs such as dextromethorphan (a drug used for cough), galantamine (a drug used for cognitive deficits) and muscle relaxants for treating smoking addiction [63]. Similarly, Khunsriraksakul et al. conducted both GWAS and TWAS analyses for systemic lupus erythematosus (SLE) and then performed drug repurposing analysis by integrating both TWAS-identified SLE-associated susceptibility genes and the expression profiles of drugs derived from the Connectivity Map (CMap) database. The underlying hypothesis of this approach is based on the idea that if a drug induces an expression profile contrasting with that of a disease, it may qualify as a candidate for repurposing. As a result, they successfully identified clinically informative drug classes including glucocorticoid receptor agonist, histone deacetylase (HDAC) inhibitor, mTOR inhibitor and topoisomerase inhibitor for SLE treatment [65]. PheWAS that integrates genome and phenome data serves as an alternative way to seek drug repositioning opportunities [67]. The rationale of this strategy is to evaluate the associations of a genetic variant or, most recently, a combination of variants affecting the function of a drug target gene, with a diverse array of phenotypes. Diogo et al. conducted a PheWAS analysis to examine the relationships between 19 candidate drug targets and 1683 binary endpoints and found that genetically anticipated inhibition of PNPLA3 and MDA5 could be a viable consideration for the treatment of liver and autoimmune diseases, respectively [54].
Network-based drug repurposing usually involves modeling a wide range of information including variants, genes, proteins, diseases/traits and drugs, enabling the incorporation of diverse dimensions from various data sources. The types of biological networks used for drug repositioning include gene-based analysis, functional annotation, pathway enrichment analysis, protein–protein interaction and drug–gene interaction networks [68–123]. A dominant advantage of integrating multi-class biological networks lies in the reduction of noise, thus enhancing biological relevance. Recently, Thomas et al. performed GO, KEGG pathway analysis and PPI network analysis to detect significant hub genes related to persistent hyperplastic primary vitreous (PHPV), followed by drug–gene interactions to evaluate potential PHPV drug candidates. As a result, 14 potential genes, four major pathways, seven drug gene targets and 26 candidate drugs were observed to provide insights into the identification of novel therapeutic targets for the clinical treatment of PHPV [100]. Besides, Adikusuma et al. proposed prioritized risk genes for atopic dermatitis (AD) by employing in silico pipelines guiding bioinformatics analysis with six functional annotations (missense mutations, cis-eQTL, a molecular pathway analysis, PPI, genetic overlap with a knockout mouse phenotype, primary immunodeficiencies) and then expanded them according to the molecular interactions to identify potential drug targets. The results revealed 27AD risk genes, which could be further mapped to 53 existing drugs [95]. It is worth noting that advanced algorithms such as machine learning show significant potential to expedite the process of drug discovery or repositioning, given their capacity of integrating wide-ranged sources of data, thereby achieving higher power in discovering and predicting complex drug–gene and drug–disease associations [124]. Mountjoy et al. developed a machine learning pipeline to prioritize likely causal signals within GWAS-identified loci. Moreover, they found that the gene–disease associations exhibited significantly enrichment for established pairs of drug targets and medical indications with an OR of 8.1 (95% confidence interval = 5.7, 11.5) across clinical trial phases 4, indicating that incorporate novel genetic discoveries from GWAS and post-GWAS studies provide potential therapeutic targets and ultimately improve success in drug development [125].
Current drug repurposing methodologies have several strengths and limitations. Given the availability of large-scale human genomic data, especially GWASs, it is possible to generate genetically predicted drug effects and genetically predicted disease predisposition, facilitating the translation of preclinical discoveries into clinical practice. However, GWAS has been criticized for small effect sizes of most risk variants, limiting the variance of drug effects elucidated by the selected genetic instruments. Another concern is that the frequency of genetic variants varies among different ethnic groups, which may lead to different drug efficacy due to off-target effect. Since the majority of GWASs and functional annotation databases were performed in white populations, whether the candidates could be repurposed for non-white populations requires further investigation. By integrating with other omics data (i.e. expression profile) or preparing a curated network of information, researchers can attain a more thorough comprehension of the molecular pathways that underlie diseases and the drug actions. Besides, this may unveil new interactions between drugs and disease-related molecules or pathways, expanding the repertoire of potential drug repurposing candidates. However, integrating data from different platforms or technologies can be tedious and challenging and might require a higher degree of data mining and statistical analysis. Therefore, more advanced algorithms or computational tools should be developed to handle the data effectively. Finally, investigation of drug effects through further laboratory experiments along with clinical trials in diverse populations should be undertaken to confirm the effectiveness and safety of repurposed drugs.
CONCLUSIONS
Drug repurposing based on the wealth of genetic information serves as an effective approach to identify novel and promising medical applications for existing drugs. In this review, we discussed different strategies to prioritize drug candidates, the data sources used in each strategy and strengths and limitations for drug repurposing investigations. Future directions for drug repurposing studies encompass several key areas. The analysis of different types of data sources will lead to a better understanding of disease mechanisms. In addition, the development of advanced algorithms that incorporate artificial intelligence approaches could enhance drug repurposing pipelines. Finally, the exploration of drug–drug interactions and synergistic effects for combination therapies, as well as the establishment of collaborative networks and data sharing platforms could accelerate discoveries and enable the clinical translation of repurposing candidates.
The growing availability of extensive data from various sources, such as genomic databases, electronic health records and drug databases, enables researchers to perform more comprehensive and systematic drug repurposing studies.
Drug repurposing using genomic data gains popularity as a promising strategy to identify specific molecular targets implicated in diseases and obtain a deeper understanding of the underlying biological mechanisms.
Mendelian randomization, multi-omic-based and network-based strategies are commonly applied in current drug repurposing studies with genotype data.
Challenges may occur when integrating multi-class data sources, and future biological experiments and randomized clinical trials are warranted to verify the predicted drug effects on medical conditions.
FUNDING
E.T. is supported by a Cancer Research UK Career Development Fellowship (C31250/A22804). L.W. is supported by the Darwin Trust of Edinburgh.
DATA AVAILABILITY
The data supporting the findings of this study are available within the article and its supplementary materials.
Author Biographies
Lijuan Wang is a PhD student in the Usher Institute, University of Edinburgh, UK. Her research focuses on the exploration of drug repurposing opportunities by using multi-omics and network based strategies.
Ying Lu is a Master student in the School of Public Health, Zhejiang University, China. Her research focuses on identifying nutritional factors associated with colorectal cancer risk.
Doudou Li is a PhD student in the School of Public Health, Zhengzhou University, China. His research focuses on exploring the association between air pollution and adverse pregnancy outcomes by using traditional epidemiological research methods.
Yajing Zhou is a PhD student in the School of Public Health, Fudan University, China. Her research interests include epidemiological studies on early-life exposures and noncommunicable diseases, and biostatistical model improvement.
Lili Yu is a PhD student in the Usher Institute, University of Edinburgh, UK. Her research interest encompasses long-term air pollution-induced epigenetic effects on health outcomes.
Ines Mesa Eguiagaray is a statistical geneticist in the Usher Institute, University of Edinburgh, UK. Her research focuses on identifying risk factors associated with breast and colorectal cancer.
Harry Campbell is a professor in the Usher Institute, University of Edinburgh, UK. His research focuses on global health and genetic epidemiology.
Xue Li is an assistant professor in the School of Public Health, Zhejiang University, China. Her research focuses on genetic epidemiology of colorectal cancer and inflammatory bowel disease, nutritional epidemiology, and mechanistic and translational research based on multi-omics technology.
Evropi Theodoratou is a professor in the Usher Institute, University of Edinburgh, UK. Her research focuses on genetic, molecular and cancer epidemiology and developing and applying new research methods in relation to empirical research and evidence-based medicine.