-
PDF
- Split View
-
Views
-
Cite
Cite
Christopher Hawthorne, David A Simpson, Barry Devereux, Guillermo López-Campos, Phexpo: a package for bidirectional enrichment analysis of phenotypes and chemicals, JAMIA Open, Volume 3, Issue 2, July 2020, Pages 173–177, https://doi.org/10.1093/jamiaopen/ooaa023
- Share Icon Share
Abstract
Phenotypes are the result of the complex interplay between environmental and genetic factors. To better understand the interactions between chemical compounds and human phenotypes, and further exposome research we have developed “phexpo,” a tool to perform and explore bidirectional chemical and phenotype interactions using enrichment analyses. Phexpo utilizes gene annotations from 2 curated public repositories, the Comparative Toxicogenomics Database and the Human Phenotype Ontology. We have applied phexpo in 3 case studies linking: (1) individual chemicals (a drug, warfarin, and an industrial chemical, chloroform) with phenotypes, (2) individual phenotypes (left ventricular dysfunction) with chemicals, and (3) multiple phenotypes (covering polycystic ovary syndrome) with chemicals. The results of these analyses demonstrated successful identification of relevant chemicals or phenotypes supported by bibliographic references. The phexpo R package (https://github.com/GHLCLab/phexpo) provides a new bidirectional analyses approach covering relationships from chemicals to phenotypes and from phenotypes to chemicals.
LAY SUMMARY
Chemicals are major contributors to the “exposome,” the whole set of exposures experienced by an individual that shapes their phenotype. Although the effects of exposure to some chemicals can be tested directly this is often not feasible. We conjectured that it might be possible to associate chemicals with phenotypes through the common genes to which they have been independently linked. In this manuscript we present “phexpo,” a novel tool that analyses the overlap between gene lists linked with chemicals or phenotypes. This approach enabled the detection of both known and novel associations between chemicals and phenotypes. Case studies using a chemical, a phenotype, or a combination of phenotypes as a query element demonstrate the application of phexpo. The relationships identified between these entities were supported by evidence from the literature. Phexpo facilitates the establishment and discovery of novel relationships between chemicals and phenotypes and vice versa and therefore provides a valuable new tool for the study of the exposome.
BACKGROUND AND SIGNIFICANCE
Phenotypes, as described in the Human Phenotype Ontology (HPO),1 are the result of the complex interplay between environmental and genetic factors. Recognition of the importance of environmental factors, coupled with an increasing ability to determine individual exposures, led to the concept of the “exposome” as the whole set of exposures of an individual since conception.2 The exposome soon gained traction in research as a complement to the genome and has become an important element for the development of new precision medicine applications.3
During the last century chemistry revolutionized many industries and human activities and as a result, the environment now contains an unprecedented amount and variety of chemicals, such as plasticizers used in bottles, flame retardants in clothes, new drugs, and pesticides used to improve crop yields.4 This increase in anthropogenic chemicals has fostered advances in toxicology and monitoring, and the plethora of subsequent studies and data are making the “chemical component” one of the best understood and studied elements of the exposome. However, despite some advances, our understanding of the biological effects of most chemical compounds is incomplete and for many there remains significant controversy regarding safety levels and effects upon human health.5
Biomedical informatics and translational bioinformatics provide analytical tools and develop data repositories to support exposome and toxicological research. Some of these repositories, such as Toxin and Toxin Target Database,6 Exposome Explorer,7 or the Comparative Toxicogenomics Database (CTD),8 integrate different data sources and combine chemical and biological information such as biomarkers or target genes. This integration has enabled the development of analytical approaches using these contents and the relationships between chemicals and genes to uncover potential links between chemicals and biological pathways or diseases. Although these approaches provide valuable insights, Gene Ontology terms and pathways are mostly focused on the biological outcomes at a cellular or molecular level whereas analyses of diseases involve sets of phenotypes. There is therefore an unmet need to implement a method to relate chemical compounds to the different phenotypes described in the HPO. The HPO has formalized the phenotype space by providing descriptions of clinical abnormalities and annotations to both rare and common diseases and is increasingly being used by different actors for data exchange and identification of disease etiology. To facilitate a bidirectional analysis of the relationships between chemicals and phenotypes using gene annotations, we have developed phexpo (phenotype–exposome), a methodology to perform bidirectional enrichment analysis of chemicals and phenotypes. This methodology has been bundled inside an R package. Phexpo incorporates chemical and gene data from CTD and phenotype and gene data from HPO. We refer to phenotypes as HPO terms. Using a chemical- or phenotype-derived gene list built from genes in both data sources, phexpo will provide enriched chemicals or phenotypes, respectively. This demonstrates a novel methodology that combines gene information from CTD and HPO to generate potential associations between chemical exposures and phenotypes.
METHODS
Chemical–gene relationships, chemical vocabulary, and phenotype–gene datasets were downloaded from CTD (February 5, 2019 update) and HPO (ontology version: February 12, 2019), respectively. As our focus is on human phenotypes, CTD datasets were preprocessed to concentrate exclusively on human gene identifiers and generate the relevant gene lists linking chemicals and human genes. A more detailed preprocessing explanation is included in the Supplementary Material.
To identify relationships between chemicals and phenotypes, phexpo uses the gene lists derived from the annotations for chemicals and phenotypes and then compares them using a Fisher’s exact test against a background universe of genes generated from the aforementioned annotations. To correct for multiple testing phexpo includes and reports Bonferroni corrected P-values and a false discovery rate using Benjamini and Hochberg corrected P-values.
Phexpo is built around 4 analytical functions (Figure 1) and 1 visualization function:

Diagrammatic representation of phexpo’s processes. (A) Chemicals and phenotypes can be connected via genes. Phexpo’s analytical functions return a table of associated results. (B) Further breakdown of phexpo functions. If a user inserts a chemical into the perfFishTestChem functions enriched phenotypes are returned. Conversely, if a user inserts a phenotype into the perfFishTestHPO functions enriched chemicals are returned.
perfFishTestChemSingle(): This function uses a chemical name provided by the user as input and generates a gene list using the chemical–gene dataset. It uses only the genes that are in the intersection of the chemical–gene and phenotype–gene datasets. It calculates the various gene counts against the phenotype–gene dataset and uses them for the R built-in Fisher’s exact test. The function returns a table with all the associated phenotypes.
perfFishTestHPOSingle(): This function uses a HPO term as input and carries out a calculation analogous to perfFishTestChemSingle(), but utilizes the phenotype–gene dataset for gene list creation. It calculates gene counts against the chemical–gene dataset to run the Fisher’s exact test and return a table with all the associated chemicals.
perfFishTestHPOMultiple(): This function uses a list of different phenotypes. It aggregates all the annotated genes for the given phenotypes into a single gene list for the comparison and then carries out the same role as perfFishTestHPOSingle() to return a table with all the associated chemicals.
perfFishTestChemMultiple(): This function uses a list of different chemicals. It aggregates all the annotated genes for the given chemicals into a single gene list for the comparison and then carries out the same role as perfFishTestChemSingle() to return a table with all the associated phenotypes.
visEnrich(): This function provides a Shiny9 interface for the visualization of the results generated using any of the other 4 analytical functions. This function generates a graphical user interface that presents the results in a tabular format and a graphical display enabling the user to manipulate the results using different filtering criteria.
The results of the 4 analytical functions are presented in tabular format and include the raw and corrected P-values, the different gene set sizes, and their overlaps.
RESULTS
To facilitate the bidirectional integration of chemicals and phenotypes we developed a new approach built into an R package, “phexpo,” that exploits gene annotations extracted from 2 curated high-quality resources, CTD (for chemical–gene annotations) and HPO (for phenotype–gene annotations). To demonstrate the capabilities of phexpo we present 3 different case studies using its different functions. In order to assess and evaluate the results we manually validated some of the results and provide bibliographic evidence supporting the associations (additional details are in the Supplementary Material).
Case study I—single chemical to phenotype enrichment
To validate the analysis of single chemicals in phexpo we chose 2 diverse, but well-studied compounds with predictable results, a drug (warfarin) and an industrial chemical (chloroform) (Figure 2). For this analysis, we used the function perfFishTestChemSingle(). As expected, the enriched phenotypes for warfarin, including “deep vein thrombosis” and “abnormality of prothrombin,” match its anticoagulant function. For chloroform, we identified liver phenotypes consistent with its known hepatotoxicity.10 Additional analysis can be found in Supplementary Material.

Case study I results using the shiny interface. The bar charts interface enables filtering using different criteria (A) shows HPO terms identified for warfarin filtered by Bonferroni correction. (B) HPO terms identified for chloroform filtered by FDR. Full results tables available in the Supplementary Material. Abbreviations: FDR: false discovery rate; HPO: Human Phenotype Ontology.
Case study II—single phenotype to chemical enrichment
For the single phenotype case study, we used the HPO term “left ventricular dysfunction” which is described as “inability of the left ventricle to perform its normal physiologic function. Failure is either due to an inability to contract the left ventricle or the inability to relax completely and fill with blood during diastole.” For this analysis, we used the function perfFishTestHPOSingle() and the top 10 results are presented in Table 1.
Top 10 chemicals identified for the HPO term “left ventricular dysfunction”
Chemical name . | P-value . | Bonf . | FDR . |
---|---|---|---|
Halofuginone | 1.11E−11 | 1.31E−08 | 1.31E−08 |
Nitrofen | 6.15E−11 | 7.22E−08 | 3.61E−08 |
1-Trifluoromethoxyphenyl-3-(1-propionylpiperidine-4-yl) urea | 1.04E−08 | 1.22E−05 | 3.54E−06 |
Streptozocin | 1.21E−08 | 1.42E−05 | 3.54E−06 |
Bleomycin | 2.03E−08 | 2.39E−05 | 4.07E−06 |
Fenofibrate | 2.08E−08 | 2.44E−05 | 4.07E−06 |
Phenylephrine | 3.33E−08 | 3.91E−05 | 5.59E−06 |
Palm Oil | 6.59E−08 | 7.74E−05 | 9.48E−06 |
Doxorubicin | 9.25E−08 | 0.000109 | 9.48E−06 |
Dietary fats | 9.65E−08 | 0.000113 | 9.48E−06 |
Chemical name . | P-value . | Bonf . | FDR . |
---|---|---|---|
Halofuginone | 1.11E−11 | 1.31E−08 | 1.31E−08 |
Nitrofen | 6.15E−11 | 7.22E−08 | 3.61E−08 |
1-Trifluoromethoxyphenyl-3-(1-propionylpiperidine-4-yl) urea | 1.04E−08 | 1.22E−05 | 3.54E−06 |
Streptozocin | 1.21E−08 | 1.42E−05 | 3.54E−06 |
Bleomycin | 2.03E−08 | 2.39E−05 | 4.07E−06 |
Fenofibrate | 2.08E−08 | 2.44E−05 | 4.07E−06 |
Phenylephrine | 3.33E−08 | 3.91E−05 | 5.59E−06 |
Palm Oil | 6.59E−08 | 7.74E−05 | 9.48E−06 |
Doxorubicin | 9.25E−08 | 0.000109 | 9.48E−06 |
Dietary fats | 9.65E−08 | 0.000113 | 9.48E−06 |
Arranged by ascending P-value.
Abbreviations: HPO: Human Phenotype Ontology; FDR: false discovery rate.
Top 10 chemicals identified for the HPO term “left ventricular dysfunction”
Chemical name . | P-value . | Bonf . | FDR . |
---|---|---|---|
Halofuginone | 1.11E−11 | 1.31E−08 | 1.31E−08 |
Nitrofen | 6.15E−11 | 7.22E−08 | 3.61E−08 |
1-Trifluoromethoxyphenyl-3-(1-propionylpiperidine-4-yl) urea | 1.04E−08 | 1.22E−05 | 3.54E−06 |
Streptozocin | 1.21E−08 | 1.42E−05 | 3.54E−06 |
Bleomycin | 2.03E−08 | 2.39E−05 | 4.07E−06 |
Fenofibrate | 2.08E−08 | 2.44E−05 | 4.07E−06 |
Phenylephrine | 3.33E−08 | 3.91E−05 | 5.59E−06 |
Palm Oil | 6.59E−08 | 7.74E−05 | 9.48E−06 |
Doxorubicin | 9.25E−08 | 0.000109 | 9.48E−06 |
Dietary fats | 9.65E−08 | 0.000113 | 9.48E−06 |
Chemical name . | P-value . | Bonf . | FDR . |
---|---|---|---|
Halofuginone | 1.11E−11 | 1.31E−08 | 1.31E−08 |
Nitrofen | 6.15E−11 | 7.22E−08 | 3.61E−08 |
1-Trifluoromethoxyphenyl-3-(1-propionylpiperidine-4-yl) urea | 1.04E−08 | 1.22E−05 | 3.54E−06 |
Streptozocin | 1.21E−08 | 1.42E−05 | 3.54E−06 |
Bleomycin | 2.03E−08 | 2.39E−05 | 4.07E−06 |
Fenofibrate | 2.08E−08 | 2.44E−05 | 4.07E−06 |
Phenylephrine | 3.33E−08 | 3.91E−05 | 5.59E−06 |
Palm Oil | 6.59E−08 | 7.74E−05 | 9.48E−06 |
Doxorubicin | 9.25E−08 | 0.000109 | 9.48E−06 |
Dietary fats | 9.65E−08 | 0.000113 | 9.48E−06 |
Arranged by ascending P-value.
Abbreviations: HPO: Human Phenotype Ontology; FDR: false discovery rate.
From these results we highlight the following potential relationships, halofuginone has been found to elicit a shielding effect against stress on the heart.11 1-Trifluoromethoxyphenyl-3-(1-propionylpiperidine-4-yl) urea is a soluble epoxide hydrolase inhibitor and soluble epoxide hydrolase inhibitors have been suggested as a potential strategy against heart diseases.12 In animal models streptozocin is used to cause diabetes mellitus, which has a link to diastolic heart dysfunction.13 The antibiotic bleomycin has known cardiotoxic effects when used in chemotherapy.14 Fenofibrate given short term has been shown to reduce some of the effects of chronic left ventricular volume overload in rat models.15 Phenylephrine causes hypertrophy when administered to neonatal rat cardiomyocytes.16 Tocotrienol rich fractions extracted from palm oil had beneficial heart functioning.17 Doxorubicin is both a drug used for cancer treatment as well as being a cardiotoxic agent involved in causing heart failure.18
Case study III—multiple phenotypes to chemical enrichment
An important feature of our methodology is that it allows for the combination of multiple phenotypes in a single enrichment analysis. This is important for diseases, complex conditions, or syndromes that comprise a variety of phenotypes, which can be stacked together in our analysis. In this case study, we use the example of polycystic ovary syndrome (PCOS). Although the wide range of different phenotypes displayed by women with this endocrine disorder have hindered elucidation of the causes, exposomic and inherited genetic variables are likely to play a part.19
We compiled a list of phenotypes including “oligomenorrhea,” “enlarged polycystic ovaries,” “amenorrhea,” “hirsutism,” “increased body weight,” and “acne” that characterize PCOS20,21 and performed an enrichment analysis to test whether it returns chemicals with a known relationship to PCOS shown in Table 2.
Chemical name . | P-value . | Bonf . | FDR . |
---|---|---|---|
Tetrachlorodibenzodioxin | 2.38E−39 | 1.22E−35 | 1.22E−35 |
Bisphenol A | 2.70E−32 | 1.39E−28 | 6.94E−29 |
Ammonium chloride | 1.21E−25 | 6.23E−22 | 2.08E−22 |
Valproic acid | 1.50E−19 | 7.71E−16 | 1.79E−16 |
Ethylnitrosourea | 1.74E−19 | 8.93E−16 | 1.79E−16 |
Colforsin | 4.17E−19 | 2.15E−15 | 3.58E−16 |
Vehicle emissions | 6.76E−19 | 3.48E−15 | 4.96E−16 |
Diethylhexyl phthalate | 2.55E−18 | 1.31E−14 | 1.64E−15 |
Ethinyl estradiol | 3.52E−18 | 1.81E−14 | 2.01E−15 |
Dexamethasone | 3.20E−17 | 1.65E−13 | 1.65E−14 |
Chemical name . | P-value . | Bonf . | FDR . |
---|---|---|---|
Tetrachlorodibenzodioxin | 2.38E−39 | 1.22E−35 | 1.22E−35 |
Bisphenol A | 2.70E−32 | 1.39E−28 | 6.94E−29 |
Ammonium chloride | 1.21E−25 | 6.23E−22 | 2.08E−22 |
Valproic acid | 1.50E−19 | 7.71E−16 | 1.79E−16 |
Ethylnitrosourea | 1.74E−19 | 8.93E−16 | 1.79E−16 |
Colforsin | 4.17E−19 | 2.15E−15 | 3.58E−16 |
Vehicle emissions | 6.76E−19 | 3.48E−15 | 4.96E−16 |
Diethylhexyl phthalate | 2.55E−18 | 1.31E−14 | 1.64E−15 |
Ethinyl estradiol | 3.52E−18 | 1.81E−14 | 2.01E−15 |
Dexamethasone | 3.20E−17 | 1.65E−13 | 1.65E−14 |
Arranged by ascending P-value.
Abbreviations: FDR: false discovery rate; PCOS: polycystic ovary syndrome.
Chemical name . | P-value . | Bonf . | FDR . |
---|---|---|---|
Tetrachlorodibenzodioxin | 2.38E−39 | 1.22E−35 | 1.22E−35 |
Bisphenol A | 2.70E−32 | 1.39E−28 | 6.94E−29 |
Ammonium chloride | 1.21E−25 | 6.23E−22 | 2.08E−22 |
Valproic acid | 1.50E−19 | 7.71E−16 | 1.79E−16 |
Ethylnitrosourea | 1.74E−19 | 8.93E−16 | 1.79E−16 |
Colforsin | 4.17E−19 | 2.15E−15 | 3.58E−16 |
Vehicle emissions | 6.76E−19 | 3.48E−15 | 4.96E−16 |
Diethylhexyl phthalate | 2.55E−18 | 1.31E−14 | 1.64E−15 |
Ethinyl estradiol | 3.52E−18 | 1.81E−14 | 2.01E−15 |
Dexamethasone | 3.20E−17 | 1.65E−13 | 1.65E−14 |
Chemical name . | P-value . | Bonf . | FDR . |
---|---|---|---|
Tetrachlorodibenzodioxin | 2.38E−39 | 1.22E−35 | 1.22E−35 |
Bisphenol A | 2.70E−32 | 1.39E−28 | 6.94E−29 |
Ammonium chloride | 1.21E−25 | 6.23E−22 | 2.08E−22 |
Valproic acid | 1.50E−19 | 7.71E−16 | 1.79E−16 |
Ethylnitrosourea | 1.74E−19 | 8.93E−16 | 1.79E−16 |
Colforsin | 4.17E−19 | 2.15E−15 | 3.58E−16 |
Vehicle emissions | 6.76E−19 | 3.48E−15 | 4.96E−16 |
Diethylhexyl phthalate | 2.55E−18 | 1.31E−14 | 1.64E−15 |
Ethinyl estradiol | 3.52E−18 | 1.81E−14 | 2.01E−15 |
Dexamethasone | 3.20E−17 | 1.65E−13 | 1.65E−14 |
Arranged by ascending P-value.
Abbreviations: FDR: false discovery rate; PCOS: polycystic ovary syndrome.
Of the multiple highly enriched chemicals returned, many have documented links with PCOS. Tetrachlorodibenzodioxin is an endocrine-disrupting chemical and although its influence on PCOS has not been specifically assessed it has been highlighted as a suspect for consideration.22 Individuals with PCOS were found to have increased levels of bisphenol A.23 Valproic acid treatment has a known association with heightened PCOS occurrence in epileptic patients.24 Colforsin (forskolin) can function in a similar way to luteinizing hormone on PCOS theca cells.25 Diethylhexyl phthalate has been shown to have ovarian effects in rats.26 Ethinyl estradiol is used to combat the acne and hirsutism phenotypes of PCOS.27 Dexamethasone has been used to increase testosterone production in theca cells to mimic PCOS patients with hyperandrogenism.28
DISCUSSION
In this work, we have presented a new approach to establish bidirectional relationships between phenotypes and chemical exposures. This methodology has been successfully implemented in phexpo, a multiplatform R package that is freely available (https://github.com/GHLCLab/phexpo).
In contrast to other existing applications our approach allows searches with multiple chemicals or phenotypes simultaneously. This enables users to search (or find) different individual phenotypes rather than diseases that might be too broad or may share phenotypes (or symptoms) that could lead to overlaps.
We have successfully tested phexpo and its different functions using a variety of chemicals and phenotypes and have been able to validate the results using bibliographic references. The methodology can therefore be used to discover novel potential relationships that open new avenues of research and direct additional experimental validations and exploration.
Although we demonstrate the ability of the approach to generate interesting and validated results, we acknowledge that there are limitations. The results are confined to the annotations present and although we selected high-quality and well-known resources we are currently only using 1 chemical database (CTD) and 1 phenotype ontology (HPO). The associations established between chemicals and phenotypes lack “directionality,” in that a chemical may induce or protect from a certain phenotype, and indeed both were found in the case study lists. Finally, although aggregating phenotypes or chemicals is a powerful tool, the use of an additive approach requires the union of all the annotated genes and does not take account of whether in 2 different phenotypes or chemicals those genes might be affected in different directions (eg, induced by 1 chemical and repressed by another). Other potential limitations are that dosage and timing of exposures that might be relevant for the development of some of the phenotypes are not considered.
In conclusion, we have introduced a novel methodology bundled inside an R package called phexpo that links chemical compounds and phenotype terms through enrichment analyses based on their gene annotations. We have described 3 case studies validated through the literature which present phexpo’s functionalities and its capabilities to identify phenotypes related to a chemical and vice versa. Phexpo’s bidirectional approach to study the potential relationships between chemical compounds and human phenotypes provides insights for human health and exposome research. This tool will be a valuable asset to further exposome research by revealing potential novel phenotype–chemical associations.
FUNDING
CH has been supported by a Northern Ireland Department for the Economy (DfE) postgraduate studentship award.
AUTHOR CONTRIBUTIONS
CH constructed the package, contributed to the idea, contributed to the design of the experiments, run the analyses, validated the results, and wrote the manuscript. GL-C created the idea, designed the experiments, validated the results, and wrote the manuscript. DAS contributed to the idea and critical manuscript revision. BD contributed to the idea and critical manuscript revision.
SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.
ACKNOWLEDGEMENTS
We thank Dr Jaine Blayney for reading a draft version of the manuscript and the useful comments and feedback returned.
CONFLICT OF INTEREST STATEMENT
None declared.
REFERENCES
The Human Phenotype Ontology. Polycystic Ovary Syndrome 1 OMIM: 184700.
NHS. Polycystic Ovary Syndrome.