Abstract

Consortium-based research is crucial for producing reliable, high-quality findings, but existing tools for consortium studies have important drawbacks with respect to data protection, ease of deployment, and analytical rigor. To address these concerns, we developed COnsortium of METabolomics Studies (COMETS) Analytics to support and streamline consortium-based analyses of metabolomics and other -omics data. The application requires no specialized expertise and can be run locally to guarantee data protection or through a Web-based server for convenience and speed. Unlike other Web-based tools, COMETS Analytics enables standardized analyses to be run across all cohorts, using an algorithmic, reproducible approach to diagnose, document, and fix model issues. This eliminates the time-consuming and potentially error-prone step of manually customizing models by cohort, helping to accelerate consortium-based projects and enhancing analytical reproducibility. We demonstrated that the application scales well by performing 2 data analyses in 45 cohort studies that together comprised measurements of 4,647 metabolites in up to 134,742 participants. COMETS Analytics performed well in this test, as judged by the minimal errors that analysts had in preparing data inputs and the successful execution of all models attempted. As metabolomics gathers momentum among biomedical and epidemiologic researchers, COMETS Analytics may be a useful tool for facilitating large-scale consortium-based research.

Abbreviations:

     
  • app

    application

  •  
  • BMI

    body mass index

  •  
  • COMETS

    Consortium of Metabolomics Studies

  •  
  • HMDB

    Human Metabolome Database

Editor’s note: An invited commentary on this article appears on page 159.

In recent years, new high-throughput technologies and the advent of “-omics” methods have made it possible to measure thousands to millions of features in human biospecimens. Such technologies allow human biology to be interrogated in unprecedented detail, but also pose challenges to the conduct of research. In particular, because -omics studies involve thousands of statistical tests, they require stringent correction for multiple testing and large sample sizes to reliably detect true-positive associations (1–3). To obtain requisite sample sizes, genomics researchers have formed large-scale collaborative consortiums, but for other -omics research areas, such collaborations continue to evolve.

Increasingly, epidemiologic and population studies are using metabolomics to uncover new metabolic aspects of a variety of phenotypes, including but not limited to diabetes (4–7), cardiovascular disease (8–10), cancer (11–14), obesity (15–17), and diet and nutrition (18–28). Guided by recent experiences in genomics and other “-omics” fields (3, 29), metabolomics researchers have recognized that large sample sizes, diverse populations, and replication studies are needed to minimize false-positive findings (1, 2, 30) and confounding (3, 31). Accordingly, many have joined the COnsortium of METabolomics Studies (COMETS) (32), an international consortium of 67 epidemiologic studies (as of this writing) with blood-based metabolomics data and follow-up for disease outcomes.

Conducting data analyses in large consortia, like COMETS, is complex. Traditionally, consortia have pooled data centrally, an approach that results in straightforward data analyses but may not be viable for institutes that forbid data transfers or find them challenging under the European Union’s General Data Protection Regulation. An alternative is the federated data analysis, in which institutes analyze their own data and send results centrally for meta-analysis. While appealing, existing methods for federated analyses have important drawbacks when applied to emerging and fast-growing consortia. One method, for example, involves sending software code for statistical models to each institute, which places a heavy burden on institute analysts, who must harmonize data and fit models. Another method involves using a software platform that enables analysts to directly send models to each institute’s data servers (33, 34). This facilitates harmonization, enhances quality control, and minimizes analyst burden, but also requires data-use agreements and specially prepared servers that take time, money, and expertise to put in place.

For emerging and growing consortia, there is a need for software that can be deployed rapidly and at low cost to facilitate the conduct of rigorous federated data analyses. Toward this end, we created COMETS Analytics, a new free online tool designed for federated analyses. Since each institute conducts its own analysis, this system works around issues of data-use permissions and consent that may otherwise hinder consortium projects. Although this application (app) is designed explicitly for the analysis of metabolomics data, it can also be used to conduct other consortium-based analyses, since its principles and workflows are relevant for non-omics and other -omics data. In this manuscript, we describe the design and development of COMETS Analytics and share our experiences in a test application of this tool to 2 ongoing research projects involving 45 prospective cohort studies participating in COMETS.

DESIGN AND DEVELOPMENT OF COMETS

Early in the development of COMETS, several cohort representatives explained that they were unable to participate in centralized data analyses due to the terms of their institutional review board approval and/or the European Union General Data Protection Regulation. To be as inclusive as possible, it was determined that data analyses should be conducted on a disseminated basis, in which each cohort analyzes its own data and send results centrally for meta-analysis. Such disseminated analyses also can proceed more quickly than centralized analyses, permit analysis of the most recently ascertained disease outcomes, and allow cohort study investigators to maintain control over their data. Cohort representatives noted, however, that they would need tools to help their analysts conduct such analyses. Consequently, it was decided to create a data analysis platform to facilitate the conduct of statistical analyses. Input was sought from a diverse pool of stakeholders during presentations at national and international scientific meetings, as well as from approved project investigators, to identify user needs by region (Europe, North America, South America, and Asia), by discipline (including but not limited to heart disease, cancer, diabetes, and genetics), and by affiliation (academic, industry, and government).

COMETS Analytics comprises 2 main components: 1) a stand-alone R (R Foundation for Statistical Computing, Vienna, Austria) package, R-cometsAnalytics, that encapsulates our core algorithms and functionality and 2) a Web-based app (https://www.comets-analytics.org) developed as a user-friendly interface to the R package using HTML5 (Web Hypertext Application Technology Working Group (https://whatwg.org/)). The GitHub repository (GitHub, Inc., San Francisco, California) for COMETS Analytics (35) is publicly available under a GNU General Public License, version 3 (GPLv3; Free Software Foundation, Boston, Massachusetts (http://www.gnu.org/licenses/gpl-3.0.html)) and contains the harmonization database, the R package source, and a companion tutorial. COMETS Analytics can be run locally using vignette-guided instructions bundled with the stand-alone R package or through the Web-based app, which has superior speed, usability, and exploratory tools and operates on secure cloud-based servers that delete data after analyses. A sample input file is available in the repository and on the Web app. We encourage readers to read the tutorial and test-drive COMETS Analytics using the sample input file to enhance their understanding.

The overarching concept of the Web app for COMETS Analytics is simple: Users prepare data input, click a button to confirm data integrity, click a button to initiate secured data analyses, and receive an e-mail with aggregate results a few minutes later. Assuming results are for a meta-analysis, the user can then forward the e-mail directly to the central analyst. These steps are shown in Table 1 and can include 1 or more concurrent projects. Note that COMETS Analytics simplifies the preparation of data input by clarifying, through a template, exactly how variables should be coded and by outputting detailed error messages when variables are miscoded, as will be discussed further below. Regarding the input of relative metabolite abundances, the user can typically use the values as preprocessed and provided by cores or companies (e.g., normalized, scaled, and missing-value–imputed data).

Table 1

Workflow for Performing a Consortium Meta-Analysis Using COMETS Analytics (Web App or Local Ra Package)

StepActions in Step
Standardized data and modelsProject coordinatorb prepares and sends an Excel sheet with “VarMap” and “Models” tabs to cohort analysts.
Cohort analysts add their data to the “Metabolites,” “Subject Metabolites,” and “Subjects” tabs to complete the input file.
Input integrity check and harmonizationCohort analysts open COMETS Analytics, select their input file, and click “check integrity.”
If the data set passes the quality control check, cohort analysts continue to the analysis.
If the data set fails the quality control check, cohort analysts read the error messages and make appropriate fixes.
Cohort-specific analyses with model validity checkCohort analysts add their e-mail address (online app only).
Cohort analysts click “run model.”
Cohort analysts forward results and documentation from COMETS Analytics to project coordinator.
Standardized results aggregationProject coordinator oversees harmonization of metabolite names from COMETS Analytics files.
Project coordinator oversees meta-analysis.
Project coordinator prepares manuscript based on meta-analysis results.
StepActions in Step
Standardized data and modelsProject coordinatorb prepares and sends an Excel sheet with “VarMap” and “Models” tabs to cohort analysts.
Cohort analysts add their data to the “Metabolites,” “Subject Metabolites,” and “Subjects” tabs to complete the input file.
Input integrity check and harmonizationCohort analysts open COMETS Analytics, select their input file, and click “check integrity.”
If the data set passes the quality control check, cohort analysts continue to the analysis.
If the data set fails the quality control check, cohort analysts read the error messages and make appropriate fixes.
Cohort-specific analyses with model validity checkCohort analysts add their e-mail address (online app only).
Cohort analysts click “run model.”
Cohort analysts forward results and documentation from COMETS Analytics to project coordinator.
Standardized results aggregationProject coordinator oversees harmonization of metabolite names from COMETS Analytics files.
Project coordinator oversees meta-analysis.
Project coordinator prepares manuscript based on meta-analysis results.

Abbreviations: app, application; COMETS, Consortium of Metabolomics Studies.

a R Foundation for Statistical Computing, Vienna, Austria.

b “Project coordinator” refers to an investigator or project lead.

Table 1

Workflow for Performing a Consortium Meta-Analysis Using COMETS Analytics (Web App or Local Ra Package)

StepActions in Step
Standardized data and modelsProject coordinatorb prepares and sends an Excel sheet with “VarMap” and “Models” tabs to cohort analysts.
Cohort analysts add their data to the “Metabolites,” “Subject Metabolites,” and “Subjects” tabs to complete the input file.
Input integrity check and harmonizationCohort analysts open COMETS Analytics, select their input file, and click “check integrity.”
If the data set passes the quality control check, cohort analysts continue to the analysis.
If the data set fails the quality control check, cohort analysts read the error messages and make appropriate fixes.
Cohort-specific analyses with model validity checkCohort analysts add their e-mail address (online app only).
Cohort analysts click “run model.”
Cohort analysts forward results and documentation from COMETS Analytics to project coordinator.
Standardized results aggregationProject coordinator oversees harmonization of metabolite names from COMETS Analytics files.
Project coordinator oversees meta-analysis.
Project coordinator prepares manuscript based on meta-analysis results.
StepActions in Step
Standardized data and modelsProject coordinatorb prepares and sends an Excel sheet with “VarMap” and “Models” tabs to cohort analysts.
Cohort analysts add their data to the “Metabolites,” “Subject Metabolites,” and “Subjects” tabs to complete the input file.
Input integrity check and harmonizationCohort analysts open COMETS Analytics, select their input file, and click “check integrity.”
If the data set passes the quality control check, cohort analysts continue to the analysis.
If the data set fails the quality control check, cohort analysts read the error messages and make appropriate fixes.
Cohort-specific analyses with model validity checkCohort analysts add their e-mail address (online app only).
Cohort analysts click “run model.”
Cohort analysts forward results and documentation from COMETS Analytics to project coordinator.
Standardized results aggregationProject coordinator oversees harmonization of metabolite names from COMETS Analytics files.
Project coordinator oversees meta-analysis.
Project coordinator prepares manuscript based on meta-analysis results.

Abbreviations: app, application; COMETS, Consortium of Metabolomics Studies.

a R Foundation for Statistical Computing, Vienna, Austria.

b “Project coordinator” refers to an investigator or project lead.

As compared with other online metabolomics data analysis programs (e.g., MetaboAnalyst (36) and XCMSOnline (37)), COMETS Analytics permits adjustment for confounders (i.e., multivariable modeling) and includes novel features that support consortium-based research. For example, COMETS Analytics checks variables to ensure analytical integrity, provides descriptive statistics, documents the models, and outputs results in a meta-analysis–ready format. COMETS Analytics does not require software installation, specially prepared servers, or user expertise beyond that of how to prepare the initial data inputs. An important advantage of COMETS Analytics over sending code for specific statistical models to each center is that its “robust analytics” enable the same variable coding and the same models to be used for all participating cohorts (see “Standardized models” section below). This eliminates hand-customized code, thereby reducing coding errors and improving analytical transparency and reproducibility.

Our design for COMETS Analytics focused on 7 key design requirements. To ensure that collaborating partners are able to use the system, we determined that COMETS Analytics should 1) be highly usable and 2) protect data. To ensure that code can be maintained over time, we determined that COMETS Analytics should 3) follow current best practices for research software. Additionally, to ensure that the data analysis process is as seamless as possible, we determined that COMETS Analytics needs clear standards for 4) data input, 5) models, and 6) results. To maximize the utility of COMETS Analytics to researchers at large, we also determined that it should potentially be 7) applicable to other -omics data. We describe these requirements in Table 2 and discuss how COMETS Analytics meets each in turn below.

Table 2

Requirements Underlying the Design of COMETS Analytics

Key RequirementSpecific Requirement
High usability and accessibilityRequires no specialized software or expertise to use
Includes interactive components so that users can explore data
Runs tens of thousands of statistical models efficiently
Reuses standardized model input for each analytical cycle
Includes a step-by-step tutorial (35)
Data protectionPrioritizes and ensures data protection
Includes a stand-alone package for institutes whose data policies prevent them from utilizing Web-based data analysis apps
Use of best practices for coding and algorithm developmentDeveloped using the Ra statistical language, which is extensively used in high-throughput data analyses and has a very active user base and community
Documents the development process and makes software code, tutorial, and documentation publicly available
Standardized data inputIncludes a clear scheme for data formatting and model coding
Performs basic checks to ensure data and model integrity
Contains metadata needed to harmonize metabolite names
Standardized modelsRuns models and modifies them as needed to avoid showstopper errors
Documents any model issues and/or modifications
Standardized and well- documented resultsOutputs reproducible results in standard format ready for meta-analysis
Documents the exact models executed
Displays distribution of covariates for each data set
Applicability to other -omics dataUses a data structure that is generalizable to other -omics data
Data inputs can be modified by simply listing identifiers and names for the analytes of interest in the “Metabolites” table and their values in the “Subject Metabolites” table and running analyses
Key RequirementSpecific Requirement
High usability and accessibilityRequires no specialized software or expertise to use
Includes interactive components so that users can explore data
Runs tens of thousands of statistical models efficiently
Reuses standardized model input for each analytical cycle
Includes a step-by-step tutorial (35)
Data protectionPrioritizes and ensures data protection
Includes a stand-alone package for institutes whose data policies prevent them from utilizing Web-based data analysis apps
Use of best practices for coding and algorithm developmentDeveloped using the Ra statistical language, which is extensively used in high-throughput data analyses and has a very active user base and community
Documents the development process and makes software code, tutorial, and documentation publicly available
Standardized data inputIncludes a clear scheme for data formatting and model coding
Performs basic checks to ensure data and model integrity
Contains metadata needed to harmonize metabolite names
Standardized modelsRuns models and modifies them as needed to avoid showstopper errors
Documents any model issues and/or modifications
Standardized and well- documented resultsOutputs reproducible results in standard format ready for meta-analysis
Documents the exact models executed
Displays distribution of covariates for each data set
Applicability to other -omics dataUses a data structure that is generalizable to other -omics data
Data inputs can be modified by simply listing identifiers and names for the analytes of interest in the “Metabolites” table and their values in the “Subject Metabolites” table and running analyses

Abbreviations: app, application; COMETS, Consortium of Metabolomics Studies.

a R Foundation for Statistical Computing, Vienna, Austria.

Table 2

Requirements Underlying the Design of COMETS Analytics

Key RequirementSpecific Requirement
High usability and accessibilityRequires no specialized software or expertise to use
Includes interactive components so that users can explore data
Runs tens of thousands of statistical models efficiently
Reuses standardized model input for each analytical cycle
Includes a step-by-step tutorial (35)
Data protectionPrioritizes and ensures data protection
Includes a stand-alone package for institutes whose data policies prevent them from utilizing Web-based data analysis apps
Use of best practices for coding and algorithm developmentDeveloped using the Ra statistical language, which is extensively used in high-throughput data analyses and has a very active user base and community
Documents the development process and makes software code, tutorial, and documentation publicly available
Standardized data inputIncludes a clear scheme for data formatting and model coding
Performs basic checks to ensure data and model integrity
Contains metadata needed to harmonize metabolite names
Standardized modelsRuns models and modifies them as needed to avoid showstopper errors
Documents any model issues and/or modifications
Standardized and well- documented resultsOutputs reproducible results in standard format ready for meta-analysis
Documents the exact models executed
Displays distribution of covariates for each data set
Applicability to other -omics dataUses a data structure that is generalizable to other -omics data
Data inputs can be modified by simply listing identifiers and names for the analytes of interest in the “Metabolites” table and their values in the “Subject Metabolites” table and running analyses
Key RequirementSpecific Requirement
High usability and accessibilityRequires no specialized software or expertise to use
Includes interactive components so that users can explore data
Runs tens of thousands of statistical models efficiently
Reuses standardized model input for each analytical cycle
Includes a step-by-step tutorial (35)
Data protectionPrioritizes and ensures data protection
Includes a stand-alone package for institutes whose data policies prevent them from utilizing Web-based data analysis apps
Use of best practices for coding and algorithm developmentDeveloped using the Ra statistical language, which is extensively used in high-throughput data analyses and has a very active user base and community
Documents the development process and makes software code, tutorial, and documentation publicly available
Standardized data inputIncludes a clear scheme for data formatting and model coding
Performs basic checks to ensure data and model integrity
Contains metadata needed to harmonize metabolite names
Standardized modelsRuns models and modifies them as needed to avoid showstopper errors
Documents any model issues and/or modifications
Standardized and well- documented resultsOutputs reproducible results in standard format ready for meta-analysis
Documents the exact models executed
Displays distribution of covariates for each data set
Applicability to other -omics dataUses a data structure that is generalizable to other -omics data
Data inputs can be modified by simply listing identifiers and names for the analytes of interest in the “Metabolites” table and their values in the “Subject Metabolites” table and running analyses

Abbreviations: app, application; COMETS, Consortium of Metabolomics Studies.

a R Foundation for Statistical Computing, Vienna, Austria.

High usability and accessibility

Consortia analyses include many institutes and centers, with varying levels of expertise in data analysis. To accommodate this broad audience of users, we emphasized high usability and accessibility in our system development. Toward this end, COMETS Analytics uses data inputs that resemble what analysts of prospective cohort studies use in their day-to-day work—namely, data file formats with participants as rows, variables as columns, and plain-language variable names (e.g., BMI for body mass index). In addition, we use Microsoft Excel (Microsoft Corporation, Redmond, Washington) as the standard format for the common data input, since most analysts are familiar with it and it has nearly universal accessibility (an estimated 1.2 billion licenses worldwide (38)). For the Web app, the user interface has interactive components that do not require special expertise to use and can provide real-time results, thus allowing the integrity of data inputs to be tested and confirmed. The system executes models quickly to accommodate the hundreds to thousands of metabolites and many models that comprise metabolomics analyses. Data analyses are hosted on Amazon Web Services (Amazon Web Services, Inc., Seattle, Washington) so that additional servers can run computationally intensive analyses in parallel, as needed. At present, using our sample input file, it takes fewer than 30 seconds to analyze the association of an exposure (e.g., BMI, age) with levels of 611 metabolites in 1,000 participants, adjusted for various factors.

When running the R package locally, the stand-alone R package can be installed using the devtools R package. This includes extensive documentation for each function and a package vignette that explains all analysis steps, from input to output. Regardless of method, the Web-based app and stand-alone R package are synchronized automatically through GitHub, ensuring that each has identical functionality and yields identical results.

Data protection

We designed COMETS Analytics to run as either a stand-alone R package or a Web-based app so that we could accommodate the different data protection concerns of participating cohorts. With the stand-alone R app, all aspects of the analysis are run locally, and analysts transfer only the summary results for central meta-analyses. With the Web app, the following measures are taken to secure data in transit, during analyses, and after analyses (Figure 1): 1) input data upload: when a user uploads an input file to the server, the data are encrypted using SSL (SSL Corporation, Houston, Texas) to protect the contents of data in transit; 2) input data retention: the input data are deleted immediately after completion of the data integrity check and analyses, which usually takes only a few seconds; 3) summary results retention: the summary results are stored in a secured private S3 bucket where data are protected at rest (Amazon S3 server-side encryption) using the 256-bit Advanced Encryption Standard (AES-256) to encrypt data. All results are deleted automatically after 7 days by configured S3 bucket life-cycle policy.

Security of the data flow when running COMETS Analytics through the Web application (https://www.comets-analytics.org). AES, Advanced Encryption Standard; AWS, Amazon Web Services; COMETS, Consortium of Metabolomics Studies; EC2, Elastic Compute Cloud; SSL, secure sockets layer; VPC, virtual private cloud.
Figure 1

Security of the data flow when running COMETS Analytics through the Web application (https://www.comets-analytics.org). AES, Advanced Encryption Standard; AWS, Amazon Web Services; COMETS, Consortium of Metabolomics Studies; EC2, Elastic Compute Cloud; SSL, secure sockets layer; VPC, virtual private cloud.

Use of best practices for data stewardship and research software development

We chose R as the base language for COMETS Analytics because it is widely used, free, and open-source and has an impressive collection of well-documented analytical packages and algorithms. We used GitHub, a free and open-source system for version control, to allow multiple developers to code simultaneously and to support reporting and tracking of bugs and fixes throughout package development. We used Travis CI (https://travis-ci.org/) and AppVeyor (https://www.appveyor.com/) to test compatibility of the software across operating systems (e.g., Windows, Unix/Linux, Mac OS). We tested overall functionality at 4 different levels—unit testing, integration testing, system testing, and user acceptance testing–using the R testthat package. The unit tests evaluate analytical stability by comparing results of the 27 models in our sample input file with previously obtained benchmark results. A selection of benchmark results was also confirmed to be identical when running models in SAS (SAS Institute, Inc., Cary, North Carolina) and STATA (StataCorp LLC, College Station, Texas). The infrastructure supports parallel development of analytical modules that invoke common core functions for data and model integrity.

COMETS Analytics is designed to be open and transparent. The software code for COMETS Analytics is publicly posted in our GitHub repository (35). The metabolites, their metadata, and information on how they were harmonized are available through a link on the COMETS Analytics website and through GitHub, and results are output in a standard format that can be reused in new analyses. In addition, specific features of COMETS Analytics follow best practices in software development, including 1) defined keywords, meta-information on software, and registry registration to enhance the findability of the software; 2) detailed documentation, versioning, and licensing, and the ability to download different versions of the software through GitHub or access the latest version of the software through a National Cancer Institute–supported server; 3) build and deployment testing through continuous integration and deployment (Travis and Appveyor); and 4) implementation of unit testing to ensure the reproducibility of results using test data sets.

Standardized data input

Smooth execution of data analyses and meta-analysis requires standardized inputs, models, and results. The required input for COMETS Analytics is an Excel file that contains 5 sheets (“Metabolites,” “Subject Metabolites,” “Subjects,” “VarMap,” and “Models”). The use of separate files for metabolite metadata, subject-specific metabolite data, and subject-specific covariate data is, in our experience, the most efficient and flexible way to manage metabolomics data and should be considered a best practice. A sample input file is available at COMETS-Analytics.org and in the R package (“inst/extdata” folder). We present an overview of each sheet in Figure 2 and describe them in detail below.

High-level overview of the process for developing standardized data inputs, models, and results in COMETS Analytics. COMETS, Consortium of Metabolomics Studies.
Figure 2

High-level overview of the process for developing standardized data inputs, models, and results in COMETS Analytics. COMETS, Consortium of Metabolomics Studies.

The “Metabolites” sheet captures metabolite metadata. Each row is a metabolite, and the columns are the metadata. Only 2 columns are required: one for the metabolite identifier (typically an R- or SAS-compliant name) and one for biochemical name. Other metabolite identifiers are optional but desirable, as they enhance metabolite harmonization. These identifiers, provided by the laboratories, may include Human Metabolome Database (HMDB), PubChem, Metabolon ID, or InChiKey. Given the lack of universal metabolite identifiers, such as reference SNP (rs) accession numbers in genomics (39), this sheet provides a mechanism for mapping metabolites across multiple laboratory platforms that utilize different nomenclatures. We stress that these data are vital for comparing findings across studies, that it is imperative for laboratories generating metabolomics data to provide them routinely, and that researchers should preserve these identifiers.

The “Subject Metabolites” sheet contains metabolite levels, as input by the participating institute. Each row is a study participant, and columns are the metabolite levels. Since data acquisition varies by platform, we rely on the expertise of each center to achieve optimal data preprocessing (normalization, imputation, and transformation), though COMETS is developing reference samples that may facilitate preprocessing in future studies. At present, no missing data are permitted for metabolite levels. In our test application, each group imputed missing data using their own standard procedures and reported procedures to the lead investigators.

The “Subjects” sheet includes covariate data. Each row is a study participant, and the columns are the covariates (age, sex, BMI, smoking status, etc.). All cells in this sheet must contain a value, but this could include a code for missing values as specified in the “VarMap” sheet.

The “VarMap” sheet is the data dictionary that cohort study investigators use to code their variables. Each row is a variable, and the columns contain details about how that variable should be coded. The principal investigator leading the meta-analysis establishes the desired coding (e.g., for sex, 0 = male and 1 = female) and then disseminates the sheet to participating institutes and centers so they can code their data (further details are provided in the “VarMap” section (currently section 2.1.4) of the tutorial (35)). The sheet is to be used without modification by participating centers.

The “Models” sheet specifies the models for the analysis. Each row is a model, and the columns provide detailed information on that model, including the exposures, adjustments, any stratifications, and model type (further details are provided in the “Models” section (currently section 2.1.5) of the tutorial (35)). Like the “VarMap” sheet, this sheet is also prepared by the principal investigator leading the meta-analysis and is to be used without modification by participating centers. These templates for coding variables and writing models ensure that instructions for variable coding and statistical models are complete and unambiguous. Moreover, they are human and machine-readable, which should minimize errors in translation, and simple enough that principal investigators without formal training in programming can complete them. Since the templates are easy to visually scan (as opposed to code or written text), they may also make it easier to catch gaps in the coding.

The last step in preparing the standardized data input is to check the integrity of the data and model(s) (see “Correlate” tab, “Integrity Check” button)—that is, whether variables appropriately match between sheets. For example, COMETS Analytics will automatically return an error if metabolites in the “Subject Metabolites” sheet lack a match in “Metabolites” or participant identifiers in “Subject Metabolites” lack a match in “Subjects,” or if cells have missing data. The error message itself describes the needed fix.

Standardized models

In its current version, COMETS Analytics supports unadjusted and partial correlation analyses based on the ppcor R package (40), with generalized models that allow for logistic regression and proportional hazards regression in development. The layout of the correlation analysis is shown in Figure 3, with separate panels for input (A) and output (B). On the input side, users can choose from 3 different methods of analysis. The “custom” model allows users to select exposures, outcomes, covariates, and stratifications directly through the user interface. The “prespecified” model allows users to select from a prepopulated list of models from the input file’s models sheet. The “all models” option prompts users to enter an e-mail address, and then runs all analyses in the “Models” sheet and e-mails users a link to the results.

Running analyses with the COMETS Analytics Web application (https://www.comets-analytics.org) using the “custom” models option. Data analysis in COMETS Analytics starts in the data analysis panel (A). The user specifies his/her cohort from a list, selects the file, and presses the “check integrity” button. If the data set passes the integrity check, the user can select one of the 3 modes of analysis. When the user selects “custom model,” he/she can then select exposures (e.g., age) and outcomes (e.g., “all metabolites”) from the data input file, as well as model covariates to adjust for or stratify by. When the user clicks “run model,” results will appear in the results panel (B). This panel prints results for each outcome × exposure permutation, and users can sort results and download them as a comma-separated values (CSV) file. If the user selects “all models” in the data analysis panel, the analyses will instead run in the background, and the user will be sent an e-mail with a link to results once analyses are complete. COMETS, Consortium of Metabolomics Studies.
Figure 3

Running analyses with the COMETS Analytics Web application (https://www.comets-analytics.org) using the “custom” models option. Data analysis in COMETS Analytics starts in the data analysis panel (A). The user specifies his/her cohort from a list, selects the file, and presses the “check integrity” button. If the data set passes the integrity check, the user can select one of the 3 modes of analysis. When the user selects “custom model,” he/she can then select exposures (e.g., age) and outcomes (e.g., “all metabolites”) from the data input file, as well as model covariates to adjust for or stratify by. When the user clicks “run model,” results will appear in the results panel (B). This panel prints results for each outcome × exposure permutation, and users can sort results and download them as a comma-separated values (CSV) file. If the user selects “all models” in the data analysis panel, the analyses will instead run in the background, and the user will be sent an e-mail with a link to results once analyses are complete. COMETS, Consortium of Metabolomics Studies.

A key asset of COMETS Analytics is that it deploys a system of “robust analytics” for diagnosing and handling showstopper errors on the fly, particularly the issue of singular matrices. This common issue arises when 1 or more covariates is a perfectly linear combination of the others, causing models to not converge. Typically, researchers address this by recoding data inputs, such as by combining categorical variables. However, this process is time-consuming, error-prone, and often poorly documented. The “robust analytics” system, in contrast, diagnoses the singularity of each model in advance, identifies the covariate most implicated, drops it (a mathematically valid solution, since the most implicated covariates are implicitly incorporated when models are singular), and repeats the process until models converge. This eliminates hand-coded data inputs, thereby reducing coding errors and improving analytical transparency and reproducibility.

COMETS Analytics performs 4 specific checks to ensure that models are executable. The first check eliminates analyses with too few participants (25 participants in the present iteration of COMETS Analytics). The second check determines whether any covariates have 0 or near-0 variance (nearZeroVar function in the CARET R package (41)) and, if so, removes them from the models. The third check identifies and removes covariates that are highly collinear in multivariable models (based on eigenvalues; trim.matrix function in the CARET R package (42)). The fourth check detects linear dependencies among the remaining covariates and removes covariates that have linear dependency (findLinearCombos function in the CARET R package). Each check is done for the whole data set and for each stratum in stratified analyses, and any modifications are noted on-screen or in the results e-mail.

Standardized and well-documented results

In the “custom” and “prespecified” model modes, results are output in the right-hand panel of the screen, and they include Spearman correlation coefficients, P values, and labels that identify the model (outcome, exposure, and adjustment variables). Users can sort results by clicking on column headers, visualize them in the “heat map” tab (further details are presented in the tutorial), or download them by clicking “download.”

Using the Web-based “all models” mode, users are sent an e-mail with a link to several files, including 1) results, 2) descriptive statistics, 3) metabolite metadata, and 4) input metadata. “Results” files are like those produced in the modes above, with standardized names that make it possible to automate meta-analyses across cohorts. The “descriptive statistics” file provides mean values and percentiles (continuous variables) and frequencies (categorical variables) for participant covariates and metabolite levels, and the “metabolite metadata” file includes metabolite details. Finally, the “input metadata” file includes the originally submitted “VarMap” and “Models” sheets, which document the exact coding and models used in the analyses. Locally running all models through the COMETS Analytics R package produces the same results in zipped format and can also be sent centrally for meta-analyses.

Applicability to other -omics data

Although COMETS Analytics was designed to analyze metabolomics data, the flexible data structure is applicable to other -omics data. For example, transcriptomic or proteomic data can be input as a gene or protein metadata table in the “Metabolites” sheet, a table of subject gene or normalized protein abundances in the “Subject Metabolites” sheet, and a table of subject covariates in the “Subject Data” sheet. To analyze these other -omics data with COMETS Analytics, users would store gene or protein identifiers (e.g., IGF1BP_1) and names (e.g., insulin-like growth factor-binding protein 1) in the “metabid” and “metabolite_name” columns of the “Metabolites” table and gene or protein abundances in the “Subject Metabolites” table. No other data modifications are needed, and data analyses would be fully functional, even though the input data are proteomics or transcriptomics rather than metabolomics.

TEST APPLICATION

We used COMETS Analytics to conduct 2 different consortium-based analyses, one focused on age-metabolite associations and the other focused on BMI–metabolite associations (manuscripts in preparation). Age and BMI are among the most important risk factors and potential confounders in epidemiologic analyses. Understanding how each factor relates to metabolism will be crucial for interpreting future metabolite-disease associations.

These analyses together included 68 data sets from 45 cohorts with 134,742 research participants, with metabolomics data generated on an aggregate 14 different metabolomics platforms. Over the course of these 2 analyses, we evaluated 27 different models (12 for the age analysis and 15 for the BMI analysis, as shown in the sample input file) adjusted for confounders and stratified by factors of interest (sex, prior heart disease, etc.). These rich, large-scale analyses provided a thorough test of the app’s robustness when applied to real-world data and under conditions of high processing loads.

We evaluated COMETS Analytics’ performance according to 2 main criteria: 1) whether analysts could prepare data inputs on their own and 2) whether input files were correctly processed. With respect to the first criterion, we found that analysts had only minor issues preparing data inputs. The most common problems were that analysts did not recode missing covariate data according to instructions or they did not impute missing metabolite data. We made trained staff available by phone to help analysts with these issues; these calls typically took 10–15 minutes. With assistance, all analysts were able to prepare data inputs correctly. Nearly all analysts used the Web app; the one analyst who used the R stand-alone application reported no difficulties. The stand-alone app will probably receive increased real-world use in the future, given the 2018 implementation of the European Union General Data Protection Regulation.

With respect to the second criterion, we found that all protocol-specified models executed on the first pass. In our experience, this does not often occur in consortium-based analyses. More typically, protocol-specified models will fail due to irrelevant covariate adjustments (e.g., adjusting for sex in a cohort of women only), and analysts will drop covariates in a trial-and-error fashion until models execute. COMETS Analytics, in contrast, algorithmically diagnoses, documents, and fixes model issues, without requiring code or data inputs to be modified. We observed no errors in the application of these algorithms over approximately 1,600 models (60 data sets × 27 models). Our “robust analytics” system thus presents a promising way to streamline analyses and alleviate analyst burden.

Our documentation also underscores why automated approaches are needed. In aggregate, we found that more than 3,000 model fixes were needed to ensure that all models executed—an average of 80 fixes per cohort and 3 fixes per model per cohort. Approximately 90% of fixes involved removing covariates with negligible variance or only 1 value—for example, removing sex from the models for studies of women. To our knowledge, no other published articles have examined the frequency of model failures in large research consortia.

One key product of our test application was the development of a metabolite dictionary that links metabolite names across participating cohorts (available at https://www.comets-analytics.org). Metabolites in this dictionary were matched across studies using unique identifiers like HMDB identifiers or InChiKey identifiers that cohorts provided with their metadata, or by metabolite name when other metadata were lacking. We did not collect information on the level of confidence of metabolite identities as defined by the Metabolomics Standards Initiative (43), since metabolomics laboratories historically have not provided these data to epidemiologic researchers. That said, many of the platforms are targeted and confirm metabolite identities against known standards (e.g., Biocrates (Biocrates Life Sciences AG, Innsbruck, Austria)). The most frequently used laboratory—Metabolon, Inc. (Morrisville, North Carolina)—has documented that its metabolites are usually identified at the tier 1 level of confidence (44). Since some represented laboratories have not confirmed metabolite identities, we treat matches as provisional rather than definite. Researchers conducting consortium-based analyses using our metabolite-matching scheme should carefully evaluate the heterogeneity of associations by metabolomics platform in case of mismatch.

Another key function of this metabolite dictionary is documenting the availability of metabolites across cohorts—crucial information for designing consortium studies. Investigators interested in evaluating metabolite biomarkers of coffee, citrus fruit, and fish intake, for example, can query the dictionary to determine the aggregate number of COMETS participants with these biomarkers and the number contributed by each cohort. This information should help guide investigators as to which cohorts are best suited for their project.

At present, the dictionary includes 4,647 metabolites measured in at least 1 COMETS cohort. Because not all metabolites are measured in each study, the estimated sample size in COMETS can vary substantially from metabolite to metabolite. To describe this variation in sample sizes, we divided metabolites into 4 groups based on their frequency (Table 3). For the metabolites in the categories of “most frequent” (the 58 metabolites measured in 40 or more data sets), “frequent” (696 metabolites measured in 15–39 data sets), “less frequent” (1,180 metabolites measured in 6–14 data sets), and “least frequent” (2,713 metabolites measured in 5 or fewer data sets), the median numbers of participants were 58,884, 17,658, 9,023, and 973, respectively. These results highlight that metabolite coverage in COMETS is both deep (for some metabolites, data are available on more than 58,000 participants) and broad (for several thousand metabolites, there are thousands of participants with data).

Table 3

Numbers of Data Sets and Study Participants Used for the Current Metabolomics Analyses

Metabolite GroupaNo. of Metabolites
in Category
Median No. of
Data Sets (IQR)
Median No. of
Participants (IQR)
Most frequent5847 (42–51)58,884 (48,069–96,115)
Frequent69622 (18–28)17,658 (14,571–25,739)
Less frequent1,18010 (8–12)9,023 (5,049–16,501)
Least frequent2,7132 (1–2)973 (332–2,103)
Metabolite GroupaNo. of Metabolites
in Category
Median No. of
Data Sets (IQR)
Median No. of
Participants (IQR)
Most frequent5847 (42–51)58,884 (48,069–96,115)
Frequent69622 (18–28)17,658 (14,571–25,739)
Less frequent1,18010 (8–12)9,023 (5,049–16,501)
Least frequent2,7132 (1–2)973 (332–2,103)

Abbreviation: IQR, interquartile range.

a Results are presented separately for 4 groups of metabolites defined by frequency of measurement. The “most frequent” group includes metabolites measured in 40 or more data sets; the “frequent” group includes metabolites measured in 15–39 data sets; the “less frequent” group includes metabolites measured in 6–14 data sets; and the “least frequent” group includes metabolites measured in 5 or fewer data sets.

Table 3

Numbers of Data Sets and Study Participants Used for the Current Metabolomics Analyses

Metabolite GroupaNo. of Metabolites
in Category
Median No. of
Data Sets (IQR)
Median No. of
Participants (IQR)
Most frequent5847 (42–51)58,884 (48,069–96,115)
Frequent69622 (18–28)17,658 (14,571–25,739)
Less frequent1,18010 (8–12)9,023 (5,049–16,501)
Least frequent2,7132 (1–2)973 (332–2,103)
Metabolite GroupaNo. of Metabolites
in Category
Median No. of
Data Sets (IQR)
Median No. of
Participants (IQR)
Most frequent5847 (42–51)58,884 (48,069–96,115)
Frequent69622 (18–28)17,658 (14,571–25,739)
Less frequent1,18010 (8–12)9,023 (5,049–16,501)
Least frequent2,7132 (1–2)973 (332–2,103)

Abbreviation: IQR, interquartile range.

a Results are presented separately for 4 groups of metabolites defined by frequency of measurement. The “most frequent” group includes metabolites measured in 40 or more data sets; the “frequent” group includes metabolites measured in 15–39 data sets; the “less frequent” group includes metabolites measured in 6–14 data sets; and the “least frequent” group includes metabolites measured in 5 or fewer data sets.

CONCLUSIONS

COMETS Analytics provides a new framework with which to analyze data and aggregate results for large research consortia. Key assets of COMETS Analytics include protection of data; its “robust analytics,” which make it possible to apply the same models to all cohorts; and use of real-time checks to help ensure high-quality results. As compared with other Web-based software for statistical analysis (36, 37, 45), COMETS Analytics requires no specialized software, servers, or data agreements to use and can therefore be readily deployed in cohorts as they join research consortia. Data inputs and models are easy to implement, and no specialized expertise is needed beyond basic knowledge of Excel. We also note that the software provides the ability to code covariables and define models, such that the use of common data models could be supported for consortium meta-analyses by modifying the “VarMap” and “Models” tabs of the input data file. Because there is no sharing of individual-level data, cohorts that do not allow data-sharing may still participate. In our test application, analysts easily completed their analyses, demonstrating that the software is usable and performs well at scale. This test application also allowed us to build a metabolite dictionary that links metabolite names across 60 different data sets.

COMETS Analytics has some limitations. It takes time to set up the input files, and metabolite identifiers for each study must be linked to those that already exist in our metabolite dictionary, a process that can be slow and has potential for error. In addition, COMETS Analytics requires local analysts to conduct their own data preprocessing. Because firsthand experience with the particularities of each cohort’s data is required to appropriately prepare data inputs, this may be the best recourse at present. However, COMETS will continue to evaluate preprocessing methods, in case methods can ultimately be standardized. After the initial preprocessing is performed, participation for future projects will take less time. A further limitation is that reliance on individual-level data may preclude the conduct of some types of pathway-based statistical analyses or other multimetabolite analyses. At present, COMETS Analytics does not mandate use of a published common data model, as each cohort study maintains and is responsible for its own data. However, by using and reusing standardized data templates, we anticipate that inputs will become increasingly standardized, helping us move toward a common data model. COMETS Analytics is currently capable of supporting the use of such a model.

To date, several hundred national and international users have used COMETS Analytics, and we expect numbers to grow given the increasing use of metabolomics data in epidemiology. With a robust infrastructure in place, we anticipate continued development of COMETS Analytics modules, including pathway analyses. Since the software is agnostic to data type, we may also consider adapting it for use in other -omics fields. We continue to solicit comments from the user community, and we welcome feedback from readers.

ACKNOWLEDGMENTS

Author affiliations: Biostatistics Center and Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, United States (Marinella Temprosa); Metabolic Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States (Steven C. Moore, Kaitlyn M. Mazzilli, Erikka Loftfield, Kathleen McClain); Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland, United States (Krista A. Zanetti); Information Management Services, Inc., Rockville, Maryland, United States (Nathan Appel, David Ruggieri); Infrastructure and Information Technology Operations Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Bethesda, Maryland, United States (Kai-ling Chen, Brian Park); Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, United States (Rachel S. Kelly, Jessica A. Lasky-Su, Oana A. Zeleznik); Division of Human Nutrition and Health, Wageningen University, Wageningen, the Netherlands (Laura Trijsburg); Biomedical Informatics Department, College of Medicine, Ohio State University, Columbus, Ohio, United States (Ewy A. Mathé); and Division of Preclinical Innovation, National Center for Advancing Translational Sciences, Bethesda, Maryland, United States (Ewy A. Mathé).

M.T. and S.C.M. contributed equally to this study.

This work was supported by the Division of Cancer Control and Population Sciences, National Cancer Institute, and by the Intramural Research Program of the National Institutes of Health (Division of Cancer Epidemiology and Genetics, National Cancer Institute, and National Center for Advancing Translational Sciences).

The main website for COMETS Analytics (https://www.comets-analytics.org) provides links for accessing the software code repository for the R package and provides direct downloads to a sample input file and the harmonization database.

We thank Dr. Mary C. Playdon (University of Utah, Salt Lake City, Utah) for suggestions on an early draft of this article.

Members of the Consortium of Metabolomics Studies who generously shared metabolite names from their cohort studies for development of a metabolite dictionary: Drs. Demetrius Albanes (National Cancer Institute, Bethesda, Maryland), Yoav Ben-Shlomo (University of Bristol, Bristol, United Kingdom), Eric Boerwinkle (University of Texas Health Science Center at Houston, Houston, Texas), Bo L. Chawes (University of Copenhagen, Copenhagen, Denmark), Caroline Dale (University College London, London, United Kingdom), A. Heather Eliasson (Harvard T.H. Chan School of Public Health, Boston, Massachusetts), Christian Geiger (Helmholtz Zentrum München, Munich, Germany), Neil J. Goulding (University of Bristol), Andrea Gsur (Medical University of Vienna, Vienna, Austria), Marc J. Gunter (International Agency for Research on Cancer, Lyon, France), Sei Harada (Keio University, Tokyo, Japan), David M. Herrington (Wake Forest School of Medicine, Winston-Salem, North Carolina), Joel N. Hirschhorn (Broad Institute of MIT and Harvard, Boston, Massachusetts), Mattias Johannson (International Agency for Research on Cancer), Rachel S. Kelly (Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts), Mika Kivimaki (University College London), Jessica Lasky-Su (Brigham and Women’s Hospital and Harvard Medical School), Charles E. Matthews (National Cancer Institute), Christina Menni (King’s College London, London, United Kingdom), Steven C. Moore (National Cancer Institute), Eric Orwoll (Oregon Health and Science University, Portland, Oregon), Alexandre C. Pereira (University of São Paolo, São Paolo, Brazil), Lucilla Poston (King’s College London), Qibin Qi (Albert Einstein College of Medicine, New York, New York), Vasan S. Ramachandran (Boston University School of Medicine, Boston, Massachusetts), Kathryn M. Rexrode (Brigham and Women’s Hospital and Harvard Medical School), Rui Wang-Sattler (Helmholtz Zentrum München), Wei Jie Seow (National University of Singapore, Singapore), Svati H. Shah (Duke University, Durham, North Carolina), Eric J. Shiroma (National Institute on Aging, Bethesda, Maryland), Xiao-Ou Shu (Vanderbilt University Medical Center, Nashville, Tennessee), Rachel Stolzenberg-Solomon (National Cancer Institute), Victoria L. Stevens (Rollins School of Public Health, Emory University, Atlanta, Georgia), Toru Takebayashi (Keio University), Marinella Temprosa (George Washington University, Washington, DC), Emmi Tikkanen (Nightingale Health, Helsinki, Finland), Therese Tillin (University College London), Ioanna Tzoulaki (Imperial College London, London, United Kingdom), Cornelia M. Ulrich (University of Utah, Salt Lake City, Utah), Andrew Wong (University College London), and Bing Yu (University of Texas Health Science Center at Houston).

Conflict of interest: none declared.

REFERENCES

1.

Sampson
JN
,
Boca
SM
,
Shu
XO
, et al.
Metabolomics in epidemiology: sources of variability in metabolite measurements and implications
.
Cancer Epidemiol Biomarkers Prev
.
2013
;
22
(
4
):
631
640
.

2.

Ioannidis
JPA
.
Why most published research findings are false
.
PLoS Med
.
2005
;
2
(
8
):
e124
.

3.

Kraft
P
,
Zeggini
E
,
Ioannidis
JP
.
Replication in genome-wide association studies
.
Stat Sci
.
2009
;
24
(
4
):
561
573
.

4.

Floegel
A
,
Stefan
N
,
Yu
Z
, et al.
Identification of serum metabolites associated with risk of type 2 diabetes using a targeted metabolomic approach
.
Diabetes
.
2013
;
62
(
2
):
639
648
.

5.

Wang
TJ
,
Larson
MG
,
Vasan
RS
, et al.
Metabolite profiles and the risk of developing diabetes
.
Nat Med
.
2011
;
17
(
4
):
448
453
.

6.

Menni
C
,
Fauman
E
,
Erte
I
, et al.
Biomarkers for type 2 diabetes and impaired fasting glucose using a nontargeted metabolomics approach
.
Diabetes
.
2013
;
62
(
12
):
4270
4276
.

7.

Yu
D
,
Moore
SC
,
Matthews
CE
, et al.
Plasma metabolomic profiles in association with type 2 diabetes risk and prevalence in Chinese adults
.
Metabolomics
.
2016
;
12
:
3
.

8.

Tang
WH
,
Wang
Z
,
Levison
BS
, et al.
Intestinal microbial metabolism of phosphatidylcholine and cardiovascular risk
.
N Engl J Med
.
2013
;
368
(
17
):
1575
1584
.

9.

Shah
SH
,
Bain
JR
,
Muehlbauer
MJ
, et al.
Association of a peripheral blood metabolic profile with coronary artery disease and risk of subsequent cardiovascular events
.
Circ Cardiovasc Genet
.
2010
;
3
(
2
):
207
214
.

10.

Kraus
WE
,
Muoio
DM
,
Stevens
R
, et al.
Metabolomic quantitative trait loci (mQTL) mapping implicates the ubiquitin proteasome system in cardiovascular disease pathogenesis
.
PLoS Genet
.
2015
;
11
(
11
):e1005553.

11.

His
M
,
Viallon
V
,
Dossus
L
, et al.
Prospective analysis of circulating metabolites and breast cancer in EPIC
.
BMC Med
.
2019
;
17
(
1
):
178
.

12.

Moore
SC
,
Playdon
MC
,
Sampson
JN
, et al.
A metabolomics analysis of body mass index and postmenopausal breast cancer risk
.
J Natl Cancer Inst
.
2018
;
110
(
6
):
588
597
.

13.

Mayers
JR
,
Wu
C
,
Clish
CB
, et al.
Elevation of circulating branched-chain amino acids is an early event in human pancreatic adenocarcinoma development
.
Nat Med
.
2014
;
20
(
10
):
1193
1198
.

14.

Schmidt
JA
,
Fensom
GK
,
Rinaldi
S
, et al.
Patterns in metabolite profile are associated with risk of more aggressive prostate cancer: a prospective study of 3,057 matched case-control sets from EPIC
.
Int J Cancer
.
2020
;
146
(
3
):
720
730
.

15.

Schmidt
JA
,
Rinaldi
S
,
Scalbert
A
, et al.
Plasma concentrations and intakes of amino acids in male meat-eaters, fish-eaters, vegetarians and vegans: a cross-sectional analysis in the EPIC-Oxford cohort
.
Eur J Clin Nutr
.
2016
;
70
(
3
):
306
312
.

16.

Mondul
AM
,
Sampson
JN
,
Moore
SC
, et al.
Metabolomic profile of response to supplementation with beta-carotene in the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study
.
Am J Clin Nutr
.
2013
;
98
(
2
):
488
493
.

17.

Menni
C
,
Graham
D
,
Kastenmuller
G
, et al.
Metabolomic identification of a novel pathway of blood pressure regulation involving hexadecanedioate
.
Hypertension
.
2015
;
66
(
2
):
422
429
.

18.

Newgard
CB
,
An
J
,
Bain
JR
, et al.
A branched-chain amino acid-related metabolic signature that differentiates obese and lean humans and contributes to insulin resistance
.
Cell Metab
.
2009
;
9
(
4
):
311
326
.

19.

Cheng
S
,
Rhee
EP
,
Larson
MG
, et al.
Metabolite profiling identifies pathways associated with metabolic risk in humans
.
Circulation
.
2012
;
125
(
18
):
2222
2231
.

20.

Moore
SC
,
Matthews
CE
,
Sampson
JN
, et al.
Human metabolic correlates of body mass index
.
Metabolomics
.
2014
;
10
(
2
):
259
269
.

21.

Wurtz
P
,
Wang
Q
,
Soininen
P
, et al.
Metabolomic profiling of statin use and genetic inhibition of HMG-CoA reductase
.
J Am Coll Cardiol
.
2016
;
67
(
10
):
1200
1210
.

22.

ATBC Cancer Prevention Study Group
.
The Alpha-Tocopherol, Beta-Carotene Lung Cancer Prevention Study: design, methods, participant characteristics, and compliance
.
Ann Epidemiol
.
1994
;
4
(
1
):
1
10
.

23.

Childhood Asthma Management Program Research Group
.
The Childhood Asthma Management Program (CAMP): design, rationale, and methods
.
Control Clin Trials
.
1999
;
20
(
1
):
91
120
.

24.

Diabetes Prevention Program Research Group
.
Long-term effects of lifestyle intervention or metformin on diabetes development and microvascular complications over 15-year follow-up: the Diabetes Prevention Program Outcomes Study
.
Lancet Diabetes Endocrinol
.
2015
;
3
(
11
):
866
875
.

25.

Gaziano
JM
,
Sesso
HD
,
Christen
WG
, et al.
Multivitamins in the prevention of cancer in men: the Physicians’ Health Study II randomized controlled trial
.
JAMA
.
2012
;
308
(
18
):
1871
1880
.

26.

Prorok
PC
,
Andriole
GL
,
Bresalier
RS
, et al.
Design of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial
.
Control Clin Trials
.
2000
;
21
(
6 suppl
):
273S
309S
.

27.

Litonjua
AA
,
Lange
NE
,
Carey
VJ
, et al.
The Vitamin D Antenatal Asthma Reduction Trial (VDAART): rationale, design, and methods of a randomized, controlled trial of vitamin D supplementation in pregnancy for the primary prevention of asthma and allergies in children
.
Contemp Clin Trials
.
2014
;
38
(
1
):
37
50
.

28.

Cheng
TY
,
Makar
KW
,
Neuhouser
ML
, et al.
Folate-mediated one-carbon metabolism genes and interactions with nutritional factors on colorectal cancer risk: Women’s Health Initiative Observational Study
.
Cancer
.
2015
;
121
(
20
):
3684
3691
.

29.

NCI-NHGRI Working Group on Replication in Association Studies
,
Chanock
SJ
,
Manolio
T
, et al.
Replicating genotype-phenotype associations
.
Nature
.
2007
;
447
(
7145
):
655
660
.

30.

Ioannidis
JPA
,
Castaldi
P
,
Evangelou
E
.
A compendium of genome-wide associations for cancer: critical synopsis and reappraisal
.
J Natl Cancer Inst
.
2010
;
102
(
12
):
846
858
.

31.

Lawlor
DA
,
Tilling
K
,
Davey Smith
G
.
Triangulation in aetiological epidemiology
.
Int J Epidemiol
.
2016
;
45
(
6
):
1866
1886
.

32.

Yu
B
,
Zanetti
KA
,
Temprosa
M
, et al.
The Consortium of Metabolomics Studies (COMETS): metabolomics in 47 prospective cohort studies
.
Am J Epidemiol
.
2019
;
188
(
6
):
991
1012
.

33.

Gaye
A
,
Marcon
Y
,
Isaeva
J
, et al.
DataSHIELD: taking the analysis to the data, not the data to the analysis
.
Int J Epidemiol
.
2014
;
43
(
6
):
1929
1944
.

34.

Doiron
D
,
Burton
P
,
Marcon
Y
, et al.
Data harmonization and federated analysis of population-based studies: the BioSHaRE Project
.
Emerg Themes Epidemiol
.
2013
;
10
(
1
):
12
.

35.

Temprosa
M
. CBIIT/R-cometsAnalytics. https://github.com/CBIIT/R-cometsAnalytics/.
Published April 12, 2021
.
Accessed April 12, 2021
.

36.

Chong
J
,
Yamamoto
M
,
Xia
J
.
MetaboAnalystR 2.0: from raw spectra to biological insights
.
Metabolites
.
2019
;
9
(
3
):
57
.

37.

Tautenhahn
R
,
Patti
GJ
,
Rinehart
D
, et al.
XCMS Online: a web-based platform to process untargeted metabolomic data
.
Anal Chem
.
2012
;
84
(
11
):
5035
5039
.

38.

Callaham
J
. There are now 1.2 billion Office users and 60 million Office 365 commercial customers. https://www.windowscentral.com/there-are-now-12-billion-office-users-60-million-office-365-commercial-customers.
Published March 31, 2016
.
Accessed April 12, 2021
.

39.

Sherry
ST
,
Ward
MH
,
Kholodov
M
, et al.
dbSNP: the NCBI database of genetic variation
.
Nucleic Acids Res
.
2001
;
29
(
1
):
308
311
.

40.

Kim
S
.
ppcor: an R package for a fast calculation to semi-partial correlation coefficients
.
Commun Stat Appl Methods
.
2015
;
22
(
6
):
665
674
.

41.

Kuhn
M
.
Building predictive models in R using the caret package
.
J Stat Softw
.
2008
;
28
(
5
):
26
.

42.

Kuhn
M
. caret: Classification and Regression Training.
(R package, version 6.0-88)
. https://CRAN.R-project.org/package=caret.
Published May 15, 2021
.
Accessed July 6, 2021
.

43.

Sumner
LW
,
Amberg
A
,
Barrett
D
, et al.
Proposed minimum reporting standards for chemical analysis: Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI)
.
Metabolomics
.
2007
;
3
(
3
):
211
221
.

44.

Evans
AM
,
Bridgewater
B
,
Liu
Q
, et al.
High resolution mass spectrometry improves data quantity and quality as compared to unit mass resolution mass spectrometry in high-throughput profiling metabolomics
.
Metabolomics
.
2014
;
4
(
2
):
132
.

45.

Fortier
I
,
Raina
P
,
Van den Heuvel
ER
, et al.
Maelstrom Research guidelines for rigorous retrospective data harmonization
.
Int J Epidemiol
.
2017
;
46
(
1
):
103
105
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)