Abstract

Summary

Identification of balances of bacterial taxa in relation to continuous and dichotomous outcomes is an increasingly frequent analytic objective in microbiome profiling experiments. SurvBal enables the selection of balances in relation to censored survival or time-to-event outcomes which are of considerable interest in many biomedical studies. The most commonly used survival models—the Cox proportional hazards and parametric survival models are included in the package, which are used in combination with step-wise selection procedures to identify the optimal associated balance of microbiome, i.e. the ratio of the geometric means of two groups of taxa’s relative abundances.

Availability and implementation

The SurvBal R package and Shiny app can be accessed at https://github.com/yinglia/SurvBal and https://yinglistats.shinyapps.io/shinyapp-survbal/.

1 Introduction

Compositional balances have served as a powerful strategy for implicating bacterial taxa in relation to a wide range of outcomes. The underlying principle of the global balances is that outcomes depend on the ratio of two sets of bacterial relative abundances. Thus, selbal (Rivera-Pinto et al. 2018) chooses to identify the (log) ratio of the geometric means of two sets of taxa. Essentially, the numerator of the ratio is the geometric mean of the taxa positively correlated with the outcome while the denominator is the geometric mean of the taxa negatively correlated with the outcome. Philosophically, the global balances include better characterization of the idea of dysbiosis as well as better accommodation of the issue of compositionality which creates significant challenges for interpretation. Operationally, the optimal balances can be identified through greedy search procedures. The approach has been successfully used to identify balances, and the comprising taxa, related to many different outcomes including COVID severity (Trøseid et al. 2023), neutrophil levels in HIV infection (Hensley-McBain et al. 2019), among others.

However, a limitation in the field is the lack of software for identifying balances in relation to censored survival or time-to-event outcomes. Yet, such outcomes are of considerable interest, particularly as human microbiome studies interface with clinical studies in which time-to-event outcomes (e.g. overall survival (OS), time to relapse, time to disease onset, etc.) are the most commonly investigated outcomes. Such studies are often subject to significant censoring such that specialized survival models are needed.

We present SurvBal, a flexible R package that facilitates the analysis of balances with censored time-to-event outcomes. It identifies the log-ratio of the geometric means of two sets of taxa that is most associated with the survival outcome using a greedy step-wise selection approach. The software supports the Cox proportional hazards model and parametric survival models, and by extension, accelerated failure time (AFT) models, and reports a selected global balance of bacteria increasing vs. decreasing the hazard or survival time. A comprehensive Shiny app is provided for broader users to interactively explore the analytical tool.

2 Software description

Suppose Xi=(Xi1,Xi2,,XiK) is the microbial composition for subject i, where Xij’s are relative abundances and the sum over the K taxa is 1. If we have a subset of bacteria that increase hazard, which is denoted by X+, indexed by I+ and composed of k+ taxa, and another subset decreasing hazard, denoted by X, indexed by I and composed of k taxa, the balance is defined as the normalized log ratio of the geometric means of the two groups,
Equivalently, we can have
(1)
The log-contrast in (1) handles compositionality (Aitchison 1982), accommodating the relative nature of microbiome abundances. Moreover, (1) is a special log-contrast, where the log differences are not included in a linear model additively, but are combined together as a single variable or feature of the microbiome (Rivera-Pinto et al. 2018). Via the Cox proportional hazards model, the balance can be associated with the hazard at time t as
(2)
where h0(t) is the baseline hazard, γ is the effect size of the balance, Zi denotes the biomedical or demographic covariates to adjust for in the analysis and β is the corresponding effect size. Similarly, if we have a subset of bacteria that increase survival time, X+, and another subset decreasing survival time, X, via a parametric survival model such as the AFT model, the balance can be associated with the survival time Ti as
(3)
where, e.g. ϵi could follow the extreme value distribution, i.e. Ti follows the Weibull distribution.

SurvBal is a variable selection software designed to identify the two sets of taxa, X+ and X (or X+ and X), that compose the optimal balance of microbiome associated with the interested survival outcome. It takes a matrix consisting of the raw counts of the taxa for each subject in the study, a survival object from the R package “survival” (Therneau and Lumley 2015) that contains survival times and censoring indicators, and, if applicable, the covariates to adjust for. The software is flexible as additional options are available throughout the model building and variable selection processes—from the pre-processing to the final selection of the microbial balance. Figure 1A provides an overview of SurvBal with the details deferred to Section 1 of the Supplementary Information. We carried out extensive simulation studies (Section 2 of the Supplementary Information) to demonstrate the reliability of SurvBal (measured by precision and recall) in selecting the balance of taxa for survival outcomes.

(A) Overview of SurvBal. Stratified time-to-event plots by a lower and a higher values of the selected balances of gut microbiome: HCT recipients, (B) OS under the Cox model, (C) time-to-GvHD under the parametric survival model; kidney transplant recipients, (D) time-to-E.coli bacteriuria under the Cox model, and (E) time-to-Enterococcus bacteriuria under the parametric survival model.
Figure 1.

(A) Overview of SurvBal. Stratified time-to-event plots by a lower and a higher values of the selected balances of gut microbiome: HCT recipients, (B) OS under the Cox model, (C) time-to-GvHD under the parametric survival model; kidney transplant recipients, (D) time-to-E.coli bacteriuria under the Cox model, and (E) time-to-Enterococcus bacteriuria under the parametric survival model.

The greedy algorithm of SurvBal may still select a balance even when no associations exist, while real-world microbiome studies frequently show no link between the microbiome and survival outcomes. To address this, SurvBal performs a global community-level association test, MiRKAT-S (Plantinga et al. 2017), before selecting the microbial balance. Specifically, we encode microbiome data in ecologically informative distance metrics (Bray-Curtis and Jaccard distances), then use MiRKAT-S to compare the similarity in microbiota to the similarity in survival times between subjects. The test reports an omnibus P-value, indicating whether there is a significant association between the survival outcome and the presence-absence status or the abundance of the microbial profile. It is worth noting that the community-level testing identifies global shifts in the microbial profile and is best used when there are concerted differences among a large number of taxa. On the other hand, analysis of balances focuses on variable selection wherein the outcome is related to imbalances in a few taxa. Given the difference in analytic philosophies, a significant community-level association supports the selected balance, while an insignificant community-level association does not truly invalidate the balance but prompts a warning, advising caution when interpreting the selected balance.

3 Illustrating examples

The first example is a study on graft-versus-host disease (GvHD) (Golob et al. 2017), a fatal complication of hematopoietic cell transplantation (HCT) and associated with the gut microbiome (Staffas et al. 2017, Andermann et al. 2018, Shono and van den Brink 2018, Fredricks 2019). Here, we aim to find gut microbial signatures measured right after HCT to describe the OS and time to the GvHD. The processed data contained 63 recipients, and the 16S rRNA microbiome data were aggregated to the genus level with rare taxa (relative abundance < 0.01%) removed. Details of the original and processed GvHD data are deferred to Section 3 of the Supplementary Information. The Cox proportional hazards model was used for analyzing OS and the parametric survival model assuming Weibull distribution was employed for time-to-GvHD. No covariates were adjusted and all other arguments were kept as the default options. The community-level association test showed that the gut microbiota is significantly associated with OS (omnibus P-value = .03) but not with time-to-GvHD (omnibus P-value = .82).

The global balance selected for OS by SurvBal includes six taxa increasing the risk of death X+OS={Schaalia, Streptococcus, Eubacterium, Evtepia, Terrisporobacter, Ruminococcaceae}, and nine taxa decreasing the risk XOS={Gemmiger, Phascolarctobacterium, Eggerthella, Agathobaculum, Christensenellaceae, Enterococcus, Pseudoflavonifractor/Clostridium, Collinsella, Romboutsia}. Figure 1B presents the predicted OS curves. For recipients with a lower balance score (-0.37, the first quartile of the 63 recipients’ balances), which means that there are lower relative abundances of taxa in X+OS than in XOS, their life expectancy is significantly longer than those with a higher balance score (0.90, the third quartile of the 63 recipients’ balances). The balance identified for time-to-GvHD comprises four taxa prolonging the time to the disease X+GvHD={Coprococcus, Monoglobus, Veillonella, Clostridiales Family XIII. Incertae Sedis}, and six taxa shortening the time to the onset XGvHD={Flintibacter, Streptococcus, Negativibacillus, Flavonifractor, Neglecta, Intestinibacter}, which is described by the predicted time-to-GvHD curves in Fig. 1C. Caution is recommended when interpreting the balance for time-to-GvHD due to the insignificant community-level association between the gut microbiota and time-to-GvHD. Nevertheless, the selected balance still provides useful insights. In particular, the two balances for GvHD and mortality share Streptococcus, which shortens the time-to-GvHD (XGvHD) and increases the risk of death (X+OS). The positive association between Streptococcus and GvHD has been confirmed in previous HCT studies (Khan et al. 2021, Lin et al. 2021). We also note that with 63 patients only, some results obtained are contrary to existing literature. For example, Enterococcus is included in XOS but has been linked with increased risk of GvHD (Stein-Thoeringer et al. 2019). The results are expected to improve with the recruitment of more patients.

The second example is a study on bacteriuria after kidney transplant (Magruder et al. 2019, 2020). Bacteriuria is a common complication leading to significant morbidity, while modulating the gut microbiota could be a promising preventive intervention. Therefore, we would like to find the gut microbiome balance right after the surgery to predict the time to the onset of Escherichia coli and Enterococcus bacteriuria. The processed data contained 163 recipients, and the 16S rRNA microbiome data were aggregated to the genus level with common taxa (relative abundance > 1%) maintained. Details of the original and processed KTx data are deferred to Section 3 of the Supplementary Information. The Cox and parametric models were used for E.coli and Enterococcus bacteriuria, respectively. Gender is a crucial risk factor for bacteriuria thus was adjusted in the analysis. MiRKAT-S concluded with insignificant community-level associations between the gut microbiota and time-to-E.coli bacteriuria (omnibus P-value = .77) and time-to-Enterococcus bacteriuria (omnibus P-value = .51) while adjusting for gender.

The balance determined for E.coli bacteriuria consists of three genera increasing the risk X+E.coli={Bacteroides, Subdoligranulum, Holdemanella}, and three genera decreasing the risk XE.coli={Oscillibacter, Lachnoclostridium, Blautia}. The predicted time-to-E.coli bacteriuria is longer for recipients with lower relative abundances of taxa in X+E.coli than in XE.coli, and the risk of E.coli bacteriuria is exaggerated within female recipients (Fig. 1D). The balance selected for Enterococcus bacteriuria includes one genus extending the time to the disease X+Enterococcus={Blautia}, and five genera expediting the onset XEnterococcus= {Erysipelatoclostridium, Lactobacillus, Anaerostipes, Enterococcus, Eubacterium}. The predicted time-to-Enterococcus bacteriuria is systematically shortened with a lower balance score (Fig. 1E). Although caution is recommended when interpreting the selected balances, they still offer valuable insights. For example, SurvBal successfully identifies Enterococcus as a risk factor for promoting Enterococcus bacteriuria, which is consistent with prior work (Magruder et al. 2019). Blautia is selected for both preventing the onset of Enterococcus (X+Enterococcus) and decreasing the risk of E.coli bacteriuria (XE.coli). It is a common commensal bacteria in the gut and is associated with the production of butyrate, a short-chain fatty acid (SCFA) (Maturana and Cárdenas 2021). Sorbara et al. (2019) has shown that SCFAs can inhibit the growth of E. coli via intra-cellular acidification, suggesting a competitive role of this bacteria against E. coli in the gut.

4 Conclusion

We have developed a software, SurvBal, to enable the selection of compositional balances in microbiome profiling studies with censored time-to-event outcomes. It allows the users to flexibly incorporate covariates and choose their preferred model type and other options. A community-level test, MiRKAT-S, is integrated into SurvBal to evaluate the overall association between the microbial profile and the survival outcome. It acts as a pre-selection inspection, endorsing the selected balance when a significant community-level association exists, and recommending caution when interpreting the selected balance if the community-level association is not significant. We believe that the selected global balance offers valuable insights, guiding further clinical investigations into the microbial markers, which can be modulated to improve the time-to-event outcomes.

There are various directions to extend SurvBal. One direction is to improve the greedy selection algorithm, though currently accompanied by a community-level test, to better accommodate the null case when no associations exist. Another limitation is that the current modeling cannot handle the competing nature of multiple causes for the same event. As SurvBal is designed with a flexible architecture, it opens avenues for further development to address the gaps, such as adding regularization and incorporating competing risk analysis.

Supplementary data

Supplementary data are available at Bioinformatics online.

Conflict of interest

J.R.L. holds patent US-2020-0048713-A1 titled “Methods of Detecting Cell-Free DNA in Biological Samples” licensed to Eurofins Viracor, received research support under an investigator-initiated research grant from BioFire Diagnostics, LLC, received an honorarium for a talk from Astellas, and is on the board of directors for the Chinese American Medical Society.

Funding

This work was supported, in part, by three grants from the National Institute of General Medical Sciences (R01 GM129512, R01 GM151301, and R01 GM155734 to Y.L., T.L., K.M., X.H., and W.L.), a grant from the National Heart, Lung, and Blood Institute (R01 HL155417 to Y.L., T.L., K.M., X.H., and W.L.), and two grants from the National Institute of Allergy and Infectious Diseases (R01 AI134808 to S.S. and D.N.F., K23 AI124464 to J.R.L.).

Data availability

GvHD and KTx data (sequencing and de-identified clinical data) are available in BioProject with accession number PRJNA1052666 and in dbGaP with accession number phs001879.v1.p1, respectively.

References

Aitchison
J.
The statistical analysis of compositional data
.
J R Stat Soc Series B (Methodol)
1982
;
44
:
139
60
.

Andermann
TM
,
Peled
JU
,
Ho
C
et al. ;
Blood and Marrow Transplant Clinical Trials Network
.
The microbiome and hematopoietic cell transplantation: past, present, and future
.
Biol Blood Marrow Transplant
2018
;
24
:
1322
40
.

Fredricks
DN.
The gut microbiota and graft-versus-host disease
.
J Clin Invest
2019
;
129
:
1808
17
.

Golob
JL
,
Pergam
SA
,
Srinivasan
S
et al.
Stool microbiota at neutrophil recovery is predictive for severe acute graft vs host disease after hematopoietic cell transplantation
.
Clin Infect Dis
2017
;
65
:
1984
91
.

Hensley-McBain
T
,
Wu
MC
,
Manuzak
JA
et al.
Increased mucosal neutrophil survival is associated with altered microbiota in HIV infection
.
PLoS Pathog
2019
;
15
:
e1007672
.

Khan
N
,
Lindner
S
,
Gomes
AL
et al.
Fecal microbiota diversity disruption and clinical outcomes after auto-HCT: a multicenter observational study
.
Blood
2021
;
137
:
1527
37
.

Lin
D
,
Hu
B
,
Li
P
et al.
Roles of the intestinal microbiota and microbial metabolites in acute GVHD
.
Exp Hematol Oncol
2021
;
10
:
49
.

Magruder
M
,
Sholi
AN
,
Gong
C
et al.
Gut uropathogen abundance is a risk factor for development of bacteriuria and urinary tract infection
.
Nat Commun
2019
;
10
:
5521
.

Magruder
M
,
Edusei
E
,
Zhang
L
et al.
Gut commensal microbiota and decreased risk for Enterobacteriaceae bacteriuria and urinary tract infection
.
Gut Microbes
2020
;
12
:
1805281
.

Maturana
JL
,
Cárdenas
JP.
Insights on the evolutionary genomics of the Blautia genus: potential new species and genetic content among lineages
.
Front Microbiol
2021
;
12
:
660920
.

Plantinga
A
,
Zhan
X
,
Zhao
N
et al.
MiRKAT-S: a community-level test of association between the microbiota and survival times
.
Microbiome
2017
;
5
:
17
.

Rivera-Pinto
J
,
Egozcue
JJ
,
Pawlowsky-Glahn
V
et al.
Balances: a new perspective for microbiome analysis
.
mSystems
2018
;
3
:
e00053-18
.

Shono
Y
,
van den Brink
MR.
Gut microbiota injury in allogeneic haematopoietic stem cell transplantation
.
Nat Rev Cancer
2018
;
18
:
283
95
.

Sorbara
MT
,
Dubin
K
,
Littmann
ER
et al.
Inhibiting antibiotic-resistant Enterobacteriaceae by microbiota-mediated intracellular acidification
.
J Exp Med
2019
;
216
:
84
98
.

Staffas
A
,
Burgos da Silva
M
,
van den Brink
MR.
The intestinal microbiota in allogeneic hematopoietic cell transplant and graft-versus-host disease
.
Blood
2017
;
129
:
927
33
.

Stein-Thoeringer
CK
,
Nichols
KB
,
Lazrak
A
et al.
Lactose drives enterococcus expansion to promote graft-versus-host disease
.
Science
2019
;
366
:
1143
9
.

Therneau
TM
,
Lumley
T.
Package ‘survival’
.
R Top Doc
2015
;
128
:
28
33
.

Trøseid
M
,
Holter
JC
,
Holm
K
et al. ;
Norwegian SARS-CoV-2 Study Group
.
Gut microbiota composition during hospitalization is associated with 60-day mortality after severe Covid-19
.
Crit Care
2023
;
27
:
69
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Jonathan Wren
Jonathan Wren
Associate Editor
Search for other works by this author on:

Supplementary data