Summary

The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the subgroup discovery for longitudinal data algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus who are at higher risk of weight gain when receiving dolutegravir (DTG)-containing antiretroviral therapies (ARTs) versus when receiving non-DTG-containing ARTs.

1 Introduction

Dolutegravir (DTG) is an orally administered antiretroviral medication that in combination with other antiretroviral medications has been approved for the treatment of human immunodeficiency virus (HIV) infection (World Health Organization, 2018). DTG-containing antiretroviral therapies (ARTs) have many advantages such as promising treatment efficacy, high medication resistance barrier, and low cost compared with ARTs previously recommended by the World Health Organization (WHO) (NAMSAL ANRS 12313 Study Group, 2019; Phillips and others, 2019; Venter and others, 2019). However, several randomized trials (Calmy and others, 2020; NAMSAL ANRS 12313 Study Group, 2019; Sax and others, 2020; Venter and others, 2019, 2020) and observational studies (Bourgi, Jenkins, and others, 2020; Bourgi, Rebeiro, and others, 2020; Esber and others, 2022; Surial and others, 2021) have found larger average weight gain among people receiving DTG-containing ARTs. Weight gains can have negative consequences among people living with HIV including increased risk of cardiovascular and metabolic diseases as well as other comorbidities (Achhra and others, 2016; Herrin and others, 2016; Kim and others, 2012). A few studies have explored how weight gain differs between subgroups defined by some individual characteristics (Bourgi, Jenkins, and others, 2020; Calmy and others, 2020; Sax and others, 2020; Venter and others, 2020), but all of these studies focused on subgroups that were pre-specified by the investigators and did not attempt data-driven subgroup discovery.

Data-driven discovery of subgroups with heterogeneous treatment effects requires larger sample sizes than estimation of population average treatment effects (Dahabreh and others, 2016). Unlike randomized controlled trials, which are usually powered to detect average treatment effects in the overall population, electronic health records (EHRs) often contain rich longitudinal data collected on a large number of diverse individuals in a “real world” setting. This makes EHRs well suited for more fine-grained treatment effect analyses such as discovering subgroups with heterogeneous treatment effects and estimating long-term effects of sustained treatments. However, treatment effect estimation using EHRs involves several challenges including: (i) EHRs have repeated measures over time, and these measures are not temporally aligned with the number of repeated measures and timing of the measurements often being informative about outcomes; (ii) potential time-varying confounding; and (iii) potentially informative dropout (i.e., individuals that dropout of the EHRs systematically differ from those who do not dropout).

Tree-based methods stratify the covariate space into interpretable subgroups making them ideal for subgroup discovery (Su and others, 2008). In contrast, most other blackbox machine learning algorithms produce completely personalized predictions (rather than subgroup-specific predictions), and are therefore not applicable for subgroup discovery unless the personalized predictions are post-processed to create subgroups. Previous work on tree-based methods for discovering subgroups with differential treatment effects has mostly focused on cross-sectional randomized (Foster and others, 2011; Seibold and others, 2016; Steingrimsson and Yang, 2019; Su and others, 2009) or non-randomized data (Athey and Imbens, 2016; Yang and others, 2022). Additionally, previously proposed tree-based methods for treatment effect estimation with longitudinal data (Su and others, 2011; Wei and others, 2020) cannot handle both non-randomized time-varying treatment assignment and dropout.

In this article, we develop the first tree-based algorithm for discovering subgroups with heterogeneous treatment effects when comparing the effects of time-varying treatments using non-randomized longitudinal data, in the presence of potentially informative dropout (censoring). The algorithm we develop, referred to as the subgroup discovery for longitudinal data (SDLD) algorithm, extends the generalized interaction tree algorithm (Yang and others, 2022) by combining it with longitudinal targeted maximum likelihood estimators (TMLEs) (Petersen and others, 2014; Stitelman and others, 2012; van der Laan and Gruber, 2012; Van der Laan and Rose, 2018; Van der Laan and Rose, 2011; Van Der Laan and Rubin, 2006). The SDLD algorithm recursively splits the covariate space into disjoint subgroups using splitting decisions that are based on maximizing treatment effect heterogeneity as quantified by subgroup-specific TMLEs. TMLEs are doubly robust (Bang and Robins, 2005; Robins and others, 1994), semi-parametric efficient (Stitelman and others, 2012; Van der Laan and Rose, 2018), and, under mild conditions, guarenteed to take values in the support of the outcome. We apply the algorithm to perform the first data-driven discovery of subgroups with differential effect of DTG-containing ARTs on weight using EHRs of people living with HIV in western Kenya.

2 Data structure and target parameter

For k{0,,K}, let Lk be a vector of time-dependent covariates (that includes measures of the outcome Yk) measured at time k, Ak be the binary treatment indicator at time k, Ck be an indicator whether a person drops out at time k, and YK+1 be the outcome at time K + 1. The baseline covariates L0 can include covariates that do not vary over time and baseline values of time-varying covariates. We assume the following time ordering of the data

By the time ordering assumption, each variable has no causal effect on those measured before. For a vector X=(X0,,XK), define X¯k=(X0,,Xk),X_k=(Xk+1,,XK), and X without subscript indicates a vector from time 0 to K (i.e., X=X¯K). We assume that if an individual drops out at time k, all quantities measured after time k are not observed. That is, Ck = 1 implies that C_k,L_k,A_k, and YK+1 are not observed.

Let Y¯K+1,ia,c=0 be the vector of potential outcomes (Rubin, 1974; Robins and Greenland, 2000) under intervention on individual i to set the treatment vector Ai to a and to abolish censoring (i.e., intervention to set Ci=0). Similarly, let L¯K,ia,c=0 be the potential covariate trajectory of the ith individual under intervention to set the treatment vector Ai to a and to abolish censoring.

Let L0 take values in L0. Any set wL0 defines a subgroup consisting of individuals with L0w. Define the treatment effect contrasting treatment regimes a1 and a0 within subgroup w as δ(w)=E[YK+1a1,c=0|L0w]E[YK+1a0,c=0|L0w]. That is, δ(w) is the subgroup-specific average treatment effect contrasting the treatment regimes a1 and a0. Our objective is to find mutually exclusive and exhaustive subgroups of L0 where splitting decisions are made to maximize the between subgroup treatment effect difference.

3 Subgroup discovery for longitudinal data (SDLD) algorithm

3.1 Generalized interaction tree algorithm

In this section, we briefly review the generalized interaction tree algorithm that has been used to discover subgroups with heterogeneous treatment effects for cross-sectional randomized trial data (Steingrimsson and Yang, 2019; Su and others, 2009) and cross-sectional observational data (Yang and others, 2022).

The generalized interaction tree algorithm consists of three steps: initial tree building, pruning, and final tree selection. The algorithm starts by splitting the data into an initial tree building dataset and a validation dataset. Algorithm 1 describes the initial tree building process that runs on the initial tree building dataset where the generalized interaction tree algorithm recursively partitions the covariate space into subgroups until a pre-determined stopping criterion is met (e.g., there are too few observations in each subgroup to justify further splitting). At each attempt to partition the covariate space, the algorithm focuses on discovering subgroups that maximize differences between subgroup-specific treatment effect estimators (using some estimator, δ^(w), of the treatment effect in subgroup w). In Section 3.2, we describe three subgroup-specific treatment effect estimators appropriate for the longitudinal data structures presented in Section 2. For a given subgroup-specific treatment effect estimator δ^(w), define the splitting criterion for splitting a subgroup (node) w into two exclusive and exhaustive subgroups l and r as

(3.1)

The splitting criterion (3.1) estimates a standardized difference between the treatment effects in the two subgroups l and r. Selecting the subgroups l and r by maximizing the splitting criterion (3.1) results in a split that tries to find the pair of subgroups that have the largest difference in estimated treatment effects (i.e., maximizing estimated between subgroup treatment effect heterogeneity).

Algorithm 1

Initial tree building process for the SDLD algorithm

1: Define the root node as consisting of all observations. Set the root node as the node under consideration.

2: Define L0(j) as the j-th component of the baseline covariate vector. In the node under consideration, find all permissible (L0(j),c) pairs, j=1,,J, that split the covariate space into two groups defined by l={L0(j)<c} and r={L0(j)c}, where J is the number of components in the baseline covariate vector.

3: Run through all possible splits in Step 2 and split the node under consideration into two new subgroups based on the split that maximizes the splitting criterion (3.1).

4: If a pre-determined stopping criterion is met, stop the tree building process; otherwise, on every node that is not already split and has not met the stopping criterion, repeat Steps 2-4.

We refer to the result of Algorithm 1 as the initial tree ψ^max. The initial tree ψ^max usually substantially overfits to the data and to reduce overfitting a pruning algorithm uses the initial tree building dataset to create a set of subtrees of ψ^max that are candidates for being the final tree (i.e., the final model). The pruning algorithm is described in detail in Supplementary Appendix A. In short, the pruning algorithm is based on the split complexity, which for a tree ψ is defined as

(3.2)

where Wψ is the set of internal nodes, Gw(ψ) is the value of the splitting criterion for internal node w, |Wψ| is the number of internal nodes, and λ is a positive penalization parameter. The first term in (3.2) is a measure of the treatment effect heterogeneity of the tree and each additional split in the tree is guaranteed to increase (or at least not decrease) the first term. To offset that, the second term penalizes the size of the tree (a measure of the complexity of the model). For a fixed λ, a subtree of the initial tree is said to be an optimal subtree if it maximizes Gλ(ψ). Varying λ from 0 to results in a sequence of subtrees of different sizes, ψ^0=ψ^max,ψ^1,ψ^D^, that all are optimal for different intervals of λ.

In the original classification and regression tree algorithm for outcome prediction (Breiman and others, 1984), the final tree from the sequence of trees created by the pruning algorithm is selected by optimizing a cross-validated objective function. However, as the true treatment effect is unknown for all individuals (we only observe at most one of the potential outcomes YK+1,ia1,c=0 or YK+1,ia0,c=0), there is no observed treatment effect for each individual to compare the treatment effect predictions to in the cross-validation procedure. Hence, cross-validation cannot be applied directly to select the final tree in the generalized interaction tree algorithm. To overcome that, the generalized interaction tree algorithm selects the final tree by estimating the split complexity on the validation dataset. For a given tree ψ^d (d{0,1,,D^}) and a fixed λ, we calculate the validation split complexity by recalculating the split complexity in (3.2) using the validation dataset. The final tree is selected as the tree that maximizes the validation split complexity. For a fixed λ, a split improves the split complexity if the value of the splitting criterion is larger than λ. If there is no treatment effect heterogeneity, the splitting criterion (3.1) follows a χ12 distribution. Hence, selecting λ as the 1α quantile of the χ12 distribution (χ1,1α2) implies that split complexity is improved if and only if the null hypothesis of no difference in treatment effect between the nodes l and r is rejected when using the splitting criterion (3.1) as the test statistic and conducting the test at a level α. Based on this, we propose to select λ as a 1α quantile of the χ12 distribution.

By implementing the initial tree building, pruning, and final tree selection, the generalized interaction tree algorithm stratifies the covariate space L0 into mutually exclusive and exhaustive subgroups w^1,,w^M^ (terminal nodes).

3.2 Subgroup-specific treatment effect estimation

Implementation of the generalized interaction tree algorithm relies on estimating the subgroup-specific treatment effect δ(w)=E[YK+1a1,c=0|L0w]E[YK+1a0,c=0|L0w]. In this section, we describe how TMLE can be used to estimate E[YK+1a,c=0|L0w] for a treatment regime a.

3.2.1 Identifiability

The following identifiability assumptions are needed for the potential outcome mean E[YK+1a,c=0|L0w] to be identifiable (Bang and Robins, 2005; Pearl and Robins, 1995; Robins, 1986):

  • Consistency: For time k, k{0,,K}, and for every individual receiving treatment A¯k=a¯k that has not dropped out of the study, their observed outcome at time k + 1 (Yk+1) is equal to their potential outcome Yk+1a¯k,c¯k=0 and their observed covariate trajectory is equal to their potential covariate trajectory.

  • Sequential exchangeability: At time k, k{0,,K}, the potential outcomes (Yk+1a¯k,c¯k,,YK+1a,c) are independent of treatment Ak and censoring status Ck conditional on past treatment and covariate trajectories.

  • Positivity: For time k, k{0,,K}, and all l¯k such that the density f(L¯k=l¯k,A¯k1=a¯k1,C¯k1=0)>0,
    where A¯1 is defined as the empty set and C¯1=0. The positivity assumption says that conditional on past treatment and covariate history there is a positive probability of continuing to follow the treatment regime of interest.

Using these assumptions, the potential outcome mean can be written as

(3.3)

or as a series of iterated conditional expectations (Robins, 1986)

(3.4)

3.2.2 Estimation

Before describing the longitudinal TMLE we start by describing inverse probability weighting (IPW) (Robins and Rotnitzky, 1992) and g-computation estimators (Robins, 1986) that are sample analogs of expressions (3.3) and (3.4), respectively. Denote the probability of following the treatment regime of interest at time k, k{0,K}, conditional on observed treatment and covariate history and not having dropped out from the study as

Similarly, denote the probability of not dropping out from the study at time k, k{k=0,,K}, conditional on observed treatment and covariate history as

For simplicity of notation, denote gA,k(L¯k,A¯k1;C¯k1=0) as gA,k and gC,k(L¯k,A¯k;C¯k1=0) as gC,k. Define gA,C=k=0KgA,kgC,k and let g^A,C=k=0Kg^A,kg^C,k be an estimator of gA,C using a model for treatment assignment and a model for being censored at each time. Last, let g^A,C,i be a prediction from the model g^A,C for the ith individual (i.e., the predicted probability that the ith individual follows the treatment regime a and is not censored). Using the sample analog of identification result in (3.3), the IPW estimator is given by

If g^A,C consistently estimates gA,C, the IPW estimator is consistent.

The g-computation estimator, sometimes referred to as the iterative conditional expectation estimator (Wen and others, 2021), is calculated by iteratively estimating the conditional expectations in identification result (3.4), where all the estimators for the conditional expectation are restricted to data from subgroup w. That is, the g-computation estimator is implemented using the following steps:

  1. Estimate E[YK+1|A,L] using data from individuals that did not dropout (i.e., C=0). Use the estimator to create predictions for each individual with C¯K1=0 setting their treatment regime to A=a. Denote the predictions by Q^K+1.

  2. Use Q^K+1 as a pseudo-outcome and estimate the conditional expectation of Q^K+1 conditional on A¯K1 and L¯K1 among those with C¯K1=0. Use the estimator to create predictions Q^K. Repeat this iterative process until the conditional expectation that is estimated is only conditional on treatment at the baseline visit A0 and the baseline covariates L0. Denote the resulting predictions by Q^1.

  3. The g-computation estimator is calculated as the subgroup average of the predictions Q^1.

The g-computation estimator is consistent if all the conditional expectation estimators required for its implementation are consistent.

The longitudinal TMLE is similar to the g-computation estimator described above, but it updates the conditional expectation of the outcome in each iterative conditional expectation model using the treatment regime and dropout information, targeted to reduce bias (Kreif and others, 2017; van der Laan and Gruber, 2012). Specifically, in the first step of obtaining the g-computation estimator, we use an additional “targeting” step where we fit an intercept only model using YK+1 as the outcome with Q^K+1 as an offset and with weights I(A=a,C=0)g^A,C, where the treatment assignment and the probability of being censored at each time g^A,C are estimated. We denote the resulting predictions by Q^K+1* and this targeting step is performed for each iterative conditional expectation. The TMLE estimator is then calculated as the subgroup average of the predictions Q^1*. In contrast to the IPW and g-computation estimators, TMLE is doubly robust (van der Laan and Gruber, 2012). That is, it is consistent when either (i) all models for the treatment assignment and the dropout probability are correctly specified or (ii) all models for the conditional expectation of the outcome are correctly specified. It is semi-parametric efficient when all the models are correctly specified (van der Laan and Gruber, 2012). TMLE is a substitution estimator, and is therefore, under mild assumptions, guaranteed to fall within the support of the outcome YK+1. While we focus on TMLE in the remainder of the manuscript, the IPW, g-computation, or other estimators could be used including the estimator developed in Bang and Robins (2005) that is also doubly robust.

3.3 Subgroup discovery for longitudinal data (SDLD) algorithm

The SDLD algorithm is defined by using the longitudinal TMLE as the treatment effect estimator in the generalized interaction tree algorithm. We implemented the SDLD algorithm using the R package rpart to build the initial tree and we constructed the longitudinal TMLE used in the user-specified splitting rules and node-specific estimators in rpart using the R package ltmle (Lendle and others, 2017). In Supplementary Appendix C, we present simulations that show good finite sample performance of the SDLD algorithm for identifying the correct subgroups. The SDLD algorithm splits the covariate space into disjoint and exhaustive subgroups based on maximizing between subgroup treatment effect heterogeneity and within each subgroup a treatment effect estimator is calculated (terminal node estimator). The following theorem is proved in Supplementary Appendix B.

 
Theorem 3.1.

If Assumptions B.1–B.5 in Supplementary Appendix B hold, the SDLD algorithm is L2 consistent.

At each splitting decision, the SDLD algorithm discovers subgroups by searching over all possible covariate and split-point combinations and selecting the combination that maximizes between subgroup heterogeneity estimated by the splitting criterion (3.1). As a result, the observed treatment effect heterogeneity is likely overestimated and that results in bias in the subgroup-specific treatment effect estimators. Data splitting can be used to get unbiased subgroup-specific treatment effect estimators (Fithian and others, 2014; Yang and others, 2022). Data splitting separates the data into two disjoint and exhaustive parts. One part is used for fitting the SDLD algorithm and producing a list of subgroups and the other part is used for subgroup-specific treatment effect estimation and confidence interval construction. As the data used for subgroup discovery and subgroup-specific treatment effect estimation are independent, the TMLE estimators for the subgroup-specific treatment effect inherit the theoretical properties of the TMLE estimator. The downside of data splitting is the reduced sample size used for both fitting the SDLD algorithm and for estimating subgroup-specific estimators of treatment effect, likely resulting in lower power for detecting subgroups and wider confidence intervals for subgroup-specific treatment effect estimators.

4 Discovering subgroups with differential weight gain when on DTG-containing ARTs versus non-DTG-containing ARTs

DTG is an antiretroviral medication that belongs to the class of integrase strand transfer inhibitors. In combination with other medications, DTG has shown promising results for improving viral suppression in clinical trials and is globally used today (Kandel and Walmsley, 2015). A concern with DTG-containing ARTs has been the emergence of substantial weight gain as a potential side effect. A recent phase III trial conducted in South Africa (Venter and others, 2019, 2020) compared two DTG-containing ARTs with standard of care over a 96-week follow-up period. Substantially higher weight gains were seen in the two groups receiving DTG-containing ARTs compared with the standard of care group. Another phase III trial conducted in Cameroon also showed larger weight gains when on DTG-containing ART compared with a low-dose efavirenz-based ART (Calmy and others, 2020; NAMSAL ANRS 12313 Study Group, 2019). Of particular concern is that weight gain associated with DTG-containing ARTs has been shown to be associated with increased truncal fat, a known risk factor for adverse cardiovascular outcomes among people living with HIV (Lake, 2017).

Not all people are expected to be at the same risk for increased weight gain associated with DTG-containing ARTs. In the South African trial previously described, weight gain was greater for females and people with lower CD4 counts or higher viral loads. Similarly, in the Cameroon trial weight gain was more severe among women. Discovering subgroups of people for whom being on DTG-containing ARTs is more likely to cause substantial weight gain can help the development of treatment recommendations or monitoring plans tailored to a person’s risk profile.

The Academic Model Providing Access to Healthcare (AMPATH) partnership is a consortium of research institutions focusing on prevention and treatment of HIV in western Kenya (Tierney and others, 2007). AMPATH administers the AMPATH’s open-source electronic medical record system (AMRS), which is a large-scale EHR database that has information on clinical encounters, lab measurements, and demographic characteristics. We implemented the SDLD algorithm on data from AMRS to discover subgroups with differential effect of DTG-containing ARTs on weight.

We included people living with HIV who were at least 18 years old, were initiating or on ART, and had data on or after July 1, 2016 in AMRS. For each individual, we defined time 0 as July 1, 2016 if the person was enrolled in AMPATH before that time and as the enrollment date at AMPATH if the person was enrolled after July 1, 2016. To structure the data, we used time periods with a length of 200 days and captured data at time k within days [200k,200(k+1)), for k{0,,K+1} (e.g., we used data from days 0 to 199 to create L0,A0,C0 and days 200 to 399 to create L1,A1,C1). We focused on the first five time periods (1000 days) as the data on people that were always on DTG-containing ARTs became sparse over time, resulting in unstable estimates over longer follow-up. We censored individuals at time k if there was no clinical visit in the time periods after time k. As pregnancy leads to natural weight gain, we excluded women that were pregnant at baseline and censored individuals at time k – 1 if they were first recorded as pregnant at time k, k=1,,K. We excluded one individual who had a data entry error whose ART start date was recorded before date of birth and individuals that had no outcome or treatment information available in any time period.

Since there are differences in the expected weight gain between individuals who are newly initiating ART and those who are switching ARTs while virally suppressed (Lakey and others, 2013; Guaraldi and others, 2021), we restricted the analysis to patients who were treatment free at time 0. We defined treatment free as patients who were either newly initiating ART at time 0 or had a viral load measurement that is missing or greater than or equal to 200 (i.e., not virally suppressed) at time 0 and did not have a clinical visit in the 200-day period immediately before time 0. This resulted in an analysis set consisting of 62,073 individuals who had a total of 843,287 unique clinical visits.

The outcome of interest was weight in kilograms at time 4 (i.e., Y4 collected within days 800–999). Treatment at each time was defined as 1 if an individual was receiving a DTG-containing ART and 0 if not. The two treatment regimes of interest were always on and never on DTG-containing ART. We included HIV viral load, systolic blood pressure, diastolic blood pressure, weight, height, whether the individual had active tuberculosis, was being treated for tuberculosis, was married or living with partner, and whether the individual was covered by the National Health Insurance Fund as time-dependent covariates. The vector of baseline covariates L0 included the values of time-varying covariates at time 0, gender, age when starting ART, age at time 0, time on ART at time 0, whether the individual was enrolled on or after July 1, 2016, and whether the individual was newly initiating ART or had been on ART 200 days before time 0. Further details on how we pre-processed the data and how we handled missing data and multiple clinic visits within each time period are provided in Supplementary Appendix D.1.

When implementing the SDLD algorithm, we applied data splitting by randomly selecting 60% of the dataset (37,243 individuals) for tree building using the SDLD algorithm and 40% (24,830 individuals) for subgroup-specific treatment effect estimation. In the tree building dataset, we randomly sampled 80% (29,794 individuals) for initial tree building and used the remaining 20% (7,449 individuals) as the validation set used for final tree selection. As the influence curve-based variance estimate for continuous outcomes implemented by the ltmle package may be anti-conservative, we constructed confidence intervals for the subgroup-specific treatment effects using the non-parametric bootstrap with 1,000 bootstrap samples.

Implementing the SDLD algorithm requires estimating models for the probability of treatment assignment, the probability of dropping out of the study, and the estimation of the pseudo-outcome at each time. For all these models, we used generalized linear regression models with linear and additive main effects for all past covariate and treatment variables fitted using only data in the subgroup under consideration.

Figure 1 shows the estimated average weight had all individuals been always or never on DTG-containing ARTs (left) and the average weight gain when comparing always being on DTG-containing ARTs to never being on DTG-containing ARTs (right) at each time. The confidence intervals are estimated using the non-parametric bootstrap with 10,000 bootstrap samples. The numerical values of the estimates and the corresponding confidence intervals used for making Figure 1 and the number of individuals always and never on DTG-containing ARTs at each time are included in Table S4 in Supplementary Appendix D.2. The results show that sustained treatment with a DTG-containing ART leads to increased weight gain compared with never receiving DTG; the estimated weight gain increases from 1.09 kg at time one (i.e., days 200–399) to 1.30 kg at time four (i.e., days 800–999).

Estimated average weight (left) using TMLE if always and never being on a DTG-containing ART and estimated effect on average weight (right) comparing always versus never being on a DTG-containing ART at each time. Error bars show 95% confidence intervals estimated using the non-parametric bootstrap.
Fig. 1

Estimated average weight (left) using TMLE if always and never being on a DTG-containing ART and estimated effect on average weight (right) comparing always versus never being on a DTG-containing ART at each time. Error bars show 95% confidence intervals estimated using the non-parametric bootstrap.

Figure 2 shows the final tree structure produced by the SDLD algorithm with subgroup-specific treatment effect estimates and associated 95% confidence intervals. The final tree splits on gender. The results suggest that the effects of DTG-containing ARTs are larger for females than males (1.98 vs –0.14 kg) although the 95% bootstrap confidence intervals are slightly overlapping ([-1.08, 0.74] for males and [0.69, 3.17] for females). Results from randomized trials and other data sources support the finding that additional weight gains due to being on DTG-containing ARTs are larger among females than males (Bourgi, Jenkins, and others, 2020; Calmy and others, 2020; Sax and others, 2020; Venter and others, 2020).

Final tree structure when the SDLD algorithm is applied to data from EHRs on people living with HIV in western Kenya to discover subgroups with heterogeneous effects of DTG-containing ARTs on weight gain. Within each subgroup, the first row shows the percentage of the dataset that belongs to the subgroup and the second row shows the estimated weight gain comparing being always versus never on DTG-containing ARTs and associated 95% confidence interval.
Fig. 2

Final tree structure when the SDLD algorithm is applied to data from EHRs on people living with HIV in western Kenya to discover subgroups with heterogeneous effects of DTG-containing ARTs on weight gain. Within each subgroup, the first row shows the percentage of the dataset that belongs to the subgroup and the second row shows the estimated weight gain comparing being always versus never on DTG-containing ARTs and associated 95% confidence interval.

We caution that the subgroup-specific treatment effect estimators only have a causal interpretation if the identifiability and the statistical modeling assumptions are met. A key untestable identifiability assumption is the sequential exchangeability assumption. A recent manuscript (Bourgi and others, 2022) analyzing data on HIV patients in Western Kenya found that gender, CD4 count, weight at baseline, and whether the patient was receiving active tuberculosis therapy were associated with a greater weight gain. In our analysis we account for gender, weight at baseline, viral load, and whether the patient had active tuberculosis. Viral load and whether the patient had active tuberculosis are expected to be highly correlated with CD4 count and whether receiving active tuberculosis therapy, respectively.

Supplementary Appendix D presents additional results including the proportion of underweight and overweight participants across time, distribution of tenofovir disoproxil fumarate containing ARTs when always and never on DTG-containing ARTs, and stability of the SDLD algorithm across data splits. Figure S2 in Supplementary Appendix D compares influence curve based and bootstrap-based confidence intervals and shows that the influence curve-based confidence intervals are much narrower.

In Supplementary Appendix E, we present the analog analysis when the analysis set is not restricted to treatment-free patients. The results are similar to the analysis presented in this section. The final tree from the SDLD algorithm first splits on gender and then makes another split in the female group on whether age at time 0 is less than 42.8 or not. However, the data-split bootstrap-based confidence intervals for younger and older females are highly overlapping. In Supplementary Appendix D, we present the number of times the final tree implemented on the treatment-free analysis set splits on each variable and following gender the variables that are most commonly used for splitting are age at ART initiation and age at time 0.

5 Discussion

We developed the SDLD algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal observational data. The algorithm combines the generalized interaction tree algorithm with longitudinal targeted maximum likelihood estimation and can handle time-varying non-randomized treatment assignment and dropout. We used the algorithm to discover subgroups of individuals with differential weight gain when receiving DTG-containing ARTs using data from AMPATH. Our results suggest that gender is the primary modifier of weight gain from receiving DTG-containing ARTs versus non-DTG-containing ARTs, with females gaining more weight than males. This result is consistent with prior literature where subgroup analyses using pre-specified subgroups (not data-driven subgroup discovery) showed larger weight gains among females.

Our analysis focused on comparing DTG-containing ARTs to other non-DTG-containing ARTs and did not distinguish among different antiretroviral medications included in DTG-containing ARTs. Previous studies have evaluated different DTG-containing ARTs (Venter and others, 2019) and the results show that the weight gains might differ depending on what medications DTG is combined with. Further investigation into specific DTG-containing ARTs might provide a more fine-grained understanding of the effect of DTG on weight gain. When implementing the SDLD algorithm there is a trade-off between identifying spurious subgroups (type I like error) and failing to identify true subgroups (type II like error). That trade-off is largely controlled through the selection of the tuning parameter λ where larger values of λ result in smaller trees (less type I like errors). In the analysis presented in Section 4, we have taken a rather conservative approach by selecting λ as the 95th quantile of the χ12 distribution, but if the goal of the analysis is more exploratory a smaller λ value can be used (e.g., the 90th quantile of the χ12 distribution).

The SDLD algorithm defines subgroups based on baseline covariates. Extending the algorithm to dynamically update the subgroups when new information becomes available is of interest (Sun and others, 2020). Although our application focuses on always and never receiving DTG-containing ARTs, the SDLD algorithm can accommodate other treatment regimes as well. Furthermore, the SDLD algorithm results in a single tree structure that creates an interpretable treatment effect stratification rule by partitioning the covariate space into identifiable subgroups with differential treatment effects. For computational simplicity, we used generalized linear regression to estimate the nuisance functions required for implementation of the TMLE. But estimating the nuisance functions using the super learner (van der Laan and others, 2007) provides an appealing alternative that allows for more flexible estimation that is less susceptible to model misspecification.

Funding

This work was supported in part by Patient-Centered Outcomes Research Institute (PCORI) awards [ME-2019C3-17875, ME-2021C2-22365]; National Library of Medicine (NLM) grant [R01LM013616]; and National Institute of Allergy and Infectious Diseases (NIAID) awards [P30AI042853, R01AI167694, K24AI134359]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NLM, NIAID, PCORI, PCORI’s Board of Governors, or PCORI’s Methodology Committee.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Because the dataset in Section 4 cannot be made publicly available, we simulated a dataset with similar structure. Details of the simulated dataset and the results we obtained when applying the SDLD algorithm to this dataset are provided in Supplementary Appendix C.3. The simulated dataset, and the R code used to implement the SDLD algorithm on the dataset are available at github.com/jiabei-yang/SDLD.

Conflict of Interest

None declared.

References

Achhra
A. C.
,
Mocroft
A.
,
Reiss
P.
,
Sabin
C.
,
Ryom
L.
,
De Wit
S.
,
Smith
C. J.
,
d’Arminio Monforte
A.
,
Phillips
A.
,
Weber
R.
and others. (
2016
).
Short-term weight gain after antiretroviral therapy initiation and subsequent risk of cardiovascular disease and diabetes: the d: A: D study
.
HIV Medicine
17
,
255
268
.

Athey
S.
,
Imbens
G.
(
2016
).
Recursive partitioning for heterogeneous causal effects
.
Proceedings of the National Academy of Sciences of the United States of America
113
,
7353
7360
.

Bang
H.
,
Robins
J. M.
(
2005
).
Doubly robust estimation in missing data and causal inference models
.
Biometrics
61
,
962
973
.

Bourgi
K.
,
Jenkins
C. A.
,
Rebeiro
P. F.
,
Shepherd
B. E.
,
Palella
F.
,
Moore
R. D.
,
Althoff
K. N.
,
Gill
J.
,
Rabkin
C. S.
,
Gange
S. J.
and others. (
2020
).
Weight gain among treatment-naïve persons with HIV starting integrase inhibitors compared to non-nucleoside reverse transcriptase inhibitors or protease inhibitors in a large observational cohort in the United States and Canada
.
Journal of the International AIDS Society
23
,
e25484
.

Bourgi
K.
,
Rebeiro
P. F.
,
Turner
M.
,
Castilho
J. L.
,
Hulgan
T.
,
Raffanti
S. P.
,
Koethe
J. R.
,
Sterling
T. R.
(
2020
).
Greater weight gain in treatment-naive persons starting dolutegravir-based antiretroviral therapy
.
Clinical Infectious Diseases
70
,
1267
1274
.

Bourgi
K.
,
Ofner
S.
,
Musick
B.
,
Griffith
B.
,
Diero
L.
,
Wools-Kaloustian
K.
,
Yiannoutsos
C. T.
,
Gupta
S. K
. (
2022
).
Weight gain among treatment-naïve persons with HIV receiving dolutegravir in Kenya
.
JAIDS Journal of Acquired Immune Deficiency Syndromes
91
,
490
496
.

Breiman
L.
,
Friedman
J.
,
Stone
C. J.
,
Olshen
R. A.
(
1984
).
Classification and regression trees
.
CRC press
.

Calmy
A.
,
Sanchez
T. T.
,
Kouanfack
C.
,
Mpoudi-Etame
M.
,
Leroy
S.
,
Perrineau
S.
,
Wandji
M. L.
,
Tata
D. T.
,
Bassega
P. O.
,
Bwenda
T. A.
and others. (
2020
).
Dolutegravir-based and low-dose efavirenz-based regimen for the initial treatment of HIV-1 infection (NAMSAL): Week 96 results from a two-group, multicentre, randomised, open label, phase 3 non-inferiority trial in cameroon
.
The Lancet HIV
7
,
e677
e687
.

Dahabreh
I. J.
,
Hayward
R.
,
Kent
D. M.
(
2016
).
Using group data to treat individuals: Understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence
.
International Journal of Epidemiology
45
,
2184
2193
.

Esber
A. L.
,
Chang
D.
,
Iroezindu
M.
,
Bahemana
E.
,
Kibuuka
H.
,
Owuoth
J.
,
Singoei
V.
,
Maswai
J.
,
Dear
N. F.
,
Crowell
T. A.
and others. (
2022
).
Weight gain during the dolutegravir transition in the African cohort study
.
Journal of the International AIDS Society
25
,
e25899
.

Fithian
W.
,
Sun
D.
,
Taylor
J.
(
2014
). Optimal inference after model selection. arXiv preprint arXiv:1410.2597.

Foster
J. C.
,
Taylor
J. M. G.
,
Ruberg
S. J.
(
2011
).
Subgroup identification from randomized clinical trial data
.
Statistics in Medicine
30
,
2867
2880
.

Guaraldi
G.
,
Calza
S.
,
Milic
J.
,
Calcagno
A.
,
Focà
E.
,
Rota
M.
,
Renzetti
S.
,
Celotti
A.
,
Siano
M.
,
Celesia
B. M.
,
Piconi
S
. (
2021
).
Dolutegravir is not associated with weight gain in antiretroviral therapy experienced geriatric patients living with HIV
.
AIDS
35
,
939
.

Herrin
M.
,
Tate
J. P.
,
Akgün
K. M.
,
Butt
A. A.
,
Crothers
K.
,
Freiberg
M. S.
,
Gibert
C. L.
,
Leaf
D. A.
,
Rimland
D.
,
Rodriguez-Barradas
M. C.
and others. (
2016
).
Weight gain and incident diabetes among HIV infected-veterans initiating antiretroviral therapy compared to uninfected individuals
.
Journal of Acquired Immune Deficiency Syndromes (1999)
73
,
228
.

Kandel
C. E.
,
Walmsley
S. L.
(
2015
).
Dolutegravir—a review of the pharmacology, efficacy, and safety in the treatment of HIV
.
Drug Design, Development and Therapy
9
,
3547
.

Kim
D. J.
,
Westfall
A. O.
,
Chamot
E.
,
Willig
A. L.
,
Mugavero
M. J.
,
Ritchie
C.
,
Burkholder
G. A.
,
Crane
H. M.
,
Raper
J. L.
,
Saag
M. S.
and others. (
2012
).
Multimorbidity patterns in HIV-infected patients: The role of obesity in chronic disease clustering
.
Journal of Acquired Immune Deficiency Syndromes (1999)
61
,
600
.

Kreif
N.
,
Tran
L.
,
Grieve
R.
,
De Stavola
B.
,
Tasker
R. C.
,
Petersen
M.
(
2017
).
Estimating the comparative effectiveness of feeding interventions in the pediatric intensive care unit: A demonstration of longitudinal targeted maximum likelihood estimation
.
American Journal of Epidemiology
186
,
1370
1379
.

Lake
J. E.
(
2017
).
The fat of the matter: Obesity and visceral adiposity in treated HIV infection
.
Current HIV/AIDS Reports
14
,
211
219
.

Lakey
W.
,
Yang
L. Y.
,
Yancy
W.
,
Chow
S. C.
,
Hicks
C
. (
2013
).
From wasting to obesity: initial antiretroviral therapy and weight gain in HIV-infected persons
.
AIDS research and human retroviruses
29
,
435
440
.

Lendle
S. D.
,
Schwab
J.
,
Petersen
M. L.
,
van der Laan
M. J
. (
2017
).
ltmle: an R package implementing targeted minimum loss-based estimation for longitudinal data
.
Journal of Statistical Software
81
,
1
21
.

NAMSAL ANRS 12313 Study Group
. (
2019
).
Dolutegravir-based or low-dose efavirenz-based regimen for the treatment of HIV-1
.
New England Journal of Medicine
381
,
816
826
.

Pearl
J.
,
Robins
J. M.
(
1995
). Probabilistic evaluation of sequential plans from causal models with hidden variables. In:
UAI
, Vol.
95
.
Citeseer
. pp.
444
453
.

Petersen
M.
,
Schwab
J.
,
Gruber
S.
,
Blaser
N.
,
Schomaker
M.
,
van der Laan
M.
(
2014
).
Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models
.
Journal of Causal Inference
2
,
147
185
.

Phillips
A. N.
,
Venter
F.
,
Havlir
D.
,
Pozniak
A.
,
Kuritzkes
D.
,
Wensing
A.
,
Lundgren
J. D.
,
De Luca
A.
,
Pillay
D.
,
Mellors
J.
and others. (
2019
).
Risks and benefits of dolutegravir-based antiretroviral drug regimens in sub-saharan Africa: A modelling study
.
The Lancet HIV
6
,
e116
e127
.

Robins
J.
(
1986
).
A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect
.
Mathematical Modelling
7
,
1393
1512
.

Robins
J. M.
,
Greenland
S.
(
2000
).
Causal inference without counterfactuals: Comment
.
Journal of the American Statistical Association
95
,
431
435
.

Robins
J. M.
,
Rotnitzky
A.
(
1992
). Recovery of information and adjustment for dependent censoring using surrogate markers. In:
AIDS Epidemiology: methodological issues
.
Boston, MA
:
Birkhäuser Boston
, pp.
297
331
.

Robins
J. M.
,
Rotnitzky
A.
,
Zhao
L. P.
(
1994
).
Estimation of regression coefficients when some regressors are not always observed
.
Journal of the American Statistical Association
89
,
846
866
.

Rubin
D. B.
(
1974
).
Estimating causal effects of treatments in randomized and nonrandomized studies
.
Journal of Educational Psychology
66
,
688
.

Sax
P. E.
,
Erlandson
K. M.
,
Lake
J. E.
,
Mccomsey
G. A.
,
Orkin
C.
,
Esser
S.
,
Brown
T. T.
,
Rockstroh
J. K.
,
Wei
X.
,
Carter
C. C.
and others. (
2020
).
Weight gain following initiation of antiretroviral therapy: Risk factors in randomized comparative clinical trials
.
Clinical Infectious Diseases
71
,
1379
1389
.

Seibold
H.
,
Zeileis
A.
,
Hothorn
T.
(
2016
).
Model-based recursive partitioning for subgroup analyses
.
The International Journal of Biostatistics
12
,
45
63
.

Steingrimsson
J. A.
,
Yang
J.
(
2019
).
Subgroup identification using covariate-adjusted interaction trees
.
Statistics in Medicine
38
,
3974
3984
.

Stitelman
O. M.
,
De Gruttola
V.
,
van der Laan
M. J.
(
2012
).
A general implementation of TMLE for longitudinal data applied to causal inference in survival analysis
.
The International Journal of Biostatistics
8
, Article
26
.

Su
X.
,
Meneses
K.
,
McNees
P.
,
Johnson
W. O.
(
2011
).
Interaction trees: Exploring the differential effects of an intervention programme for breast cancer survivors
.
Journal of the Royal Statistical Society: Series C (Applied Statistics
)
60
,
457
474
.

Su
X.
,
Tsai
C.-L.
,
Wang
H.
,
Nickerson
D. M.
,
Li
B.
(
2009
).
Subgroup analysis via recursive partitioning
.
Journal of Machine Learning Research
10
,
141
158
.

Su
X.
,
Zhou
T.
,
Yan
X.
,
Fan
J.
,
Yang
S.
(
2008
).
Interaction trees with censored survival data
.
The International Journal of Biostatistics
4
,
Article 2
.

Sun
Y.
,
Chiou
S. H.
,
Wu
C. O.
,
McGarry
M.
,
Huang
C.-Y.
(
2020
). Dynamic risk prediction using survival tree ensembles with application to cystic fibrosis. arXiv preprint arXiv:2011.07175.

Surial
B.
,
Mugglin
C.
,
Calmy
A.
,
Cavassini
M.
,
Günthard
H. F.
,
Stöckle
M.
,
Bernasconi
E.
,
Schmid
P.
,
Tarr
P. E.
,
Furrer
H.
and others. (
2021
).
Weight and metabolic changes after switching from tenofovir disoproxil fumarate to tenofovir alafenamide in people living with HIV: A cohort study
.
Annals of Internal Medicine
174
,
758
767
.

Tierney
W. M.
,
Rotich
J. K.
,
Hannan
T. J.
,
Siika
A. M.
,
Biondich
P. G.
,
Mamlin
B. W.
,
Nyandiko
W. M.
,
Kimaiyo
S.
,
Wools-Kaloustian
K.
,
Sidle
J. E.
and others. (
2007
).
The AMPATH medical record system: Creating, implementing, and sustaining an electronic medical record system to support HIV/AIDS care in Western Kenya
.
Studies in Health Technology and Informatics
129
,
372
.

Van der Laan
M. J.
,
Polley
E. C.
,
Hubbard
A. E
. (
2007
).
Super learner
.
Statistical applications in genetics and molecular biology
6
.

van der Laan
M. J.
,
Gruber
S.
(
2012
).
Targeted minimum loss based estimation of causal effects of multiple time point interventions
.
The International Journal of Biostatistics
8
, Article
9
.

Van der Laan
M. J.
,
Rose
S.
(
2018
).
Targeted Learning in Data Science
.
Cham
:
Springer International Publishing
.

Van der Laan
M. J.
,
Rose
S.
(
2011
).
Targeted Learning: Causal Inference for Observational and Experimental Data
, Vol.
4
.
New York
:
Springer
.

Van Der Laan
M. J.
,
Rubin
D.
(
2006
).
Targeted maximum likelihood learning
.
The International Journal of Biostatistics
2
, Article
11
.

Venter
W. D. F.
,
Moorhouse
M.
,
Sokhela
S.
,
Fairlie
L.
,
Mashabane
N.
,
Masenya
M.
,
Serenata
C.
,
Akpomiemie
G.
,
Qavi
A.
,
Chandiwana
N.
and others. (
2019
).
Dolutegravir plus two different prodrugs of tenofovir to treat HIV
.
New England Journal of Medicine
381
,
803
815
.

Venter
W. D. F.
,
Sokhela
S.
,
Simmons
B.
,
Moorhouse
M.
,
Fairlie
L.
,
Mashabane
N.
,
Serenata
C.
,
Akpomiemie
G.
,
Masenya
M.
,
Qavi
A.
and others. (
2020
).
Dolutegravir with emtricitabine and tenofovir alafenamide or tenofovir disoproxil fumarate versus efavirenz, emtricitabine, and tenofovir disoproxil fumarate for initial treatment of HIV-1 infection (advance): Week 96 results from a randomised, phase 3, non-inferiority trial
.
The Lancet HIV
7
,
e666
e676
.

Wei
Y.
,
Liu
L.
,
Su
X.
,
Zhao
L.
,
Jiang
H.
(
2020
).
Precision medicine: Subgroup identification in longitudinal trajectories
.
Statistical Methods in Medical Research
29
,
2603
2616
.

Wen
L.
,
Young
J. G.
,
Robins
J. M.
,
Hernán
M. A.
(
2021
).
Parametric g-formula implementations for causal survival analyses
.
Biometrics
77
,
740
753
.

World Health
Organization. (
2018
). Updated recommendations on first-line and second-line antiretroviral regimens and post-exposure prophylaxis and recommendations on early infant diagnosis of HIV: Interim guidelines: Supplement to the 2016 consolidated guidelines on the use of antiretroviral drugs for treating and preventing HIV infection. Technical Report, World Health Organization.

Yang
J.
,
Dahabreh
I. J.
,
Steingrimsson
J. A.
(
2022
).
Causal interaction trees: Finding subgroups with heterogeneous treatment effects in observational data
.
Biometrics
78
,
624
635
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)

Supplementary data