-
PDF
- Split View
-
Views
-
Cite
Cite
Juhee Lee, Peter F. Thall, Bora Lim, Pavlos Msaouel, Utility-Based Bayesian Personalized Treatment Selection for Advanced Breast Cancer, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 71, Issue 5, November 2022, Pages 1605–1622, https://doi.org/10.1111/rssc.12582
- Share Icon Share
Abstract
A Bayesian method is proposed for personalized treatment selection in settings where data are available from a randomized clinical trial with two or more outcomes. The motivating application is a randomized trial that compared letrozole plus bevacizumab to letrozole alone as first-line therapy for hormone receptor-positive advanced breast cancer. The combination treatment arm had larger median progression-free survival time, but also a higher rate of severe toxicities. This suggests that the risk-benefit trade-off between these two outcomes should play a central role in selecting each patient's treatment, particularly since older patients are less likely to tolerate severe toxicities. To quantify the desirability of each possible outcome combination for an individual patient, we elicited from breast cancer oncologists a utility function that varied with age. The utility was used as an explicit criterion for quantifying risk-benefit trade-offs when making personalized treatment selections. A Bayesian nonparametric multivariate regression model with a dependent Dirichlet process prior was fit to the trial data. Under the fitted model, a new patient's treatment can be selected based on the posterior predictive utility distribution. For the breast cancer trial dataset, the optimal treatment depends on the patient's age, with the combination preferable for patients 70 years or younger and the single agent preferable for patients older than 70.
1 INTRODUCTION
Nearly all published clinical trial results focus on statistical inferences about effects of treatments and patient prognostic variables on clinical outcomes. This may fall short of what is needed by practicing physicians to make informed treatment decisions for individual patients. In many settings, estimated effects on efficacy and toxicity lead to conflicting treatment choices, and the relative desirability of two treatments also may vary with patient prognostic variables. Our motivating dataset, which illustrates this class of problems, arose from a phase III study of targeted agents for treating hormone receptor-positive advanced breast cancer, reported by Dickler et al. (2016). Patients were randomized between letrozole plus bevacizumab (L + B) and letrozole plus placebo (L). The primary efficacy endpoint was progression-free survival (PFS) time, defined as the time from the treatment to disease progression or death from any cause. Due to safety concerns, 21 different types of toxicity were monitored, including the type and grade (0 = none to 5 = fatal) of each occurrence. A statistically significant PFS improvement was seen with L + B compared to L (one-sided p-value = 0.016), with estimated median PFS 20.2 months (95% confidence interval, CI, 17.0–24.1) with L + B compared to 15.6 months (95% CI 12.9 – 19.7) with L. Consideration of toxicities led to the opposite conclusion, with 46.8% of patients treated with L + B experiencing severe (grade ≥ 3) toxicities compared to 14.2% with L.
Considering each outcome alone, selecting an optimal treatment is straightforward, since longer PFS and less toxicity each is more desirable. This leads to the problematic conclusions that L + B is preferable in terms of PFS but L is preferable in terms of toxicity. Thus, when considering these two outcomes together, as must be done in practice by a physician when choosing between the treatments for an individual patient, decision making is not straightforward. In our analyses, rather than dichotomizing toxicity severity, we will use total toxicity burden (TTB) (Bekele & Thall, 2004; Le-Rademacher et al., 2020) to summarize each patient's adverse events. In general, for K toxicities , where each is a grade in {0, …, J − 1}, we define the scaled TTB to be which takes on values between 0 and 1. Figure 1 illustrates the TTB distributions and Kaplan–Meier estimates of the PFS survival function for each treatment arm in the breast cancer dataset, with L + B represented by blue and L by red. The questions that we will address in this paper are how one may use the available data to choose between L + B and L for a new breast cancer patient, and how this may be done in other, similar settings.
![For the breast cancer data, histograms of scaled total toxicity burden (TTB) are given in panels (a) and (b). Panel (c) illustrates Kaplan–Meier estimates of survival functions, S(t). Blue and red represent treatments, letrozole plus bevacizumab (L + B) and letrozole plus placebo (L) respectively. [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/5/10.1111_rssc.12582/1/m_rssc_71_5_1605_fig-1.jpeg?Expires=1748064762&Signature=z2b3US3L1GfS67Mo4lRXwX0OgQDsf08oYzyLuTVaLoXyFi9EcKnVEyH0Wh-OeOLiAeVbRi-U2HJplAvIBkiIf5XJX9EAuwR2rT99h0rMT53MW0kFlA8sBdoEHGz1gVvyW3lvBvJMtGCymY1Wvh3WiAjkLJvWm5Fh7FkBpJdvvSWMdGt8rm~gn8BL7SsgsLRAL3DPUOTE269Z6H7WwEC9LrXOkTpzwyByVnJDV96v1ewH7qpBYTM8TtL1Vxjr2g6jGNSgJuMmN97EqfELzZA3o8~GoyEVBOkFe9zHzTGbwbekzHRTgWNDYEbBJ1oGozm0X8HE-Ma~9uGz3fAjBb~XPw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
For the breast cancer data, histograms of scaled total toxicity burden (TTB) are given in panels (a) and (b). Panel (c) illustrates Kaplan–Meier estimates of survival functions, S(t). Blue and red represent treatments, letrozole plus bevacizumab (L + B) and letrozole plus placebo (L) respectively. [Colour figure can be viewed at https://dbpia.nl.go.kr]
We assume that each patient's data can be summarized as a vector, y, of clinical outcomes, a vector, x, of prognostic covariates, and a treatment indicator variable, τ. A Bayesian regression model, f( y | τ, x, θ) is assumed and fit to the data, where θ denotes the model's parameter vector. For the breast cancer trial data, y = with = PFS time and = TTB. Using the breast cancer trial data for illustration, we extend the usual statistical process of data analysis by connecting it with medical decision making by practicing physicians. To do this, we first construct a family of utility functions, with each utility assigning numerical desirability scores U(y, x) to all y = (PFS time, TTB) pairs for a patient with prognostic variables x. In a given setting, a physician and patient may choose a particular utility function from the family that best represents the patient's subjective trade-offs. Because the trade-off between PFS and TTB may vary with x, the desirability of a particular pair of treatment options may not be the same for all patients, and this may lead to different treatment preferences for two patients having different x.
Decision analysis based on utility functions certainly is not new. This has been studied and applied in many areas, including business (e.g. Loewenstein et al., 1989; Pennings & Smidts, 2003), engineering (e.g. Bagočius et al., 2014; Chen et al., 1998) and operations research (e.g. Roy et al., 2017; Walsh et al., 2004). Two papers of a five-part primer on medical decision analysis are given by Detsky et al. (1997) and Naglie et al. (1997). However, formal utility-based decision procedures are seldom included in statistical data analysis reports. Our application of the methodology to the breast cancer dataset illustrates how a utility function and Bayesian statistical model can be used to choose between two treatments for a patient with given prognostic variables. To establish the idea that outcome utilities can be used as practical tools for decision making, and illustrate the range of potential applications, Supplementary Section 2 provides examples of utility functions for different types of outcome vectors. These include a one-dimensional ordinal outcome, binary (response, toxicity) indicators, and the two ordinal categorical outcomes (disease status, toxicity severity).
For the breast cancer data analysis, we develop a robust Bayesian regression model, f( y | τ, x, θ), that assumes latent patient frailties to account for association among the elements of y and z, and describes how each outcome varies as a function of treatment, τ, and baseline covariates, x. We formulate a joint Bayesian nonparametric (BNP) multivariate regression model that includes a vector of continuous latent variables defined to represent ordinal toxicity outcomes, using the dependent Dirichlet process (DDP) (MacEachern, 1999). We use a linear DDP developed by De Iorio et al. (2004, 2009). DDP models are highly flexible and provide a robust basis for inferences about regression relationships. BNP models have been applied to a broad range of statistical problems, including density estimation, regression, clustering and survival analysis. See, for example, Müller et al. (2015) for general applications, Mitra and Müller (2015) for applications in biostatistics, or Müller and Mitra (2013); Thall et al. (2017) for overviews and illustrations. Given a Bayesian model f( y | τ, x, θ) and utility function U( y, x), we use posterior predictive utility distributions as a basis for deciding between treatments for a new patient with prognostic variables x. We choose the treatment that yields the greatest posterior mean utility.
The remainder of the paper is organized as follows. Section 2.1 formulates a bivariate regression model for clinical outcomes PFS time and TTB, and provides a predictive distribution of (PFS, TTB) for a future patient as a function of the patient's covariates and each potential treatment that may be given to the patient. Section 2.2 provides computational details for implementation. In Section 3, we describe a utility function that varies with PFS, TTB and covariates in order to represent a personalized risk-benefit trade-off between PFS and TTB. The utility function is constructed using separate contributions from PFS time and TTB, with each contribution constrained so that it is logically consistent and reflects elicited expert opinion. The predictive distribution allows one to estimate the utility of each treatment for a future patient with given covariates, and provides a way to compute the probability that each treatment is preferred for the future patient. In Section 4, we illustrate the methodology by applying it to make personalized treatment selections based on the breast cancer trial data. In Section 5, a simulation study is presented to illustrate general properties of the proposed decision-making approach. We close with a brief discussion in Section 6.
2 A BNR MODEL
2.1 Sampling distribution and prior specification
Let denote PFS time during the follow-up period , for patient i = 1, …, n. The observed time of failure (progression or death) or independent administrative censoring at is with if PFS time was observed, , and if censored, . Denote the ordinal variable for the maximum grade that patient i experienced of toxicity type k = 1, …, K. Censoring is assumed to be independent of , toxicity occurrences, and covariates. Denote the ith patient's vector of baseline covariates by and the observed data by , where . In the breast cancer dataset, there are n = 340 patients after removing three patients having , K = 21 toxicity categories and J = 6 severity grades, where grade 0 = no occurrence of that toxicity type and 5 = death. If a subject died due to a toxicity type k occurrence, the corresponding was recorded with observed survival time and . We include P = 3 prognostic covariates, = age, an indicator of measurable disease at baseline, and an indicator of whether the patient's disease free interval prior to trial entry was greater than 24 months, in addition to the indicator τ of treatment L + B.
To construct a model that accounts for heterogeneity between patients not explained by the covariates, we introduce real-valued (K + 1)−dimensional multivariate normal latent frailty vectors, for i = 1, …, n. We assume and . Following Chib and Greenberg (1998), we construct a multinomial probit model for the ordinal toxicity outcomes by introducing the unobserved real-valued latent variables , where , and define if and only if , where denote toxicity type-specific cutoffs for each k. This is a common modelling strategy that uses real-valued latent variables to induce a tractable multivariate distribution on a vector of observed ordinal categorical variables, and greatly facilitates computation. For logarithm transformed PFS time, latent variables treatment and covariates , we assume
We take a BNP approach for modelling h in (1) that allows flexible regression structures by assuming the DDP (MacEachern, 1999), which is a family of random probability distributions indexed by (τ, x). Specifically, we use a linear-DDP that induces covariate dependence through linear regression structures (De Iorio et al., 2004, 2009). Denoting , and , we assume the simple parametric linear combinations and , for each k = 1, …, K. We denote and construct a model for h through a convolution with a normal kernel
where is the density function of the d−variate normal distribution with mean vector a and d × d covariance matrix B > 0. We use the Dirichlet process (DP) as a prior for the random mixing distribution G in (2). This gives a DP mixture of multivariate normal linear models,
where with and . The weights are constructed via Sethuraman's (1994) so-called ‘stick-breaking’ process by assuming , with fixed ξ > 0. For the covariate and treatment effect parameters in (3), we assume
with and fixed, where represents a (P + 2)-dimensional multivariate normal distribution. In (4), and are (P + 2)-dimensional mean vectors and is a (P + 2) × (P + 2) matrix. We assume with fixed and , and with fixed and for p = 0, …, P. The hierarchical structure for enables the model to borrow information across the toxicity categories. The model in (3) incorporates τ and x linearly in the mean of each normal summand. Due to the fact that the distribution of is a weighted average of multivariate normal distributions, each with its own linear term, the model accounts for possible effects of τ and x on that can be nonlinear and quite complex, including interactions between two or more variables in (τ, x). This construction thus provides a flexible modelling framework for inference and prediction, avoiding restrictive assumptions of linearity or additivity in the covariate effects. This facilitates accurate decision making by avoiding restrictions imposed by conventional parametric models, such as the proportional hazards model. We let , which implies conditional independence between and given . Due to the conditional independence, the marginal distributions from (3) can be expressed as the weighted averages
That is, f and each also is a linear DDP mixture distribution. The marginal distribution of each is obtained by integrating over the latent variables,
Marginalizing by averaging over s in (3), the resulting DP mixture model, , has covariance matrix ∑ + Ω. Thus, Ω induces dependence between PFS time and the toxicities within each patient, in addition to explaining additional variability between patients not explained by τ and x.
To ensure identifiability in the multivariate ordinal regression model, we fix and set for all k. We also set , and , and and . We let the cut-offs , j = 2, …, J−1 be random for flexibility, by defining k = 1, …, K and with error terms , for j = 2, …, J − 1. Lastly, we assume .
2.2 Posterior inference
Collecting terms, is the vector of all model parameters, and is the vector of all fixed hyper-parameters. Given and data , the joint posterior of θ and the patient-specific random effects is
where the joint likelihood of the observed data for the ith patient is the product
We use Markov chain Monte Carlo (MCMC) simulation to generate posterior samples of the parameter and latent variable vectors, (θ, s). For computational convenience, we approximate the DDP in (5) by truncating the infinite number of mixture components of F and to the finite value M. The final weight is set to to ensure that F and are proper distributions. For sufficiently large M, the truncated sum produces inferences virtually identical to those with the infinite sum (Ishwaran & James, 2001; Rodriguez & Dunson, 2011). As discussed in Rodriguez and Dunson (2011), if there is a discrepancy between the posterior distributions under the truncated and infinite sums, then the model is sensitive to the choice of M. Any value of M that has a small value for is sufficiently large to produce a negligible discrepancy. We examined the posterior distribution of , and assessed sensitivity of the model to several different M values for the breast cancer dataset. We found that the truncated process is robust to the choice of M, if M is sufficiently large. This led us to use M = 15 for the data analysis and simulation studies. Computational details are given in Supplementary Section 1.1. A computer program ‘utility-analysis’ for fitting the proposed model is available from https://users.soe.ucsc.edu/juheelee/.
3 UTILITY FUNCTIONS FOR PFS AND TTB
In this section, we describe how a utility function was constructed for the breast cancer data analysis. While we focus on the case where y consists of PFS time and TTB, the methodology may be applied generally in settings where y is a single variable, a bivariate binary or ordinal variable, or some combination of two or more discrete and continuous outcomes. Examples are given in Supplementary Section 2.
Given the reduction of the K-dimensional toxicity vector z to the scaled TTB q, 0 ≤ q ≤ 1, we will construct a utility function for the pair (t, q). A departure of our utility formulation from previous published outcome utilities is that we construct U so that it varies with covariates x as well as the outcomes (t, q). We let τ = 1 for L + B and τ = 0 for L, so choosing τ for a future patient with a covariates is the target of our decision analysis. To ensure a consistent utility function that quantifies trade-offs between t and q for each x, we require
That is, considered individually with the other outcome variable fixed, smaller TTB and longer PFS each must be more desirable.
The form of the utility function given here, and the numerical values that it takes on, were obtained based on the consensus of two medical oncologists who are co-authors of this paper, PM and BL, one of whom is a breast cancer subspecialist. The first step of our construction was to specify a parametric total utility function, which we defined generally as the product
subject to the constraints and . For a given x, the utility component acts multiplicatively to decrease the utility component , and one may regard multiplying by as penalizing the PFS utility, where the magnitude of the penalty is determined by q.
To apply this to the breast cancer dataset, we constructed a functional form for (8) to reflect this particular treatment setting. We denote the prognostic covariates by , an indicator of measurable disease, and an indicator of whether the patient's disease free interval prior to trial entry was > 24 months, with all three included in the regression models for outcomes t and z. While the randomization for the breast cancer trial was stratified by and to improve precision, based on clinical experience PM and BL decided that neither nor should have any effect on the utility function, whereas is included in This is because, in clinical practice, the utility gained by greater PFS is similar regardless of age group, while older patients tend to care more about maintaining a good quality of life, quantified by a lower TTB. Thus, older patients are less likely than younger patients to accept a higher level of toxicity for the same PFS benefit. Accordingly, we only used the prognostic covariate in the utility function, and constructed so that, for any PFS time t, a given value q > 0 for TTB decreases the utility for an older patient more than for a young patient. We also assumed that . We set since the domain (0, 100) is easy to interpret, and set for any Age. Thus, if a patient has no toxicity (q = 0) then . For example, if and , then the utility of 60-month PFS time and TTB = 0.5 is for a 60-year-old patient, whereas for a 40-year-old patient gives the much higher total utility .
For the PFS component, the clinicians PM and BL specified the particular values , = 95, and required for patients with any Age. This was based on the clinical experience that, in the hormone receptor-positive metastatic breast cancer setting, the increase in utility with increasing PFS is linear for up to 4 years, after which there is generally not much increase in utility with increasing PFS. This is due, in part, to the availability of newer regimens that can be given as salvage therapy to patients whose disease progresses after 4 years. To reflect this, we constructed the following parametric function for :
and we set = 48 months, = 95, and = 100. This function increases in t up to 48 months, with a small additional increase for t > 48 months. Since , the equation gives a = 0.926. Similarly, since this gives . The resulting function is plotted in Figure 2(a).
![Illustration of utility functions. Panel (a) has UPFS(t), the utility function of progression-free survival (PFS) time with t = PFS time. Panels (b)–(d) have the total utility functions Utot(t,q,Age), where t = PFS time and q = total toxicity burden (TTB). [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/5/10.1111_rssc.12582/1/m_rssc_71_5_1605_fig-2.jpeg?Expires=1748064762&Signature=a3Ei0UfdBnDGZGUw4a1OPTO4qkSCyEFZYFHkNEwVP0KxXnYdBcBBHgal4fzBZbkAiH99BEYgo2HJ-MeJlnJMfjtn7DSMU-ckrYp1RkkMhK5ODAqz8K2nITttMMYzVPOKeVnKnzjRGTMuyUtP8M-obTdj1EWoPR4xv6lNyA92bOKtFvqqTkMRXVdesT7J3RGJ5BI0iRk9nB4y1WxukndHSk0z7dCK6R6P~T5NdIIJxV2CV-J8-HBwdp4j1OwlLRwEbiUKLMYPhezp5hBgu415fQwHfwd2z57WtXmT9JZcGXlRO2xGtkIcKqY3um~I4-KY2d-Nn3X3F2aEZ1iOgYHyvQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Illustration of utility functions. Panel (a) has , the utility function of progression-free survival (PFS) time with t = PFS time. Panels (b)–(d) have the total utility functions , where t = PFS time and q = total toxicity burden (TTB). [Colour figure can be viewed at https://dbpia.nl.go.kr]
We defined to vary with Age so that it decreases at a faster rate for older Age. PM and BL established numerical values of in [0, 1] for each pair of (q, Age) values specified on a grid. These are tabulated in Supplementary Table 2. We constructed a parametric function that closely approximates these elicited TTB utilities by exploring various functional forms, and chose
where . As q increases, the total utility given in (8) decreases by a factor of the above exponential function of and is penalized in using this exponential value. If no toxicity occurs, that is, q = 0, then and , as required. The function has an inflection point at q = g(Age), and for any q > g(Age), decreases to 0 very quickly. We defined g(Age) to be a decreasing function of Age so that decreases in q faster for larger values of Age. To obtain this, we restricted and calibrated the values of and using the elicited numerical utilities. The numerical values and yield a good approximation. Details are given in Supplementary Section 2. Figure 2b,c compares (solid line) to the elicited values (dots connected by dotted lines) for Age = 50, 65 and 85. In each plot, different colours represent different values of q. The figure illustrates how decreases with q, and how the magnitude of decrease changes with Age. The specific utility function described here may be questioned due to its subjectivity. However, ordering the consequences of decisions is inherently subjective and is necessary for decision making, and using an explicit utility function that is constructed based on expert knowledge produces meaningful decisions.
To use this structure for individualized treatment selection, we exploit the Bayesian model to compute the posterior predictive (PP) distribution of for a new patient with prognostic covariates assuming a particular τ is given to the patient, similar to the examples in Supplementary Section 2,
By averaging the likelihood of the new patient's future outcomes over the joint posterior distribution of , the PP distribution in (11) provides a fully model-based criterion for making inferences to compare treatments, with appropriate quantification of uncertainty. The PP distribution of the utility, for a new patient with prognostic covariates can be derived directly from (11).
We use the predictive mean total utility for each τ as a basis for treatment selection. For τ = 0 corresponding to L and τ = 1 for L + B, the predictive mean total utility is
One may choose the treatment τ having larger for the new patient. In general, is a function of the entire vector and τ through the utility function and/or the probability distribution, and this still would be the case if did not depend on Age.
In general, another criterion for comparing treatments τ and studied in a trial is to compare the PP distributions and of the total utilities. To do this, we define the posterior probability that treatment has a larger total utility than treatment τ for a new patient with prognostic variables ,
One may select to treat a new patient with if and otherwise select τ. This can be done by first computing the joint PP distribution of , that is, of two sets of outcomes with one for each treatment,
which can be used in turn to compute . Although depends on only through Age, the joint distribution in (14) depends on the entire and , and Δ does as well. The decision criteria and may be computed numerically using MCMC samples of θ simulated from . Computational details are given in Supplementary Section 1.2.
4 DECISION MAKING FOR THE BREAST CANCER DATA
In this section, we illustrate the proposed decision-making procedures by application to the breast cancer dataset. We fit the statistical model in Section 2 to the data and used the elicited utility in Section 3. To fit the Bayesian model, we specified values of the fixed hyperparameters as follows. We let the DP concentration parameter be ξ = 1, for the prior of , and for the priors of for the priors of and and for the prior of . We fixed , p ≥ 1 and used the empirical average of the observed to specify . Similarly, we let , p = 1, …, P and used the empirical probabilities of being 0 to determine . We fixed . For Ω, we let , and . We discarded the first 20,000 iterates for burn-in, and kept the next 5000 iterates for posterior inference. We examined mixing and convergence of the Markov chains using trace plots, and did not find evidence of poor mixing or bad convergence.
Estimates of the posterior predictive distributions of t, q and are shown in Figure 3 for a future patient with . The top, middle and bottom rows of the figure correspond to 55, 65 and 75 years, with treatment L (τ = 0) represented by red and L + B (τ = 1) by blue. PP estimates of the survival functions with 95% pointwise credible intervals are given in the left column, and the middle column gives estimates of the PP distributions, of TTB. Estimated for L + B are slightly above those for L at all ages, indicating a small overall improvement in PFS with L + B. In contrast, estimated for L + B have much longer and thicker right tails, indicating an increased risk of toxicity events with L + B compared to L. From the figures, effects of Age on PFS and TTB are small. Additional comparisons of the PP distributions of PFS (t) and TTB (q) for treatments L and L + B for more values of are given in panels (b) and (c) of Supplementary Figures 1 and 2. These give PP point estimates of t and q as functions of . Panel (a) of the figures shows estimated PP probabilities that L + B yields greater PFS than L, and that L + B yields greater TTB than L. The figures also indicate that L + B tends to yield better PFS than L, while L + B is more likely to be associated with higher TTB. Swanson and Lin (1994) also noted that older patients are more likely to respond to hormone therapies such as L + B or L. Our inferences for PFS and TTB considered individually agree with the findings reported by Dickler et al. (2016).
![[Breast Cancer Trial Data] Estimated posterior predictive (PP) survival functions S(t|𝒟) where 𝒟 denotes data, are given in panels (a), (d) and (g); estimates of PP cumulative distributions of total toxicity burden (TTB, Q) in panels (b), (e) and (h); and PP distribution estimates of total utility Utot in (c), (f) and (i). The top, middle and bottom rows are correspond to ages ( x1new) 55, 65 and 75 years. Covariates (x2new,x3new)=(0,0), which indicate the absence of measurable disease at baseline and the patient's disease free interval prior to trial entry ≤ 24 months, are fixed. In each panel, red and blue represent treatments L (τ = 0) and L + B (τ = 1), respectively. [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/5/10.1111_rssc.12582/1/m_rssc_71_5_1605_fig-3.jpeg?Expires=1748064762&Signature=NiFF2bUMXsnBuq6WxXtUB85QkYGoF55OapW5Fy8jp-Go23LpowI6pkIh2mUuWaREzoXkAzKVfotEzQY0Lzu2x6-rl86tBvUu5~hfdhSFLrGiGszPiL7z2~3aQ2nBNQDdODxB6LXI1Iwk~7EEIm1oymr9SjpqwGKDivFjlzVhKgi2GsRokN8Do4EXm9ldqJMpjXn8KfmSOza~5GEz~ON-u8RL2qxaFdy~i3USZ6SNzA7q2cHwUyUCEcR1Kl4--Cu2umJ74jYmNSE30YsFG9yAWNrCKKcL7J~Tk4LezAuUTLsRGopcAFMhMGPOClwQW6q3BZJkMDbk8t9xi31sgj-QvQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
[Breast Cancer Trial Data] Estimated posterior predictive (PP) survival functions where denotes data, are given in panels (a), (d) and (g); estimates of PP cumulative distributions of total toxicity burden (TTB, Q) in panels (b), (e) and (h); and PP distribution estimates of total utility in (c), (f) and (i). The top, middle and bottom rows are correspond to ages ( ) 55, 65 and 75 years. Covariates , which indicate the absence of measurable disease at baseline and the patient's disease free interval prior to trial entry ≤ 24 months, are fixed. In each panel, red and blue represent treatments L (τ = 0) and L + B (τ = 1), respectively. [Colour figure can be viewed at https://dbpia.nl.go.kr]
As noted earlier, because treatment comparisons based on TTB and PFS considered separately lead in opposite directions, these results do not provide a clear basis for choosing one treatment over the other, either overall or for individual patients. Considering the outcomes together by using the utility function of treatment and Age provides a useful tool for resolving this problem. The plots in the rightmost column of Figure 3 compare PP distributions of for the two treatments, as functions of Age, indicating that the utility benefit of L + B over L diminishes with increasing Age. In the plot, we assume that and Posterior predictive estimates of the probabilities that treatment L + B has greater utility than treatment L are 0.56, 0.54 and 0.46 for 55-, 65- and 75-year-old patients respectively. Thus, in terms of overall utility accounting for both PFS and TTB, decisions based on would be to give L + B to patients with but give L to patients with 70.
Figure 4(a) illustrates on a grid of values for the combinations (0, 0), (0, 1), (1, 0) and (1, 1). Overall, while L + B tends to yield a greater utility for younger patients, L is expected to have a greater utility for older patients. Although PFS is improved by L + B for patients of all ages, increases in TTB with both L + B and age make it less desirable for older patients. For example, for a 70-year-old patient, with (dashed line), implying that L is a better treatment option than L + B for this patient. The differences in Δ for varying values are small, and the values of and do not change any decisions significantly. Posterior expected utility estimates are computed on the age grid for different treatment options, shown in panels (b) and (c) of the figure. The figure in panel (c) shows that the expected utility decreases rapidly with age for L + B, while it increases slightly for L. Thus, Age-specific recommendations in terms of and are the same for these values of .
![[Breast Cancer Trial Data] (a) Estimated posterior predictive (PP) probability Δ^(xnew,L,L+B) that treatment letrozole plus bevacizumab (L + B) has greater utility than treatment letrozole plus placebo (L) for different cases of xnew=(x1new,x2new,x3new), where x1new = age, x2new = the presence of measurable disease at baseline, and x3new = disease free interval prior to trial entry > 24 months for an unobserved patient. Panels (b) and (c) give posterior predictive mean utility estimates u‾^tot(τ,xnew) for treatments τ = L + B and L, respectively, for different values of xnew. [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/5/10.1111_rssc.12582/1/m_rssc_71_5_1605_fig-4.jpeg?Expires=1748064762&Signature=aWgsxyTxi8VckSqo~cTzovBuB~Th3JFhr3yAdAs7gFNjGDfhOnhYI5YrZqK8J-ldpNVsdUMC2xvDYjJlFcqBs9v3bMmchQDV6RLNSai3D~D6bPzI~90ezbEETB~Wi-txYxTsMAR17uBijr568UAhnJrb-zc6NiCaUpXsKx~BVLIMKg0Gpqq1TFJDIhmSVULtWC4Zla8UGuKxNq8JiPzn5~wXGEN2G0l~4oVcSFIs37ahtV0iYucrV7OnqPFW4f9qjssGcrz6mIMLjMBVrZp67VCkfkGs81m42ja7IXjkC9SuszrrQgDt4zv25dFfWpL7A9~qMAAmHMKFumtt97u7sQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
[Breast Cancer Trial Data] (a) Estimated posterior predictive (PP) probability that treatment letrozole plus bevacizumab (L + B) has greater utility than treatment letrozole plus placebo (L) for different cases of , where = age, = the presence of measurable disease at baseline, and = disease free interval prior to trial entry > 24 months for an unobserved patient. Panels (b) and (c) give posterior predictive mean utility estimates for treatments τ = L + B and L, respectively, for different values of . [Colour figure can be viewed at https://dbpia.nl.go.kr]
5 SIMULATION STUDY
In this section, we summarize a simulation study to illustrate the performance of the proposed utility-based decision-making procedure. To generate data similar to the breast cancer dataset, we set the number of patients to be n = 350, with three covariates and K = 20 toxicity types each having J = 6 grades. To mimic the covariate distribution in the breast cancer dataset, we randomly drew a sample of size 350 from with replacement, where τ, and are a binary treatment indicator, age, and a binary indicator of disease measurability, respectively, so our P = 2.
We simulated patient-specific frailty vectors for correlation between within a patient. To illustrate how the BNP model flexibly accommodates complicated relationships between , and , we generated from a mixture of two regression functions, each having main effects for τ and , and their interaction effects. Specifically, we generated , where assumes that variances of are 0.05 and correlations between and are 0.5 if and , and −0.5 if k or . Given , we then generated from the following distribution with probability 0.4,
where , , , and were simulated for all k. With the remaining probability 0.6, we generated
where , , and were simulated for all k. We simulated censoring times , and let and . In the simulated data, 22.86% of 's was censored. For ordinal outcomes , we simulated toxicity type specific cutoff points , j = 2, …, J − 1 for each k, and let if . Details of the simulation setup are given in Supplementary Section 5. The utility function elicited in Section 3 is assumed.
Supplementary Figure 3(a), (b) and (e) illustrate the true total utility , TTB distribution and PFS survival function (solid lines) in the simulation. The covariate vector x = (65, 0) of a 65-year-old patient with no measurable disease is used for illustration. In each plot, red represents τ = 0 and blue represents τ = 1. From panels (b) and (e), treatment τ = 1 has a greater TTB and greater expected PFS time than treatment τ = 0, which complicates treatment choice. This is resolved by panel (a), which shows that with τ = 1 (blue) is stochastically greater than with τ = 0 (red), implying that τ = 1 is better for a patient with x = (65, 0). Supplementary Figure 4 gives
with the true expected utilities shown in dark green symbols for varying Age. For both treatments, the expected utility decreases with Age, shown in panels (a) and (b). The difference in between the treatments and decrease with Age, indicating that the superiority of treatment τ = 1 compared to treatment diminishes with Age.
We specified values of the fixed hyperparameters similar to those in Section 4, and ran the MCMC simulation as described in Section 2.2. Posterior inferences are summarized in Supplementary Figure 3. The posterior predictive distributions, and with , are shown in Supplementary Figure 3c–e, respectively, where τ = 0 and 1 are in red and blue respectively. In panel (e), the dashed lines represent the posterior mean estimates with 95% pointwise credible intervals in the shaded areas. Comparing the estimates in panels (d) and (e) to the truth under in panels (b), and (e) shows that the flexible BNP regression model captures the simulation truth reasonably well, which provides a good basis for accurate statistical decision making. For example, the posterior predictive distribution of in panel (c) that provides a comprehensive criterion for treatment comparison, and is close to the truth in panel (a).
Supplementary Figure 4 illustrates posterior estimates of the probabilities that τ = 1 has greater utility than τ = 0, and compares posterior estimates of expected utilities for each of τ = 0 and 1, the varying . For = Age, a grid from 40 to 80 years in 5-year increments was used, and values of are indicated by the symbols, + and ×, respectively, with true values given in dark green. The model recovers the simulation truth reasonably well, and the decision-making procedure based on Δ(x, 0, 1) in Section 3, selects the truly superior treatment for all cases of . From panels (b) and (c), we observe discrepancies between and for small (Age). However, the ranks of between the treatments are well estimated and the procedure of choosing τ to maximize reliably selects the treatments with truly greater utility.
We simulated 100 datasets to further examine the performance of the proposed decision-making procedures. The results are summarized in Supplementary Figure 5. Panels (a) and (d) of the figure show the distributions of over the 100 datasets, for and 1, respectively, where the symbols + and × represent with and 1 respectively. In most cases, the proposed decision-making procedure based on Δ(x, 0, 1) produces the correct decisions. Panels (b), (c), (e) and (f) illustrate the distributions of predictive mean utility estimates, , using the 100 datasets for , (0, 1) and (1, 1) respectively. The proposed model produces reasonably good estimates of the expected utilities, and the ordering of are also well estimated overall.
6 DISCUSSION
We have presented a formal decision-making framework based on utility functions to address the goal of statistical decision making based on data from a randomized clinical trial. The key elements of our methodology are a multivariate Bayesian regression model and a utility function of outcomes and covariates. We assumed a DDP, which is a general, flexible family of Bayesian regression models.
The methodology was illustrated with a breast cancer dataset from a randomized clinical trial. This required close collaboration with oncologists to elicit a utility function that reflected this clinical setting, and the resulting utility function varied with Age to reflect different risk-benefit trade-offs between PFS and TTB for older versus younger patients. Our application illustrates that, by establishing a utility function that quantifies the risk-benefit trade-off between two competing outcomes, one can derive a rational basis for making treatment choices in settings where simpler comparisons in terms of individual outcomes give different, contradictory choices.
The particular form of our utility function was tailored to the breast cancer setting, and may not be appropriate in other clinical settings. In general, the requirements are that U(y,x) must be consistent in the arguments of y so that it makes sense, and that it be tractable enough to facilitate application. Beyond that, a utility should be constructed so that it provides a sensible basis for quantifying trade-offs between different outcomes. It also should be kept in mind that, if U is constructed to be a function of y but not x, both the posterior mean utility and still will vary with x due to the regression structure of f( y | τ, x, θ), and thus both criteria still will serve as a basis for making personalized treatment decisions.
Our framework may be generalized to accommodate more complex clinical settings, such as meta-analysis of multiple studies, optimization of a multi-stage treatment strategies or non-medical applications. Although the key elements will remain the same, such applications may be complex and will require tailoring to particular settings.
ACKNOWLEDGEMENTS
Juhee Lee's research was supported by NSF grant DMS-1662427. Pavlos Msaouel is supported by a Career Development Award by the American Society of Clinical Oncology, a Research Award by KCCure, the MD Anderson Khalifa Scholar Award and the MD Anderson Physician-Scientist Award. Peter Thall's research was supported by NIH/NCI grants P01 2P30CA016672 and R01 CA261978.
This manuscript was prepared using data from Datasets NCT00601900-D1 and NCT00601900-D2 from the NCTN Data Archive of the National Cancer Institute's (NCI's) National Clinical Trials Network (NCTN). Data were originally collected from clinical trial NCT number NCT00601900, Endocrine Therapy With or Without Anti-VEGF Therapy: A Randomized, Phase III Trial of Endocrine Therapy Alone or Endocrine Therapy Plus Bevacizumab (NSC 704865) for Women With Hormone Receptor-Positive Advanced Breast Cancer. All analyses and conclusions in this manuscript are the sole responsibility of the authors and do not necessarily reflect the opinions or views of the clinical trial investigators, the NCTN, or the NCI. The CALGB-40503 dataset was made available through an NCI-Genentech agreement under which Genentech supplied the drug to support the study.