Abstract

A Bayesian method is proposed for personalized treatment selection in settings where data are available from a randomized clinical trial with two or more outcomes. The motivating application is a randomized trial that compared letrozole plus bevacizumab to letrozole alone as first-line therapy for hormone receptor-positive advanced breast cancer. The combination treatment arm had larger median progression-free survival time, but also a higher rate of severe toxicities. This suggests that the risk-benefit trade-off between these two outcomes should play a central role in selecting each patient's treatment, particularly since older patients are less likely to tolerate severe toxicities. To quantify the desirability of each possible outcome combination for an individual patient, we elicited from breast cancer oncologists a utility function that varied with age. The utility was used as an explicit criterion for quantifying risk-benefit trade-offs when making personalized treatment selections. A Bayesian nonparametric multivariate regression model with a dependent Dirichlet process prior was fit to the trial data. Under the fitted model, a new patient's treatment can be selected based on the posterior predictive utility distribution. For the breast cancer trial dataset, the optimal treatment depends on the patient's age, with the combination preferable for patients 70 years or younger and the single agent preferable for patients older than 70.

1 INTRODUCTION

Nearly all published clinical trial results focus on statistical inferences about effects of treatments and patient prognostic variables on clinical outcomes. This may fall short of what is needed by practicing physicians to make informed treatment decisions for individual patients. In many settings, estimated effects on efficacy and toxicity lead to conflicting treatment choices, and the relative desirability of two treatments also may vary with patient prognostic variables. Our motivating dataset, which illustrates this class of problems, arose from a phase III study of targeted agents for treating hormone receptor-positive advanced breast cancer, reported by Dickler et al. (2016). Patients were randomized between letrozole plus bevacizumab (L + B) and letrozole plus placebo (L). The primary efficacy endpoint was progression-free survival (PFS) time, defined as the time from the treatment to disease progression or death from any cause. Due to safety concerns, 21 different types of toxicity were monitored, including the type and grade (0 = none to 5 = fatal) of each occurrence. A statistically significant PFS improvement was seen with L + B compared to L (one-sided p-value = 0.016), with estimated median PFS 20.2 months (95% confidence interval, CI, 17.0–24.1) with L + B compared to 15.6 months (95% CI 12.9 – 19.7) with L. Consideration of toxicities led to the opposite conclusion, with 46.8% of patients treated with L + B experiencing severe (grade ≥ 3) toxicities compared to 14.2% with L.

Considering each outcome alone, selecting an optimal treatment is straightforward, since longer PFS and less toxicity each is more desirable. This leads to the problematic conclusions that L + B is preferable in terms of PFS but L is preferable in terms of toxicity. Thus, when considering these two outcomes together, as must be done in practice by a physician when choosing between the treatments for an individual patient, decision making is not straightforward. In our analyses, rather than dichotomizing toxicity severity, we will use total toxicity burden (TTB) (Bekele & Thall, 2004; Le-Rademacher et al., 2020) to summarize each patient's adverse events. In general, for K toxicities z=(z1,,zK), where each zk is a grade in {0, …, J − 1}, we define the scaled TTB to be q=k=1Kzk/{K×(J-1)}, which takes on values between 0 and 1. Figure 1 illustrates the TTB distributions and Kaplan–Meier estimates of the PFS survival function for each treatment arm in the breast cancer dataset, with L + B represented by blue and L by red. The questions that we will address in this paper are how one may use the available data to choose between L + B and L for a new breast cancer patient, and how this may be done in other, similar settings.

For the breast cancer data, histograms of scaled total toxicity burden (TTB) are given in panels (a) and (b). Panel (c) illustrates Kaplan–Meier estimates of survival functions, S(t). Blue and red represent treatments, letrozole plus bevacizumab (L + B) and letrozole plus placebo (L) respectively. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 1

For the breast cancer data, histograms of scaled total toxicity burden (TTB) are given in panels (a) and (b). Panel (c) illustrates Kaplan–Meier estimates of survival functions, S(t). Blue and red represent treatments, letrozole plus bevacizumab (L + B) and letrozole plus placebo (L) respectively. [Colour figure can be viewed at https://dbpia.nl.go.kr]

We assume that each patient's data can be summarized as a vector, y, of clinical outcomes, a vector, x, of prognostic covariates, and a treatment indicator variable, τ. A Bayesian regression model, f( y | τ, x, θ) is assumed and fit to the data, where θ denotes the model's parameter vector. For the breast cancer trial data, y = (y1,y2) with y1 = PFS time and y2 = TTB. Using the breast cancer trial data for illustration, we extend the usual statistical process of data analysis by connecting it with medical decision making by practicing physicians. To do this, we first construct a family of utility functions, with each utility assigning numerical desirability scores U(y, x) to all y = (PFS time, TTB) pairs for a patient with prognostic variables x. In a given setting, a physician and patient may choose a particular utility function from the family that best represents the patient's subjective trade-offs. Because the trade-off between PFS and TTB may vary with x, the desirability of a particular pair of treatment options may not be the same for all patients, and this may lead to different treatment preferences for two patients having different x.

Decision analysis based on utility functions certainly is not new. This has been studied and applied in many areas, including business (e.g. Loewenstein et al., 1989; Pennings & Smidts, 2003), engineering (e.g. Bagočius et al., 2014; Chen et al., 1998) and operations research (e.g. Roy et al., 2017; Walsh et al., 2004). Two papers of a five-part primer on medical decision analysis are given by Detsky et al. (1997) and Naglie et al. (1997). However, formal utility-based decision procedures are seldom included in statistical data analysis reports. Our application of the methodology to the breast cancer dataset illustrates how a utility function and Bayesian statistical model can be used to choose between two treatments for a patient with given prognostic variables. To establish the idea that outcome utilities can be used as practical tools for decision making, and illustrate the range of potential applications, Supplementary Section 2 provides examples of utility functions for different types of outcome vectors. These include a one-dimensional ordinal outcome, binary (response, toxicity) indicators, and the two ordinal categorical outcomes (disease status, toxicity severity).

For the breast cancer data analysis, we develop a robust Bayesian regression model, f( y | τ, x, θ), that assumes latent patient frailties to account for association among the elements of y and z, and describes how each outcome varies as a function of treatment, τ, and baseline covariates, x. We formulate a joint Bayesian nonparametric (BNP) multivariate regression model that includes a vector of continuous latent variables defined to represent ordinal toxicity outcomes, using the dependent Dirichlet process (DDP) (MacEachern, 1999). We use a linear DDP developed by De Iorio et al. (2004, 2009). DDP models are highly flexible and provide a robust basis for inferences about regression relationships. BNP models have been applied to a broad range of statistical problems, including density estimation, regression, clustering and survival analysis. See, for example, Müller et al. (2015) for general applications, Mitra and Müller (2015) for applications in biostatistics, or Müller and Mitra (2013); Thall et al. (2017) for overviews and illustrations. Given a Bayesian model f( y | τ, x, θ) and utility function U( y, x), we use posterior predictive utility distributions as a basis for deciding between treatments for a new patient with prognostic variables x. We choose the treatment that yields the greatest posterior mean utility.

The remainder of the paper is organized as follows. Section 2.1 formulates a bivariate regression model for clinical outcomes PFS time and TTB, and provides a predictive distribution of (PFS, TTB) for a future patient as a function of the patient's covariates and each potential treatment that may be given to the patient. Section 2.2 provides computational details for implementation. In Section 3, we describe a utility function that varies with PFS, TTB and covariates in order to represent a personalized risk-benefit trade-off between PFS and TTB. The utility function is constructed using separate contributions from PFS time and TTB, with each contribution constrained so that it is logically consistent and reflects elicited expert opinion. The predictive distribution allows one to estimate the utility of each treatment for a future patient with given covariates, and provides a way to compute the probability that each treatment is preferred for the future patient. In Section 4, we illustrate the methodology by applying it to make personalized treatment selections based on the breast cancer trial data. In Section 5, a simulation study is presented to illustrate general properties of the proposed decision-making approach. We close with a brief discussion in Section 6.

2 A BNR MODEL

2.1 Sampling distribution and prior specification

Let tiR+ denote PFS time during the follow-up period (0,ci], for patient i = 1, …, n. The observed time of failure (progression or death) or independent administrative censoring at ci is tio=min(ti,ci), with δi=1 if PFS time was observed, tici, and δi=0 if censored, ti>ci. Denote the ordinal variable zi,k{0,1,,J-1} for the maximum grade that patient i experienced of toxicity type k = 1, …, K. Censoring is assumed to be independent of ti, toxicity occurrences, and covariates. Denote the ith patient's vector of baseline covariates by xi=(xi,1,,xi,P), and the observed data by 𝒟={(tio,δi,zi,τi,xi),i=1,,n}, where zi=(zi,1,,zi,K). In the breast cancer dataset, there are n = 340 patients after removing three patients having to=0, K = 21 toxicity categories and J = 6 severity grades, where grade 0 = no occurrence of that toxicity type and 5 = death. If a subject died due to a toxicity type k occurrence, the corresponding zi,k=5 was recorded with observed survival time tio=ti and δi=1. We include P = 3 prognostic covariates, x1 = age, an indicator x2 of measurable disease at baseline, and an indicator x3 of whether the patient's disease free interval prior to trial entry was greater than 24 months, in addition to the indicator τ of treatment L + B.

To construct a model that accounts for heterogeneity between patients not explained by the covariates, we introduce real-valued (K + 1)−dimensional multivariate normal latent frailty vectors, si=(si,0,si,1,,si,K) for i = 1, …, n. We assume si|ΩiidNK+1(0,Ω) and ΩInv-Wishart(aΩ,Ω0). Following Chib and Greenberg (1998), we construct a multinomial probit model for the ordinal toxicity outcomes zi by introducing the unobserved real-valued latent variables z˜i=(z˜i,1,,z˜i,K), where z˜i,kR, and define zi,k=j if and only if uk,j<z˜i,kuk,j+1, where uk,0<uk,1<<uk,J denote toxicity type-specific cutoffs for each k. This is a common modelling strategy that uses real-valued latent variables to induce a tractable multivariate distribution on a vector of observed ordinal categorical variables, and greatly facilitates computation. For logarithm transformed PFS time, t˜=log(t), latent variables z˜i, treatment τi, and covariates xi, we assume

(1)

We take a BNP approach for modelling h in (1) that allows flexible regression structures by assuming the DDP (MacEachern, 1999), which is a family of random probability distributions indexed by (τ, x). Specifically, we use a linear-DDP that induces covariate dependence through linear regression structures (De Iorio et al., 2004, 2009). Denoting x˜i=(1,τi,xi), β=(β0,βτ,β1,,βP) and αk=(αk,0,αk,τ,αk,1,,αk,P), we assume the simple parametric linear combinations η0(x˜i)=βx˜i and ηk(x˜i)=αkx˜i, for each k = 1, …, K. We denote η(x˜i)=(η0(x˜i),,ηK(x˜i)) and construct a model for h through a convolution with a normal kernel

(2)

where ϕd(·|a,B) is the density function of the d−variate normal distribution with mean vector a and d × d covariance matrix B > 0. We use the Dirichlet process (DP) as a prior for the random mixing distribution G in (2). This gives a DP mixture of multivariate normal linear models,

(3)

where ηm(x˜i)=(ηm,0(x˜i),,ηm,K(x˜i)) with ηm,0(x˜i)=βmx˜i and ηm,k(x˜i)=αm,kx˜i. The weights {wm} are constructed via Sethuraman's (1994) so-called ‘stick-breaking’ process by assuming wm/m=1m-1(1-wm)iidBe(1,ξ), with fixed ξ > 0. For the covariate and treatment effect parameters in (3), we assume

(4)

with β and κ2 fixed, where MVNP+2 represents a (P + 2)-dimensional multivariate normal distribution. In (4), β=(β0,βτ,β1,,βP) and αk=(αk,0,αk,τ,αk,1,,αk,P) are (P + 2)-dimensional mean vectors and V=diag[vp2] is a (P + 2) × (P + 2) matrix. We assume αk,pindepN(α=p,vα2) with fixed α=p and vα2, and vp2iidIG(av,bv) with fixed av and bv for p = 0, …, P. The hierarchical structure for αm,k enables the model to borrow information across the toxicity categories. The model in (3) incorporates τ and x linearly in the mean of each normal summand. Due to the fact that the distribution of (t˜,z˜) is a weighted average of multivariate normal distributions, each with its own linear term, the model accounts for possible effects of τ and x on (t˜,z˜) that can be nonlinear and quite complex, including interactions between two or more variables in (τ, x). This construction thus provides a flexible modelling framework for inference and prediction, avoiding restrictive assumptions of linearity or additivity in the covariate effects. This facilitates accurate decision making by avoiding restrictions imposed by conventional parametric models, such as the proportional hazards model. We let =diag(σt2,σz2,,σz2), which implies conditional independence between t˜i and z˜i,k given si. Due to the conditional independence, the marginal distributions from (3) can be expressed as the weighted averages

(5)

That is, f and each gk also is a linear DDP mixture distribution. The marginal distribution of each zi,k is obtained by integrating over the latent variables,

(6)

Marginalizing by averaging over s in (3), the resulting DP mixture model, h(t˜,z˜|τ,x), has covariance matrix ∑ + Ω. Thus, Ω induces dependence between PFS time and the toxicities within each patient, in addition to explaining additional variability between patients not explained by τ and x.

To ensure identifiability in the multivariate ordinal regression model, we fix σz2 and set uk,1=0 for all k. We also set uk,0=-, and uk,J=, and P(zi,k<0)=0 and P(zi,kJ-1)=1. We let the cut-offs uk,j, j = 2, …, J−1 be random for flexibility, by defining uk,j=uk,j-1+ek,j-1k = 1, …, K and with error terms ek,jiidGa(ae,be), for j = 2, …, J − 1. Lastly, we assume σt2IG(at,bt).

2.2 Posterior inference

Collecting terms, θ=(wm,βm,σt2,αm,k,e,αk,p,vp2,Ω) is the vector of all model parameters, and θ˜=(M,β,at,bt,κ2,α=p,vα2,av,bv,aΩ,Ω0) is the vector of all fixed hyper-parameters. Given θ˜ and data 𝒟, the joint posterior of θ and the patient-specific random effects s={si,i=1,,n} is

(7)

where the joint likelihood of the observed data for the ith patient is the product

We use Markov chain Monte Carlo (MCMC) simulation to generate posterior samples of the parameter and latent variable vectors, (θ, s). For computational convenience, we approximate the DDP in (5) by truncating the infinite number of mixture components of F and Gk to the finite value M. The final weight is set to wM=1-m=1M-1wm to ensure that F and Gk are proper distributions. For sufficiently large M, the truncated sum produces inferences virtually identical to those with the infinite sum (Ishwaran & James, 2001; Rodriguez & Dunson, 2011). As discussed in Rodriguez and Dunson (2011), if there is a discrepancy between the posterior distributions under the truncated and infinite sums, then the model is sensitive to the choice of M. Any value of M that has a small value for wM is sufficiently large to produce a negligible discrepancy. We examined the posterior distribution of wM, and assessed sensitivity of the model to several different M values for the breast cancer dataset. We found that the truncated process is robust to the choice of M, if M is sufficiently large. This led us to use M = 15 for the data analysis and simulation studies. Computational details are given in Supplementary Section 1.1. A computer program ‘utility-analysis’ for fitting the proposed model is available from https://users.soe.ucsc.edu/juheelee/.

3 UTILITY FUNCTIONS FOR PFS AND TTB

In this section, we describe how a utility function was constructed for the breast cancer data analysis. While we focus on the case where y consists of PFS time and TTB, the methodology may be applied generally in settings where y is a single variable, a bivariate binary or ordinal variable, or some combination of two or more discrete and continuous outcomes. Examples are given in Supplementary Section 2.

Given the reduction of the K-dimensional toxicity vector z to the scaled TTB q, 0 ≤ q ≤ 1, we will construct a utility function for the pair (t, q). A departure of our utility formulation from previous published outcome utilities is that we construct U so that it varies with covariates x as well as the outcomes (t, q). We let τ = 1 for L + B and τ = 0 for L, so choosing τ for a future patient with a covariates xnew is the target of our decision analysis. To ensure a consistent utility function that quantifies trade-offs between t and q for each x, we require

That is, considered individually with the other outcome variable fixed, smaller TTB and longer PFS each must be more desirable.

The form of the utility function given here, and the numerical values that it takes on, were obtained based on the consensus of two medical oncologists who are co-authors of this paper, PM and BL, one of whom is a breast cancer subspecialist. The first step of our construction was to specify a parametric total utility function, which we defined generally as the product

(8)

subject to the constraints 0UPFS(t,x)Umax and 0<UTTB(q,x)1. For a given x, the utility component UTTB(q,x) acts multiplicatively to decrease the utility component UPFS(t,x), and one may regard multiplying by UTTB(q,x) as penalizing the PFS utility, where the magnitude of the penalty is determined by q.

To apply this to the breast cancer dataset, we constructed a functional form for (8) to reflect this particular treatment setting. We denote the prognostic covariates by x1=Age, an indicator x2 of measurable disease, and an indicator x3 of whether the patient's disease free interval prior to trial entry was > 24 months, with all three included in the regression models for outcomes t and z. While the randomization for the breast cancer trial was stratified by x2 and x3 to improve precision, based on clinical experience PM and BL decided that neither x2 nor x3 should have any effect on the utility function, whereas x1=Age is included in UTTB. This is because, in clinical practice, the utility gained by greater PFS is similar regardless of age group, while older patients tend to care more about maintaining a good quality of life, quantified by a lower TTB. Thus, older patients are less likely than younger patients to accept a higher level of toxicity for the same PFS benefit. Accordingly, we only used the prognostic covariate x1=Age in the utility function, and constructed Utot(t,q,x)=Utot(t,q,Age) so that, for any PFS time t, a given value q > 0 for TTB decreases the utility for an older patient more than for a young patient. We also assumed that UPFS(t,Age)=UPFS(t). We set Umax=100 since the domain (0, 100) is easy to interpret, and set UTTB(0,Age)=1 for any Age. Thus, if a patient has no toxicity (q = 0) then Utot(t,0,Age)=UPFS(t). For example, if q=0.50,UPFS(36)=80, and UTTB(0.50,60)=0.50, then the utility of 60-month PFS time and TTB = 0.5 is Utot(36,0.50,60)=80×0.5=40 for a 60-year-old patient, whereas UTTB(0.50,40)=0.70 for a 40-year-old patient gives the much higher total utility Utot(36,0.50,40)=80×0.7=56.

For the PFS component, the clinicians PM and BL specified the particular values UPFS(24)=50, UPFS(48) = 95, and required limtUPFS(t)=100 for patients with any Age. This was based on the clinical experience that, in the hormone receptor-positive metastatic breast cancer setting, the increase in utility with increasing PFS is linear for up to 4 years, after which there is generally not much increase in utility with increasing PFS. This is due, in part, to the availability of newer regimens that can be given as salvage therapy to patients whose disease progresses after 4 years. To reflect this, we constructed the following parametric function for UPFS(t):

(9)

and we set t0 = 48 months, U0 = 95, and Umax = 100. This function increases in t up to 48 months, with a small additional increase for t > 48 months. Since UPFS(24)=50, the equation 95×(24/48)a=50 gives a = 0.926. Similarly, since UPFS(48)=100/{1+exp(-b148)}=95, this gives b1=0.061. The resulting function UPFS(t) is plotted in Figure 2(a).

Illustration of utility functions. Panel (a) has UPFS(t), the utility function of progression-free survival (PFS) time with t = PFS time. Panels (b)–(d) have the total utility functions Utot(t,q,Age), where t = PFS time and q = total toxicity burden (TTB). [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 2

Illustration of utility functions. Panel (a) has UPFS(t), the utility function of progression-free survival (PFS) time with t = PFS time. Panels (b)–(d) have the total utility functions Utot(t,q,Age), where t = PFS time and q = total toxicity burden (TTB). [Colour figure can be viewed at https://dbpia.nl.go.kr]

We defined UTTB(q,Age) to vary with Age so that it decreases at a faster rate for older Age. PM and BL established numerical values of UTTB in [0, 1] for each pair of (q, Age) values specified on a grid. These are tabulated in Supplementary Table 2. We constructed a parametric function that closely approximates these elicited TTB utilities by exploring various functional forms, and chose

(10)

where g(Age)=exp(c0+c1Age). As q increases, the total utility given in (8) decreases by a factor of the above exponential function of q2, and UPFS is penalized in Utot using this exponential value. If no toxicity occurs, that is, q = 0, then UTTB(0,Age)=1 and Utot(t,q,Age)=UPFS(t), as required. The function UTTB(q,Age) has an inflection point at q = g(Age), and for any q > g(Age), Utot decreases to 0 very quickly. We defined g(Age) to be a decreasing function of Age so that UTTB(q,Age) decreases in q faster for larger values of Age. To obtain this, we restricted c1<0 and calibrated the values of c0 and c1 using the elicited numerical utilities. The numerical values c0=0.823 and c1=-0.05 yield a good approximation. Details are given in Supplementary Section 2. Figure 2b,c compares Utot(t,q,Age) (solid line) to the elicited values (dots connected by dotted lines) for Age = 50, 65 and 85. In each plot, different colours represent different values of q. The figure illustrates how Utot(t,q,Age) decreases with q, and how the magnitude of decrease changes with Age. The specific utility function described here may be questioned due to its subjectivity. However, ordering the consequences of decisions is inherently subjective and is necessary for decision making, and using an explicit utility function that is constructed based on expert knowledge produces meaningful decisions.

To use this structure for individualized treatment selection, we exploit the Bayesian model to compute the posterior predictive (PP) distribution of (t˜,z) for a new patient with prognostic covariates xnew=(x1new,x2new,x3new) assuming a particular τ is given to the patient, similar to the examples in Supplementary Section 2,

(11)

By averaging the likelihood of the new patient's future outcomes (t˜,z) over the joint posterior distribution of (θ,snew), the PP distribution in (11) provides a fully model-based criterion for making inferences to compare treatments, with appropriate quantification of uncertainty. The PP distribution of the utility, p{Utot(t,q,Agenew)|τ,xnew,𝒟}, for a new patient with prognostic covariates xnew can be derived directly from (11).

We use the predictive mean total utility for each τ as a basis for treatment selection. For τ = 0 corresponding to L and τ = 1 for L + B, the predictive mean total utility is

(12)

One may choose the treatment τ having larger utot(τ,xnew) for the new patient. In general, utot(τ,xnew) is a function of the entire xnew vector and τ through the utility function and/or the probability distribution, and this still would be the case if Utot did not depend on Age.

In general, another criterion for comparing treatments τ and τ studied in a trial is to compare the PP distributions p{Utot(t,q,Agenew)|τ,xnew,𝒟} and p{Utot(t,q,Agenew)|τ,xnew,𝒟} of the total utilities. To do this, we define the posterior probability that treatment τ has a larger total utility than treatment τ for a new patient with prognostic variables xnew,

(13)

One may select τ to treat a new patient with xnew if Δ(xnew,τ,τ)0.5, and otherwise select τ. This can be done by first computing the joint PP distribution of (t˜(τ),z(τ),t˜(τ),z(τ)), that is, of two sets of outcomes with one for each treatment,

(14)

which can be used in turn to compute Δ(xnew,τ,τ). Although Utot depends on xnew only through Age, the joint distribution in (14) depends on the entire xnew and (τ,τ), and Δ does as well. The decision criteria utot(τ,xnew) and Δ(xnew,τ,τ) may be computed numerically using MCMC samples of θ simulated from p(θ|𝒟,θ˜). Computational details are given in Supplementary Section 1.2.

4 DECISION MAKING FOR THE BREAST CANCER DATA

In this section, we illustrate the proposed decision-making procedures by application to the breast cancer dataset. We fit the statistical model in Section 2 to the data and used the elicited utility in Section 3. To fit the Bayesian model, we specified values of the fixed hyperparameters θ˜ as follows. We let the DP concentration parameter be ξ = 1, κ2=5 for the prior of βm,p, av=5 and bv=1 for the priors of vp2,ae=be=3 for the priors of ek,j, and at=3 and bt=1 for the prior of σt2. We fixed βp=0, p ≥ 1 and used the empirical average of the observed t˜ to specify β0. Similarly, we let α=p=0, p = 1, …, P and used the empirical probabilities of zi,k being 0 to determine α=0. We fixed σz2=9. For Ω, we let aΩ=50, and Ω0=0.01(aΩ-K-1)IK+1. We discarded the first 20,000 iterates for burn-in, and kept the next 5000 iterates for posterior inference. We examined mixing and convergence of the Markov chains using trace plots, and did not find evidence of poor mixing or bad convergence.

Estimates of the posterior predictive distributions of t, q and Utot are shown in Figure 3 for a future patient with xnew=(Agenew,0,0). The top, middle and bottom rows of the figure correspond to Agenew= 55, 65 and 75 years, with treatment L (τ = 0) represented by red and L + B (τ = 1) by blue. PP estimates of the survival functions S(t|τ,xnew,𝒟) with 95% pointwise credible intervals are given in the left column, and the middle column gives estimates of the PP distributions, p(q|τ,xnew,𝒟), of TTB. Estimated S(t|τ,xnew,𝒟) for L + B are slightly above those for L at all ages, indicating a small overall improvement in PFS with L + B. In contrast, estimated p(q|τ,xnew,𝒟) for L + B have much longer and thicker right tails, indicating an increased risk of toxicity events with L + B compared to L. From the figures, effects of Age on PFS and TTB are small. Additional comparisons of the PP distributions of PFS (t) and TTB (q) for treatments L and L + B for more values of xnew are given in panels (b) and (c) of Supplementary Figures 1 and 2. These give PP point estimates of t and q as functions of xnew. Panel (a) of the figures shows estimated PP probabilities that L + B yields greater PFS than L, and that L + B yields greater TTB than L. The figures also indicate that L + B tends to yield better PFS than L, while L + B is more likely to be associated with higher TTB. Swanson and Lin (1994) also noted that older patients are more likely to respond to hormone therapies such as L + B or L. Our inferences for PFS and TTB considered individually agree with the findings reported by Dickler et al. (2016).

[Breast Cancer Trial Data] Estimated posterior predictive (PP) survival functions S(t|𝒟) where 𝒟 denotes data, are given in panels (a), (d) and (g); estimates of PP cumulative distributions of total toxicity burden (TTB, Q) in panels (b), (e) and (h); and PP distribution estimates of total utility Utot in (c), (f) and (i). The top, middle and bottom rows are correspond to ages ( x1new) 55, 65 and 75 years. Covariates (x2new,x3new)=(0,0), which indicate the absence of measurable disease at baseline and the patient's disease free interval prior to trial entry ≤ 24 months, are fixed. In each panel, red and blue represent treatments L (τ = 0) and L + B (τ = 1), respectively. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 3

[Breast Cancer Trial Data] Estimated posterior predictive (PP) survival functions S(t|𝒟) where 𝒟 denotes data, are given in panels (a), (d) and (g); estimates of PP cumulative distributions of total toxicity burden (TTB, Q) in panels (b), (e) and (h); and PP distribution estimates of total utility Utot in (c), (f) and (i). The top, middle and bottom rows are correspond to ages ( x1new) 55, 65 and 75 years. Covariates (x2new,x3new)=(0,0), which indicate the absence of measurable disease at baseline and the patient's disease free interval prior to trial entry ≤ 24 months, are fixed. In each panel, red and blue represent treatments L (τ = 0) and L + B (τ = 1), respectively. [Colour figure can be viewed at https://dbpia.nl.go.kr]

As noted earlier, because treatment comparisons based on TTB and PFS considered separately lead in opposite directions, these results do not provide a clear basis for choosing one treatment over the other, either overall or for individual patients. Considering the outcomes together by using the utility function of treatment and Age provides a useful tool for resolving this problem. The plots in the rightmost column of Figure 3 compare PP distributions of Utot for the two treatments, as functions of Age, indicating that the utility benefit of L + B over L diminishes with increasing Age. In the plot, we assume that x2new=0 and x3new=0. Posterior predictive estimates Δ^(xnew,L,L+B) of the probabilities that treatment L + B has greater utility than treatment L are 0.56, 0.54 and 0.46 for 55-, 65- and 75-year-old patients respectively. Thus, in terms of overall utility accounting for both PFS and TTB, decisions based on Δ^(xnew,L,L+B) would be to give L + B to patients with Agenew<70, but give L to patients with Agenew 70.

Figure 4(a) illustrates Δ^(xnew,L,L+B) on a grid of Agenew=x1new values for the combinations (x2new,x3new)= (0, 0), (0, 1), (1, 0) and (1, 1). Overall, while L + B tends to yield a greater utility for younger patients, L is expected to have a greater utility for older patients. Although PFS is improved by L + B for patients of all ages, increases in TTB with both L + B and age make it less desirable for older patients. For example, Δ^(xnew,L,L+B)<0.5 for a 70-year-old patient, with xnew=(70,0,1) (dashed line), implying that L is a better treatment option than L + B for this patient. The differences in Δ for varying (x2new,x3new) values are small, and the values of x2 and x3 do not change any decisions significantly. Posterior expected utility estimates u^tot(xnew,τ) are computed on the age grid for different treatment options, shown in panels (b) and (c) of the figure. The figure in panel (c) shows that the expected utility decreases rapidly with age for L + B, while it increases slightly for L. Thus, Age-specific recommendations in terms of u^tot(xnew,τ) and Δ^(xnew,L,L+B) are the same for these values of (x2new,x3new).

[Breast Cancer Trial Data] (a) Estimated posterior predictive (PP) probability Δ^(xnew,L,L+B) that treatment letrozole plus bevacizumab (L + B) has greater utility than treatment letrozole plus placebo (L) for different cases of xnew=(x1new,x2new,x3new), where x1new = age, x2new = the presence of measurable disease at baseline, and x3new = disease free interval prior to trial entry > 24 months for an unobserved patient. Panels (b) and (c) give posterior predictive mean utility estimates u‾^tot(τ,xnew) for treatments τ = L + B and L, respectively, for different values of xnew. [Colour figure can be viewed at https://dbpia.nl.go.kr]
FIGURE 4

[Breast Cancer Trial Data] (a) Estimated posterior predictive (PP) probability Δ^(xnew,L,L+B) that treatment letrozole plus bevacizumab (L + B) has greater utility than treatment letrozole plus placebo (L) for different cases of xnew=(x1new,x2new,x3new), where x1new = age, x2new = the presence of measurable disease at baseline, and x3new = disease free interval prior to trial entry > 24 months for an unobserved patient. Panels (b) and (c) give posterior predictive mean utility estimates u^tot(τ,xnew) for treatments τ = L + B and L, respectively, for different values of xnew. [Colour figure can be viewed at https://dbpia.nl.go.kr]

5 SIMULATION STUDY

In this section, we summarize a simulation study to illustrate the performance of the proposed utility-based decision-making procedure. To generate data similar to the breast cancer dataset, we set the number of patients to be n = 350, with three covariates and K = 20 toxicity types each having J = 6 grades. To mimic the covariate distribution in the breast cancer dataset, we randomly drew a sample of size 350 from (τi,xi,1,xi,2) with replacement, where τ, x1 and x2 are a binary treatment indicator, age, and a binary indicator of disease measurability, respectively, so our P = 2.

We simulated patient-specific frailty vectors siTR for correlation between yiTR=(t˜iTR,z˜iTR) within a patient. To illustrate how the BNP model flexibly accommodates complicated relationships between τi, xi and yi, we generated yiTR from a mixture of two regression functions, each having main effects for τ and xp, and their interaction effects. Specifically, we generated siTRiidNK+1(0,ΩTR), where ΩTR assumes that variances of si,kTR are 0.05 and correlations between si,kTR and si,kTR are 0.5 if k,k>0 and kk, and −0.5 if k or k=0. Given siTR, we then generated yiTR from the following distribution with probability 0.4,

where α1,k,0TRiidU(-4.5,-4.0), α1,k,1TRiidU(0.5,1.0), α1,k,2TRiidU(0.3,0.8), and α1,k,3TRiidU(0,0.5) were simulated for all k. With the remaining probability 0.6, we generated

where α2,k,0TRiidU(-4.5,-4.0), α2,k,1TRiidU(1.0,1.5), and α2,k,2TRiidU(0.5,1.0) were simulated for all k. We simulated censoring times ciiidN(4.63,2), and let tio=min(ti,ci) and δi=1(ci<ti). In the simulated data, 22.86% of ti's was censored. For ordinal outcomes zi,k, we simulated toxicity type specific cutoff points uk,jTR, j = 2, …, J − 1 for each k, and let zi,k=j if uk,j-1TR<z˜i,kTRuk,jTR. Details of the simulation setup are given in Supplementary Section 5. The utility function Utot(t,q,Age) elicited in Section 3 is assumed.

Supplementary Figure 3(a), (b) and (e) illustrate the true total utility p(Utot|τ,x,θTR), TTB distribution p(q|τ,x,θTR), and PFS survival function S(t|τ,x,θTR) (solid lines) in the simulation. The covariate vector x = (65, 0) of a 65-year-old patient with no measurable disease is used for illustration. In each plot, red represents τ = 0 and blue represents τ = 1. From panels (b) and (e), treatment τ = 1 has a greater TTB and greater expected PFS time than treatment τ = 0, which complicates treatment choice. This is resolved by panel (a), which shows that Utot with τ = 1 (blue) is stochastically greater than Utot with τ = 0 (red), implying that τ = 1 is better for a patient with x = (65, 0). Supplementary Figure 4 gives

with the true expected utilities UTR(τ,x)=E(Utot|τ,x,θTR) shown in dark green symbols for varying Age. For both treatments, the expected utility decreases with Age, shown in panels (a) and (b). The difference in UTR(τ,x) between the treatments and ΔTR(x,0,1) decrease with Age, indicating that the superiority of treatment τ = 1 compared to treatment τ=0 diminishes with Age.

We specified values of the fixed hyperparameters similar to those in Section 4, and ran the MCMC simulation as described in Section 2.2. Posterior inferences are summarized in Supplementary Figure 3. The posterior predictive distributions, p(q|τ,xnew,𝒟) and S(t|τ,xnew,𝒟) with xnew=(65,0), are shown in Supplementary Figure 3c–e, respectively, where τ = 0 and 1 are in red and blue respectively. In panel (e), the dashed lines represent the posterior mean estimates with 95% pointwise credible intervals in the shaded areas. Comparing the estimates in panels (d) and (e) to the truth under in panels (b), and (e) shows that the flexible BNP regression model captures the simulation truth reasonably well, which provides a good basis for accurate statistical decision making. For example, the posterior predictive distribution of Utot in panel (c) that provides a comprehensive criterion for treatment comparison, and is close to the truth in panel (a).

Supplementary Figure 4 illustrates posterior estimates Δ^(x,0,1) of the probabilities that τ = 1 has greater utility than τ = 0, and compares posterior estimates of expected utilities u^(τ,xnew) for each of τ = 0 and 1, the varying xnew=(x1new,x2new). For x1new = Age, a grid from 40 to 80 years in 5-year increments was used, and values of x2new{0,1} are indicated by the symbols, + and ×, respectively, with true values given in dark green. The model recovers the simulation truth reasonably well, and the decision-making procedure based on Δ(x, 0, 1) in Section 3, selects the truly superior treatment for all cases of xnew. From panels (b) and (c), we observe discrepancies between u^tot(τ,xnew) and UtotTR(τ,xnew) for small x1 (Age). However, the ranks of u^tot(τ,xnew) between the treatments are well estimated and the procedure of choosing τ to maximize u^tot(τ,xnew) reliably selects the treatments with truly greater utility.

We simulated 100 datasets to further examine the performance of the proposed decision-making procedures. The results are summarized in Supplementary Figure 5. Panels (a) and (d) of the figure show the distributions of Δ^(xnew,0,1) over the 100 datasets, for x2new=0 and 1, respectively, where the symbols + and × represent ΔTR(xnew,0,1) with x2new=0 and 1 respectively. In most cases, the proposed decision-making procedure based on Δ(x, 0, 1) produces the correct decisions. Panels (b), (c), (e) and (f) illustrate the distributions of predictive mean utility estimates, u^tot(xnew,τ), using the 100 datasets for (x1,x3)=(0,0),(1,0), (0, 1) and (1, 1) respectively. The proposed model produces reasonably good estimates of the expected utilities, and the ordering of utot(xnew,τ) are also well estimated overall.

6 DISCUSSION

We have presented a formal decision-making framework based on utility functions to address the goal of statistical decision making based on data from a randomized clinical trial. The key elements of our methodology are a multivariate Bayesian regression model and a utility function of outcomes and covariates. We assumed a DDP, which is a general, flexible family of Bayesian regression models.

The methodology was illustrated with a breast cancer dataset from a randomized clinical trial. This required close collaboration with oncologists to elicit a utility function that reflected this clinical setting, and the resulting utility function varied with Age to reflect different risk-benefit trade-offs between PFS and TTB for older versus younger patients. Our application illustrates that, by establishing a utility function that quantifies the risk-benefit trade-off between two competing outcomes, one can derive a rational basis for making treatment choices in settings where simpler comparisons in terms of individual outcomes give different, contradictory choices.

The particular form of our utility function was tailored to the breast cancer setting, and may not be appropriate in other clinical settings. In general, the requirements are that U(y,x) must be consistent in the arguments of y so that it makes sense, and that it be tractable enough to facilitate application. Beyond that, a utility should be constructed so that it provides a sensible basis for quantifying trade-offs between different outcomes. It also should be kept in mind that, if U is constructed to be a function of y but not x, both the posterior mean utility u(τ,x) and Δ(x,τ,τ) still will vary with x due to the regression structure of f( y | τ, x, θ), and thus both criteria still will serve as a basis for making personalized treatment decisions.

Our framework may be generalized to accommodate more complex clinical settings, such as meta-analysis of multiple studies, optimization of a multi-stage treatment strategies or non-medical applications. Although the key elements will remain the same, such applications may be complex and will require tailoring to particular settings.

ACKNOWLEDGEMENTS

Juhee Lee's research was supported by NSF grant DMS-1662427. Pavlos Msaouel is supported by a Career Development Award by the American Society of Clinical Oncology, a Research Award by KCCure, the MD Anderson Khalifa Scholar Award and the MD Anderson Physician-Scientist Award. Peter Thall's research was supported by NIH/NCI grants P01 2P30CA016672 and R01 CA261978.

This manuscript was prepared using data from Datasets NCT00601900-D1 and NCT00601900-D2 from the NCTN Data Archive of the National Cancer Institute's (NCI's) National Clinical Trials Network (NCTN). Data were originally collected from clinical trial NCT number NCT00601900, Endocrine Therapy With or Without Anti-VEGF Therapy: A Randomized, Phase III Trial of Endocrine Therapy Alone or Endocrine Therapy Plus Bevacizumab (NSC 704865) for Women With Hormone Receptor-Positive Advanced Breast Cancer. All analyses and conclusions in this manuscript are the sole responsibility of the authors and do not necessarily reflect the opinions or views of the clinical trial investigators, the NCTN, or the NCI. The CALGB-40503 dataset was made available through an NCI-Genentech agreement under which Genentech supplied the drug to support the study.

REFERENCES

Bagočius
,
V.
,
Zavadskas
,
E.K.
&
Turskis
,
Z.
(
2014
)
Multi-person selection of the best wind turbine based on the multi-criteria integrated additive-multiplicative utility function
.
Journal of Civil Engineering and Management
,
20
(
4
),
590
599
.

Bekele
,
B.
&
Thall
,
P.
(
2004
)
Dose-finding based on multiple toxicities in a soft tissue sarcoma trial
.
Journal of the American Statistical Association
,
99
,
71
84
.

Chen
,
W.
,
Wiecek
,
M.M.
&
Zhang
,
J.
(
1998
)
Quality utility: a compromise programming approach to robust design
. In International design engineering technical conferences and computers and information in engineering conference, volume 80326, American Society of Mechanical Engineers. pp. V002T02A032.

Chib
,
S.
&
Greenberg
,
E.
(
1998
)
Analysis of multivariate probit models
.
Biometrika
,
85
,
347
361
.

De Iorio
,
M.
,
Müller
,
P.
,
Rosner
,
G.L.
&
MacEachern
,
S.N.
(
2004
)
An ANOVA model for dependent random measures
.
Journal of the American Statistical Association
,
99
(
465
),
205
215
.

De Iorio
,
M.
,
Johnson
,
W.O.
,
Müller
,
P.
&
Rosner
,
G.L.
(
2009
)
Bayesian nonparametric nonproportional hazards survival modeling
.
Biometrics
,
65
(
3
),
762
771
.

Detsky
,
A.S.
,
Naglie
,
G.
,
Krahn
,
M.D.
,
Naimark
,
D.
&
Redelmeier
,
D.A.
(
1997
)
Primer on medical decision analysis: part 1—Getting started
.
Medical Decision Making
,
17
(
2
),
123
125
.

Dickler
,
M.N.
,
Barry
,
W.T.
,
Cirrincione
,
C.T.
,
Ellis
,
M.J.
,
Moynahan
,
M.E.
,
Innocenti
,
F.
et al. (
2016
)
Phase III trial evaluating letrozole as first-line endocrine therapy with or without bevacizumab for the treatment of postmenopausal women with hormone receptor-positive advanced-stage breast cancer: CALGB 40503 (Alliance)
.
Journal of Clinical Oncology
,
34
(
22
),
2602
.

Ishwaran
,
H.
&
James
,
L.F.
(
2001
)
Gibbs sampling methods for stick-breaking priors
.
Journal of the American Statistical Association
,
96
(
453
),
161
173
.

Le-Rademacher
,
J.
,
Hillman
,
S.
,
Storrick
,
E.
,
Mahoney
,
M.
,
Thall
,
P.
,
Jatoi
,
A.
et al. (
2020
)
Adverse event burden score a versatile summary measure in cancer clinical trials
.
Cancers
,
12
,
3256
.

Loewenstein
,
G.F.
,
Thompson
,
L.
&
Bazerman
,
M.H.
(
1989
)
Social utility and decision making in interpersonal contexts
.
Journal of Personality and Social psychology
,
57
(
3
),
426
.

MacEachern
,
S.N.
(
1999
) Dependent nonparametric processes. In
ASA proceedings of the section on Bayesian statistical science
, volume 1.
Alexandria, VA
:
American Statistical Association
, pp.
50
55
.

Mitra
,
R.
&
Müller
,
P.
(
2015
)
Nonparametric Bayesian inference in biostatistics
.
Berlin
:
Springer-Verlag
.

Müller
,
P.
&
Mitra
,
R.
(
2013
)
Bayesian nonparametric inference why and how
.
Bayesian Analysis
,
8
(
2
),
269
302
.

Müller
,
P.
,
Quintana
,
A.
,
Jara
,
A.
&
Hanson
,
T.
(
2015
)
Bayesian nonparametric data analysis
.
Berlin
:
Springer-Verlag
.

Naglie
,
G.
,
Krahn
,
M.D.
,
Naimark
,
D.
,
Redelmeier
,
D.A.
&
Detsky
,
A.S.
(
1997
)
Primer on medical decision analysis: part 3 estimating probabilities and utilities
.
Medical Decision Making
,
17
(
2
),
136
141
.

Pennings
,
J.M.
&
Smidts
,
A.
(
2003
)
The shape of utility functions and organizational behavior
.
Management Science
,
49
(
9
),
1251
1263
.

Rodriguez
,
A.
&
Dunson
,
D.B.
(
2011
)
Nonparametric Bayesian models through probit stick-breaking processes
.
Bayesian Analysis
,
6
(
1
),
145
177
.

Roy
,
S.K.
,
Maity
,
G.
&
Weber
,
G.-W.
(
2017
)
Multi-objective two-stage grey transportation problem using utility function with goals
.
Central European Journal of Operations Research
,
25
(
2
),
417
439
.

Sethuraman
,
J.
(
1994
)
A constructive definition of Dirichlet priors
.
Statistica Sinica
,
4
(
2
),
639
650
.

Swanson
,
G.M.
&
Lin
,
C.-S.
(
1994
)
Survival patterns among younger women with breast cancer: the effects of age, race, stage, and treatment
.
Journal of the National Cancer Institute. Monographs
, (
16
),
69
77
.

Thall
,
P.
,
Müller
,
P.
,
Xu
,
Y.
&
Guidani
,
M.
(
2017
)
Bayesian nonparametric statistics: a new toolkit for discovery in cancer research
.
Pharmaceutical Statistics
,
16
,
414
423
.

Walsh
,
W.E.
,
Tesauro
,
G.
,
Kephart
,
J.O.
&
Das
,
R.
(
2004
) Utility functions in autonomic systems. In
International conference on autonomic computing, 2004. Proceedings
.
IEEE
, pp.
70
77
. Available from: https://www.computer.org/csdl/proceedings-article/icac/2004/21140070/12OmNzahc1L [Accessed 5th September 2022].

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data