Abstract

Identification of optimal dose combinations in early-phase dose-finding trials is challenging, due to the trade-off between precisely estimating the many parameters required to flexibly model the possibly nonmonotonic dose-response surface, and the small sample sizes in early-phase trials. This difficulty is even more pertinent in the context of personalized dose-finding, where patient characteristics are used to identify tailored optimal dose combinations. To overcome these challenges, we propose the use of Bayesian optimization for finding optimal dose combinations in standard (one size fits all) and personalized multi-agent dose-finding trials. Bayesian optimization is a method for estimating the global optima of expensive-to-evaluate objective functions. The objective function is approximated by a surrogate model, commonly a Gaussian process, paired with a sequential design strategy to select the next point via an acquisition function. This work is motivated by an industry-sponsored problem, where the focus is on optimizing a dual-agent therapy in a setting featuring minimal toxicity. To compare the performance of the standard and personalized methods under this setting, simulation studies are performed for a variety of scenarios. Our study concludes that taking a personalized approach is highly beneficial in the presence of heterogeneity.

1 Introduction

Early-phase clinical trials are designed to assess the safety and efficacy profiles of first in human doses of an experimental drug. Traditional adaptive dose-finding designs fall into three major families: algorithmic designs (e.g. Storer, 1989), model-assisted designs (e.g. Yuan et al., 2016), and model-based designs (e.g. O’Quigley et al., 1990). Here, our interest lies in model-based designs, which have been extended to the dual-agent dose combination setting (e.g. Wages & Conaway, 2014; K. Wang & Ivanova, 2005). Some of these designs (e.g. Houede et al., 2010; Z. Wang et al., 2023) propose using flexible models to handle possible nonmonotonicities in the dose-efficacy/dose-toxicity surfaces, recognizing that monotonicity depends on the type of drug being administered and may not hold in general (Li et al., 2017). Additionally, existing methods often restrict dose-finding to a small set of preselected dose combinations, which can fail to identify the optimal dose combination (Hirakawa et al., 2015). Adaptive dose insertion designs have been proposed to allow for the evaluation of additional dose combinations if warranted by the data (Cai et al., 2014; Lyu et al., 2019).

Most existing dose-finding designs, including those mentioned above, seek to find optimal dose combinations without consideration of additional covariate information (we refer to this as standard dose-finding). These designs ignore the possibility that patient responses may be variable across different subgroups within the population. With recent advances in molecular biology, specifically the identification of novel biomarkers, interest has grown in personalized (or precision) medicine. Personalized dose-finding aims to find optimal dose combinations based on individual patient characteristics. When utilizing parametric models, as in many of the standard dose-finding methods described above, extension to the personalized setting is challenging due to the limited sample sizes and potentially large number of dose-covariate interaction terms that are required to be estimated. Personalized dose-finding for monotherapies has been investigated using a variety of methods, including dimension-reduction techniques (e.g. Guo & Zang, 2022), Bayesian hierarchical models (e.g. Morita et al., 2017), and Bayesian model averaging (e.g. Psioda et al., 2021). The increased dimensionality in the combination therapy setting makes estimation even more challenging. Mozgunov et al. (2022) have considered personalized dose-finding for a dual-agent combination therapy where the patient-specific dose of one of the agents is selected externally by clinicians, but multi-agent personalized dose-finding designs where all dose dimensions are explored during optimization are still needed.

Our work is motivated by a problem in the industry, where a sponsor is interested in a design for the development of an intraocular implant that combines two topical agents. Each agent has been in use and is well tolerated. No drug-related adverse events are expected, and we anticipate minimal toxicity, if any. Interest lies in obtaining the optimal dose combination of these two agents using a continuous efficacy measure, which need not be monotonic with the doses of each agent. Additionally, response heterogeneity is expected to exist with respect to a key binary covariate, a particular characteristic of the lens of the eye, and we seek a design that accommodates this feature. We allow for exploration of the dose combination region, which has been defined using data from previous observational and randomized studies, without consideration of formal dose escalation/de-escalation rules. As each new dose combination investigated carries engineering costs, early stopping is desirable if warranted by the data.

Considering this motivating example, we propose novel methodology for multi-agent dose-finding which meets several criteria. First, the model estimating the response surface is parsimonious, which is suitable for small sample sizes, yet remains flexible enough to model both monotonic and nonmonotonic dose-response surfaces. Next, the design sequentially explores the entire dose combination space in a principled way, and allows for early stopping if warranted. Finally, our methodology facilitates extension to personalized dose-finding. Specifically, we propose modelling the response surface with a Gaussian process (GP) and using Bayesian optimization for efficient standard and personalized multi-agent dose-finding, assuming minimal toxicity. Bayesian optimization has been previously proposed for monotherapy in a standard dose-finding setting (Takahashi & Suzuki, 2021a, 2021b). Our contributions here differ from this previous work by utilizing a different sequential search policy, by extending the approach to combination therapies and personalized dose-finding setting, and by permitting early stopping.

The remainder of this manuscript is organized in the following manner. We first introduce Bayesian optimization, propose methods for standard and personalized multi-agent dose-finding, and describe an early stopping rule. A simulation study is then performed to compare the standard and personalized dose-finding methods for a dual-agent therapy under a variety of scenarios without early stopping. We then consider the development of the previously described intraocular implant and compare several dose-finding designs which include early stopping rules. We conclude with a discussion and possible directions for future work.

2 Bayesian optimization for dose-finding

Standard dose-finding. Let f(d)R be a continuous dose-response surface of interest, defined as a function of dose combinations d=(d1,,dJ), which are comprised of JN dosing agents and which lie in a continuous dose combination region DRJ. In this setting, f(d) may be the dose-efficacy surface or the dose-utility surface, where utility is defined using both efficacy and toxicity outcomes. For the remainder of the manuscript, we assume f(d) is the dose-efficacy surface, which has been transformed such that smaller values are more desirable. We assume minimal toxicity over D. Thus, dose escalation/de-escalation rules are not considered, and it is ethically permissible to select future dose combinations throughout the design space. Our interest is in obtaining the optimal dose combination doptD, defined as that which minimizes f(d):

To do so, we use Bayesian optimization, which is a derivative-free optimization method that estimates the global optima of expensive-to-evaluate objective functions (Garnett, 2023). It is a sequential design strategy that approximates the objective function with a stochastic surrogate model (commonly a GP), and selects the next point by optimizing an acquisition function. Bayesian optimization is commonly used in engineering, machine learning, and computer experiments (Gramacy, 2020; Murphy, 2023). In this work, we model the efficacy function with a GP, and select the next dose combination as that which maximizes an acquisition function. This process repeats until optimality criteria are satisfied or sample size limits are reached.

To begin, a GP prior is placed on f(d). This prior is defined by its mean function, E[f(d)]=m(d), and its correlation function (kernel) Corr(f(d),f(d))=K(d,d), which determines the correlation between two responses as a function of their corresponding dose combinations, d and d (Williams & Rasmussen, 2006). This correlation function is multiplied by scale parameter ν, which determines the variability of the efficacy function throughout the dose combination space and yields Cov(f(d),f(d))=νK(d,d). Since the method’s flexibility is largely determined by the choice of kernel function, constant mean GPs can be used, and in this work, we set m(d)=β0. However, information about the response surface (e.g. pharmacokinetic/pharmacodynamic models) can be incorporated through the mean function if desired. Furthermore, we utilize a separable anisotropic squared exponential kernel,

(1)

This kernel is parameterized by characteristic length-scales ldj, which control the rate of decay in the correlation between two dose combinations with respect to each dosing dimension and which are estimated using available data. Additionally, it is an example of a stationary kernel, which assumes that the degree of correlation between two dose combinations depends only on their distance and not on their locations in the dose combination space. We assume stationary dose-response surfaces throughout this work, but comment on relaxing this assumption in the discussion.

To optimize f(d), r patients are assigned to each of c initial dose combinations, yielding n=r×c patient responses which are treated as noisy observations yi=f(di)+ϵi, where ϵiN(0,σy2) for i=1,,n. This yields the observed data D={(di,yi)}i=1n, where we denote the vector of observations by y=(y1,,yn)T. Specifying the GP prior above induces a multivariate normal distribution on the observations, yN(β01n,νK), where K(i,j)=K(di,dj)+τ21i=j and τ2 is a noise parameter. After observing D, we adopt an empirical Bayes approach towards the kernel hyperparameters θ={ν,τ2,ld1,ld2}, replacing them with their maximum-likelihood estimates (Gramacy, 2020). The posterior distribution of the efficacy function at a new dose combination, denoted by d~, is then p(fD,d~)=N(μ(d~),σ2(d~)) (Binois & Gramacy, 2021), such that

(2)

where K is n×n, k(d~)=(K(d1,d~),,K(dn,d~))T is n×1, and β^0 is a plug-in estimate for the mean, each of which is calculated using the maximum-likelihood estimate of θ. A derivation of (2) is provided in the online supplementary material.

The next dose combination for evaluation, denoted by d(c+1), is then selected as that candidate dose combination d~D which maximizes an acquisition function, denoted by α(d~D):

One acquisition function commonly paired with a GP is the expected improvement (EI), defined as

and available in closed form (Jones et al., 1998),

where f*=mindif(di) denotes the value of the current observed optimum, Φ() and ϕ() denote the cdf and pdf of a standard normal random variable, respectively, and where μ(d~) and σ(d~) are the posterior mean and standard deviation of the efficacy function evaluated at d~, respectively. The EI balances between exploiting regions that have desirable values of f(d) (first term), and exploring regions in the dose combination space that are imprecisely estimated (second term). Under a noisy setting, f* is not observed and so plug-in estimates have been proposed, for example, by using the minimum of the posterior mean, f^*=mind~Dμ(d~) (Picheny et al., 2013). A different acquisition function, called the augmented expected improvement (AEI; Huang et al., 2006), has been shown to offer better performance than EI under higher noise settings (Picheny et al., 2013). The AEI defines f^*=μ(d*), the posterior mean of f at the current ‘effective best solution’, d*, which can be defined as the point which minimizes a posterior β-quantile. We follow the recommendation in Huang et al. (2006) and define d* as that point which minimizes the β=0.84 posterior quantile, which is equivalent to setting d*=argmind~μ(d~)+γσ(d~) when γ=1. The AEI is then defined as:

(3)

The multiplicative term serves to promote exploration by penalizing dose combinations that have small posterior variance (Picheny et al., 2013). In this work, we use AEI since we find it offers moderate improvement over EI.

After evaluation of d(c+1) in r new patients, the data are updated, D=D{(di(c+1),yi)}i=1r. The GP model is refit to obtain a new posterior distribution p(fD,d~). Then s=1,,S samples fs(d~) from this posterior are obtained to yield S samples from the posterior of the optimal dose combination p(doptD) as

This procedure continues until the sample size limit is reached or an early stopping rule, which we denote by 1STOP, is satisfied.

Algorithm 1

Standard Optimization Algorithm

Require:  r patient responses at each of c initial dose combinations, D={(di,yi)}i=1n0
1: nn0=r×c
2: 1STOPFALSE
3: Obtain p(fD,d~) and p(doptD) using fitted GP             ⊳ Obtain posteriors
4: d*argmind~μ(d~)+σ(d~)                       ⊳ Obtain effective best point
5: f*μ(d*)                             ⊳ Obtain best value of f
6: Calculate αAEI(d~D) for d~D                    ⊳ Compute AEI
7: while  n<N and 1STOP=FALSE  do
8:  d(c+1)argmaxd~DαAEI(d~D)                   ⊳ Obtain next dose
9:  for  i=1,,r  do
10:    Evaluate yi at d(c+1)                      ⊳ Observe outcomes
11:  end for
12:  nn+r; cc+1                       ⊳ Update n,c
13:  DD{(di(c+1),yi)}i=1r                      ⊳ Update data
14:  Obtain p(fD,d~) and p(doptD) using fitted GP           ⊳ Obtain posteriors
15:  d*argmind~μ(d~)+σ(d~)                     ⊳ Obtain effective best point
16:  f*μ(d*)                           ⊳ Obtain best value of f
17:  Calculate αAEI(d~D) for d~D                   ⊳ Compute AEI
18:  Update 1STOP using (4)                      ⊳ Update stopping rule
19: end while
Require:  r patient responses at each of c initial dose combinations, D={(di,yi)}i=1n0
1: nn0=r×c
2: 1STOPFALSE
3: Obtain p(fD,d~) and p(doptD) using fitted GP             ⊳ Obtain posteriors
4: d*argmind~μ(d~)+σ(d~)                       ⊳ Obtain effective best point
5: f*μ(d*)                             ⊳ Obtain best value of f
6: Calculate αAEI(d~D) for d~D                    ⊳ Compute AEI
7: while  n<N and 1STOP=FALSE  do
8:  d(c+1)argmaxd~DαAEI(d~D)                   ⊳ Obtain next dose
9:  for  i=1,,r  do
10:    Evaluate yi at d(c+1)                      ⊳ Observe outcomes
11:  end for
12:  nn+r; cc+1                       ⊳ Update n,c
13:  DD{(di(c+1),yi)}i=1r                      ⊳ Update data
14:  Obtain p(fD,d~) and p(doptD) using fitted GP           ⊳ Obtain posteriors
15:  d*argmind~μ(d~)+σ(d~)                     ⊳ Obtain effective best point
16:  f*μ(d*)                           ⊳ Obtain best value of f
17:  Calculate αAEI(d~D) for d~D                   ⊳ Compute AEI
18:  Update 1STOP using (4)                      ⊳ Update stopping rule
19: end while
Algorithm 1

Standard Optimization Algorithm

Require:  r patient responses at each of c initial dose combinations, D={(di,yi)}i=1n0
1: nn0=r×c
2: 1STOPFALSE
3: Obtain p(fD,d~) and p(doptD) using fitted GP             ⊳ Obtain posteriors
4: d*argmind~μ(d~)+σ(d~)                       ⊳ Obtain effective best point
5: f*μ(d*)                             ⊳ Obtain best value of f
6: Calculate αAEI(d~D) for d~D                    ⊳ Compute AEI
7: while  n<N and 1STOP=FALSE  do
8:  d(c+1)argmaxd~DαAEI(d~D)                   ⊳ Obtain next dose
9:  for  i=1,,r  do
10:    Evaluate yi at d(c+1)                      ⊳ Observe outcomes
11:  end for
12:  nn+r; cc+1                       ⊳ Update n,c
13:  DD{(di(c+1),yi)}i=1r                      ⊳ Update data
14:  Obtain p(fD,d~) and p(doptD) using fitted GP           ⊳ Obtain posteriors
15:  d*argmind~μ(d~)+σ(d~)                     ⊳ Obtain effective best point
16:  f*μ(d*)                           ⊳ Obtain best value of f
17:  Calculate αAEI(d~D) for d~D                   ⊳ Compute AEI
18:  Update 1STOP using (4)                      ⊳ Update stopping rule
19: end while
Require:  r patient responses at each of c initial dose combinations, D={(di,yi)}i=1n0
1: nn0=r×c
2: 1STOPFALSE
3: Obtain p(fD,d~) and p(doptD) using fitted GP             ⊳ Obtain posteriors
4: d*argmind~μ(d~)+σ(d~)                       ⊳ Obtain effective best point
5: f*μ(d*)                             ⊳ Obtain best value of f
6: Calculate αAEI(d~D) for d~D                    ⊳ Compute AEI
7: while  n<N and 1STOP=FALSE  do
8:  d(c+1)argmaxd~DαAEI(d~D)                   ⊳ Obtain next dose
9:  for  i=1,,r  do
10:    Evaluate yi at d(c+1)                      ⊳ Observe outcomes
11:  end for
12:  nn+r; cc+1                       ⊳ Update n,c
13:  DD{(di(c+1),yi)}i=1r                      ⊳ Update data
14:  Obtain p(fD,d~) and p(doptD) using fitted GP           ⊳ Obtain posteriors
15:  d*argmind~μ(d~)+σ(d~)                     ⊳ Obtain effective best point
16:  f*μ(d*)                           ⊳ Obtain best value of f
17:  Calculate αAEI(d~D) for d~D                   ⊳ Compute AEI
18:  Update 1STOP using (4)                      ⊳ Update stopping rule
19: end while

One possible early stopping rule proposed in Huang et al. (2006) is to allow stopping only after maxd~DαAEI(d~D)<δ. In this case, the algorithm terminates only if there is little improvement to be gained over f* across the dose combination space. Under a noisy setting, the authors suggest this be satisfied for (J+1) consecutive algorithm iterations before termination. This yields the following stopping rule:

(4)

We note that δ is a tuning parameter that controls the performance of the algorithms. Its value can be determined through sensitivity analysis using several values obtained through Monte Carlo simulation. For example, the Monte Carlo distributions of αAEI can be obtained at each iteration and different values of δ can be selected as summary statistics of these distributions (e.g. the median). The performance of these values can then be compared, with smaller values of δ implying more stringent stopping criteria, and larger values of δ permitting earlier stopping.

The sequential procedure and stopping rule described above suggest Algorithm 1, referred to as the standard optimization algorithm. In many applications of Bayesian optimization, it is standard to collect the initial data and then continue with r=1 at each iteration of the algorithm. This permits maximal exploration of the domain, as a novel input point from the design space can be evaluated each time. In certain dose-finding applications, such as the motivating example of this work, engineering costs may prohibit the creation of a novel dose combination for each new patient, and some patients may necessarily be assigned to the same dose.

Personalized dose-finding. Personalized medicine recognizes that response heterogeneity may exist within the population of interest; personalized dose-finding incorporates covariate information to account for this. Consider a set of P discrete covariates Z={Zp}p=1P. The Cartesian product of the levels of these P covariates define K strata. Personalized dose-finding seeks to find the optimal dose combinations, denoted by dopt,k, across the continuous dose combination space for each of the K strata

The efficacy function f(d,z) is modelled using a single GP fit to the data D={(di,zi,yi)}i=1n, where yi=f(di,zi)+ϵi with ϵiN(0,σy2), which allows information to be borrowed across strata. One possible method of incorporating the additional covariates into the GP model is through the kernel function. We use a (stationary) separable anisotropic squared exponential kernel function that includes the additional covariates of interest

(5)

As before, we use an empirical Bayes approach for estimating the hyperparameters. In the case, where Zp is a binary variable representing the levels of covariate p, the correlation between two patient responses is reduced by a factor of exp(1/(2lp2)) if they belong to different strata with respect to covariate p.

Since the efficacy function may exhibit different behaviour within each stratum, each stratum may have a unique best efficacy function value fk*. Similar to the standard case, fk* is estimated as the posterior mean at the effective best point within stratum k, fk*=μ(d*,zk). To account for this heterogeneity, the sequential selection is performed within each stratum but with the GP fit utilizing data from all strata. This proceeds by modifying the AEI acquisition function to use fk* when conditioned on being in stratum k, denoted by αAEI(d~D,z~k). The sequential procedure continues until sample size limits are reached or until an early stopping rule is satisfied for each stratum. Since optimal doses in some strata may be easier to identify than in others, stratum-specific early stopping should be employed. One possible stratum-specific stopping rule, which we denote by 1STOP,k, replaces αAEI(d~D) in (4) with αAEI(d~D,z~k). Upon termination of the algorithm, the posterior distribution of the optimal dose combination for each strata is returned, denoted by p(doptD,zk). These modifications suggest Algorithm 2, the personalized optimization algorithm.

Algorithm 2

Personalized Optimization Algorithm

Require:  r patient responses at each of ck initial dose combinations per stratum, D={(di,zi,yi)}i=1n0
1: nn0=k=1Kn0,k
2: Obtain p(fD,d~,z~) using fitted GP                ⊳ Obtain posterior of f
3: for  k=1,,K  do
4:  Obtain p(dopt,kD,zk)                     ⊳ Obtain posterior of dopt,k
5:  d*argmind~μ(d~,zk)+σ(d~,zk)                  ⊳ Obtain effective best point
6:  fk*μ(d*,zk)                         ⊳ Obtain best value fk*
7:  Calculate αAEI(d~D,z~k) for d~D                ⊳ Compute AEI
8:  1STOP,kFALSE
9: end for
10: while  n<N and 1STOP,k=FALSE for at least one stratum k  do
11:  for  k=1,,K  do
12:   nk0
13:   Dk=
14:   if  1STOP,k=FALSE  then
15:    d(ck+1)argmaxd~DαAEI(d~D,z~k)             ⊳ Obtain next dose
16:    for  i=1,,r  do
17:     Evaluate yi at (d(ck+1),zk)                ⊳ Observe outcomes
18:    end for
19:    nkr; ckck+1                    ⊳ Update nk,ck
20:    Dk={(di(ck+1),zk,yi)}i=1r
21:   end if
22:  end for
23:  nn+k=1Knk                        ⊳ Update n
24:  D=Dk=1KDk                         ⊳ Update data
25:  Obtain p(fD,d~,z~) using fitted GP                ⊳ Obtain posterior of f
26:  for  k=1,,K  do
27:   Obtain p(dopt,kD,zk)                    ⊳ Obtain posterior of dopt,k
28:   if  1STOP,k=FALSE  then
29:    d*argmind~μ(d~,zk)+σ(d~,zk)               ⊳ Obtain effective best point
30:    fk*μ(d*,zk)                      ⊳ Obtain best value fk*
31:    Calculate αAEI(d~D,z~k) for d~D             ⊳ Compute AEI
32:    Update 1STOP,k using (4)                  ⊳ Update stopping rule
33:   end if
34:  end for
35: end while
Require:  r patient responses at each of ck initial dose combinations per stratum, D={(di,zi,yi)}i=1n0
1: nn0=k=1Kn0,k
2: Obtain p(fD,d~,z~) using fitted GP                ⊳ Obtain posterior of f
3: for  k=1,,K  do
4:  Obtain p(dopt,kD,zk)                     ⊳ Obtain posterior of dopt,k
5:  d*argmind~μ(d~,zk)+σ(d~,zk)                  ⊳ Obtain effective best point
6:  fk*μ(d*,zk)                         ⊳ Obtain best value fk*
7:  Calculate αAEI(d~D,z~k) for d~D                ⊳ Compute AEI
8:  1STOP,kFALSE
9: end for
10: while  n<N and 1STOP,k=FALSE for at least one stratum k  do
11:  for  k=1,,K  do
12:   nk0
13:   Dk=
14:   if  1STOP,k=FALSE  then
15:    d(ck+1)argmaxd~DαAEI(d~D,z~k)             ⊳ Obtain next dose
16:    for  i=1,,r  do
17:     Evaluate yi at (d(ck+1),zk)                ⊳ Observe outcomes
18:    end for
19:    nkr; ckck+1                    ⊳ Update nk,ck
20:    Dk={(di(ck+1),zk,yi)}i=1r
21:   end if
22:  end for
23:  nn+k=1Knk                        ⊳ Update n
24:  D=Dk=1KDk                         ⊳ Update data
25:  Obtain p(fD,d~,z~) using fitted GP                ⊳ Obtain posterior of f
26:  for  k=1,,K  do
27:   Obtain p(dopt,kD,zk)                    ⊳ Obtain posterior of dopt,k
28:   if  1STOP,k=FALSE  then
29:    d*argmind~μ(d~,zk)+σ(d~,zk)               ⊳ Obtain effective best point
30:    fk*μ(d*,zk)                      ⊳ Obtain best value fk*
31:    Calculate αAEI(d~D,z~k) for d~D             ⊳ Compute AEI
32:    Update 1STOP,k using (4)                  ⊳ Update stopping rule
33:   end if
34:  end for
35: end while
Algorithm 2

Personalized Optimization Algorithm

Require:  r patient responses at each of ck initial dose combinations per stratum, D={(di,zi,yi)}i=1n0
1: nn0=k=1Kn0,k
2: Obtain p(fD,d~,z~) using fitted GP                ⊳ Obtain posterior of f
3: for  k=1,,K  do
4:  Obtain p(dopt,kD,zk)                     ⊳ Obtain posterior of dopt,k
5:  d*argmind~μ(d~,zk)+σ(d~,zk)                  ⊳ Obtain effective best point
6:  fk*μ(d*,zk)                         ⊳ Obtain best value fk*
7:  Calculate αAEI(d~D,z~k) for d~D                ⊳ Compute AEI
8:  1STOP,kFALSE
9: end for
10: while  n<N and 1STOP,k=FALSE for at least one stratum k  do
11:  for  k=1,,K  do
12:   nk0
13:   Dk=
14:   if  1STOP,k=FALSE  then
15:    d(ck+1)argmaxd~DαAEI(d~D,z~k)             ⊳ Obtain next dose
16:    for  i=1,,r  do
17:     Evaluate yi at (d(ck+1),zk)                ⊳ Observe outcomes
18:    end for
19:    nkr; ckck+1                    ⊳ Update nk,ck
20:    Dk={(di(ck+1),zk,yi)}i=1r
21:   end if
22:  end for
23:  nn+k=1Knk                        ⊳ Update n
24:  D=Dk=1KDk                         ⊳ Update data
25:  Obtain p(fD,d~,z~) using fitted GP                ⊳ Obtain posterior of f
26:  for  k=1,,K  do
27:   Obtain p(dopt,kD,zk)                    ⊳ Obtain posterior of dopt,k
28:   if  1STOP,k=FALSE  then
29:    d*argmind~μ(d~,zk)+σ(d~,zk)               ⊳ Obtain effective best point
30:    fk*μ(d*,zk)                      ⊳ Obtain best value fk*
31:    Calculate αAEI(d~D,z~k) for d~D             ⊳ Compute AEI
32:    Update 1STOP,k using (4)                  ⊳ Update stopping rule
33:   end if
34:  end for
35: end while
Require:  r patient responses at each of ck initial dose combinations per stratum, D={(di,zi,yi)}i=1n0
1: nn0=k=1Kn0,k
2: Obtain p(fD,d~,z~) using fitted GP                ⊳ Obtain posterior of f
3: for  k=1,,K  do
4:  Obtain p(dopt,kD,zk)                     ⊳ Obtain posterior of dopt,k
5:  d*argmind~μ(d~,zk)+σ(d~,zk)                  ⊳ Obtain effective best point
6:  fk*μ(d*,zk)                         ⊳ Obtain best value fk*
7:  Calculate αAEI(d~D,z~k) for d~D                ⊳ Compute AEI
8:  1STOP,kFALSE
9: end for
10: while  n<N and 1STOP,k=FALSE for at least one stratum k  do
11:  for  k=1,,K  do
12:   nk0
13:   Dk=
14:   if  1STOP,k=FALSE  then
15:    d(ck+1)argmaxd~DαAEI(d~D,z~k)             ⊳ Obtain next dose
16:    for  i=1,,r  do
17:     Evaluate yi at (d(ck+1),zk)                ⊳ Observe outcomes
18:    end for
19:    nkr; ckck+1                    ⊳ Update nk,ck
20:    Dk={(di(ck+1),zk,yi)}i=1r
21:   end if
22:  end for
23:  nn+k=1Knk                        ⊳ Update n
24:  D=Dk=1KDk                         ⊳ Update data
25:  Obtain p(fD,d~,z~) using fitted GP                ⊳ Obtain posterior of f
26:  for  k=1,,K  do
27:   Obtain p(dopt,kD,zk)                    ⊳ Obtain posterior of dopt,k
28:   if  1STOP,k=FALSE  then
29:    d*argmind~μ(d~,zk)+σ(d~,zk)               ⊳ Obtain effective best point
30:    fk*μ(d*,zk)                      ⊳ Obtain best value fk*
31:    Calculate αAEI(d~D,z~k) for d~D             ⊳ Compute AEI
32:    Update 1STOP,k using (4)                  ⊳ Update stopping rule
33:   end if
34:  end for
35: end while

3 Simulation study

Below, we perform a simulation study to compare the performance of the standard and personalized algorithms (Algorithms 1 and 2, respectively) under three scenarios with no early stopping (i.e. δ=0 in (4)). Early stopping will be investigated in the next section. Scenarios 1 and 2 consider a single binary covariate Z1. We index the true optimal dose combinations, dopt,k, and the true optimal values of the efficacy function, fopt,k, using the values of Z1. That is, when Z1=0 we use dopt,0 and fopt,0. Scenario 3 considers two binary covariates, Z1 and Z2, and the dopt,k and fopt,k are indexed similarly. For example, when Z1=0 and Z2=1, we use dopt,01 and fopt,01. To make the simulations in this manuscript more computationally feasible, we modify Algorithms 1 and 2 to return a point estimate of dopt,k rather than the entire posterior distribution. The point estimate is defined as the minimizer of the posterior mean surface, d^opt,k=argmind~μ(d~,z~k).

We utilize dose combinations d=(d1,d2)[0,1]2, assumed to be standardized, where d=(0,0) corresponds to the combination using the lowest doses of interest for each agent and where d=(1,1) corresponds to the combination using the highest doses of interest for each agent. The point estimates, d^opt,k, and the next dose combinations for evaluation, d(ck+1), are set, respectively, as the minimizers of μ(d~,z~k) and maximizers of αAEI(d~D,z~k). The αAEI(d~D,z~k) is evaluated across an evenly spaced grid on [0,1]2. The grid is incremented by 0.25 in each dimension, reflecting the degree of precision to which the drug maker can manufacture a particular dose combination. As a result, some dose combinations may be suggested more than once in the algorithm. While the proposed method is capable of optimizing over the continuous dose combination space, it is important to incorporate any manufacturing constraints into the optimization procedures to avoid suggesting doses that are not feasible to engineer.

As before, we define f(d,z) to be a continuous efficacy surface and assume we are in a minimal toxicity setting where it is ethically permissible to select future doses anywhere in the dose combination space. Let gi be a bivariate normal density function, N(μi,Σi), which is parameterized by mean vector μi and covariance matrix Σi, for i=1,2,3. The efficacy functions under the considered scenarios use the following densities evaluated at d, denoted by gi(d) in Table 1:

The data-generating mechanism for each scenario is y=f(d,z)+ϵ, where ϵN(0,σy2), with the specification of f(d,z) included in the first panel of Table 1 (rows labelled ‘Simulation Study’) and plotted in Figures 1a–3a. The values of σy are chosen to ensure specific standardized effect sizes, defined as ses=|fopt|/σy. We consider several standardized effect sizes drawn from a meta-analysis of dose-responses for a large drug development portfolio at a pharmaceutical company (Thomas et al., 2014). We focus on standardized effect sizes for drugs that had laboratory-confirmed endpoints, which is the type of endpoint used in our motivating example. The 25th/50th/75th percentiles of these standardized effect sizes are 0.79/1/3.77, which we refer to as small/medium/large effect sizes.

Scenario 1. (a) Efficacy function, white stars denote dopt,k, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration.
Figure 1.

Scenario 1. (a) Efficacy function, white stars denote dopt,k, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration.

Scenario 2. (a) Efficacy function, white stars denote dopt,k, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration.
Figure 2.

Scenario 2. (a) Efficacy function, white stars denote dopt,k, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration.

Scenario 3. (a) Efficacy function, white stars denote dopt,k, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration. In (b), the plot for (z1=0,z2=0) is empty as no optimal dose combination exists.
Figure 3.

Scenario 3. (a) Efficacy function, white stars denote dopt,k, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration. In (b), the plot for (z1=0,z2=0) is empty as no optimal dose combination exists.

Table 1.

Simulation scenarios considered

 f(d,z)σyz1z2doptfoptsesmonotone
Simulation Study1) 1{z1=0}×g1(d)2.0150(1,1)1.5920.79Yes
1{z1=1}×g1(d)1(1,1)1.5920.79Yes
2) 1{z1=0}×g2(d)0.3190(0.25,0.75)1.2033.77No
1{z1=1}×g3(d)1(0.75,0.25)1.2033.77No
3) 1{(z1=0,z2=0)}×0100NoneNone0No
1{(z1=0,z2=1)}×0.831g2(d)01(0.25,0.75)11No
1{(z1=1,z2=0)}×3.134g3(d)10(0.75,0.25)3.773.77No
1{(z1=1,z2=1)}×0.496g1(d)11(1,1)0.790.79Yes
Implant1) 1(z1=0)×2.49g2(d)50(0.25,0.75)51No
1(z1=1)×6.65g3(d)21(0.75,0.25)102No
 f(d,z)σyz1z2doptfoptsesmonotone
Simulation Study1) 1{z1=0}×g1(d)2.0150(1,1)1.5920.79Yes
1{z1=1}×g1(d)1(1,1)1.5920.79Yes
2) 1{z1=0}×g2(d)0.3190(0.25,0.75)1.2033.77No
1{z1=1}×g3(d)1(0.75,0.25)1.2033.77No
3) 1{(z1=0,z2=0)}×0100NoneNone0No
1{(z1=0,z2=1)}×0.831g2(d)01(0.25,0.75)11No
1{(z1=1,z2=0)}×3.134g3(d)10(0.75,0.25)3.773.77No
1{(z1=1,z2=1)}×0.496g1(d)11(1,1)0.790.79Yes
Implant1) 1(z1=0)×2.49g2(d)50(0.25,0.75)51No
1(z1=1)×6.65g3(d)21(0.75,0.25)102No

Note. The data-generating mechanism for each scenario is y=f(d,z)+ϵ where ϵN(0,σy2). The table columns contain the location of the optimal dose combination (dopt), the optimal value of the efficacy function (fopt), the standardized effect size (ses), and whether or not the dose-efficacy surface is monotonically increasing with respect to each dosing dimension (monotone). Note that gi(d) for 1=1,2,3 represents the value of a bivariate normal density function with specific mean vector and covariance matrix evaluated at d as defined in the text. The subtraction of 2 in f(d,z) under the Implant scenario corresponds to a base level of drug response outside the regions of optimality.

Table 1.

Simulation scenarios considered

 f(d,z)σyz1z2doptfoptsesmonotone
Simulation Study1) 1{z1=0}×g1(d)2.0150(1,1)1.5920.79Yes
1{z1=1}×g1(d)1(1,1)1.5920.79Yes
2) 1{z1=0}×g2(d)0.3190(0.25,0.75)1.2033.77No
1{z1=1}×g3(d)1(0.75,0.25)1.2033.77No
3) 1{(z1=0,z2=0)}×0100NoneNone0No
1{(z1=0,z2=1)}×0.831g2(d)01(0.25,0.75)11No
1{(z1=1,z2=0)}×3.134g3(d)10(0.75,0.25)3.773.77No
1{(z1=1,z2=1)}×0.496g1(d)11(1,1)0.790.79Yes
Implant1) 1(z1=0)×2.49g2(d)50(0.25,0.75)51No
1(z1=1)×6.65g3(d)21(0.75,0.25)102No
 f(d,z)σyz1z2doptfoptsesmonotone
Simulation Study1) 1{z1=0}×g1(d)2.0150(1,1)1.5920.79Yes
1{z1=1}×g1(d)1(1,1)1.5920.79Yes
2) 1{z1=0}×g2(d)0.3190(0.25,0.75)1.2033.77No
1{z1=1}×g3(d)1(0.75,0.25)1.2033.77No
3) 1{(z1=0,z2=0)}×0100NoneNone0No
1{(z1=0,z2=1)}×0.831g2(d)01(0.25,0.75)11No
1{(z1=1,z2=0)}×3.134g3(d)10(0.75,0.25)3.773.77No
1{(z1=1,z2=1)}×0.496g1(d)11(1,1)0.790.79Yes
Implant1) 1(z1=0)×2.49g2(d)50(0.25,0.75)51No
1(z1=1)×6.65g3(d)21(0.75,0.25)102No

Note. The data-generating mechanism for each scenario is y=f(d,z)+ϵ where ϵN(0,σy2). The table columns contain the location of the optimal dose combination (dopt), the optimal value of the efficacy function (fopt), the standardized effect size (ses), and whether or not the dose-efficacy surface is monotonically increasing with respect to each dosing dimension (monotone). Note that gi(d) for 1=1,2,3 represents the value of a bivariate normal density function with specific mean vector and covariance matrix evaluated at d as defined in the text. The subtraction of 2 in f(d,z) under the Implant scenario corresponds to a base level of drug response outside the regions of optimality.

Scenario 1 considers the case of no response heterogeneity across a binary covariate Z1 and includes a small standardized effect size with a dose-efficacy surface, which is monotonically increasing with respect to each dosing dimension. Both the locations of the optimal dose combinations, and the optimal values of the efficacy function, are the same across the strata. That is, dopt,0=dopt,1 and fopt,0=fopt,1. Scenario 2 considers response heterogeneity across Z1, where the locations of the optimal dose combinations differ. Under this scenario, dopt,0dopt,1 but fopt,0=fopt,1. This scenario considers large standardized effect sizes with efficacy surfaces that are nonmonotone with respect to each dosing dimension. An example of a single trial performed using the personalized algorithm for this scenario is provided in Figure E.1 in the online supplementary material. Scenario 3 considers heterogeneity across two binary covariates, Z1 and Z2, where both the locations of the optimal dose combinations and the optimal values of the efficacy function differ across the strata. Thus, dopt,ijdopt,lm for ijlm and fopt,ijfopt,lm for ijlm. This scenario includes a zero stratum, (z1=0,z2=0), which represents the covariate pattern of those who do not respond to the drug. We note that in this stratum, dopt,00 and fopt,00 do not exist, but we consider the standardized effect size to be 0. This scenario includes small, medium, and large standardized effect sizes as well as both monotone and nonmonotone dose-efficacy surfaces.

For each scenario, the standard and personalized algorithms are run using a maximum sample size of 80 participants. While this number is larger than many early-phase trials might be in practice, our goal is to investigate the algorithms’ performance characteristics as the sample size increases. We defer the use of early stopping rules to the following section, but note that these will permit a reduction in the expected sample size. We assign participants to dose combinations such that the total sample size of each algorithm is equal at each iteration, which allows a comparison of their performance. For the standard algorithm under scenarios 1 and 2, r=4 participants are assigned to each dose combination on an initial dose matrix comprised of c=5 dose combinations, which are selected via a two-dimensional quasi-random Sobol sequence (Morgan-Wall, 2022). This sequence serves as a space-filling design and seeks to spread out the initial dose combinations in a more uniform manner than is typically accomplished via random sampling. A comparison to other space-filling designs is provided in the online supplementary material. This yields n0=20, which represents 25% of the total sample size, following a recommendation of using 20%–50% of the data for the initial design (Picheny & Ginsbourger, 2014). More than one patient is assigned per dose combination to control the cost associated with producing novel dose combinations, a financial constraint from our motivating problem. At each iteration of the algorithm, r=4 additional participants are assigned to each proposed dose combination. This yields a total sample size of 80 after 15 iterations. We note that r is a tuning parameter, and that reducing it will lead to more unique doses being explored for the same fixed sample size. We explore this in the next section.

For the personalized algorithm, c=5 initial dose combinations are selected in the same manner as above. To achieve the same total sample size by iteration as the standard algorithm, only r=2 participants are assigned to each dose combination within each stratum. This yields n0,0=n0,1=10, and so n0=20, as in the standard algorithm. At each iteration within each stratum, r=2 additional participants are assigned to each proposed dose combination. This yields a total sample size of 80 after 15 iterations. We note that this equal allocation of patients across strata represents an idealized trial, which assumes that both the prevalence and enrolment of the specific subgroups is the same throughout the course of the trial. We consider departures from these assumptions in the discussion section. The same setup is used for scenario 3. However, since the number of strata is doubled, the number of participants evaluated at each dose combination is halved for the personalized algorithm. Thus, the standard algorithm still evaluates r=4 participants per dose combination, but the personalized algorithm evaluates r=1 participant per dose combination, yielding a total sample size of 80 after 15 iterations. All computing is performed in the statistical programming language R (R Core Team, 2022). The efficacy functions are modelled using constant mean GP models, which utilize the anisotropic separable squared exponential kernels previously described, with the hyperparameters being jointly optimized by maximizing the marginal log-likelihood of the data (i.e. empirical Bayes GP; Binois & Gramacy, 2021). A set of parametric dose-finding designs was also considered, but yielded performance which was, in general, worse than the standard and personalized algorithms and so are not included here. The simulation details and results are provided in the online supplementary material.

Algorithm performance is compared using several criteria, which are estimated via m=1,,1,000 Monte Carlo simulations. The expected number of dosing units from the optimal dose combination is used to assess how close the recommended dose combination d^k is to dopt,k at each iteration. This measure is defined as the expected value of the Euclidean distance between d^k and dopt,k divided by the precision to which the sponsor can manufacture doses, which is 0.25 in our simulations

(6)

We utilize this single number summary to facilitate visual comparison between the algorithms. We also report different selection metrics, including the probability of correctly selecting the optimal dose, the probability of selecting within one dose of the optimal dose, and the probability of selecting within one dose along the diagonal of the optimal dose, in the online supplementary material. We utilize the average root posterior squared error loss (RPSEL) to assess how well the true efficacy function value f(d^k,zk) at the recommended dose is estimated by the pointwise posterior distribution of the efficacy function at the recommended dose, p(fkD,d^k,zk), where f(s)(d^k,zk) denotes a single posterior sample out of s=1,,10,000 posterior samples:

(7)

We employ f(d^k,zk) here rather than fopt,k to understand how well the algorithms capture f at the recommended dose combination even if the recommended dose combination is not optimal. This is important for later phase studies, which may utilize estimates of f obtained at the recommended dose combination for sample size and power calculations. In a similar manner, we present the average absolute deviation of the posterior mean point estimates E[fkD,d^k,zk] from the true value of f(d^k,zk) at each iteration. Note that the standard algorithm ignores the strata and so yields only a single recommended dose d^k=d^, posterior distribution p(fkD,d^k,zk)=p(fD,d^), and posterior mean E[fkD,d^k,zk]=E[fD,d^] per iteration for k=1,,K. See panels b–d of Figures 13 for these criteria by iteration.

Scenario 1 considers the case of no response heterogeneity across a single binary covariate Z1 for small standardized effect sizes with monotonically increasing dose-efficacy surfaces. Under this scenario, both algorithms converge to the locations of dopt,k, coming within one dosing unit of the optimal (Figure 1b). Estimation of f is challenging for the small standardized effect size (i.e. higher level of noise), but by termination, the algorithms come within 0.4 units of the true f at the recommended dose combination (Figure 1c–d). The personalized algorithm is slightly less efficient than the standard algorithm, however, and takes longer to converge. This is likely the result of the GP model used in the personalized algorithm needing to estimate the additional length-scale parameter for Z1, l1.

Scenario 2 considers response heterogeneity across Z1 for large standardized treatment effect sizes with nonmonotonic dose-efficacy surfaces. Under this scenario, the personalized algorithm converges to the locations of dopt,k and estimates the true value of f at the recommended dose combination well, whereas the standard algorithm does not (Figure 2b–d). The standard algorithm typically explores the area near only a single dopt,k or attempts to explore both optima (not shown). This results from the marginal efficacy function surface being bimodal, since it is a mixture distribution comprised of the equally weighted strata of Z1, which are displayed in Figure 2a. Note that even if the bimodality of the marginal surface is properly identified and explored, patients cannot be optimally treated without consideration of Z1. That is, without this additional covariate information, the standard algorithm cannot determine which mode should be used to treat patients with Z1=0 versus Z1=1. This is possible using the personalized approach, however.

Scenario 3 considers heterogeneity across two binary covariates, Z1 and Z2, and includes zero, small, medium, and large standardized effect sizes with both monotone and nonmonotone dose-efficacy surfaces. Recall that the stratum (z1=0,z2=0) corresponds to those patients who do not respond to the drug. Thus, there are no optima in this stratum, and the corresponding plot in Figure 3b is empty. Figure 3c,d is not empty for this stratum; however, since f=0 everywhere, and so it is of interest to see how the algorithms estimate this value. Under this scenario, the personalized algorithm converges to the locations of dopt,k, coming within 1–1.5 dosing units of the optimal depending on the standardized effect size, whereas the standard algorithm does not (Figure 3b). Since the standard algorithm targets the global optimum, it performs best in stratum (z1=1,z2=1) where the standardized effect is the largest. Accurate estimation of f is challenging under this scenario (Figure 3c,d). The personalized algorithm shows evidence of convergence towards the true values of f (i.e. RPSEL and absolute deviation of the posterior mean estimates decreasing to 0), whereas the standard algorithm does not. The standard algorithm yields only a single estimate of f and so must split the difference among the different efficacy function values across the strata. This scenario suggests that as the number of strata grow, and thus also the likelihood for some degree of response heterogeneity to be present, the performance of the standard algorithm will be further degraded.

In summary, when heterogeneity exists across the strata, the personalized algorithm is superior in both identifying the locations of the dopt,k and estimating f. When no heterogeneity exists, the standard algorithm is slightly more efficient. Additionally, the proposed methods have performed well for both monotonic and nonmonotonic dose-efficacy surfaces, and have done so without utilizing strong prior information.

4 Dose-finding design for an intraocular implant

In this section, we focus on the intraocular implant example. The goal is to develop an intraocular implant with an optimal dose combination of two agents, which reduce intraocular pressure (IOP), a laboratory-confirmed measurement. The normal range of IOP is 12–21 mmHg, with 21–30 mmHg considered elevated IOP. Elevated IOP is a risk factor for ocular hypertension and glaucoma, and is strongly associated with increased vision loss (Leske et al., 2003). The implant seeks to reduce elevated IOP by combining two agents, with doses d1 and d2, each of which has been in use individually in topical form for many years. The agents are well tolerated and we do not expect any drug-related adverse events. We are interested in obtaining the optimal dose combination of these two agents using a reduction in IOP from baseline as a continuous efficacy measure. It is hypothesized that higher doses do not necessarily imply greater efficacy, which is expected to plateau or even decrease at higher levels of agent concentration. Additionally, we expect response heterogeneity to exist with respect to a particular characteristic of the lens of the eye, which we treat as a binary covariate Z1, and are interested in a design that permits the identification of potentially different optimal dose combinations according to this patient characteristic.

We allow the dose-finding algorithm to explore the dose-combination region, assumed to be standardized, subject to the manufacturing precision constraint of 0.25 standardized dosing units and deem it ethically permissible to proceed without safety-related dose escalation/de-escalation rules. It is hypothesized that the implant can reduce elevated IOP by 5 mmHg in individuals with Z1=0, but may be even more effective in individuals with Z1=1, leading to reductions as high as 10 mmHg. To assess the cost and size of a hypothetical trial, we are interested in comparing the standard and personalized dosing approaches under different stopping rule specifications for the scenario described above. The final design is then selected as the one that balances good performance while controlling expected cost. Costs are measured in terms of enrolled participants and also the number of unique dose combinations, since there are engineering costs associated with the production of novel doses.

The goal is to minimize the efficacy function. The data-generating mechanism is y=f(d,z)+ϵ where ϵN(0,σy2), with the specification of f(d,z) included in the second panel of Table 1 (row labelled ‘Implant’) and plotted in Figure 4a. We use the same indexing for dopt,k and fopt,k as described previously. The value of σy=5 ensures medium standardized effect sizes of 1 and 2 for Z1=0 and Z1=1, respectively, across nonmonotonic dose-efficacy surfaces.

Intraocular Implant Scenario: (a) Efficacy function, white stars denote dopt,k, (b) expected sample size and expected number of unique doses evaluated, (c) expected dose units from the optimal dose combination as defined in (6) by iteration, (d) average RPSEL as defined in (7) by iteration, and (e) average absolute deviation of the posterior mean estimate by iteration. On the x-axis in (b), P stands for personalized and S for standard, followed by the number referring to high replication for a smaller number of doses (1) or low replication for a larger number of doses (2).
Figure 4.

Intraocular Implant Scenario: (a) Efficacy function, white stars denote dopt,k, (b) expected sample size and expected number of unique doses evaluated, (c) expected dose units from the optimal dose combination as defined in (6) by iteration, (d) average RPSEL as defined in (7) by iteration, and (e) average absolute deviation of the posterior mean estimate by iteration. On the x-axis in (b), P stands for personalized and S for standard, followed by the number referring to high replication for a smaller number of doses (1) or low replication for a larger number of doses (2).

Two settings of the algorithms are compared for a maximum sample size of 80: one setting includes a higher number of replications at a smaller number of doses, and the other includes a smaller number of replications at a larger number of doses. We denote the standard/personalized algorithms under the first setting as S1/P1 and under the second setting as S2/P2. Under the first setting, S1/P1 are run under the same specifications described in the previous section, where r=4 and r=2 for the standard versus personalized algorithms, respectively. Under the second setting, S2/P2 are run with r=2 and r=1 for the standard versus personalized algorithms, respectively, for c=10 initial dose combinations selected via Sobol sequences.

As the sponsor is concerned about cost and size of the trial, early stopping is permitted using the rule defined in (4). Early stopping is investigated by choosing values of δ as previously described such that there is a moderately high chance of stopping after roughly 40 or 60 total participants are enrolled in the trial (denoted by the values of nstop in Figure 4). These values are {0.00179,0.000971} for S1, {0.00670,0.00345} for P1, {0.00140,0.000820} for S2, and {0.00565,0.00298} for P2. Since we are considering a dual-agent dose combination, J=2 and we thus require the stopping criteria in (4) to be satisfied J+1=3 times before stopping early.

For the personalized algorithm, we permit stratum-specific early stopping. Importantly, should the exploration of one stratum stop early, we allocate the remaining budget to the recruitment of participants in the other stratum. This assumes the sponsor can target recruitment specifically for this group. This reallocation enables resources to be utilized in strata that are harder to optimize, and so may increase performance. We compare this to no early stopping (i.e.nstop=80), where the numbers of participants enrolled in each stratum are equal. When combined with the two settings for each algorithm, 12 unique designs are defined: P1/P2/S1/S2 each of which has three stopping rules, denoted by nstop={40,60,80}. All computing and inference is performed as previously described. The performance of the designs is compared using the previously defined criteria, which are estimated via 1,000 Monte Carlo replicates.

The expected sample size and expected number of unique doses evaluated by each design for the scenario described above is included in Figure 4b. The standard and personalized designs have approximately equal expected samples sizes within each stopping rule, but the personalized designs are expected to evaluate more unique doses on average. The performance of the personalized designs with respect to expected dosing units from the dopt,k (Figure 4c) is comparable within each stopping rule, having differences of around 0.1 standardized dosing units, which are too small to be practically meaningful.

There is some suggestion of a possible increase in the expected number of dosing units from dopt,k under no early stopping (i.e. nstop=80) when compared with early stopping. To investigate this, an additional simulation was performed and compared designs with maximum samples sizes of 40 and 60 under no early stopping to those that permit early stopping at roughly these same number of participants. The simulation suggests that the increase mentioned above results from the equal allocation of participants across strata under no early stopping. When early stopping is permitted, a larger proportion of evaluations is allocated to the stratum, which is more difficult to optimize, and so provides an improved model fit at each iteration of the algorithm, which improves the dose-finding overall. This difference in proportions can be observed in Figure 4b by noting that higher proportions of the expected sample sizes under early stopping rules come from stratum Z1=0, which has a smaller standardized effect size and is thus harder to optimize. Regardless, under the current scenario, the observed difference in expected dosing units from dopt,k between the stopping rules nstop=60 and nstop=80 for the personalized designs is too small to be meaningful. However, future work should more fully investigate how equal versus unequal allocation of participants across the strata at each algorithm iteration impacts design performance.

Design P1 estimates f the best, supporting findings in the literature that suggest higher degrees of replication can beneficial for estimation under noisy settings (Binois et al., 2018). This difference is most apparent between designs P1 and P2 under stopping rule nstop=40 (Figure 4d,e). The standard algorithms perform poorly across all performance metrics, recommending doses that are on average farther away from dopt,k, and poorly estimating the efficacy function f at the recommended dose combinations (Figure 4c–e). This poor performance is expected since response heterogeneity is present in the true data-generating mechanism. If response heterogeneity were not present, we would expect the standard algorithms to be slightly more efficient than their personalized counterparts as was observed in the simulation study from the last section.

To suggest a final design to the sponsor, we use Figure 4 as a visual aid. Since response heterogeneity is expected a priori, the poor performance of the standard designs under this scenario renders them inappropriate. Instead, we select the personalized P1 design with nSTOP=60. For roughly the same expected sample size but for fewer unique dose evaluations, this design yields final dose suggestions that are as close to dopt,k as design P2. This design also offers a mild improvement in the estimation of f when compared with P1 with nstop=40. Choosing P1 with nstop=60 over P1 with nstop=40 does come with additional cost, however: the design with nstop=40 expects to evaluate 13 unique doses and enrol approximately 44 participants, whereas the design with nstop=60 expects to evaluate about 15 unique dose combinations (a 15% increase) and enrol approximately 58 participants (a 32% increase). The sponsor would need to weigh the increased engineering and enrolment costs against the increase in performance.

5 Discussion

In this manuscript, we proposed the use of Bayesian optimization for early-phase multi-agent dose-finding trials in a tolerated toxicity setting. We showed the benefit of taking a personalized approach for dual-agent trials when heterogeneity exists across a set of prespecified subgroups. As expected, under no response heterogeneity, the personalized approach is slightly less efficient. As noted in the introduction, parametric models may suffer from the curse of dimensionality when transitioning from standard to personalized dose-finding, as they require terms for all higher-order dose-covariate interactions. By using the anisotropic squared exponential GP kernel in the Bayesian optimization methods proposed here, however, only a single additional parameter per covariate is required (the additional length-scale parameter corresponding to that covariate). Thus, the proposed methods highlight the benefit and feasibility of adopting a personalized approach towards early-phase multi-agent dose-finding trials for both monotonic and nonmonotonic dose-response surfaces.

The proposed approach is not without limitations. First, the methods proposed in this work assumed no meaningful toxicity across the dose combination space. Extension to higher-grade toxicity settings through incorporation of dose escalation/de-escalation remains as future work. Second, the personalized approach was demonstrated by considering dual-agent dose combinations in predefined subgroups only. Furthermore, the subgroups were defined using binary covariates only, reflecting the focus of the design that motivated this work. Extension to dose combinations with more than two agents and to categorical covariates is straightforward, though other kernel functions may be used if appropriate. Extension to continuous covariates without categorization is not trivial and could be the subject of future investigations. Under this setting, the models may extrapolate into regions of the covariate space that are unobserved in the sample data, and so this should be considered for any future approach. Another direction for future development is to extend the proposed approach to binary and ordinal outcomes where the proposed response models may be defined over a latent continuous surface.

In this manuscript, Bayesian optimization is utilized as a global optimizer. While other global optimization methods exist (e.g. genetic algorithms and simulated annealing), they require many function evaluations and are thus not appropriate for early-phase dose-finding trials where evaluations are expensive. We employed the AEI acquisition function under a GP surrogate model. The performance of the algorithms under additional acquisition functions, surrogate models, and/or kernel functions should be investigated. Different kernel functions may be required to incorporate informative prior information, e.g. specific relationships or orderings among patient subgroups. The (stationary) separable anisotropic kernel used in this work assumes that all strata have the same correlation structure and that the covariance between points in different strata are changed by a multiplicative factor only, which may not be true in general. Indeed, under simulation scenario 3 which included two binary covariates, the efficacy function is zero everywhere for stratum (z1=0,z2=0), so this assumption is not true in this case. The efficacy function values in this stratum are perfectly correlated, whereas those corresponding to dose combinations in other strata are not. Future work should investigate relaxing the assumption of stationarity by using kernels that are nonseparable (e.g. including dose-covariate interaction terms in the kernel function (5)), or even nonstationary, or deep GPs (Sauer et al., 2023). Finally, if the number of included covariates is large and there is reason to believe that only low-level interactions between the drug combinations and covariates exist, different surrogate models could be employed, such as additive GPs or Bayesian additive regression trees (Chipman et al., 2010).

The simulations performed in this manuscript assumed that the subgroups represented by each strata were equally prevalent and had the same enrolment rates throughout the trial. This represents an idealized situation and may not be reasonable to assume in practice. In the extreme scenario of one strata having zero patients, the algorithm would simply base recommendations for this stratum on prior information only. Furthermore, due to the borrowing of information across the strata, the estimation for sparse strata may be dominated by strata with many patients. For these reasons, Zhang et al. (2024) suggest subgroup-specific dose-optimization only be performed for predefined subgroups that have large enough sample sizes in the trial, a recommendation that supports the simulation scenarios evaluated in the present work.

Finally, we adopted an empirical Bayes approach towards the GP hyperparameter estimation to decrease the computational burden of the simulations. Likelihood methods can yield poor results when the sample sizes are small, as in early-phase dose-finding trials, and so full Bayesian inference may be preferred. Unfortunately, the Markov chain Monte Carlo methods typically used to perform full Bayesian inference are prohibitively expensive for the algorithms proposed here, and so a sequential Monte Carlo approach may be a less computationally demanding alternative (Gramacy & Polson, 2011).

Acknowledgments

J.W. acknowledges the support of a doctoral training scholarship from the Fonds de recherche du Québec—Nature et technologies (FRQNT) and a research exchange internship with PharmaLex Belgium.

Funding

S.G. acknowledges the support by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC; RGPIN-2020-04115) and a Chercheurs-boursiers (Junior 1) award from the Fonds de recherche du Québec - Santé (FRQS; #313252). E.E.M.M. acknowledges support from a Discovery Grant from NSERC (RGPIN-2019-04230). E.E.M.M. is a Canada Research Chair (Tier 1) in Statistical Methods for Precision Medicine and acknowledges the support of a chercheur de mérite career award from the FRQS (#309780). This research was enabled in part by support provided by Calcul Québec and the Digital Research Alliance of Canada.

Data availability

No new data were created or analysed in this study. The R scripts used for the simulations and graphics can be found on a public GitHub repository at https://github.com/jjwillard/bayesopt_pers_df.

Supplementary material

Supplementary material is available online at Journal of the Royal Statistical Society: Series C.

References

Binois
 
M.
, &
Gramacy
 
R. B.
(
2021
).
hetgp: Heteroskedastic Gaussian process modeling and sequential design in R
.
Journal of Statistical Software
,
98
(
13
),
1
44
.

Binois
 
M.
,
Gramacy
 
R. B.
, &
Ludkovski
 
M.
(
2018
).
Practical heteroscedastic Gaussian process modeling for large simulation experiments
.
Journal of Computational and Graphical Statistics
,
27
(
4
),
808
821
.

Cai
 
C.
,
Yuan
 
Y.
, &
Ji
 
Y.
(
2014
).
A Bayesian dose-finding design for oncology clinical trials of combinational biological agents
.
Journal of the Royal Statistical Society. Series C
,
63
(
1
),
159
173
.

Chipman
 
H. A.
,
George
 
E. I.
, &
McCulloch
 
R. E.
(
2010
).
BART: Bayesian additive regression trees
.
The Annals of Applied Statistics
,
4
(
1
),
266
298
.

Garnett
 
R.
(
2023
).
Bayesian optimization
.
Cambridge University Press
.

Gramacy
 
R. B.
(
2020
).
Surrogates: Gaussian process modeling, design and optimization for the applied sciences
.
Chapman Hall/CRC
.

Gramacy
 
R. B.
, &
Polson
 
N. G.
(
2011
).
Particle learning of gaussian process models for sequential design and optimization
.
Journal of Computational and Graphical Statistics
,
20
(
1
),
102
118
.

Guo
 
B.
, &
Zang
 
Y.
(
2022
).
A Bayesian phase I/II biomarker-based design for identifying subgroup-specific optimal dose for immunotherapy
.
Statistical Methods in Medical Research
,
31
(
6
),
1104
1119
.

Hirakawa
 
A.
,
Wages
 
N. A.
,
Sato
 
H.
, &
Matsui
 
S.
(
2015
).
A comparative study of adaptive dose-finding designs for phase I oncology trials of combination therapies
.
Statistics in Medicine
,
34
(
24
),
3194
3213
.

Houede
 
N.
,
Thall
 
P. F.
,
Nguyen
 
H.
,
Paoletti
 
X.
, &
Kramar
 
A.
(
2010
).
Utility-based optimization of combination therapy using ordinal toxicity and efficacy in phase I/II trials
.
Biometrics
,
66
(
2
),
532
540
.

Huang
 
D.
,
Allen
 
T. T.
,
Notz
 
W. I.
, &
Zeng
 
N.
(
2006
).
Global optimization of stochastic black-box systems via sequential kriging meta-models
.
Journal of Global Optimization
,
34
(
3
),
441
466
.

Jones
 
D. R.
,
Schonlau
 
M.
, &
Welch
 
W. J.
(
1998
).
Efficient global optimization of expensive black-box functions
.
Journal of Global Optimization
,
13
(
4
),
455
492
.

Leske
 
M. C.
,
Heijl
 
A.
,
Hussein
 
M.
,
Bengtsson
 
B.
,
Hyman
 
L.
,
Komaroff
 
E.
, & Early Manifest Glaucoma Trial Group (
2003
).
Factors for glaucoma progression and the effect of treatment: The early manifest glaucoma trial
.
Archives of Ophthalmology
,
121
(
1
),
48
56
.

Li
 
D. H.
,
Whitmore
 
J. B.
,
Guo
 
W.
, &
Ji
 
Y.
(
2017
).
Toxicity and efficacy probability interval design for phase I adoptive cell therapy dose-finding clinical trials
.
Clinical Cancer Research
,
23
(
1
),
13
20
.

Lyu
 
J.
,
Ji
 
Y.
,
Zhao
 
N.
, &
Catenacci
 
D. V.
(
2019
).
AAA: Triple adaptive Bayesian designs for the identification of optimal dose combinations in dual-agent dose-finding trials
.
Journal of the Royal Statistical Society. Series C
,
68
(
2
),
385
410
.

Morgan-Wall
 
T.
(
2022
). spacefillr: Space-Filling Random and Quasi-Random Sequences, R package version 0.3.2. https://CRAN.R-project.org/package=spacefillr.

Morita
 
S.
,
Thall
 
P. F.
, &
Takeda
 
K.
(
2017
).
A simulation study of methods for selecting subgroup-specific doses in phase 1 trials
.
Pharmaceutical Statistics
,
16
(
2
),
143
156
.

Mozgunov
 
P.
,
Cro
 
S.
,
Lingford-Hughes
 
A.
,
Paterson
 
L. M.
, &
Jaki
 
T.
(
2022
).
A dose-finding design for dual-agent trials with patient-specific doses for one agent with application to an opiate detoxification trial
.
Pharmaceutical Statistics
,
21
(
2
),
476
495
.

Murphy
 
K. P.
(
2023
).
Probabilistic machine learning: Advanced topics
.
MIT Press
. http://probml.github.io/book2.

O’Quigley
 
J.
,
Pepe
 
M.
, &
Fisher
 
L.
(
1990
).
Continual reassessment method: A practical design for phase 1 clinical trials in cancer
.
Biometrics
,
46
(
1
),
33
48
.

Picheny
 
V.
, &
Ginsbourger
 
D.
(
2014
).
Noisy kriging-based optimization methods: A unified implementation within the DiceOptim package
.
Computational Statistics & Data Analysis
,
71
,
1035
1053
.

Picheny
 
V.
,
Ginsbourger
 
D.
,
Richet
 
Y.
, &
Caplin
 
G.
(
2013
).
Quantile-based optimization of noisy computer experiments with tunable precision
.
Technometrics
,
55
(
1
),
2
13
.

Picheny
 
V.
,
Wagner
 
T.
, &
Ginsbourger
 
D.
(
2013
).
A benchmark of kriging-based infill criteria for noisy optimization
.
Structural and Multidisciplinary Optimization
,
48
(
3
),
607
626
.

Psioda
 
M. A.
,
Xu
 
J.
,
Jiang
 
Q.
,
Ke
 
C.
,
Yang
 
Z.
, &
Ibrahim
 
J. G.
(
2021
).
Bayesian adaptive basket trial design using model averaging
.
Biostatistics
,
22
(
1
),
19
34
.

R Core Team
(
2022
). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Sauer
 
A.
,
Gramacy
 
R. B.
, &
Higdon
 
D.
(
2023
).
Active learning for deep gaussian process surrogates
.
Technometrics
,
65
(
1
),
4
18
.

Storer
 
B. E.
(
1989
).
Design and analysis of phase I clinical trials
.
Biometrics
,
45
(
3
),
925
937
.

Takahashi
 
A.
, &
Suzuki
 
T.
(
2021a
).
Bayesian optimization for estimating the maximum tolerated dose in phase I clinical trials
.
Contemporary Clinical Trials Communications
,
21
,
100753
.

Takahashi
 
A.
, &
Suzuki
 
T.
(
2021b
).
Bayesian optimization design for dose-finding based on toxicity and efficacy outcomes in phase I/II clinical trials
.
Pharmaceutical Statistics
,
20
(
3
),
422
439
.

Thomas
 
N.
,
Sweeney
 
K.
, &
Somayaji
 
V.
(
2014
).
Meta-analysis of clinical dose-response in a large drug development portfolio
.
Statistics in Biopharmaceutical Research
,
6
(
4
),
302
317
.

Wages
 
N. A.
, &
Conaway
 
M. R.
(
2014
).
Phase I/II adaptive design for drug combination oncology trials
.
Statistics in Medicine
,
33
(
12
),
1990
2003
.

Wang
 
K.
, &
Ivanova
 
A.
(
2005
).
Two-dimensional dose finding in discrete dose space
.
Biometrics
,
61
(
1
),
217
222
.

Wang
 
Z.
,
Zhang
 
J.
,
Xia
 
T.
,
He
 
R.
, &
Yan
 
F.
(
2023
).
A Bayesian phase I-II clinical trial design to find the biological optimal dose on drug combination
.
Journal of Biopharmaceutical Statistics
,
34
(
4
),
582
595
.

Williams
 
C. K.
, &
Rasmussen
 
C. E.
(
2006
).
Gaussian processes for machine learning
.
MIT Press
.

Yuan
 
Y.
,
Hess
 
K. R.
,
Hilsenbeck
 
S. G.
, &
Gilbert
 
M. R.
(
2016
).
Bayesian optimal interval design: A simple and well-performing design for phase I oncology trials
.
Clinical Cancer Research
,
22
(
17
),
4291
4301
.

Zhang
 
J.
,
Lin
 
R.
,
Chen
 
X.
, &
Yan
 
F.
(
2024
).
Adaptive Bayesian information borrowing methods for finding and optimizing subgroup-specific doses
.
Clinical Trials
,
21
(
3
),
308
321
.

Author notes

Conflicts of interest: None declared.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data