Bayesian optimization for personalized dose-finding trials with combination therapies

Standard Optimization Algorithm

Require: r patient responses at each of c initial dose combinations,

D = {(d_{i}, y_{i})}_{i = 1}^{n_{0}}

n \leftarrow n_{0} = r \times c

1_{STOP} \leftarrow FALSE

3: Obtain

p (f ∣ D, \tilde{d})

and

p (d_{opt} ∣ D)

using fitted GP ⊳ Obtain posteriors

d^{*} \leftarrow {argmin}_{\tilde{d}} μ (\tilde{d}) + σ (\tilde{d})

⊳ Obtain effective best point

f^{*} \leftarrow μ (d^{*})

⊳ Obtain best value of f

6: Calculate

α_{AEI} (\tilde{d} ∣ D)

for

\tilde{d} \in D

⊳ Compute AEI

7: while

n < N

and

1_{STOP} = FALSE

d^{(c + 1)} \leftarrow {argmax}_{\tilde{d} \in D} α_{AEI} (\tilde{d} ∣ D)

⊳ Obtain next dose

9: for

i = 1, \dots, r

10: Evaluate

y_{i}

d^{(c + 1)}

⊳ Observe outcomes

11: end for

12:

n \leftarrow n + r

⁠;

c \leftarrow c + 1

⊳ Update

n, c

13:

D \leftarrow D \cup {(d_{i}^{(c + 1)}, y_{i})}_{i = 1}^{r}

⊳ Update data

14: Obtain

p (f ∣ D, \tilde{d})

and

p (d_{opt} ∣ D)

using fitted GP ⊳ Obtain posteriors

15:

d^{*} \leftarrow {argmin}_{\tilde{d}} μ (\tilde{d}) + σ (\tilde{d})

⊳ Obtain effective best point

16:

f^{*} \leftarrow μ (d^{*})

⊳ Obtain best value of f

17: Calculate

α_{AEI} (\tilde{d} ∣ D)

for

\tilde{d} \in D

⊳ Compute AEI

18: Update

1_{STOP}

using (4) ⊳ Update stopping rule

19: end while

Algorithm 1

Standard Optimization Algorithm

Require: r patient responses at each of c initial dose combinations,

D = {(d_{i}, y_{i})}_{i = 1}^{n_{0}}

n \leftarrow n_{0} = r \times c

1_{STOP} \leftarrow FALSE

3: Obtain

p (f ∣ D, \tilde{d})

and

p (d_{opt} ∣ D)

using fitted GP ⊳ Obtain posteriors

d^{*} \leftarrow {argmin}_{\tilde{d}} μ (\tilde{d}) + σ (\tilde{d})

⊳ Obtain effective best point

f^{*} \leftarrow μ (d^{*})

⊳ Obtain best value of f

6: Calculate

α_{AEI} (\tilde{d} ∣ D)

for

\tilde{d} \in D

⊳ Compute AEI

7: while

n < N

and

1_{STOP} = FALSE

d^{(c + 1)} \leftarrow {argmax}_{\tilde{d} \in D} α_{AEI} (\tilde{d} ∣ D)

⊳ Obtain next dose

9: for

i = 1, \dots, r

10: Evaluate

y_{i}

d^{(c + 1)}

⊳ Observe outcomes

11: end for

12:

n \leftarrow n + r

⁠;

c \leftarrow c + 1

⊳ Update

n, c

13:

D \leftarrow D \cup {(d_{i}^{(c + 1)}, y_{i})}_{i = 1}^{r}

⊳ Update data

14: Obtain

p (f ∣ D, \tilde{d})

and

p (d_{opt} ∣ D)

using fitted GP ⊳ Obtain posteriors

15:

d^{*} \leftarrow {argmin}_{\tilde{d}} μ (\tilde{d}) + σ (\tilde{d})

⊳ Obtain effective best point

16:

f^{*} \leftarrow μ (d^{*})

⊳ Obtain best value of f

17: Calculate

α_{AEI} (\tilde{d} ∣ D)

for

\tilde{d} \in D

⊳ Compute AEI

18: Update

1_{STOP}

using (4) ⊳ Update stopping rule

19: end while

One possible early stopping rule proposed in Huang et al. (2006) is to allow stopping only after $max_{\tilde{d} \in D} α_{AEI} (\tilde{d} ∣ D) < δ$ ⁠. In this case, the algorithm terminates only if there is little improvement to be gained over $f^{*}$ across the dose combination space. Under a noisy setting, the authors suggest this be satisfied for $(J + 1)$ consecutive algorithm iterations before termination. This yields the following stopping rule:

1_{STOP} = {\begin{matrix} TRUE & max_{\tilde{d} \in D} α_{AEI} (\tilde{d} ∣ D) < δ for (J + 1) iterations \\ FALSE & otherwise. \end{matrix}

(4)

We note that δ is a tuning parameter that controls the performance of the algorithms. Its value can be determined through sensitivity analysis using several values obtained through Monte Carlo simulation. For example, the Monte Carlo distributions of $α_{AEI}$ can be obtained at each iteration and different values of δ can be selected as summary statistics of these distributions (e.g. the median). The performance of these values can then be compared, with smaller values of δ implying more stringent stopping criteria, and larger values of δ permitting earlier stopping.

The sequential procedure and stopping rule described above suggest Algorithm 1, referred to as the standard optimization algorithm. In many applications of Bayesian optimization, it is standard to collect the initial data and then continue with $r = 1$ at each iteration of the algorithm. This permits maximal exploration of the domain, as a novel input point from the design space can be evaluated each time. In certain dose-finding applications, such as the motivating example of this work, engineering costs may prohibit the creation of a novel dose combination for each new patient, and some patients may necessarily be assigned to the same dose.

Personalized dose-finding. Personalized medicine recognizes that response heterogeneity may exist within the population of interest; personalized dose-finding incorporates covariate information to account for this. Consider a set of P discrete covariates $Z = {Z_{p}}_{p = 1}^{P}$ ⁠. The Cartesian product of the levels of these P covariates define K strata. Personalized dose-finding seeks to find the optimal dose combinations, denoted by $d_{opt, k}$ ⁠, across the continuous dose combination space for each of the K strata

d_{opt, k} = \underset{d \in D}{argmin} f (d, z_{k}) for k = 1, \dots, K .

The efficacy function $f (d, z)$ is modelled using a single GP fit to the data $D = {(d_{i}, z_{i}, y_{i})}_{i = 1}^{n}$ ⁠, where $y_{i} = f (d_{i}, z_{i}) + ϵ_{i}$ with $ϵ_{i} \sim N (0, σ_{y}^{2})$ ⁠, which allows information to be borrowed across strata. One possible method of incorporating the additional covariates into the GP model is through the kernel function. We use a (stationary) separable anisotropic squared exponential kernel function that includes the additional covariates of interest

K ((d, z), (d, z)^{'}) = exp {- (\sum_{j = 1}^{J} \frac{(d_{j} - d_{j}^{'})^{2}}{2 l_{d_{j}}^{2}} + \sum_{p = 1}^{P} \frac{(z_{p} - z_{p}^{'})^{2}}{2 l_{p}^{2}})} .

(5)

As before, we use an empirical Bayes approach for estimating the hyperparameters. In the case, where $Z_{p}$ is a binary variable representing the levels of covariate p, the correlation between two patient responses is reduced by a factor of $exp (- 1 / (2 l_{p}^{2}))$ if they belong to different strata with respect to covariate p.

Since the efficacy function may exhibit different behaviour within each stratum, each stratum may have a unique best efficacy function value $f_{k}^{*}$ ⁠. Similar to the standard case, $f_{k}^{*}$ is estimated as the posterior mean at the effective best point within stratum k, $f_{k}^{*} = μ (d^{*}, z_{k})$ ⁠. To account for this heterogeneity, the sequential selection is performed within each stratum but with the GP fit utilizing data from all strata. This proceeds by modifying the AEI acquisition function to use $f_{k}^{*}$ when conditioned on being in stratum k, denoted by $α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})$ ⁠. The sequential procedure continues until sample size limits are reached or until an early stopping rule is satisfied for each stratum. Since optimal doses in some strata may be easier to identify than in others, stratum-specific early stopping should be employed. One possible stratum-specific stopping rule, which we denote by $1_{STOP, k}$ ⁠, replaces $α_{AEI} (\tilde{d} ∣ D)$ in (4) with $α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})$ ⁠. Upon termination of the algorithm, the posterior distribution of the optimal dose combination for each strata is returned, denoted by $p (d_{opt} ∣ D, z_{k})$ ⁠. These modifications suggest Algorithm 2, the personalized optimization algorithm.

Algorithm 2

Personalized Optimization Algorithm

Require: r patient responses at each of

c_{k}

initial dose combinations per stratum,

D = {(d_{i}, z_{i}, y_{i})}_{i = 1}^{n_{0}}

n \leftarrow n_{0} = \sum_{k = 1}^{K} n_{0, k}

2: Obtain

p (f ∣ D, \tilde{d}, \tilde{z})

using fitted GP ⊳ Obtain posterior of f

3: for

k = 1, \dots, K

4: Obtain

p (d_{opt, k} ∣ D, z_{k})

⊳ Obtain posterior of

d_{opt, k}

d^{*} \leftarrow {argmin}_{\tilde{d}} μ (\tilde{d}, z_{k}) + σ (\tilde{d}, z_{k})

⊳ Obtain effective best point

f_{k}^{*} \leftarrow μ (d^{*}, z_{k})

⊳ Obtain best value

f_{k}^{*}

7: Calculate

α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})

for

\tilde{d} \in D

⊳ Compute AEI

1_{STOP, k} \leftarrow FALSE

9: end for

10: while

n < N

and

1_{STOP, k} = FALSE

for at least one stratum k do

11: for

k = 1, \dots, K

12:

n_{k} \leftarrow 0

13:

D_{k} = \emptyset

14: if

1_{STOP, k} = FALSE

then

15:

d^{(c_{k} + 1)} \leftarrow {argmax}_{\tilde{d} \in D} α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})

⊳ Obtain next dose

16: for

i = 1, \dots, r

17: Evaluate

y_{i}

(d^{(c_{k} + 1)}, z_{k})

⊳ Observe outcomes

18: end for

19:

n_{k} \leftarrow r

⁠;

c_{k} \leftarrow c_{k} + 1

⊳ Update

n_{k}, c_{k}

20:

D_{k} = {(d_{i}^{(c_{k} + 1)}, z_{k}, y_{i})}_{i = 1}^{r}

21: end if

22: end for

23:

n \leftarrow n + \sum_{k = 1}^{K} n_{k}

⊳ Update n

24:

D = D ⋃_{k = 1}^{K} D_{k}

⊳ Update data

25: Obtain

p (f ∣ D, \tilde{d}, \tilde{z})

using fitted GP ⊳ Obtain posterior of f

26: for

k = 1, \dots, K

27: Obtain

p (d_{opt, k} ∣ D, z_{k})

⊳ Obtain posterior of

d_{opt, k}

28: if

1_{STOP, k} = FALSE

then

29:

d^{*} \leftarrow {argmin}_{\tilde{d}} μ (\tilde{d}, z_{k}) + σ (\tilde{d}, z_{k})

⊳ Obtain effective best point

30:

f_{k}^{*} \leftarrow μ (d^{*}, z_{k})

⊳ Obtain best value

f_{k}^{*}

31: Calculate

α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})

for

\tilde{d} \in D

⊳ Compute AEI

32: Update

1_{STOP, k}

using (4) ⊳ Update stopping rule

33: end if

34: end for

35: end while

Algorithm 2

Personalized Optimization Algorithm

Require: r patient responses at each of

c_{k}

initial dose combinations per stratum,

D = {(d_{i}, z_{i}, y_{i})}_{i = 1}^{n_{0}}

n \leftarrow n_{0} = \sum_{k = 1}^{K} n_{0, k}

2: Obtain

p (f ∣ D, \tilde{d}, \tilde{z})

using fitted GP ⊳ Obtain posterior of f

3: for

k = 1, \dots, K

4: Obtain

p (d_{opt, k} ∣ D, z_{k})

⊳ Obtain posterior of

d_{opt, k}

d^{*} \leftarrow {argmin}_{\tilde{d}} μ (\tilde{d}, z_{k}) + σ (\tilde{d}, z_{k})

⊳ Obtain effective best point

f_{k}^{*} \leftarrow μ (d^{*}, z_{k})

⊳ Obtain best value

f_{k}^{*}

7: Calculate

α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})

for

\tilde{d} \in D

⊳ Compute AEI

1_{STOP, k} \leftarrow FALSE

9: end for

10: while

n < N

and

1_{STOP, k} = FALSE

for at least one stratum k do

11: for

k = 1, \dots, K

12:

n_{k} \leftarrow 0

13:

D_{k} = \emptyset

14: if

1_{STOP, k} = FALSE

then

15:

d^{(c_{k} + 1)} \leftarrow {argmax}_{\tilde{d} \in D} α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})

⊳ Obtain next dose

16: for

i = 1, \dots, r

17: Evaluate

y_{i}

(d^{(c_{k} + 1)}, z_{k})

⊳ Observe outcomes

18: end for

19:

n_{k} \leftarrow r

⁠;

c_{k} \leftarrow c_{k} + 1

⊳ Update

n_{k}, c_{k}

20:

D_{k} = {(d_{i}^{(c_{k} + 1)}, z_{k}, y_{i})}_{i = 1}^{r}

21: end if

22: end for

23:

n \leftarrow n + \sum_{k = 1}^{K} n_{k}

⊳ Update n

24:

D = D ⋃_{k = 1}^{K} D_{k}

⊳ Update data

25: Obtain

p (f ∣ D, \tilde{d}, \tilde{z})

using fitted GP ⊳ Obtain posterior of f

26: for

k = 1, \dots, K

27: Obtain

p (d_{opt, k} ∣ D, z_{k})

⊳ Obtain posterior of

d_{opt, k}

28: if

1_{STOP, k} = FALSE

then

29:

d^{*} \leftarrow {argmin}_{\tilde{d}} μ (\tilde{d}, z_{k}) + σ (\tilde{d}, z_{k})

⊳ Obtain effective best point

30:

f_{k}^{*} \leftarrow μ (d^{*}, z_{k})

⊳ Obtain best value

f_{k}^{*}

31: Calculate

α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})

for

\tilde{d} \in D

⊳ Compute AEI

32: Update

1_{STOP, k}

using (4) ⊳ Update stopping rule

33: end if

34: end for

35: end while

3 Simulation study

Below, we perform a simulation study to compare the performance of the standard and personalized algorithms (Algorithms 1 and 2, respectively) under three scenarios with no early stopping (i.e. $δ = 0$ in (4)). Early stopping will be investigated in the next section. Scenarios 1 and 2 consider a single binary covariate $Z_{1}$ ⁠. We index the true optimal dose combinations, $d_{opt, k}$ ⁠, and the true optimal values of the efficacy function, $f_{opt, k}$ ⁠, using the values of $Z_{1}$ ⁠. That is, when $Z_{1} = 0$ we use $d_{opt, 0}$ and $f_{opt, 0}$ ⁠. Scenario 3 considers two binary covariates, $Z_{1}$ and $Z_{2}$ ⁠, and the $d_{opt, k}$ and $f_{opt, k}$ are indexed similarly. For example, when $Z_{1} = 0$ and $Z_{2} = 1$ ⁠, we use $d_{opt, 01}$ and $f_{opt, 01}$ ⁠. To make the simulations in this manuscript more computationally feasible, we modify Algorithms 1 and 2 to return a point estimate of $d_{opt, k}$ rather than the entire posterior distribution. The point estimate is defined as the minimizer of the posterior mean surface, ${\hat{d}}_{opt, k} = {argmin}_{\tilde{d}} μ (\tilde{d}, {\tilde{z}}_{k})$ ⁠.

We utilize dose combinations $d = (d_{1}, d_{2}) \in [0, 1]^{2}$ ⁠, assumed to be standardized, where $d = (0, 0)$ corresponds to the combination using the lowest doses of interest for each agent and where $d = (1, 1)$ corresponds to the combination using the highest doses of interest for each agent. The point estimates, ${\hat{d}}_{opt, k}$ ⁠, and the next dose combinations for evaluation, $d^{(c_{k} + 1)}$ ⁠, are set, respectively, as the minimizers of $μ (\tilde{d}, {\tilde{z}}_{k})$ and maximizers of $α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})$ ⁠. The $α_{AEI} (\tilde{d} ∣ D, {\tilde{z}}_{k})$ is evaluated across an evenly spaced grid on $[0, 1]^{2}$ ⁠. The grid is incremented by $0.25$ in each dimension, reflecting the degree of precision to which the drug maker can manufacture a particular dose combination. As a result, some dose combinations may be suggested more than once in the algorithm. While the proposed method is capable of optimizing over the continuous dose combination space, it is important to incorporate any manufacturing constraints into the optimization procedures to avoid suggesting doses that are not feasible to engineer.

As before, we define $f (d, z)$ to be a continuous efficacy surface and assume we are in a minimal toxicity setting where it is ethically permissible to select future doses anywhere in the dose combination space. Let $g_{i}$ be a bivariate normal density function, $N (μ_{i}, Σ_{i})$ ⁠, which is parameterized by mean vector $μ_{i}$ and covariance matrix $Σ_{i}$ ⁠, for $i = 1, 2, 3$ ⁠. The efficacy functions under the considered scenarios use the following densities evaluated at $d$ ⁠, denoted by $g_{i} (d)$ in Table 1:

\begin{aligned} g_{1} & = N [\begin{matrix} (\begin{matrix} 1 \\ 1 \end{matrix}), (\begin{matrix} 0.1 & 0 \\ 0 & 0.1 \end{matrix}) \end{matrix}] g_{2} = N [\begin{matrix} (\begin{matrix} 0.25 \\ 0.75 \end{matrix}), (\begin{matrix} 0.2 & 0.05 \\ 0.05 & 0.1 \end{matrix}) \end{matrix}] \\ g_{3} & = N [\begin{matrix} (\begin{matrix} 0.75 \\ 0.25 \end{matrix}), (\begin{matrix} 0.2 & 0.05 \\ 0.05 & 0.1 \end{matrix}) \end{matrix}] . \end{aligned}

The data-generating mechanism for each scenario is $y = f (d, z) + ϵ,$ where $ϵ \sim N (0, σ_{y}^{2})$ ⁠, with the specification of $f (d, z)$ included in the first panel of Table 1 (rows labelled ‘Simulation Study’) and plotted in Figures 1a–3a. The values of $σ_{y}$ are chosen to ensure specific standardized effect sizes, defined as $s e s = | f_{opt} | / σ_{y}$ ⁠. We consider several standardized effect sizes drawn from a meta-analysis of dose-responses for a large drug development portfolio at a pharmaceutical company (Thomas et al., 2014). We focus on standardized effect sizes for drugs that had laboratory-confirmed endpoints, which is the type of endpoint used in our motivating example. The $25 th / 50 th / 75 th$ percentiles of these standardized effect sizes are $0.79 / 1 / 3.77$ ⁠, which we refer to as small/medium/large effect sizes.

Figure 1.

Scenario 1. (a) Efficacy function, white stars denote $d_{opt, k}$ ⁠, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration.

Figure 2.

Scenario 2. (a) Efficacy function, white stars denote $d_{opt, k}$ ⁠, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration.

Figure 3.

Scenario 3. (a) Efficacy function, white stars denote $d_{opt, k}$ ⁠, (b) expected dose units from the optimal dose combination as defined in (6) by iteration, (c) average RPSEL as defined in (7) by iteration, and (d) average absolute deviation of the posterior mean estimate by iteration. In (b), the plot for $(z_{1} = 0, z_{2} = 0)$ is empty as no optimal dose combination exists.

Table 1.

Simulation scenarios considered

	$f (d, z)$	$σ_{y}$	$z_{1}$	$z_{2}$	$d_{opt}$	$f_{opt}$	ses	monotone
Simulation Study	1) $- 1 {z_{1} = 0} \times g_{1} (d) -$	2.015	0		$(1, 1)$	$-$ 1.592	0.79	Yes
	$1 {z_{1} = 1} \times g_{1} (d)$		1		$(1, 1)$	$-$ 1.592	0.79	Yes
	2) $- 1 {z_{1} = 0} \times g_{2} (d) -$	0.319	0		$(0.25, 0.75)$	$-$ 1.203	3.77	No
	$1 {z_{1} = 1} \times g_{3} (d)$		1		$(0.75, 0.25)$	$-$ 1.203	3.77	No
	3) $- 1 {(z_{1} = 0, z_{2} = 0)} \times 0 -$	1	0	0	None	None	0	No
	$1 {(z_{1} = 0, z_{2} = 1)} \times 0.831 g_{2} (d) -$		0	1	$(0.25, 0.75)$	$-$ 1	1	No
	$1 {(z_{1} = 1, z_{2} = 0)} \times 3.134 g_{3} (d) -$		1	0	$(0.75, 0.25)$	$-$ 3.77	3.77	No
	$1 {(z_{1} = 1, z_{2} = 1)} \times 0.496 g_{1} (d)$		1	1	$(1, 1)$	$-$ 0.79	0.79	Yes
Implant	1) $- 1 (z_{1} = 0) \times 2.49 g_{2} (d) -$	5	0		$(0.25, 0.75)$	$-$ 5	1	No
	$1 (z_{1} = 1) \times 6.65 g_{3} (d) - 2$		1		$(0.75, 0.25)$	$-$ 10	2	No

	$f (d, z)$	$σ_{y}$	$z_{1}$	$z_{2}$	$d_{opt}$	$f_{opt}$	ses	monotone
Simulation Study	1) $- 1 {z_{1} = 0} \times g_{1} (d) -$	2.015	0		$(1, 1)$	$-$ 1.592	0.79	Yes
	$1 {z_{1} = 1} \times g_{1} (d)$		1		$(1, 1)$	$-$ 1.592	0.79	Yes
	2) $- 1 {z_{1} = 0} \times g_{2} (d) -$	0.319	0		$(0.25, 0.75)$	$-$ 1.203	3.77	No
	$1 {z_{1} = 1} \times g_{3} (d)$		1		$(0.75, 0.25)$	$-$ 1.203	3.77	No
	3) $- 1 {(z_{1} = 0, z_{2} = 0)} \times 0 -$	1	0	0	None	None	0	No
	$1 {(z_{1} = 0, z_{2} = 1)} \times 0.831 g_{2} (d) -$		0	1	$(0.25, 0.75)$	$-$ 1	1	No
	$1 {(z_{1} = 1, z_{2} = 0)} \times 3.134 g_{3} (d) -$		1	0	$(0.75, 0.25)$	$-$ 3.77	3.77	No
	$1 {(z_{1} = 1, z_{2} = 1)} \times 0.496 g_{1} (d)$		1	1	$(1, 1)$	$-$ 0.79	0.79	Yes
Implant	1) $- 1 (z_{1} = 0) \times 2.49 g_{2} (d) -$	5	0		$(0.25, 0.75)$	$-$ 5	1	No
	$1 (z_{1} = 1) \times 6.65 g_{3} (d) - 2$		1		$(0.75, 0.25)$	$-$ 10	2	No

Note. The data-generating mechanism for each scenario is $y = f (d, z) + ϵ$ where $ϵ \sim N (0, σ_{y}^{2})$ ⁠. The table columns contain the location of the optimal dose combination (⁠ $d_{opt}$ ⁠), the optimal value of the efficacy function (⁠ $f_{opt}$ ⁠), the standardized effect size (ses), and whether or not the dose-efficacy surface is monotonically increasing with respect to each dosing dimension (monotone). Note that $g_{i} (d)$ for $1 = 1, 2, 3$ represents the value of a bivariate normal density function with specific mean vector and covariance matrix evaluated at $d$ as defined in the text. The subtraction of 2 in $f (d, z)$ under the Implant scenario corresponds to a base level of drug response outside the regions of optimality.

Table 1.

Simulation scenarios considered

	$f (d, z)$	$σ_{y}$	$z_{1}$	$z_{2}$	$d_{opt}$	$f_{opt}$	ses	monotone
Simulation Study	1) $- 1 {z_{1} = 0} \times g_{1} (d) -$	2.015	0		$(1, 1)$	$-$ 1.592	0.79	Yes
	$1 {z_{1} = 1} \times g_{1} (d)$		1		$(1, 1)$	$-$ 1.592	0.79	Yes
	2) $- 1 {z_{1} = 0} \times g_{2} (d) -$	0.319	0		$(0.25, 0.75)$	$-$ 1.203	3.77	No
	$1 {z_{1} = 1} \times g_{3} (d)$		1		$(0.75, 0.25)$	$-$ 1.203	3.77	No
	3) $- 1 {(z_{1} = 0, z_{2} = 0)} \times 0 -$	1	0	0	None	None	0	No
	$1 {(z_{1} = 0, z_{2} = 1)} \times 0.831 g_{2} (d) -$		0	1	$(0.25, 0.75)$	$-$ 1	1	No
	$1 {(z_{1} = 1, z_{2} = 0)} \times 3.134 g_{3} (d) -$		1	0	$(0.75, 0.25)$	$-$ 3.77	3.77	No
	$1 {(z_{1} = 1, z_{2} = 1)} \times 0.496 g_{1} (d)$		1	1	$(1, 1)$	$-$ 0.79	0.79	Yes
Implant	1) $- 1 (z_{1} = 0) \times 2.49 g_{2} (d) -$	5	0		$(0.25, 0.75)$	$-$ 5	1	No
	$1 (z_{1} = 1) \times 6.65 g_{3} (d) - 2$		1		$(0.75, 0.25)$	$-$ 10	2	No

	$f (d, z)$	$σ_{y}$	$z_{1}$	$z_{2}$	$d_{opt}$	$f_{opt}$	ses	monotone
Simulation Study	1) $- 1 {z_{1} = 0} \times g_{1} (d) -$	2.015	0		$(1, 1)$	$-$ 1.592	0.79	Yes
	$1 {z_{1} = 1} \times g_{1} (d)$		1		$(1, 1)$	$-$ 1.592	0.79	Yes
	2) $- 1 {z_{1} = 0} \times g_{2} (d) -$	0.319	0		$(0.25, 0.75)$	$-$ 1.203	3.77	No
	$1 {z_{1} = 1} \times g_{3} (d)$		1		$(0.75, 0.25)$	$-$ 1.203	3.77	No
	3) $- 1 {(z_{1} = 0, z_{2} = 0)} \times 0 -$	1	0	0	None	None	0	No
	$1 {(z_{1} = 0, z_{2} = 1)} \times 0.831 g_{2} (d) -$		0	1	$(0.25, 0.75)$	$-$ 1	1	No
	$1 {(z_{1} = 1, z_{2} = 0)} \times 3.134 g_{3} (d) -$		1	0	$(0.75, 0.25)$	$-$ 3.77	3.77	No
	$1 {(z_{1} = 1, z_{2} = 1)} \times 0.496 g_{1} (d)$		1	1	$(1, 1)$	$-$ 0.79	0.79	Yes
Implant	1) $- 1 (z_{1} = 0) \times 2.49 g_{2} (d) -$	5	0		$(0.25, 0.75)$	$-$ 5	1	No
	$1 (z_{1} = 1) \times 6.65 g_{3} (d) - 2$		1		$(0.75, 0.25)$	$-$ 10	2	No

Scenario 1 considers the case of no response heterogeneity across a binary covariate $Z_{1}$ and includes a small standardized effect size with a dose-efficacy surface, which is monotonically increasing with respect to each dosing dimension. Both the locations of the optimal dose combinations, and the optimal values of the efficacy function, are the same across the strata. That is, $d_{opt, 0} = d_{opt, 1}$ and $f_{opt, 0} = f_{opt, 1}$ ⁠. Scenario 2 considers response heterogeneity across $Z_{1}$ ⁠, where the locations of the optimal dose combinations differ. Under this scenario, $d_{opt, 0} \neq d_{opt, 1}$ but $f_{opt, 0} = f_{opt, 1}$ ⁠. This scenario considers large standardized effect sizes with efficacy surfaces that are nonmonotone with respect to each dosing dimension. An example of a single trial performed using the personalized algorithm for this scenario is provided in Figure E.1 in the online supplementary material. Scenario 3 considers heterogeneity across two binary covariates, $Z_{1}$ and $Z_{2}$ ⁠, where both the locations of the optimal dose combinations and the optimal values of the efficacy function differ across the strata. Thus, $d_{opt, i j} \neq d_{opt, l m}$ for $i j \neq l m$ and $f_{opt, i j} \neq f_{opt, l m}$ for $i j \neq l m$ ⁠. This scenario includes a zero stratum, $(z_{1} = 0, z_{2} = 0)$ ⁠, which represents the covariate pattern of those who do not respond to the drug. We note that in this stratum, $d_{opt, 00}$ and $f_{opt, 00}$ do not exist, but we consider the standardized effect size to be 0. This scenario includes small, medium, and large standardized effect sizes as well as both monotone and nonmonotone dose-efficacy surfaces.

For each scenario, the standard and personalized algorithms are run using a maximum sample size of 80 participants. While this number is larger than many early-phase trials might be in practice, our goal is to investigate the algorithms’ performance characteristics as the sample size increases. We defer the use of early stopping rules to the following section, but note that these will permit a reduction in the expected sample size. We assign participants to dose combinations such that the total sample size of each algorithm is equal at each iteration, which allows a comparison of their performance. For the standard algorithm under scenarios 1 and 2, $r = 4$ participants are assigned to each dose combination on an initial dose matrix comprised of $c = 5$ dose combinations, which are selected via a two-dimensional quasi-random Sobol sequence (Morgan-Wall, 2022). This sequence serves as a space-filling design and seeks to spread out the initial dose combinations in a more uniform manner than is typically accomplished via random sampling. A comparison to other space-filling designs is provided in the online supplementary material. This yields $n_{0} = 20$ ⁠, which represents 25% of the total sample size, following a recommendation of using 20%–50% of the data for the initial design (Picheny & Ginsbourger, 2014). More than one patient is assigned per dose combination to control the cost associated with producing novel dose combinations, a financial constraint from our motivating problem. At each iteration of the algorithm, $r = 4$ additional participants are assigned to each proposed dose combination. This yields a total sample size of 80 after 15 iterations. We note that r is a tuning parameter, and that reducing it will lead to more unique doses being explored for the same fixed sample size. We explore this in the next section.

For the personalized algorithm, $c = 5$ initial dose combinations are selected in the same manner as above. To achieve the same total sample size by iteration as the standard algorithm, only $r = 2$ participants are assigned to each dose combination within each stratum. This yields $n_{0, 0} = n_{0, 1} = 10$ ⁠, and so $n_{0} = 20$ ⁠, as in the standard algorithm. At each iteration within each stratum, $r = 2$ additional participants are assigned to each proposed dose combination. This yields a total sample size of 80 after 15 iterations. We note that this equal allocation of patients across strata represents an idealized trial, which assumes that both the prevalence and enrolment of the specific subgroups is the same throughout the course of the trial. We consider departures from these assumptions in the discussion section. The same setup is used for scenario 3. However, since the number of strata is doubled, the number of participants evaluated at each dose combination is halved for the personalized algorithm. Thus, the standard algorithm still evaluates $r = 4$ participants per dose combination, but the personalized algorithm evaluates $r = 1$ participant per dose combination, yielding a total sample size of 80 after 15 iterations. All computing is performed in the statistical programming language R (R Core Team, 2022). The efficacy functions are modelled using constant mean GP models, which utilize the anisotropic separable squared exponential kernels previously described, with the hyperparameters being jointly optimized by maximizing the marginal log-likelihood of the data (i.e. empirical Bayes GP; Binois & Gramacy, 2021). A set of parametric dose-finding designs was also considered, but yielded performance which was, in general, worse than the standard and personalized algorithms and so are not included here. The simulation details and results are provided in the online supplementary material.

Algorithm performance is compared using several criteria, which are estimated via $m = 1, \dots, 1,000$ Monte Carlo simulations. The expected number of dosing units from the optimal dose combination is used to assess how close the recommended dose combination ${\hat{d}}_{k}$ is to $d_{opt, k}$ at each iteration. This measure is defined as the expected value of the Euclidean distance between ${\hat{d}}_{k}$ and $d_{opt, k}$ divided by the precision to which the sponsor can manufacture doses, which is $0.25$ in our simulations

E_{y ∣ z_{k}} [dose units] \approx \frac{1}{1,000} \sum_{m = 1}^{1,000} \frac{\sqrt{({\hat{d}}_{1}^{(m)} - d_{1, o p t})_{k}^{2} + ({\hat{d}}_{2}^{(m)} - d_{2, o p t})_{k}^{2}}}{0.25} .

(6)

We utilize this single number summary to facilitate visual comparison between the algorithms. We also report different selection metrics, including the probability of correctly selecting the optimal dose, the probability of selecting within one dose of the optimal dose, and the probability of selecting within one dose along the diagonal of the optimal dose, in the online supplementary material. We utilize the average root posterior squared error loss (RPSEL) to assess how well the true efficacy function value $f ({\hat{d}}_{k}, z_{k})$ at the recommended dose is estimated by the pointwise posterior distribution of the efficacy function at the recommended dose, $p (f_{k} ∣ D, {\hat{d}}_{k}, z_{k})$ ⁠, where $f^{(s)} ({\hat{d}}_{k}, z_{k})$ denotes a single posterior sample out of $s = 1, \dots, 10,000$ posterior samples:

Average RPSEL \approx \frac{1}{1,000} \sum_{m = 1}^{1,000} {[\frac{1}{10,000} \sum_{s = 1}^{10,000} (f^{(s)} ({\hat{d}}_{k}^{(m)}, z_{k}) - f ({\hat{d}}_{k}, z_{k}))^{2}]}^{\frac{1}{2}} .

(7)

We employ $f ({\hat{d}}_{k}, z_{k})$ here rather than $f_{opt, k}$ to understand how well the algorithms capture f at the recommended dose combination even if the recommended dose combination is not optimal. This is important for later phase studies, which may utilize estimates of f obtained at the recommended dose combination for sample size and power calculations. In a similar manner, we present the average absolute deviation of the posterior mean point estimates $E [f_{k} ∣ D, {\hat{d}}_{k}, z_{k}]$ from the true value of $f ({\hat{d}}_{k}, z_{k})$ at each iteration. Note that the standard algorithm ignores the strata and so yields only a single recommended dose ${\hat{d}}_{k} = \hat{d}$ ⁠, posterior distribution $p (f_{k} ∣ D, {\hat{d}}_{k}, z_{k}) = p (f ∣ D, \hat{d})$ ⁠, and posterior mean $E [f_{k} ∣ D, {\hat{d}}_{k}, z_{k}] = E [f ∣ D, \hat{d}]$ per iteration for $k = 1, \dots, K$ ⁠. See panels b–d of Figures 1–3 for these criteria by iteration.

Scenario 1 considers the case of no response heterogeneity across a single binary covariate $Z_{1}$ for small standardized effect sizes with monotonically increasing dose-efficacy surfaces. Under this scenario, both algorithms converge to the locations of $d_{opt, k}$ ⁠, coming within one dosing unit of the optimal (Figure 1b). Estimation of f is challenging for the small standardized effect size (i.e. higher level of noise), but by termination, the algorithms come within 0.4 units of the true f at the recommended dose combination (Figure 1c–d). The personalized algorithm is slightly less efficient than the standard algorithm, however, and takes longer to converge. This is likely the result of the GP model used in the personalized algorithm needing to estimate the additional length-scale parameter for $Z_{1}$ ⁠, $l_{1}$ ⁠.

Scenario 2 considers response heterogeneity across $Z_{1}$ for large standardized treatment effect sizes with nonmonotonic dose-efficacy surfaces. Under this scenario, the personalized algorithm converges to the locations of $d_{opt, k}$ and estimates the true value of f at the recommended dose combination well, whereas the standard algorithm does not (Figure 2b–d). The standard algorithm typically explores the area near only a single $d_{opt, k}$ or attempts to explore both optima (not shown). This results from the marginal efficacy function surface being bimodal, since it is a mixture distribution comprised of the equally weighted strata of $Z_{1}$ ⁠, which are displayed in Figure 2a. Note that even if the bimodality of the marginal surface is properly identified and explored, patients cannot be optimally treated without consideration of $Z_{1}$ ⁠. That is, without this additional covariate information, the standard algorithm cannot determine which mode should be used to treat patients with $Z_{1} = 0$ versus $Z_{1} = 1$ ⁠. This is possible using the personalized approach, however.

Scenario 3 considers heterogeneity across two binary covariates, $Z_{1}$ and $Z_{2}$ ⁠, and includes zero, small, medium, and large standardized effect sizes with both monotone and nonmonotone dose-efficacy surfaces. Recall that the stratum $(z_{1} = 0, z_{2} = 0)$ corresponds to those patients who do not respond to the drug. Thus, there are no optima in this stratum, and the corresponding plot in Figure 3b is empty. Figure 3c,d is not empty for this stratum; however, since $f = 0$ everywhere, and so it is of interest to see how the algorithms estimate this value. Under this scenario, the personalized algorithm converges to the locations of $d_{opt, k}$ ⁠, coming within 1–1.5 dosing units of the optimal depending on the standardized effect size, whereas the standard algorithm does not (Figure 3b). Since the standard algorithm targets the global optimum, it performs best in stratum $(z_{1} = 1, z_{2} = 1)$ where the standardized effect is the largest. Accurate estimation of f is challenging under this scenario (Figure 3c,d). The personalized algorithm shows evidence of convergence towards the true values of f (i.e. RPSEL and absolute deviation of the posterior mean estimates decreasing to 0), whereas the standard algorithm does not. The standard algorithm yields only a single estimate of f and so must split the difference among the different efficacy function values across the strata. This scenario suggests that as the number of strata grow, and thus also the likelihood for some degree of response heterogeneity to be present, the performance of the standard algorithm will be further degraded.

In summary, when heterogeneity exists across the strata, the personalized algorithm is superior in both identifying the locations of the $d_{opt, k}$ and estimating f. When no heterogeneity exists, the standard algorithm is slightly more efficient. Additionally, the proposed methods have performed well for both monotonic and nonmonotonic dose-efficacy surfaces, and have done so without utilizing strong prior information.

4 Dose-finding design for an intraocular implant

In this section, we focus on the intraocular implant example. The goal is to develop an intraocular implant with an optimal dose combination of two agents, which reduce intraocular pressure (IOP), a laboratory-confirmed measurement. The normal range of IOP is 12–21 mmHg, with 21–30 mmHg considered elevated IOP. Elevated IOP is a risk factor for ocular hypertension and glaucoma, and is strongly associated with increased vision loss (Leske et al., 2003). The implant seeks to reduce elevated IOP by combining two agents, with doses $d_{1}$ and $d_{2}$ ⁠, each of which has been in use individually in topical form for many years. The agents are well tolerated and we do not expect any drug-related adverse events. We are interested in obtaining the optimal dose combination of these two agents using a reduction in IOP from baseline as a continuous efficacy measure. It is hypothesized that higher doses do not necessarily imply greater efficacy, which is expected to plateau or even decrease at higher levels of agent concentration. Additionally, we expect response heterogeneity to exist with respect to a particular characteristic of the lens of the eye, which we treat as a binary covariate $Z_{1}$ ⁠, and are interested in a design that permits the identification of potentially different optimal dose combinations according to this patient characteristic.

We allow the dose-finding algorithm to explore the dose-combination region, assumed to be standardized, subject to the manufacturing precision constraint of $0.25$ standardized dosing units and deem it ethically permissible to proceed without safety-related dose escalation/de-escalation rules. It is hypothesized that the implant can reduce elevated IOP by 5 mmHg in individuals with $Z_{1} = 0$ ⁠, but may be even more effective in individuals with $Z_{1} = 1$ ⁠, leading to reductions as high as 10 mmHg. To assess the cost and size of a hypothetical trial, we are interested in comparing the standard and personalized dosing approaches under different stopping rule specifications for the scenario described above. The final design is then selected as the one that balances good performance while controlling expected cost. Costs are measured in terms of enrolled participants and also the number of unique dose combinations, since there are engineering costs associated with the production of novel doses.

The goal is to minimize the efficacy function. The data-generating mechanism is $y = f (d, z) + ϵ$ where $ϵ \sim N (0, σ_{y}^{2})$ ⁠, with the specification of $f (d, z)$ included in the second panel of Table 1 (row labelled ‘Implant’) and plotted in Figure 4a. We use the same indexing for $d_{opt, k}$ and $f_{opt, k}$ as described previously. The value of $σ_{y} = 5$ ensures medium standardized effect sizes of 1 and 2 for $Z_{1} = 0$ and $Z_{1} = 1$ ⁠, respectively, across nonmonotonic dose-efficacy surfaces.

Figure 4.

Intraocular Implant Scenario: (a) Efficacy function, white stars denote $d_{opt, k}$ ⁠, (b) expected sample size and expected number of unique doses evaluated, (c) expected dose units from the optimal dose combination as defined in (6) by iteration, (d) average RPSEL as defined in (7) by iteration, and (e) average absolute deviation of the posterior mean estimate by iteration. On the x-axis in (b), P stands for personalized and S for standard, followed by the number referring to high replication for a smaller number of doses (1) or low replication for a larger number of doses (2).

Two settings of the algorithms are compared for a maximum sample size of 80: one setting includes a higher number of replications at a smaller number of doses, and the other includes a smaller number of replications at a larger number of doses. We denote the standard/personalized algorithms under the first setting as $S 1 / P 1$ and under the second setting as $S 2 / P 2$ ⁠. Under the first setting, $S 1 / P 1$ are run under the same specifications described in the previous section, where $r = 4$ and $r = 2$ for the standard versus personalized algorithms, respectively. Under the second setting, $S 2 / P 2$ are run with $r = 2$ and $r = 1$ for the standard versus personalized algorithms, respectively, for $c = 10$ initial dose combinations selected via Sobol sequences.

As the sponsor is concerned about cost and size of the trial, early stopping is permitted using the rule defined in (4). Early stopping is investigated by choosing values of δ as previously described such that there is a moderately high chance of stopping after roughly 40 or 60 total participants are enrolled in the trial (denoted by the values of $n_{stop}$ in Figure 4). These values are ${0.00179, 0.000971}$ for $S 1$ ⁠, ${0.00670, 0.00345}$ for $P 1$ ⁠, ${0.00140, 0.000820}$ for $S 2$ ⁠, and ${0.00565, 0.00298}$ for $P 2$ ⁠. Since we are considering a dual-agent dose combination, $J = 2$ and we thus require the stopping criteria in (4) to be satisfied $J + 1 = 3$ times before stopping early.

For the personalized algorithm, we permit stratum-specific early stopping. Importantly, should the exploration of one stratum stop early, we allocate the remaining budget to the recruitment of participants in the other stratum. This assumes the sponsor can target recruitment specifically for this group. This reallocation enables resources to be utilized in strata that are harder to optimize, and so may increase performance. We compare this to no early stopping (⁠ $i . e . n_{stop} = 80$ ⁠), where the numbers of participants enrolled in each stratum are equal. When combined with the two settings for each algorithm, 12 unique designs are defined: $P 1 / P 2 / S 1 / S 2$ each of which has three stopping rules, denoted by $n_{stop} = {40, 60, 80}$ ⁠. All computing and inference is performed as previously described. The performance of the designs is compared using the previously defined criteria, which are estimated via 1,000 Monte Carlo replicates.

The expected sample size and expected number of unique doses evaluated by each design for the scenario described above is included in Figure 4b. The standard and personalized designs have approximately equal expected samples sizes within each stopping rule, but the personalized designs are expected to evaluate more unique doses on average. The performance of the personalized designs with respect to expected dosing units from the $d_{opt, k}$ (Figure 4c) is comparable within each stopping rule, having differences of around 0.1 standardized dosing units, which are too small to be practically meaningful.

There is some suggestion of a possible increase in the expected number of dosing units from $d_{opt, k}$ under no early stopping (i.e. $n_{stop} = 80$ ⁠) when compared with early stopping. To investigate this, an additional simulation was performed and compared designs with maximum samples sizes of 40 and 60 under no early stopping to those that permit early stopping at roughly these same number of participants. The simulation suggests that the increase mentioned above results from the equal allocation of participants across strata under no early stopping. When early stopping is permitted, a larger proportion of evaluations is allocated to the stratum, which is more difficult to optimize, and so provides an improved model fit at each iteration of the algorithm, which improves the dose-finding overall. This difference in proportions can be observed in Figure 4b by noting that higher proportions of the expected sample sizes under early stopping rules come from stratum $Z_{1} = 0$ ⁠, which has a smaller standardized effect size and is thus harder to optimize. Regardless, under the current scenario, the observed difference in expected dosing units from $d_{opt, k}$ between the stopping rules $n_{stop} = 60$ and $n_{stop} = 80$ for the personalized designs is too small to be meaningful. However, future work should more fully investigate how equal versus unequal allocation of participants across the strata at each algorithm iteration impacts design performance.

Design $P 1$ estimates f the best, supporting findings in the literature that suggest higher degrees of replication can beneficial for estimation under noisy settings (Binois et al., 2018). This difference is most apparent between designs $P 1$ and $P 2$ under stopping rule $n_{stop} = 40$ (Figure 4d,e). The standard algorithms perform poorly across all performance metrics, recommending doses that are on average farther away from $d_{opt, k}$ ⁠, and poorly estimating the efficacy function f at the recommended dose combinations (Figure 4c–e). This poor performance is expected since response heterogeneity is present in the true data-generating mechanism. If response heterogeneity were not present, we would expect the standard algorithms to be slightly more efficient than their personalized counterparts as was observed in the simulation study from the last section.

To suggest a final design to the sponsor, we use Figure 4 as a visual aid. Since response heterogeneity is expected a priori, the poor performance of the standard designs under this scenario renders them inappropriate. Instead, we select the personalized $P 1$ design with $n_{STOP} = 60$ ⁠. For roughly the same expected sample size but for fewer unique dose evaluations, this design yields final dose suggestions that are as close to $d_{opt, k}$ as design $P 2$ ⁠. This design also offers a mild improvement in the estimation of f when compared with $P 1$ with $n_{stop} = 40$ ⁠. Choosing $P 1$ with $n_{stop} = 60$ over $P 1$ with $n_{stop} = 40$ does come with additional cost, however: the design with $n_{stop} = 40$ expects to evaluate 13 unique doses and enrol approximately 44 participants, whereas the design with $n_{stop} = 60$ expects to evaluate about 15 unique dose combinations (a 15% increase) and enrol approximately 58 participants (a 32% increase). The sponsor would need to weigh the increased engineering and enrolment costs against the increase in performance.

5 Discussion

In this manuscript, we proposed the use of Bayesian optimization for early-phase multi-agent dose-finding trials in a tolerated toxicity setting. We showed the benefit of taking a personalized approach for dual-agent trials when heterogeneity exists across a set of prespecified subgroups. As expected, under no response heterogeneity, the personalized approach is slightly less efficient. As noted in the introduction, parametric models may suffer from the curse of dimensionality when transitioning from standard to personalized dose-finding, as they require terms for all higher-order dose-covariate interactions. By using the anisotropic squared exponential GP kernel in the Bayesian optimization methods proposed here, however, only a single additional parameter per covariate is required (the additional length-scale parameter corresponding to that covariate). Thus, the proposed methods highlight the benefit and feasibility of adopting a personalized approach towards early-phase multi-agent dose-finding trials for both monotonic and nonmonotonic dose-response surfaces.

The proposed approach is not without limitations. First, the methods proposed in this work assumed no meaningful toxicity across the dose combination space. Extension to higher-grade toxicity settings through incorporation of dose escalation/de-escalation remains as future work. Second, the personalized approach was demonstrated by considering dual-agent dose combinations in predefined subgroups only. Furthermore, the subgroups were defined using binary covariates only, reflecting the focus of the design that motivated this work. Extension to dose combinations with more than two agents and to categorical covariates is straightforward, though other kernel functions may be used if appropriate. Extension to continuous covariates without categorization is not trivial and could be the subject of future investigations. Under this setting, the models may extrapolate into regions of the covariate space that are unobserved in the sample data, and so this should be considered for any future approach. Another direction for future development is to extend the proposed approach to binary and ordinal outcomes where the proposed response models may be defined over a latent continuous surface.

In this manuscript, Bayesian optimization is utilized as a global optimizer. While other global optimization methods exist (e.g. genetic algorithms and simulated annealing), they require many function evaluations and are thus not appropriate for early-phase dose-finding trials where evaluations are expensive. We employed the AEI acquisition function under a GP surrogate model. The performance of the algorithms under additional acquisition functions, surrogate models, and/or kernel functions should be investigated. Different kernel functions may be required to incorporate informative prior information, e.g. specific relationships or orderings among patient subgroups. The (stationary) separable anisotropic kernel used in this work assumes that all strata have the same correlation structure and that the covariance between points in different strata are changed by a multiplicative factor only, which may not be true in general. Indeed, under simulation scenario 3 which included two binary covariates, the efficacy function is zero everywhere for stratum $(z_{1} = 0, z_{2} = 0)$ ⁠, so this assumption is not true in this case. The efficacy function values in this stratum are perfectly correlated, whereas those corresponding to dose combinations in other strata are not. Future work should investigate relaxing the assumption of stationarity by using kernels that are nonseparable (e.g. including dose-covariate interaction terms in the kernel function (5)), or even nonstationary, or deep GPs (Sauer et al., 2023). Finally, if the number of included covariates is large and there is reason to believe that only low-level interactions between the drug combinations and covariates exist, different surrogate models could be employed, such as additive GPs or Bayesian additive regression trees (Chipman et al., 2010).

The simulations performed in this manuscript assumed that the subgroups represented by each strata were equally prevalent and had the same enrolment rates throughout the trial. This represents an idealized situation and may not be reasonable to assume in practice. In the extreme scenario of one strata having zero patients, the algorithm would simply base recommendations for this stratum on prior information only. Furthermore, due to the borrowing of information across the strata, the estimation for sparse strata may be dominated by strata with many patients. For these reasons, Zhang et al. (2024) suggest subgroup-specific dose-optimization only be performed for predefined subgroups that have large enough sample sizes in the trial, a recommendation that supports the simulation scenarios evaluated in the present work.

Finally, we adopted an empirical Bayes approach towards the GP hyperparameter estimation to decrease the computational burden of the simulations. Likelihood methods can yield poor results when the sample sizes are small, as in early-phase dose-finding trials, and so full Bayesian inference may be preferred. Unfortunately, the Markov chain Monte Carlo methods typically used to perform full Bayesian inference are prohibitively expensive for the algorithms proposed here, and so a sequential Monte Carlo approach may be a less computationally demanding alternative (Gramacy & Polson, 2011).

Acknowledgments

J.W. acknowledges the support of a doctoral training scholarship from the Fonds de recherche du Québec—Nature et technologies (FRQNT) and a research exchange internship with PharmaLex Belgium.

Funding

S.G. acknowledges the support by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC; RGPIN-2020-04115) and a Chercheurs-boursiers (Junior 1) award from the Fonds de recherche du Québec - Santé (FRQS; #313252). E.E.M.M. acknowledges support from a Discovery Grant from NSERC (RGPIN-2019-04230). E.E.M.M. is a Canada Research Chair (Tier 1) in Statistical Methods for Precision Medicine and acknowledges the support of a chercheur de mérite career award from the FRQS (#309780). This research was enabled in part by support provided by Calcul Québec and the Digital Research Alliance of Canada.

Data availability

No new data were created or analysed in this study. The R scripts used for the simulations and graphics can be found on a public GitHub repository at https://github.com/jjwillard/bayesopt_pers_df.

Supplementary material

Supplementary material is available online at Journal of the Royal Statistical Society: Series C.

References

Binois

, &

Gramacy

R. B.

(

2021

hetgp: Heteroskedastic Gaussian process modeling and sequential design in R

Journal of Statistical Software

(

–

10.18637/jss.v098.i13

10.1080/10618600.2018.1458625

Binois

Gramacy

R. B.

, &

Ludkovski

(

2018

Practical heteroscedastic Gaussian process modeling for large simulation experiments

Journal of Computational and Graphical Statistics

(

808

–

821

Cai

Yuan

, &

(

2014

A Bayesian dose-finding design for oncology clinical trials of combinational biological agents

Journal of the Royal Statistical Society. Series C

(

159

–

173

Chipman

H. A.

George

E. I.

, &

McCulloch

R. E.

(

2010

BART: Bayesian additive regression trees

The Annals of Applied Statistics

(

266

–

298

Garnett

(

2023

Bayesian optimization

Cambridge University Press

Gramacy

R. B.

(

2020

Surrogates: Gaussian process modeling, design and optimization for the applied sciences

Chapman Hall/CRC

Gramacy

R. B.

, &

Polson

N. G.

(

2011

Particle learning of gaussian process models for sequential design and optimization

Journal of Computational and Graphical Statistics

(

102

–

118

10.1198/jcgs.2010.09171

10.1177/09622802221080753

Guo

, &

Zang

(

2022

A Bayesian phase I/II biomarker-based design for identifying subgroup-specific optimal dose for immunotherapy

Statistical Methods in Medical Research

(

1104

–

1119

Hirakawa

Wages

N. A.

Sato

, &

Matsui

(

2015

A comparative study of adaptive dose-finding designs for phase I oncology trials of combination therapies

Statistics in Medicine

(

3194

–

3213

Houede

Thall

P. F.

Nguyen

Paoletti

, &

Kramar

(

2010

Utility-based optimization of combination therapy using ordinal toxicity and efficacy in phase I/II trials

Biometrics

(

532

–

540

10.1111/biom.2010.66.issue-2

Huang

Allen

T. T.

Notz

W. I.

, &

Zeng

(

2006

Global optimization of stochastic black-box systems via sequential kriging meta-models

Journal of Global Optimization

(

441

–

466

10.1007/s10898-005-2454-3

Jones

D. R.

Schonlau

, &

Welch

W. J.

(

1998

Efficient global optimization of expensive black-box functions

Journal of Global Optimization

(

455

–

492

10.1023/A:1008306431147

10.1001/archopht.121.1.48

Leske

M. C.

Heijl

Hussein

Bengtsson

Hyman

Komaroff

, & Early Manifest Glaucoma Trial Group (

2003

Factors for glaucoma progression and the effect of treatment: The early manifest glaucoma trial

Archives of Ophthalmology

121

(

–

D. H.

Whitmore

J. B.

Guo

, &

(

2017

Toxicity and efficacy probability interval design for phase I adoptive cell therapy dose-finding clinical trials

Clinical Cancer Research

(

–

10.1158/1078-0432.CCR-16-1125

Lyu

Zhao

, &

Catenacci

D. V.

(

2019

AAA: Triple adaptive Bayesian designs for the identification of optimal dose combinations in dual-agent dose-finding trials

Journal of the Royal Statistical Society. Series C

(

385

–

410

Morgan-Wall

(

2022

). spacefillr: Space-Filling Random and Quasi-Random Sequences, R package version 0.3.2. https://CRAN.R-project.org/package=spacefillr.

Morita

Thall

P. F.

, &

Takeda

(

2017

A simulation study of methods for selecting subgroup-specific doses in phase 1 trials

Pharmaceutical Statistics

(

143

–

156

Mozgunov

Cro

Lingford-Hughes

Paterson

L. M.

, &

Jaki

(

2022

A dose-finding design for dual-agent trials with patient-specific doses for one agent with application to an opiate detoxification trial

Pharmaceutical Statistics

(

476

–

495

Murphy

K. P.

(

2023

Probabilistic machine learning: Advanced topics

MIT Press

. http://probml.github.io/book2.

Google Preview

OpenURL Placeholder Text

10.1016/j.csda.2013.03.018

O’Quigley

Pepe

, &

Fisher

(

1990

Continual reassessment method: A practical design for phase 1 clinical trials in cancer

Biometrics

(

–

Picheny

, &

Ginsbourger

(

2014

Noisy kriging-based optimization methods: A unified implementation within the DiceOptim package

Computational Statistics & Data Analysis

1035

–

1053

10.1080/00401706.2012.707580

Picheny

Ginsbourger

Richet

, &

Caplin

(

2013

Quantile-based optimization of noisy computer experiments with tunable precision

Technometrics

(

–

10.1007/s00158-013-0919-4

Picheny

Wagner

, &

Ginsbourger

(

2013

A benchmark of kriging-based infill criteria for noisy optimization

Structural and Multidisciplinary Optimization

(

607

–

626

10.1093/biostatistics/kxz014

Psioda

M. A.

Jiang

Yang

, &

Ibrahim

J. G.

(

2021

Bayesian adaptive basket trial design using model averaging

Biostatistics

(

–

R Core Team

(

2022

). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Sauer

Gramacy

R. B.

, &

Higdon

(

2023

Active learning for deep gaussian process surrogates

Technometrics

(

–

10.1080/00401706.2021.2008505

10.1016/j.conctc.2021.100753

Storer

B. E.

(

1989

Design and analysis of phase I clinical trials

Biometrics

(

925

–

937

Takahashi

, &

Suzuki

(

2021a

Bayesian optimization for estimating the maximum tolerated dose in phase I clinical trials

Contemporary Clinical Trials Communications

100753

Takahashi

, &

Suzuki

(

2021b

Bayesian optimization design for dose-finding based on toxicity and efficacy outcomes in phase I/II clinical trials

Pharmaceutical Statistics

(

422

–

439

Thomas

Sweeney

, &

Somayaji

(

2014

Meta-analysis of clinical dose-response in a large drug development portfolio

Statistics in Biopharmaceutical Research

(

302

–

317

10.1080/19466315.2014.924876

10.1111/biom.2005.61.issue-1

Wages

N. A.

, &

Conaway

M. R.

(

2014

Phase I/II adaptive design for drug combination oncology trials

Statistics in Medicine

(

1990

–

2003

Wang

, &

Ivanova

(

2005

Two-dimensional dose finding in discrete dose space

Biometrics

(

217

–

222

Wang

Zhang

Xia

, &

Yan

(

2023

A Bayesian phase I-II clinical trial design to find the biological optimal dose on drug combination

Journal of Biopharmaceutical Statistics

(

582

–

595

10.1080/10543406.2023.2236208

Williams

C. K.

, &

Rasmussen

C. E.

(

2006

Gaussian processes for machine learning

MIT Press

Google Preview

OpenURL Placeholder Text