Borrowing of information across patient subgroups in a basket trial based on distributional discrepancy

Simulation scenarios with specification of the “true” treatment effect |$\theta_k$| to compare the Bayesian analysis models. The figure in bold indicates a 0 or low treatment effect.

	Subtrial \|$k$\| (Sample size, \|$n_k$\|⁠)}
Scenario	1 (⁠\|$n_1 = 10$\|⁠)	2 (⁠\|$n_3 = 10$\|⁠)	3 (⁠\|$n_2 = 14$\|⁠)	4 (⁠\|$n_5 = 16$\|⁠)	5 (⁠\|$n_4 = 20$\|⁠)	6 (⁠\|$n_6 = 20$\|⁠)
1	0.49	0.67	0.54	0.43	0.79	0.35
2	0.35	0.37	0.80	1.30	1.38	0.40
3	0.29	0.77	0.68	0.75	0.33	0.30
4	0.59	1.17	1.02	0.95	0.13	0.75
5	0.45	0.45	0.45	0.45	0.45	0.45
6	0.30	0.30	0.30	0.30	0.30	0.30
7	0	0	0	0	0.37	0.37
8	0.33	0	0.82	0.90	0	0.83
9	0	0	0	0	0	0

	Subtrial \|$k$\| (Sample size, \|$n_k$\|⁠)}
Scenario	1 (⁠\|$n_1 = 10$\|⁠)	2 (⁠\|$n_3 = 10$\|⁠)	3 (⁠\|$n_2 = 14$\|⁠)	4 (⁠\|$n_5 = 16$\|⁠)	5 (⁠\|$n_4 = 20$\|⁠)	6 (⁠\|$n_6 = 20$\|⁠)
1	0.49	0.67	0.54	0.43	0.79	0.35
2	0.35	0.37	0.80	1.30	1.38	0.40
3	0.29	0.77	0.68	0.75	0.33	0.30
4	0.59	1.17	1.02	0.95	0.13	0.75
5	0.45	0.45	0.45	0.45	0.45	0.45
6	0.30	0.30	0.30	0.30	0.30	0.30
7	0	0	0	0	0.37	0.37
8	0.33	0	0.82	0.90	0	0.83
9	0	0	0	0	0	0

Table 1.

Open in new tab Download slide

Simulation scenarios with specification of the “true” treatment effect |$\theta_k$| to compare the Bayesian analysis models. The figure in bold indicates a 0 or low treatment effect.

	Subtrial \|$k$\| (Sample size, \|$n_k$\|⁠)}
Scenario	1 (⁠\|$n_1 = 10$\|⁠)	2 (⁠\|$n_3 = 10$\|⁠)	3 (⁠\|$n_2 = 14$\|⁠)	4 (⁠\|$n_5 = 16$\|⁠)	5 (⁠\|$n_4 = 20$\|⁠)	6 (⁠\|$n_6 = 20$\|⁠)
1	0.49	0.67	0.54	0.43	0.79	0.35
2	0.35	0.37	0.80	1.30	1.38	0.40
3	0.29	0.77	0.68	0.75	0.33	0.30
4	0.59	1.17	1.02	0.95	0.13	0.75
5	0.45	0.45	0.45	0.45	0.45	0.45
6	0.30	0.30	0.30	0.30	0.30	0.30
7	0	0	0	0	0.37	0.37
8	0.33	0	0.82	0.90	0	0.83
9	0	0	0	0	0	0

	Subtrial \|$k$\| (Sample size, \|$n_k$\|⁠)}
Scenario	1 (⁠\|$n_1 = 10$\|⁠)	2 (⁠\|$n_3 = 10$\|⁠)	3 (⁠\|$n_2 = 14$\|⁠)	4 (⁠\|$n_5 = 16$\|⁠)	5 (⁠\|$n_4 = 20$\|⁠)	6 (⁠\|$n_6 = 20$\|⁠)
1	0.49	0.67	0.54	0.43	0.79	0.35
2	0.35	0.37	0.80	1.30	1.38	0.40
3	0.29	0.77	0.68	0.75	0.33	0.30
4	0.59	1.17	1.02	0.95	0.13	0.75
5	0.45	0.45	0.45	0.45	0.45	0.45
6	0.30	0.30	0.30	0.30	0.30	0.30
7	0	0	0	0	0.37	0.37
8	0.33	0	0.82	0.90	0	0.83
9	0	0	0	0	0	0

When fitting the Bayesian analysis models, we consider random effects for |$\gamma_{0k}$|⁠, |$\gamma_{1k,}$| and |$\gamma_{2k}$|⁠:

$$\gamma_{0k} = \chi_0 + \epsilon_0^2, \quad \gamma_{1k} = \chi_1 + \epsilon_1^2, \quad \text{ and } \quad \gamma_{2k} = \chi_2 + \epsilon_2^2,$$

setting an uninformative normal prior |$N(0, 5^2)$| on each |$\chi_j$| and a half-normal prior |$HN(z)$| on each |$\epsilon_{j}$|⁠, for |$j = 0, 1, 2$|⁠. Here, |$HN(z)$| is defined as a |$N(0, z^2)$|⁠, truncated to cover the interval |$(0, \infty)$|⁠. The use of a half-normal prior is consistent with the recommendation by Cunanan and others (2019). We stipulate |$\epsilon_j \in HN(1)$|⁠, of which the prior and 95% credible interval are 0.674 and (0.031, 2.241), to permit very limited information borrowing across subtrials for estimating |$\gamma_{jk}, \, j=0, 1, 2,\, k = 1, \dots, 6$|⁠. In the following, we describe additional specifications to implement the Bayesian models that estimate |$\theta_k$| with or without information leveraged from other subtrials.

To implement our methodology for estimating |$\theta_k$|⁠, we choose setting |$\mathcal{B}_1 = 0.01, \, \mathcal{B}_2 = 1,$| and |$\mathcal{S} = 100$| for the spike-and-slab prior on each |$\nu_{kk^\star}$|⁠. The “slab” prior is very uninformative and is sufficient to fully discard the entire information from an external subtrial |$k$|⁠; the “spike” prior is specified so that the proposed methodology can be reduced to complete pooling in situations of perfect information commensurability. Justification of choosing this spike-and-slab prior is provided in Section B of the supplementary material available at Biostatistics online. An initial vague prior |$\pi_{0k}(\theta_k)$| is used for |$\theta_k, \, k = 1, \dots, 6$|⁠; we use |$N(0, 10^2)$| such that the 95% prior credible interval is (⁠|$-$|19.560, 19.560), covering a wide range of possible |$\theta_k$|⁠. To yield a large (small) weight |$p_{kk^\star}$| corresponding to a small (large) Hellinger distance, we let |$s_0 = 0.15$| for the transformation. Nevertheless, we study how different stipulations of |$s_0$| may impact on the identification of the most consistent subtrial(s) in Section C of the supplementary material available at Biostatistics online, exploring |$s_0 = 0.25, 0.35, 0.45$| in addition. We are interested in comparing the proposed methodology with

(1) Standard hierarchical model (HM) that assumes fully exchangeable parameters: |$\theta_k | \mu_, \tau \sim N(\mu, \tau^2)$| with |$\mu \sim N(0, 10^2)$| and |$\tau\sim HN(0.125)$|⁠. The median and 95% credible interval of |$HN$|(0.125) are 0.084 and (0.004, 0.280), respectively.
(2) Bayesian model with no borrowing of information. Trial data are stratified by subtrials for stand-alone analyses, setting each |$\theta_k \sim N(0, 10^2)$|⁠. Random effects for |$\gamma_{0k}$|⁠, |$\gamma_{1k}$|⁠, and |$\gamma_{2k}$| therefore cannot be estimated; we then place a |$N(0, 5^2)$| prior on each.
(3) EXNEX model by Neuenschwander and others (2016), with equal prior probabilities of exchangeability (EX) and non-exchangeability (NEX). The EX distribution has the same parameter configuration as what was stipulated for the standard HM above, and the six NEX distributions are all set to be |$N(0, 10^2)$|⁠.

Comparison is in terms of the precision of their posterior point estimates, more specifically, the posterior means, for |$\theta_k$| that could be measured by an analog of bias and mean squared error (MSE):

$$\begin{equation*} \begin{split} \text{Bias}(\theta_k) &\approx \frac{1}{M} \sum_{m = 1}^M \bar{\theta}_k^m - \theta_k, \\ \text{MSE}(\theta_k) &\approx \frac{1}{M} \sum_{m = 1}^M (\bar{\theta}_k^m - \theta_k)^2, \end{split} \end{equation*}$$

where |$M$| is the total number of replicates in the simulation study, and |$\bar{\theta}_k^m$| denote the posterior means of |$\theta_k$| for the |$m$|th simulated basket trial. These metrics will be reported by subtrial. We also compare these Bayesian analysis models with respect to the trial operating characteristics, such as the subtrial-wise error rates. Corresponding to the frequentist type I error rate and statistical power, we will report proportions of the simulated trials with

an erroneous Go decision in a subtrial where the “true” |$\theta_k =0$|⁠, and
a correct Go decision in a subtrial where the “true” |$\theta_k > 0$|⁠,

respectively. An overall (analog of) type I error rate, computed as proportion of trials with an erroneous Go decision made for at least one subtrial, will also be reported for scenarios with null |$\theta_k$|’s. In particular, a Go will be allocated if |$\mathbb{P}(\theta_k > \delta_U) > 0.975$|⁠. Trial operating characteristics will be evaluated setting |$\delta_U=0.25, 0.30$|⁠. We are especially interested in Scenarios 7–9, the mixed or global null scenarios, to report the (analog of) type I error rate by both subtrial and overall.

Results are summarized by averaging across 10 000 replicates of the basket trial. The Bayesian analysis models are fitted in R version 3.4.4 using the R2OpenBUGS package based on two parallel chains, with each running the Gibbs sampler for 10 000 iterations that follow a burn-in of 3000 iterations. OpenBUGS code, together with R functions, to implement each of the Bayesian analysis models is available at https://github.com/BasketTrials/Bayesian-analysis-models.

4.2. Results

Figure 1 compares the performance of the posterior estimators yielded by the Bayesian models. It shows that the proposed methodology produces smaller bias and MSE compared with the standard HM and EXNEX, across nearly all scenarios. Point estimators based on the standard HM and EXNEX work well in scenarios 5, 6, and 9 as the small-to-moderate variability between |$\theta_k$|s can be addressed by setting |$\tau \sim HN(0.125)$|⁠. The proposed analysis methodology, in contrast, distinguishes the heterogeneity more sensitively. Much smaller bias and MSE are yielded when estimating |$\theta_k$| for basket trials with divergent treatment effects across subtrials; see, for example, scenario 2. In situations where information from other subtrials should be largely discounted, referring to scenarios 7 and 8, our methodology generates comparatively similar bias to the no borrowing approach but with a smaller MSE. This is because information from subtrials with a non-zero treatment effect, can be largely discounted to formulate the marginal predictive prior for e.g., |$\theta_2$| in scenario 8.

$Bias and mean squared error of the posterior estimators for $\theta_k$ based on the Bayesian models.$

Fig. 1.

Bias and mean squared error of the posterior estimators for |$\theta_k$| based on the Bayesian models.

We have also compared the Bayesian analysis models in terms of the average width of the posterior credible intervals for |$\theta_k$|⁠. In contrast to the alternative Bayesian models, the proposed methodology yields posterior estimates with narrower credible intervals when there is at least one consistent complementary subtrial; see Figure S1 of the supplementary material available at Biostatistics online. When using the proposed analysis methodology for borrowing of information, investigators may be interested in the weight eventually allocated to each external subtrial for obtaining the marginal predictive prior. In Figures S3 and S4 of the supplementary material available at Biostatistics online, we comment with regards to scenarios 4 (divergent |$\theta_k$|⁠) and 5 (consistent |$\theta_k$|⁠) on the weight allocation based on the assessed pairwise commensurability, and illustrate how the pre-specified value of |$s_0$| would impact the sensitivity of the proposed methodology to identify the most commensurate subtrial(s).

Table 2 quantifies the impact of using different Bayesian models on the error rate control under the null hypothesis. Here, we report the (analog of) type I error rate for scenarios involving at least one subtrial with |$\theta_k =0$|⁠, setting |$\delta_U=0.25$|⁠. Comparisons where we set |$\delta_U=0.30$| are given in Table S1 of the supplementary material available at Biostatistics online. For scenario 9 (global null), all the four Bayesian analysis models control the error rate well following the decision criterion. Nevertheless, the approaches that enable borrowing of information, i.e., standard HM, EXNEX, and the proposed methodology, have resulted in smaller type I error rates, compared with the approach of no borrowing, since incorporating consistent information from other subtrials reassures that making a Go decision is not justified. Our approach produces slightly higher error rates than standard HM and EXNEX, as for some simulated trials information from subtrials with a similar low treatment effect may be shared (but not with those of a null |$\theta_k$|’s), leading to a higher chance to reject the null hypothesis. In scenario 8 where some subtrials have large treatment effects, we observe a higher error rate when using standard HM and EXNEX approaches, compared with the proposed approach. We note that a difference in the sample sizes of subtrials 2 and 4 or 5 (for all scenarios) leads to disparate {magnitudes of the error rate} using the same approach in the same null scenario: those for subtrial 2 are regularly larger than subtrial 4 or 5. More explicitly, when reacting to a data conflict, a larger sample size of subtrial 4 or 5 provides more evidence to evaluate the plausibility of down-weighting; estimation of |$\theta_4$| or |$\theta_5$| thus has increased chances to avoid being overwhelmed by the complementary information.

Table 2.

Subtrial
		1	2	3	4	5	6	Overall
Scenario 7	Standard HM	0.0000	0.0000	0.0000	0.0000	—	—	0.0000
	No borrowing	0.0036	0.0077	0.0056	0.0100	—	—	0.0269
	EXNEX	0.0001	0.0003	0.0002	0.0000	—	—	0.0006
	Proposed approach	0.0073	0.0089	0.0002	0.0002	—	—	0.0166
Scenario 8	Standard HM	—	0.0155	—	—	0.0080	—	0.0207
	No borrowing	—	0.0077	—	—	0.0008	—	0.0085
	EXNEX	—	0.0195	—	—	0.0056	—	0.0251
	Proposed approach	—	0.0155	—	—	0.0017	—	0.0172
Scenario 9	Standard HM	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	No borrowing	0.0036	0.0077	0.0056	0.0100	0.0008	0.0006	0.0283
	EXNEX	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0001
	Proposed approach	0.0014	0.0028	0.0000	0.0000	0.0002	0.0020	0.0064

Subtrial
		1	2	3	4	5	6	Overall
Scenario 7	Standard HM	0.0000	0.0000	0.0000	0.0000	—	—	0.0000
	No borrowing	0.0036	0.0077	0.0056	0.0100	—	—	0.0269
	EXNEX	0.0001	0.0003	0.0002	0.0000	—	—	0.0006
	Proposed approach	0.0073	0.0089	0.0002	0.0002	—	—	0.0166
Scenario 8	Standard HM	—	0.0155	—	—	0.0080	—	0.0207
	No borrowing	—	0.0077	—	—	0.0008	—	0.0085
	EXNEX	—	0.0195	—	—	0.0056	—	0.0251
	Proposed approach	—	0.0155	—	—	0.0017	—	0.0172
Scenario 9	Standard HM	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	No borrowing	0.0036	0.0077	0.0056	0.0100	0.0008	0.0006	0.0283
	EXNEX	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0001
	Proposed approach	0.0014	0.0028	0.0000	0.0000	0.0002	0.0020	0.0064

Overall: the proportion of trials with erroneous Go decision for at least one subtrial.

Table 2.

Open in new tab Download slide

Subtrial
		1	2	3	4	5	6	Overall
Scenario 7	Standard HM	0.0000	0.0000	0.0000	0.0000	—	—	0.0000
	No borrowing	0.0036	0.0077	0.0056	0.0100	—	—	0.0269
	EXNEX	0.0001	0.0003	0.0002	0.0000	—	—	0.0006
	Proposed approach	0.0073	0.0089	0.0002	0.0002	—	—	0.0166
Scenario 8	Standard HM	—	0.0155	—	—	0.0080	—	0.0207
	No borrowing	—	0.0077	—	—	0.0008	—	0.0085
	EXNEX	—	0.0195	—	—	0.0056	—	0.0251
	Proposed approach	—	0.0155	—	—	0.0017	—	0.0172
Scenario 9	Standard HM	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	No borrowing	0.0036	0.0077	0.0056	0.0100	0.0008	0.0006	0.0283
	EXNEX	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0001
	Proposed approach	0.0014	0.0028	0.0000	0.0000	0.0002	0.0020	0.0064

Subtrial
		1	2	3	4	5	6	Overall
Scenario 7	Standard HM	0.0000	0.0000	0.0000	0.0000	—	—	0.0000
	No borrowing	0.0036	0.0077	0.0056	0.0100	—	—	0.0269
	EXNEX	0.0001	0.0003	0.0002	0.0000	—	—	0.0006
	Proposed approach	0.0073	0.0089	0.0002	0.0002	—	—	0.0166
Scenario 8	Standard HM	—	0.0155	—	—	0.0080	—	0.0207
	No borrowing	—	0.0077	—	—	0.0008	—	0.0085
	EXNEX	—	0.0195	—	—	0.0056	—	0.0251
	Proposed approach	—	0.0155	—	—	0.0017	—	0.0172
Scenario 9	Standard HM	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
	No borrowing	0.0036	0.0077	0.0056	0.0100	0.0008	0.0006	0.0283
	EXNEX	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0001
	Proposed approach	0.0014	0.0028	0.0000	0.0000	0.0002	0.0020	0.0064

Overall: the proportion of trials with erroneous Go decision for at least one subtrial.

What may also be interesting to investigators is the potential increase in statistical power to demonstrate the treatment effect, by incorporating information from complementary subtrials. Figure 2 visualizes the comparison of the Bayesian analysis models in terms of correctly declaring a clinical benefit in subtrial |$k$|⁠, setting |$\delta_U=0.25$|⁠; see Figure S5 of supplementary material available at Biostatistics online that visualizes the comparison setting |$\delta_U=0.30$|⁠. Across nearly all subtrials of the simulated basket trials in scenarios 1–5, the Bayesian approaches of borrowing show substantial advantages over the approach of no borrowing. When comparing between the Bayesian approaches of borrowing, we check how the chance would be for a subtrial with comparatively low treatment effect to be concluded with a correct Go decision, in the presence of consistent subtrials. Looking at scenarios 3, for example, our approach leads to higher statistical power for subtrials 2 and 6 compared with other Bayesian models.

$Comparison of the Bayesian analysis models with respect to the analog of statistical power: null hypothesis is correctly rejected in the presence of a treatment effect per subtrial, setting $\delta_U=0.25$ and $\zeta = 0.975$.$

Fig. 2.

Comparison of the Bayesian analysis models with respect to the analog of statistical power: null hypothesis is correctly rejected in the presence of a treatment effect per subtrial, setting |$\delta_U=0.25$| and |$\zeta = 0.975$|⁠.

Scenarios 5 and 6 represent situations where all subtrials are commensurate, but the former has a larger treatment effect size. Given our criterion that |$\mathbb{P}(\theta_k > 0.25) > 0.975$| for a Go decision, scenario 6 with all |$\theta_k = 0.30$| is particularly a hard scenario to allocate a Go. Compared with the approaches that enable borrowing, using the approach of no borrowing results in subtrials 3 and 4 having a slightly higher probability of correct Go decision. However, this does not mean the no borrowing approach is superior, since standard HM and EXNEX produce estimates of |$\theta_k$| with a similar level of bias, but much smaller posterior variances than the approach of no borrowing. These results are observed from Figure 1 and Figure S2 of the supplementary material available at Biostatistics online. More informative posterior distributions for |$\theta_k$| nevertheless do not necessarily mean a higher interval probability of |$\theta_k>\delta_U$|⁠: it is possible that diffuse posteriors for |$\theta_k$| obtained from the approach of no borrowing has comparable or even higher chances to exceed the level |$\gamma = 0.975$|⁠. When the consistent “true” |$\theta_k$|s increase from 0.30 (scenario 6) to 0.45 (scenario 5), we begin to observe the efficiency gains by using Bayesian models that permit borrowing of information than no borrowing. The proposed methodology appears to present a larger absolute gain in power compared with the alternative models, although we note that the absolute gain in power can be a misleading metric due to the non-linear shape of the power curve.

In scenario 7, due to the prior specification of |$\tau \sim HN(0.125)$| being incapable of accounting for the variability across subtrials, both standard HM and EXNEX shrink |$\theta_5$| and |$\theta_6$| excessively towards the mean effect across subgroups. This in turn dilutes the treatment effect in corresponding subtrials. Consequently, it seems better to implement the approach of no borrowing for possibility of declaring a positive treatment effect. Our approach presents slightly higher power than the no borrowing approach as there is some consistent information to be incorporated from a complementary subtrial. In scenario 8, our approach performs similarly to the alternatives, but slightly better for subtrial 6 due to leveraging consistent information.

We note this simulation study does not consider cases of basket trials involving rare disease subgroups, where certain subtrials can have a much smaller sample size than others. We present several hypothetical data examples in Section D of the supplementary material available at Biostatistics online to comment on the sensitivity to the difference in subtrial sample sizes.}

5. Discussion

The paradigm shift towards precision medicine opens new avenues for novel trial designs and analysis methodologies to deliver more tailored healthcare to patients. Basket trials emerge as a new class of efficient approaches to oncology drug development in the era of precision medicine, offering a framework to evaluate the treatment effect together with its heterogeneity in various patient subgroups. In this article, we have extended the key ideas of a basket trial approach to disease areas outside of oncology, and proposed a new Bayesian model to enable borrowing of information from the most commensurate subtrial(s) without requiring a priori clustering of similar subgroups. By including an information discrepancy measure, it can discern the degree of borrowing from complementary subtrials. In particular, the Hellinger distance plays a dual role in our methodology: (i) it gauges the maximum amount of information that could be leveraged from a specific subtrial |$k \neq k^\star$| to estimate |$\theta_{k^\star}$|⁠; (ii) when there are |$K \geq 3$| subtrials, it determines the weight allocation to reflect the relative importance for appropriate borrowing of information.

The Bayesian analysis methodology in Section 3 has been developed assuming the basket trial generates continuous response data. However, it could be easily generalized to analyze other types of data that can be fitted using a generalized linear model for non-Gaussian error distributions. For example, it would be readily applicable to analyzing phase II basket trials that use binary endpoints: after fitting the patient-level data per subtrial with a logistic regression model, our approach may be considered to stipulate commensurate predictive priors, informed by the pairwise Hellinger distance, for the subtrial-specific treatment effect parameters to permit borrowing of information from the most consistent subtrial(s). For down-weighting in cases of a data conflict suggested by the Hellinger distance, we did not delve into calibration of the “slab” prior but simply used a very uninformative uniform distribution, which ensures data from an inconsistent subtrial can be discarded. When using the proposed methodology in practice, we recommend specifying the spike-and-slab prior based on some preliminary knowledge about the magnitude of variances of |$\theta_k$|⁠. Specification of the “slab” prior may particularly deserve future research to exploit the advantage of the proposed methodology. We refer to Mutsvari and others (2016) as a relevant investigation, which focuses on choosing the diffuse component of a mixture prior for robust inferences. We also note that the exploration may be closely linked with the users’ stipulation of the prior probability weight, which is based upon the Hellinger distance, to be attributed to the “slab” prior.

In our simulation study, we have considered imbalance subtrial sizes. Simulation results show that our methodology can down-weight inconsistent information from a subtrial that has larger sample sizes. For illustrative purposes, we have supposed equal randomization ratio between treatment groups within a subtrial. Investigators can pragmatically determine the randomization ratio as well as the subtrial-wise sample size for a basket trial that may base decision making on our analysis methodology. Potentially, more dosage groups of the same treatment in each subtrial can be considered. Also, many have shown a great interest in sequential basket trials (Simon and others, 2016; Cunanan and others, 2017; Hobbs and Landin, 2018) with interim look(s) incorporated for the possibility of, say, terminating enrollment of patients in ineffective subgroups. We note that the proposed Bayesian approach can be implemented with any number of analyses following a flexible timescale for interim decision making. There is no requirement of a minimum sample size per subtrial to carry out an interim look, due to the use of an initial operational prior |$\pi_{0k}(\theta_k)$| for computing the pairwise Hellinger distance. However, an inflation of type I error rate arising from such repeated significance tests would occur.

Throughout, we have restricted our focus onto basket trials, where the subtrials use the same endpoint across patient subgroups. In many disease areas, multiple endpoints (FDA, 2017) may often arise, as it could involve various dimensions to conclude on the clinical benefit. One common situation is to continue monitoring toxicity in addition to the assessment of efficacy (Bryant and Day, 1995; Tournoux and others, 2007). With regards to this, our approach could be extended in several ways. For instance, in cases where the set of multiple endpoints remain the same across subgroups, it would be straightforward to establish a joint probability model and derive the pairwise Hellinger distance between multivariate probability densities (Pardo, 2005). Suitable alternatives include separating the discussion about borrowing of information by endpoint. A unified utility function may then be adopted for trial decision making based on evidence on multiple endpoints. In another more complex setting where the efficacy endpoint, for example, could be distinct but correlated across subgroups, one might need to translate the subtrial data onto a common scale in order to adapt the present approach. Ideas could be drawn from Zheng and others (2020), where incorporation of supplementary data recorded on a different measurement scale has been discussed in the context of phase I clinical trials.

6. Software

Software in the form of OpenBUGS code together with R functions is available on GitHub (https://github.com/BasketTrials/Bayesian-analysis-models).

Acknowledgments

The authors are thankful to the Associate Editor and two anonymous reviewers for carefully reading the manuscript and giving very helpful comments. Conflict of Interest: None declared.

Funding

JW is funded by the UK Medical Research Council (MC_UU_00002/6).

References

Bernardo,

J. M.

(

1996

The concept of exchangeability and its applications

Far East Journal of Mathematical Sciences

111

–

122

Berry,

S. M.

Broglio,

K. R.

Groshen,

and

Berry,

D. A.

(

2013

Bayesian hierarchical modeling of patient subpopulations: efficient designs of phase II oncology clinical trials

Clinical Trials

720

–

734

Bryant,

and

Day,

R}.

(

1995

Incorporating toxicity considerations into the design of two-stage phase II clinical trials

Biometrics

1372

–

1383

Chu,

and

Yuan,

(

2018

A Bayesian basket trial design using a calibrated Bayesian hierarchical model

Clinical Trials

149

–

158

Cunanan,

K. M.

Iasonos,

Shen,

Begg,

C. B.

and

Gönen,

(

2017

An efficient basket trial design

Statistics in Medicine

1568

–

1579

Cunanan,

K. M.

Iasonos,

Shen,

and

Gönen,

(

2019

Variance prior specification for a basket trial design using Bayesian hierarchical modeling

Clinical Trials

142

–

153

Dey,

D. K.

and

Birmiwal,

L. R.

(

1994

Robust Bayesian analysis using divergence measures

Statistics & Probability Letters

287

–

294

Draper,

(

1995

Assessment and propagation of model uncertainty

Journal of the Royal Statistical Society. Series B (Methodological)

–

Eisenhauer,

E. A.

Therasse,

and

Bogaerts,

and others. (

2009

New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1)

European Journal of Cancer

228

–

247

Fda. (

2017

Multiple Endpoints in Clinical Trials

Rockville, MD

US Food and Drug Administration

Google Preview

Hobbs,

B. P.

Carlin,

B. P.

Mandrekar,

S. J.

and

Sargent,

D. J.

(

2011

Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials

Biometrics

1047

–

1056

Hobbs,

B. P.

Carlin,

B. P.

and

Sargent,

D. J.

(

2013

Adaptive adjustment of the randomization ratio using historical control data

Clinical Trials

430

–

440

Hobbs,

B. P.

and

Landin,

(

2018

Bayesian basket trial design with exchangeability monitoring

Statistics in Medicine

3557

–

3572

Hobbs,

B. P.

Sargent,

D. J.

and

Carlin,

B. P.

(

2012

Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models

Bayesian Analysis

639

–

674

Hyman,

D. M.

Piha-Paul,

S. A.

and

Won,

et al. (

2018

HER kinase inhibition in patients with HER2- and HER3-mutant cancers

Nature

554

189

–

194

Hyman,

D. M.

Puzanov,

and

Subbiah,

et al. (

2015

Vemurafenib in multiple nonmelanoma cancers with BRAF V600 mutations

New England Journal of Medicine

373

726

–

736

Jeffreys,

(

1961

Theory of Probability

Oxford University Press

Amen House, London

Google Preview

Liu,

Ghadessi,

and

Vonk,

(

2017

Increasing the efficiency of oncology basket trials using a Bayesian approach

Contemporary Clinical Trials

–

Madigan,

and

Raftery,

A. E.

(

1994

Model selection and accounting for model uncertainty in graphical models using Occam’s window

Journal of the American Statistical Association

1535

–

1546

Mirnezami,

Nicholson,

and

Darzi,

(

2012

Preparing for precision medicine

New England Journal of Medicine

366

489

–

491

Mitchell,

T. J.

and

Beauchamp,

J. J.

(

1988

Bayesian variable selection in linear regression

Journal of the American Statistical Association

1023

–

1032

Mutsvari,

Tytgat,

and

Walley,

R}.

(

2016

Addressing potential prior-data conflict when using informative priors in proof-of-concept studies

Pharmaceutical Statistics

–

Neuenschwander,

Wandel,

Roychoudhury,

and

Bailey,

S}.

(

2016

Robust exchangeability designs for early phase clinical trials with multiple strata

Pharmaceutical Statistics

123

–

134

Pardo,

(

2005

Statistical Inference Based on Divergence Measures

Statistics: A Series of Textbooks and Monographs

New York

CRC Press

Google Preview

Psioda,

M. A.

Xu,

Jiang,

Ke,

Yang,

and

Ibrahim,

J. G.

(

2021

Bayesian adaptive basket trial design using model averaging

Biostatistics

, 19–34.

Renfro,

L. A.

and

Mandrekar,

S.J.

(

2018

Definitions and statistical properties of master protocols for personalized medicine in oncology

Journal of Biopharmaceutical Statistics

217

–

228

Renfro,

L. A.

and

Sargent,

D. J.

(

2017

Statistical controversies in clinical research: basket trials, umbrella trials, and other master protocols: a review and examples

Annals of Oncology

–

Schork,

N. J.

(

2015

Personalized medicine: time for one-person trials

Nature

520

609

–

611

Schwartz,

L. H.

Litière,

and

de Vries,

and others. (

2016

). RECIST 1.1—update and clarification: from the RECIST committee.

European Journal of Cancer

132

–

137

PubMed

Simon,

Richard

Geyer,

Susan

Subramanian,

Jyothi

and

Roychowdhury,

Sameek}.

(

2016

). The Bayesian basket design for genomic variant-driven phase II trials.

Seminars in Oncology

–

PubMed

Spiegelhalter,

D. J.

Freedman,

L. S.

and

Parmar,

M. K. B.

(

1994

Bayesian approaches to randomized trials

Journal of the Royal Statistical Society. Series A (Statistics in Society)

157

357

–

416

Thall,

P. F.

and

Wathen,

J. K.

(

2008

Bayesian designs to account for patient heterogeneity in phase II clinical trials

Current Opinion in Oncology

407

–

411

Thall,

P. F.

Wathen,

J. K.

Bekele,

B. N.

Champlin,

R. E.

Baker,

L. H.

and

Benjamin,

R. S.

(

2003

Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes

Statistics in Medicine

763

–

780

Tournoux,

Rycke,

Y. D.

Médioni,

and

Asselain,

(

2007

). Methods of joint evaluation of efficacy and toxicity in phase II clinical trials.

Contemporary Clinical Trials

514

–

524

PubMed

Woodcock,

and

LaVange,

L. M.

(

2017

Master protocols to study multiple therapies, multiple diseases, or both

New England Journal of Medicine

377

–