Summary

Basket trials have emerged as a new class of efficient approaches in oncology to evaluate a new treatment in several patient subgroups simultaneously. In this article, we extend the key ideas to disease areas outside of oncology, developing a robust Bayesian methodology for randomized, placebo-controlled basket trials with a continuous endpoint to enable borrowing of information across subtrials with similar treatment effects. After adjusting for covariates, information from a complementary subtrial can be represented into a commensurate prior for the parameter that underpins the subtrial under consideration. We propose using distributional discrepancy to characterize the commensurability between subtrials for appropriate borrowing of information through a spike-and-slab prior, which is placed on the prior precision factor. When the basket trial has at least three subtrials, commensurate priors for point-to-point borrowing are combined into a marginal predictive prior, according to the weights transformed from the pairwise discrepancy measures. In this way, only information from subtrial(s) with the most commensurate treatment effect is leveraged. The marginal predictive prior is updated to a robust posterior by the contemporary subtrial data to inform decision making. Operating characteristics of the proposed methodology are evaluated through simulations motivated by a real basket trial in chronic diseases. The proposed methodology has advantages compared to other selected Bayesian analysis models, for (i) identifying the most commensurate source of information and (ii) gauging the degree of borrowing from specific subtrials. Numerical results also suggest that our methodology can improve the precision of estimates and, potentially, the statistical power for hypothesis testing.

1. Introduction

There has been an increasing interest in precision medicine (Mirnezami and others, 2012; Schork, 2015) over the past few decades. Rapid advances in genomics and biomarkers allow stratification of patients into subgroups that may have different benefit from new treatments. Unlike the one-size-fits-all concept in conventional paradigms of clinical drug development, the aim of precision medicine is to target the right treatments to the right patients at the right time. In the era of precision medicine, new trial designs have been developed, several of which are examples of master protocols (Woodcock and LaVange, 2017; Renfro and Mandrekar, 2018) to study multiple diseases or multiple agents, or sometimes both. One well-known class of master protocols is basket trials (Renfro and Sargent, 2017). In the simplest formulation, basket trials evaluate a single-targeted agent to patients that share a common feature, such as similar genetic mutation, but may present various disease subtypes. It is administratively more efficient to plan a basket trial than a number of separate trials for the small subgroups, respectively. With one subtrial performed in each patient subgroup, basket trials are advantageous also for addressing multiple research questions simultaneously, for example, which subgroup(s) of patients may benefit and to what extent. To date, sophisticated approaches to the design and analysis of basket trials have predominantly been proposed for early phase oncology drug development, where the “standard” approach is a single-arm design with binary RECIST endpoint (Eisenhauer and others, 2009; Schwartz and others, 2016). This manuscript extends the key ideas of basket trials to disease areas outside of oncology, for example, in cases where patients have distinct clinical conditions but share similar symptoms. For this, we develop efficient approaches for analyzing randomized, placebo-controlled basket trials which collect data on a continuous endpoint.

When analyzing early phase basket trials, one major concern is the potential heterogeneity of the treatment effect in various patient subgroups. Investigators are faced with the dilemma of discarding or incorporating data from other subgroups to reach a decisive conclusion about the treatment effect for a specific subgroup. The option of using complementary data from subtrials that run concurrently is intriguing, as it may lead to considerable increase in the statistical power of the study to detect drug activity in one or more subgroups. This should be balanced with the risk that treatment effect in an important patient subgroup may be overlooked or missed. Conventional analysis strategies such as stand-alone analyses (also known as the approach of no borrowing) and complete pooling irrespective of subgroup labels have been criticized. Some authors have proposed using hierarchical random-effects models, as a compromise between the two limiting opinions, to enable borrowing of information across subgroups (Thall and others, 2003; Thall and Wathen, 2008; Berry and others, 2013). Such well-established approaches for information borrowing are justified, under the assumption of the exchangeability (Bernardo, 1996) of subgroup-specific treatment effects. More specifically, exchangeability means that the magnitude of clinical benefit may differ, but nothing is known a priori to suppose patients of some subgroups benefit better than others. Neuenschwander and others (2016) discuss a robust extension to the standard hierarchical models by including the possibility of non-exchangeability for each parameter (vector) that underpins a subgroup. Their approach permits an extreme subgroup not to be overly influenced by other subgroups in situations of data inconsistency.

Additional concerns about the subgroup effect are essential in precision medicine. Often, the targeted therapy is effective only in some subgroups, and certain subgroups illustrate more similar clinical benefit between themselves than with others. Several variations of standard hierarchical modeling have been considered suitably for the context of basket trials to implement borrowing of information (Liu and others, 2017; Chu and Yuan, 2018). Modifications are motivated mainly by (i) justification about plausible clustering of similar subgroups and (ii) quantification about the magnitude, to which a subgroup-specific parameter should be shrunk towards the mean effect across subgroups. Most recently, more sophisticated methods in the framework of Bayesian model averaging (Madigan and Raftery, 1994; Draper, 1995) have been applied to analyzing basket trials. Psioda and others (2019) average over the complete model space, which is constituted by all models for possible configurations of the subgroups that may demonstrate the same or disparate efficacy. In a model that assumes identical treatment effect among specific subgroups, information is pooled across the corresponding subgroups under the assumption of inter-patient exchangeability. The number of models to be included in the complete model space for averaging increases exponentially with the number of subgroups involved in the basket trial. Hobbs and Landin (2018) enumerate all possible subgroup pairs, wherein the parameters are considered to be either exchangeable or non-exchangeable. Using the product of, rather than the individual, prior probabilities for any two subgroups being exchangeable or not, their method offers considerable computational efficiency with respect to conventional Bayesian model averaging.

In this article, we propose methodology motivated by a randomized, placebo-controlled phase II basket trial, which is being undertaken in patients with chronic diseases. Patients who share a common disease symptom that the new treatment can potentially improve, will be stratified into subgroups according to their clinical conditions. Efficacy will be recorded on a continuous endpoint. Adjustment for baseline covariate(s) is desirable to allow for a more precise estimate of treatment effect. We develop a Bayesian methodology for borrowing of information across consistent subgroups based on commensurate priors (Hobbs and others, 2011, 2012), which lead to a type of hierarchical model for robust estimation in circumstances of just a small number of complementary studies. Using it can facilitate inferences with respect to all possible pairwise borrowing of information between |$K$| subgroups in a basket trial, which accounts for the level of data commensurability across subtrials. More explicitly, given any complementary subtrial data, a commensurate prior can be specified for the treatment effect in the subtrial of contemporary analysis interest. It is basically a normal predictive prior centered at the complementary subtrial data parameter, with a precision factor to capture the commensurability of the parameters that underpin the complementary and contemparory subtrials.

We explore placing an empirical spike-and-slab prior (Mitchell and Beauchamp, 1988) on the precision factor, which determines the degree of point-to-point borrowing. For overcoming a prior-data conflict, we propose using a distributional discrepancy measure to characterize the commensurability of information between any two subtrials. It could quantify the probability mass to be placed on the “spike” prior for strong borrowing, and that on the “slab” prior for discounting inconsistent information from a complementary subtrial, respectively. This discrepancy measure meanwhile discerns the complementary subtrials (when at least three subgroups are involved) according to their relative commensurability, and can therefore encourage differential borrowing of information to estimate the model parameter specific to a subtrial. The proposed methodology for basket trials is fundamentally different from the existing approaches to information sharing. It avoids the limiting assumption about exchangeability for parameters of certain or all subtrials but features the use of a distributional discrepancy measure to inform the borrowing only from the most commensurate subtrial(s).

The remainder of this article is structured as follows. We describe the motivating example and decision criteria in Section 2. In Section 3, we present our analysis methodology and discuss how a discrepancy measure may help make appropriate use of complementary data in a basket trial. In Section 4, we perform a simulation study to evaluate the operating characteristics of phase II basket trials that would have been analyzed using the proposed methodology, and compare our Bayesian model with some alternative analysis models. We close with a discussion of our findings and future research that arises in Section 5.

2. Motivating example and notation

We use a randomized, placebo-controlled phase II basket trial, as a motivating example, which evaluates a new treatment for cognitive dysfunction in patients of primary biliary cholangitis (PBC) and Parkinson’s disease (PD). This clinical trial is led by Newcastle University; at the time of writing, it has been funded but not yet opened to participants. By stages and types of the chronic diseases, patients are to be recruited and stratified into three disjoint subgroups, that is, early-stage PBC, late-stage PBC, and PD. The PBC and PD basket trial thus comprises three subtrials. A continuous outcome measuring cognitive performance will be used as the clinical endpoint in each subtrial. Once the trial begins, patients within each subtrial will be randomized to receive either the new treatment or a placebo.

For each subtrial |$k=1, \dots, K$|⁠, we suppose |$n_k$| patients are to be recruited. The binary indicator with respect to the treatment assignment is denoted by |$T_{ik}$| for patient |$i = 1, \dots, n_k$|⁠. Specifically, |$T_{ik}=1$| if patient |$i$| in subtrial |$k$| is allocated to the treatment and 0 if a placebo. From each patient, data are measured on covariates, denoted by |$\boldsymbol{z}_{ik} = \{z_{ik1}, \dots, z_{ikq} \}$|⁠, and a post-treatment outcome of clinical interest, denoted by |$y_{ik}$|⁠. We fit a linear regression model:
(2.1)
where |$\gamma_k$| is a |$(1\times q)$| coefficient vector representing the main effects of the covariates, and |$\theta_k$| for the treatment effect in a subtrial |$k$| is of our primary estimation interest. Within each subtrial
leads to an estimator of the treatment effect over a placebo, denoted by |$\Delta (T_{ik}) = \theta_k$|⁠.
A more accurate estimate for |$\theta_k$| helps to support the decision as to whether a phase III trial of the treatment should go ahead, and in which patient subgroup(s). Moreover, inferences based on evidence of the basket trial can inform the design of a future trial, such as computing the sample size to sufficiently power the trial. With respect to either continuing or halting the clinical evaluation, trial decisions per subtrial can be framed as a decision between Go and No-go. Given a hypothesis for the treatment effect per subtrial |$k$|⁠:
investigators may base the trial decision on probabilistic inferences about |$\theta_k$| to a threshold |$\delta_U$|⁠, which represents the magnitude of improvement required to declare a clinical benefit of the new treatment, in the Bayesian framework (Spiegelhalter and others, 1994). One example is to compute the posterior probability of |$\theta_k$| exceeding the threshold |$\delta_U$|⁠, in which it is not uncommon to pre-specify |$\delta_U$| as a value greater than 0. Here, we define a formal decision criterion: (i) a Go decision will be taken for subtrial |$k$| if |$\mathbb{P}(\theta_k > \delta_U) > \zeta$|⁠, otherwise (ii) a No-go. This criterion refers to a quantity |$\zeta \in [0, 1]$| as the level of evidence that would be required for the new treatment to compellingly provide an improvement over the control. Choices such as |$\zeta = 0.90$| may be appropriate. In what follows, we develop a novel Bayesian model that leverages complementary data from the most consistent subtrials for estimating |$\theta_k$|⁠.

3. Methods

Suppose the patient-level trial data can be modeled using a linear regression model in the form of (2.1), and |$\theta_k$| specific to a subtrial is a continuous location parameter. Letting |$\boldsymbol{x}_k$| denote the data from subtrial |$k$| and |$\pi_{0k}(\theta_k)$| the initial vague (operational) prior, the information that subtrial |$k$| carries can be represented by an operational posterior,
(3.1)
where |$\mathcal{L}(\cdot)$| denotes the likelihood function. We label the subtrial of our contemporary analysis interest with |$k^\star$|⁠. For estimating each |$\theta_{k^\star}$|⁠, information from a complementary subtrial |$k\neq k^\star$| can be leveraged through a commensurate predictive prior (CPP). Inspired by Hobbs and others (2011, 2012, we introduce a precision parameter, denoted by |$\nu_{kk^\star}$|⁠, which parameterizes the consistency between |$\theta_k$| and |$\theta_{k^\star}$|⁠. Conditional on the information commensurability, each predictive distribution for |$\theta_{k^\star}$| can be stipulated as
(3.2)
where |$\theta_k$| follows the operational posterior in (3.1) and the unknown |$\nu_{kk^\star}$| determines the degree of borrowing. This then leads to a CPP as
(3.3)
where |$\phi(\cdot)$| is the standard normal probability density function, and |$g_k(\nu_{kk^\star})$| is the prior for |$\nu_{kk^\star}$|⁠. If the complementary subtrial |$k$| is consistent, with |$\nu_{kk^\star}\gg 0$|⁠, the marginal CPP for |$\theta_{k^\star}$| converges to the operational posterior |$\pi_k(\theta_k | \boldsymbol{x}_k)$| that is updated from |$\pi_{0k}(\theta_k)$|⁠, so that the complementary data |$\boldsymbol{x}_k$| will be largely incorporated into subtrial |$k^\star$|⁠. Otherwise, with |$\nu_{kk^\star}\approx 0$|⁠, |$\boldsymbol{x}_k$| are discarded and the marginal CPP for |$\theta_{k^\star}$| tends towards the operational prior for |$\theta_{k^\star}$|⁠.
A spike-and-slab distribution (Mitchell and Beauchamp, 1988) has been found suitable as a prior for the normal precision parameter (Hobbs and others, 2012, 2013). This is a discrete mixture prior with two components, which can provide us a means for robust borrowing. Specifically, we define |$g_k(\nu_{kk^\star})$| as locally uniform between two limits, |$0\leq \mathcal{B}_1 < \mathcal{B}_2$|⁠, except some portion of probability mass concentrated at a point, |$\mathcal{S} > \mathcal{B}_2$|⁠, such that
(3.4)
where |$w_{kk^\star}$| is the probability that |$\nu_{kk^\star} \sim \text{Unif}(\mathcal{B}_1, \mathcal{B}_2)$|⁠. Given a subtrial |$k$| with sufficiently consistent treatment effect, we expect strong borrowing from |$\boldsymbol{x}_k$|⁠. This requires that the normal precision parameter |$\nu_{kk^\star}$| in (3.3) takes a large value, which is possible when the “spike” prior has a large probability mass, that is, when |$w_{kk^\star}$| is sufficiently small. Otherwise, |$\boldsymbol{x}_k$| is down-weighted by allocating more probability mass to the “slab” prior. We interpret |$w_{kk^\star}$| as our prior opinion (on the probability scale) about the pairwise subtrial incommensurability.
To determine |$w_{kk^\star}$|⁠, we propose using a discrepancy measure that quantifies the distributional divergence between the posterior density distributions, |$\pi_k(\theta_k | \boldsymbol{x}_k)$| and |$\pi_{k^\star}(\theta_{k^\star} | \boldsymbol{x}_{k^\star})$|⁠, arising from the same operational prior. One viable option is the Hellinger distance (Dey and Birmiwal, 1994):
(3.5)

Derivable from the Cauchy–Schwarz inequality, the computed Hellinger distance |$d_{\phi_H}(\pi_{\theta_k}, \pi_{\theta_{k^\star}}\!)$| will strictly fall into the interval of [0, 1], which is convenient for characterizing the probability that treatment effects of any two subtrials are regarded as dissimilar. We may then relate the “slab” prior probability |$w_{kk^\star}$| with the computed Hellinger distance, simply by stipulating |$w_{kk^\star} = d_{H}(\pi_{\theta_k}, \pi_{\theta_{k^\star}}\!)$|⁠. In an extreme case that two subtrials are perfectly consistent, i.e., |$d_{H}(\pi_{\theta_k}, \pi_{\theta_{k^\star}}\!) \to 0$|⁠, the whole probability mass will be concentrated at the “spike,” |$\mathcal{S}$|⁠. In turn, this will result in a notably small normal variance |$1/\nu_{kk^\star}^2$| of the CPP in (3.3) such that the complementary subtrial data |$\boldsymbol{x}_k$| can be fully incorporated. In addition, knowing the upper and lower bounds, which are 0 and 1 for the Hellinger distance, makes it easier to standardize a collection of pairwise discrepancy measurements. This can help quantify the relative importance of all other subtrials |$k \neq k^\star$| to form a prior for |$\theta_{k^\star}$| in circumstances of |$K\geq 3$|⁠. More importantly, the Hellinger distance is preferred over other distributional discrepancy measures, because of its desirable properties of symmetry and invariance to any transformation, for example, logarithmic, exponential, or inverse of square root, of both densities (Jeffreys, 1961). As a symmetric measure of discrepancy, |$d_{H}(\pi_{\theta_k}, \pi_{\theta_{k^\star}}\!) = d_{H}(\pi_{\theta_k^\star}, \pi_{\theta_{k}})$|⁠. In a basket trial with |$K=2$| and no a priori assumption about which subtrial has stronger treatment effect, using the Hellinger distance to define the spike-and-slab prior will result in the same magnitude of down-weighting or leveraging subtrial data |$\boldsymbol{x}_1$| to subtrial 2 and |$\boldsymbol{x}_2$| to subtrial 1. Whilst the invariance property ensures that the computed Hellinger distance |$d_{H}(\pi_{\theta_k}, \pi_{\theta_{k^\star}}\!)$| truly reflects the discrepancy between the treatment effect distributions in different patient subtrials, when the linear regression model (2.1) may be parameterized in a different way, for example, with the treatment effect represented by the exponential of |$\theta_k$|⁠. Given the invariance, we know that |$d_{H}(\pi_{\theta_k}, \pi_{\theta_{k^\star}}\!) = d_{H}(\pi_{\exp(\theta_k)}, \pi_{\exp(\theta_{k^\star})})$|⁠.

For any |$\theta_{k^\star}$|⁠, there exist |$(K-1)$| complementary subtrials as the sources where the possible values can be drawn upon. We now turn our attention to combining the |$(K-1)$| CPPs for synthesizing {information from these complementary subtrials} for basket trials with |$K\geq 3$|⁠. The CPPs for point-to-point borrowing are robust in that inconsistent information from any subtrial |$k { \neq k^\star}$| can be down-weighted through the “slab” prior placed on each |$\nu_{kk^\star}$|⁠, incorporating the use of pairwise Hellinger distance. If values of |$\nu_{kk^\star}$| are appropriately specified, we may obtain |$(K-1)$| predictive priors marginally on |$\theta_{k^\star}$|⁠:
(3.6)
where each one separately may be represented as approximately a |$N(\lambda_k, \xi_k^2)$| distribution for ease of notation. We further see |$\theta_{k^\star}$| as a weighted sum of |$(K-1)$| hypothetical random variables:
(3.7)
where we suppose each |$\tilde{\theta}_k \sim N(\lambda_k, \xi_k^2)$| and the weight vector |$\boldsymbol{p}_{k^\star} = (p_{1k^\star}, p_{2k^\star}, \dots)$|⁠, containing |$(K-1)$| elements, satisfies |$\sum \boldsymbol{p}_{k^\star} = 1$|⁠. This further gives
(3.8)
in which |$\boldsymbol{x}_{(-k^\star)}$| denotes the entire trial data excluding those from subtrial |$k^\star$|⁠. This allows for leveraging information from multiple sources. This marginal predictive prior, denoted by |$\pi^\text{MPP}(\theta_{k^\star} | \boldsymbol{x}_{(-k^\star)})$|⁠, is updated to the posterior with the subtrial data |$\boldsymbol{x}_{k^\star}$| using Bayes’ Theorem:
(3.9)
so as to inform decision making for subtrial |$k^\star$|⁠.
To allocate a sensible weight to each complementary subtrial, we expect |$p_{kk^\star}$| to take large values (close to 1) if subtrials |$k$| and |$k^\star$| are consistent. Specification of these weights |$p_{kk^\star}$| may be guided by the Hellinger distance, |$d_{H}(\pi_{\theta_k}, \pi_{\theta_{k^\star}}\!)$| (labeled as |$d_{kk^\star}$| for notation simplicity), which we use to measure the commensurability between subtrials |$k$| and |$k^\star$|⁠. For a basket trial with |$K$| subtrials, the Hellinger distances can be organized as a |$K\times K$| symmetric matrix:
where each column |$k^\star=1,\dots,K$| describes the pairwise distributional discrepancy between our target parameter |$\theta_{k^\star}$| and a source parameter |$\theta_k$|⁠, for |$k \neq k^\star$|⁠. Hellinger distances contained in each column |$k^\star$| can be normalized into a series of weights |$p_{kk^\star} \in [0, 1]$|⁠. For this, we will simply stipulate a decreasing function of |$d_{kk^\star}$| as
(3.10)
where a pre-defined |$s_0$| governs how much influence the Hellinger distance has on the weight to be computed. With a value of |$s_0\gg d_{kk^\star}$|⁠, nearly the same weight will be allocated irrespective of the pairwise Hellinger distances that could be very different from each other. Whereas, with |$s_0 \to 0^+$|⁠, the weight corresponding to a Hellinger distance close to 0 tends to be 1. In Section A of the supplementary material available at Biostatistics online, we illustrate properties of this transformation analytically in more details. Weights converted from the pairwise Hellinger distances following (3.10) can then be assigned to each |$\tilde{\theta}_k$|⁠, as was stipulated in (3.7).

We would like to add one more note here. The stipulation of weights |$p_{kk^\star}$| summing to 1 does not restrict the potential of full borrowing of information in situations, where all the |$(K-1)$| complementary subtrials are perfectly consistent with subtrial |$k^\star$|⁠. In such a scenario, the Hellinger distance |$d_{kk^\star} = 0$|⁠, suggesting that the CPPs marginally on |$\theta_{k^\star}$|⁠, represented by |$N(\lambda_k, \xi_k^2)$|⁠, have identical mean and variance to those of |$\pi_{k^\star} (\theta_{k^\star} | \boldsymbol{x}_{k^\star})$|⁠. Moreover, equal weights, i.e., |$p_{kk^\star} = \frac{1}{K-1}$|⁠, will be allocated to the complementary subtrials, |$k \neq k^\star$|⁠. Following (3.8), a predictive prior |$\pi^\text{MPP}(\theta_{k^\star} | \boldsymbol{x}_{(-k^\star)})$| would be obtained with its mean as |$\lambda_k$| and variance as |$\frac{1}{K-1} \xi_k^2$|⁠. With the inclusion of |$\boldsymbol{x}_{k^\star}$|⁠, the posterior mean and variance become |$\lambda_k$| and |$\frac{1}{K} \xi_k^2$|⁠, respectively. This indicates all the complementary subtrial data |$\boldsymbol{x}_{(-k^\star)}$| have been fully incorporated, and our methodology converges to the approach of complete pooling in the case of perfect information consistency.

4. Simulation study

In this section, we illustrate applications of the proposed analysis methodology, and compare it with alternative Bayesian models that may be used for analyzing basket trials through a simulation study. Our trial examples are hypothetical, but can represent the situation of a phase II basket trial, for which the analyses are performed to enable sharing of information. The main characteristics of the basket trials we simulate are based on the motivating PBC and PD trial described in Section 2. For illustrative purposes, we assume six subtrials instead of three, as typically a fairly large number of patient subgroups would be examined; for example, Hyman and others (2015, 2018 report the results from basket trials with six and nine subtrials, respectively.

4.1. Basic settings

We simulate basket trials with |$K=6$| subtrials of unequal sample sizes: |$n_k \in \{10, 10, 14, 16, 20, 20 \}$|⁠, respectively. Treatment allocation to individual patients follows a block approach by subtrial, with exactly half of the subtrial sample size to receive the treatment and the other half to placebo. We simulate two covariates for each patient as |$z_{ik1} \sim N(6, 0.2^2)$| and |$z_{ik2} \sim N(4, 0.2^2)$|⁠, for |$i = 1, \dots, 90$|⁠. In particular, |$z_{ik1}$| is assumed to be the baseline measurement of the clinical endpoint at the time of randomization. We generate the trial data from a linear regression: for |$i = 1, \dots, 90, \, k = 1, \dots, 6$|⁠,
(4.1)
where we set the “true” parameter values for the intercept and main effects of baseline covariates to |$\gamma_{0k} = 5$|⁠, |$\gamma_{1k} = 3,$| and |$\gamma_{2k} = 1.3$|⁠, and the inter-patient standard deviation |$\sigma = 0.4$| to generate the data. Basket trials are simulated under nine scenarios (listed in Table 1), which feature varying treatment effect sizes |$\theta_k$| and different degrees of heterogeneity across subtrials. All sets of the “true” values of |$\theta_k$| are realizations from distinct multivariate normal distributions: we stipulated a high pairwise correlation coefficient (0.8) for |$\theta_k$| of the consistent subtrials and a low pairwise correlation coefficient (0.1) between |$\theta_k$| of an extreme subtrial and of one else from the rest. In particular, scenarios 7 and 8 can be seen as “mixed null” scenarios, and scenario 9 is a “global null”.
Table 1.

Simulation scenarios with specification of the “true” treatment effect |$\theta_k$| to compare the Bayesian analysis models. The figure in bold indicates a 0 or low treatment effect.

 Subtrial |$k$| (Sample size, |$n_k$|⁠)}
Scenario1 (⁠|$n_1 = 10$|⁠)2 (⁠|$n_3 = 10$|⁠)3 (⁠|$n_2 = 14$|⁠)4 (⁠|$n_5 = 16$|⁠)5 (⁠|$n_4 = 20$|⁠)6 (⁠|$n_6 = 20$|⁠)
10.490.670.540.430.790.35
20.350.370.801.301.380.40
30.290.770.680.750.330.30
40.591.171.020.950.130.75
50.450.450.450.450.450.45
60.300.300.300.300.300.30
700000.370.37
80.3300.820.9000.83
9000000
 Subtrial |$k$| (Sample size, |$n_k$|⁠)}
Scenario1 (⁠|$n_1 = 10$|⁠)2 (⁠|$n_3 = 10$|⁠)3 (⁠|$n_2 = 14$|⁠)4 (⁠|$n_5 = 16$|⁠)5 (⁠|$n_4 = 20$|⁠)6 (⁠|$n_6 = 20$|⁠)
10.490.670.540.430.790.35
20.350.370.801.301.380.40
30.290.770.680.750.330.30
40.591.171.020.950.130.75
50.450.450.450.450.450.45
60.300.300.300.300.300.30
700000.370.37
80.3300.820.9000.83
9000000
Table 1.

Simulation scenarios with specification of the “true” treatment effect |$\theta_k$| to compare the Bayesian analysis models. The figure in bold indicates a 0 or low treatment effect.

 Subtrial |$k$| (Sample size, |$n_k$|⁠)}
Scenario1 (⁠|$n_1 = 10$|⁠)2 (⁠|$n_3 = 10$|⁠)3 (⁠|$n_2 = 14$|⁠)4 (⁠|$n_5 = 16$|⁠)5 (⁠|$n_4 = 20$|⁠)6 (⁠|$n_6 = 20$|⁠)
10.490.670.540.430.790.35
20.350.370.801.301.380.40
30.290.770.680.750.330.30
40.591.171.020.950.130.75
50.450.450.450.450.450.45
60.300.300.300.300.300.30
700000.370.37
80.3300.820.9000.83
9000000
 Subtrial |$k$| (Sample size, |$n_k$|⁠)}
Scenario1 (⁠|$n_1 = 10$|⁠)2 (⁠|$n_3 = 10$|⁠)3 (⁠|$n_2 = 14$|⁠)4 (⁠|$n_5 = 16$|⁠)5 (⁠|$n_4 = 20$|⁠)6 (⁠|$n_6 = 20$|⁠)
10.490.670.540.430.790.35
20.350.370.801.301.380.40
30.290.770.680.750.330.30
40.591.171.020.950.130.75
50.450.450.450.450.450.45
60.300.300.300.300.300.30
700000.370.37
80.3300.820.9000.83
9000000
When fitting the Bayesian analysis models, we consider random effects for |$\gamma_{0k}$|⁠, |$\gamma_{1k,}$| and |$\gamma_{2k}$|⁠:
setting an uninformative normal prior |$N(0, 5^2)$| on each |$\chi_j$| and a half-normal prior |$HN(z)$| on each |$\epsilon_{j}$|⁠, for |$j = 0, 1, 2$|⁠. Here, |$HN(z)$| is defined as a |$N(0, z^2)$|⁠, truncated to cover the interval |$(0, \infty)$|⁠. The use of a half-normal prior is consistent with the recommendation by Cunanan and others (2019). We stipulate |$\epsilon_j \in HN(1)$|⁠, of which the prior and 95% credible interval are 0.674 and (0.031, 2.241), to permit very limited information borrowing across subtrials for estimating |$\gamma_{jk}, \, j=0, 1, 2,\, k = 1, \dots, 6$|⁠. In the following, we describe additional specifications to implement the Bayesian models that estimate |$\theta_k$| with or without information leveraged from other subtrials.

To implement our methodology for estimating |$\theta_k$|⁠, we choose setting |$\mathcal{B}_1 = 0.01, \, \mathcal{B}_2 = 1,$| and |$\mathcal{S} = 100$| for the spike-and-slab prior on each |$\nu_{kk^\star}$|⁠. The “slab” prior is very uninformative and is sufficient to fully discard the entire information from an external subtrial |$k$|⁠; the “spike” prior is specified so that the proposed methodology can be reduced to complete pooling in situations of perfect information commensurability. Justification of choosing this spike-and-slab prior is provided in Section B of the supplementary material available at Biostatistics online. An initial vague prior |$\pi_{0k}(\theta_k)$| is used for |$\theta_k, \, k = 1, \dots, 6$|⁠; we use |$N(0, 10^2)$| such that the 95% prior credible interval is (⁠|$-$|19.560, 19.560), covering a wide range of possible |$\theta_k$|⁠. To yield a large (small) weight |$p_{kk^\star}$| corresponding to a small (large) Hellinger distance, we let |$s_0 = 0.15$| for the transformation. Nevertheless, we study how different stipulations of |$s_0$| may impact on the identification of the most consistent subtrial(s) in Section C of the supplementary material available at Biostatistics online, exploring |$s_0 = 0.25, 0.35, 0.45$| in addition. We are interested in comparing the proposed methodology with

  • (1) Standard hierarchical model (HM) that assumes fully exchangeable parameters: |$\theta_k | \mu_, \tau \sim N(\mu, \tau^2)$| with |$\mu \sim N(0, 10^2)$| and |$\tau\sim HN(0.125)$|⁠. The median and 95% credible interval of |$HN$|(0.125) are 0.084 and (0.004, 0.280), respectively.

  • (2) Bayesian model with no borrowing of information. Trial data are stratified by subtrials for stand-alone analyses, setting each |$\theta_k \sim N(0, 10^2)$|⁠. Random effects for |$\gamma_{0k}$|⁠, |$\gamma_{1k}$|⁠, and |$\gamma_{2k}$| therefore cannot be estimated; we then place a |$N(0, 5^2)$| prior on each.

  • (3) EXNEX model by Neuenschwander and others (2016), with equal prior probabilities of exchangeability (EX) and non-exchangeability (NEX). The EX distribution has the same parameter configuration as what was stipulated for the standard HM above, and the six NEX distributions are all set to be |$N(0, 10^2)$|⁠.

Comparison is in terms of the precision of their posterior point estimates, more specifically, the posterior means, for |$\theta_k$| that could be measured by an analog of bias and mean squared error (MSE):
where |$M$| is the total number of replicates in the simulation study, and |$\bar{\theta}_k^m$| denote the posterior means of |$\theta_k$| for the |$m$|th simulated basket trial. These metrics will be reported by subtrial. We also compare these Bayesian analysis models with respect to the trial operating characteristics, such as the subtrial-wise error rates. Corresponding to the frequentist type I error rate and statistical power, we will report proportions of the simulated trials with
  • an erroneous Go decision in a subtrial where the “true” |$\theta_k =0$|⁠, and

  • a correct Go decision in a subtrial where the “true” |$\theta_k > 0$|⁠,

respectively. An overall (analog of) type I error rate, computed as proportion of trials with an erroneous Go decision made for at least one subtrial, will also be reported for scenarios with null |$\theta_k$|’s. In particular, a Go will be allocated if |$\mathbb{P}(\theta_k > \delta_U) > 0.975$|⁠. Trial operating characteristics will be evaluated setting |$\delta_U=0.25, 0.30$|⁠. We are especially interested in Scenarios 7–9, the mixed or global null scenarios, to report the (analog of) type I error rate by both subtrial and overall.

Results are summarized by averaging across 10 000 replicates of the basket trial. The Bayesian analysis models are fitted in R version 3.4.4 using the R2OpenBUGS package based on two parallel chains, with each running the Gibbs sampler for 10 000 iterations that follow a burn-in of 3000 iterations. OpenBUGS code, together with R functions, to implement each of the Bayesian analysis models is available at https://github.com/BasketTrials/Bayesian-analysis-models.

4.2. Results

Figure 1 compares the performance of the posterior estimators yielded by the Bayesian models. It shows that the proposed methodology produces smaller bias and MSE compared with the standard HM and EXNEX, across nearly all scenarios. Point estimators based on the standard HM and EXNEX work well in scenarios 5, 6, and 9 as the small-to-moderate variability between |$\theta_k$|s can be addressed by setting |$\tau \sim HN(0.125)$|⁠. The proposed analysis methodology, in contrast, distinguishes the heterogeneity more sensitively. Much smaller bias and MSE are yielded when estimating |$\theta_k$| for basket trials with divergent treatment effects across subtrials; see, for example, scenario 2. In situations where information from other subtrials should be largely discounted, referring to scenarios 7 and 8, our methodology generates comparatively similar bias to the no borrowing approach but with a smaller MSE. This is because information from subtrials with a non-zero treatment effect, can be largely discounted to formulate the marginal predictive prior for e.g., |$\theta_2$| in scenario 8.

Bias and mean squared error of the posterior estimators for $\theta_k$ based on the Bayesian models.
Fig. 1.

Bias and mean squared error of the posterior estimators for |$\theta_k$| based on the Bayesian models.

We have also compared the Bayesian analysis models in terms of the average width of the posterior credible intervals for |$\theta_k$|⁠. In contrast to the alternative Bayesian models, the proposed methodology yields posterior estimates with narrower credible intervals when there is at least one consistent complementary subtrial; see Figure S1 of the supplementary material available at Biostatistics online. When using the proposed analysis methodology for borrowing of information, investigators may be interested in the weight eventually allocated to each external subtrial for obtaining the marginal predictive prior. In Figures S3 and S4 of the supplementary material available at Biostatistics online, we comment with regards to scenarios 4 (divergent |$\theta_k$|⁠) and 5 (consistent |$\theta_k$|⁠) on the weight allocation based on the assessed pairwise commensurability, and illustrate how the pre-specified value of |$s_0$| would impact the sensitivity of the proposed methodology to identify the most commensurate subtrial(s).

Table 2 quantifies the impact of using different Bayesian models on the error rate control under the null hypothesis. Here, we report the (analog of) type I error rate for scenarios involving at least one subtrial with |$\theta_k =0$|⁠, setting |$\delta_U=0.25$|⁠. Comparisons where we set |$\delta_U=0.30$| are given in Table S1 of the supplementary material available at Biostatistics online. For scenario 9 (global null), all the four Bayesian analysis models control the error rate well following the decision criterion. Nevertheless, the approaches that enable borrowing of information, i.e., standard HM, EXNEX, and the proposed methodology, have resulted in smaller type I error rates, compared with the approach of no borrowing, since incorporating consistent information from other subtrials reassures that making a Go decision is not justified. Our approach produces slightly higher error rates than standard HM and EXNEX, as for some simulated trials information from subtrials with a similar low treatment effect may be shared (but not with those of a null |$\theta_k$|’s), leading to a higher chance to reject the null hypothesis. In scenario 8 where some subtrials have large treatment effects, we observe a higher error rate when using standard HM and EXNEX approaches, compared with the proposed approach. We note that a difference in the sample sizes of subtrials 2 and 4 or 5 (for all scenarios) leads to disparate {magnitudes of the error rate} using the same approach in the same null scenario: those for subtrial 2 are regularly larger than subtrial 4 or 5. More explicitly, when reacting to a data conflict, a larger sample size of subtrial 4 or 5 provides more evidence to evaluate the plausibility of down-weighting; estimation of |$\theta_4$| or |$\theta_5$| thus has increased chances to avoid being overwhelmed by the complementary information.

Table 2.

Comparison of the Bayesian analysis models with respect to the analogue of type I error rate: null hypothesis is erroneously rejected under scenarios of any |$\theta_k =0$|⁠, setting |$\delta_U=0.25$| and |$\zeta = 0.975$|⁠.

Subtrial
 123456Overall
Scenario 7Standard HM0.00000.00000.00000.00000.0000
 No borrowing0.00360.00770.00560.01000.0269
 EXNEX0.00010.00030.00020.00000.0006
 Proposed approach0.00730.00890.00020.00020.0166
Scenario 8Standard HM0.01550.00800.0207
 No borrowing0.00770.00080.0085
 EXNEX0.01950.00560.0251
 Proposed approach0.01550.00170.0172
Scenario 9Standard HM0.00000.00000.00000.00000.00000.00000.0000
 No borrowing0.00360.00770.00560.01000.00080.00060.0283
 EXNEX0.00000.00010.00000.00000.00000.00000.0001
 Proposed approach0.00140.00280.00000.00000.00020.00200.0064
Subtrial
 123456Overall
Scenario 7Standard HM0.00000.00000.00000.00000.0000
 No borrowing0.00360.00770.00560.01000.0269
 EXNEX0.00010.00030.00020.00000.0006
 Proposed approach0.00730.00890.00020.00020.0166
Scenario 8Standard HM0.01550.00800.0207
 No borrowing0.00770.00080.0085
 EXNEX0.01950.00560.0251
 Proposed approach0.01550.00170.0172
Scenario 9Standard HM0.00000.00000.00000.00000.00000.00000.0000
 No borrowing0.00360.00770.00560.01000.00080.00060.0283
 EXNEX0.00000.00010.00000.00000.00000.00000.0001
 Proposed approach0.00140.00280.00000.00000.00020.00200.0064

Overall: the proportion of trials with erroneous Go decision for at least one subtrial.

Table 2.

Comparison of the Bayesian analysis models with respect to the analogue of type I error rate: null hypothesis is erroneously rejected under scenarios of any |$\theta_k =0$|⁠, setting |$\delta_U=0.25$| and |$\zeta = 0.975$|⁠.

Subtrial
 123456Overall
Scenario 7Standard HM0.00000.00000.00000.00000.0000
 No borrowing0.00360.00770.00560.01000.0269
 EXNEX0.00010.00030.00020.00000.0006
 Proposed approach0.00730.00890.00020.00020.0166
Scenario 8Standard HM0.01550.00800.0207
 No borrowing0.00770.00080.0085
 EXNEX0.01950.00560.0251
 Proposed approach0.01550.00170.0172
Scenario 9Standard HM0.00000.00000.00000.00000.00000.00000.0000
 No borrowing0.00360.00770.00560.01000.00080.00060.0283
 EXNEX0.00000.00010.00000.00000.00000.00000.0001
 Proposed approach0.00140.00280.00000.00000.00020.00200.0064
Subtrial
 123456Overall
Scenario 7Standard HM0.00000.00000.00000.00000.0000
 No borrowing0.00360.00770.00560.01000.0269
 EXNEX0.00010.00030.00020.00000.0006
 Proposed approach0.00730.00890.00020.00020.0166
Scenario 8Standard HM0.01550.00800.0207
 No borrowing0.00770.00080.0085
 EXNEX0.01950.00560.0251
 Proposed approach0.01550.00170.0172
Scenario 9Standard HM0.00000.00000.00000.00000.00000.00000.0000
 No borrowing0.00360.00770.00560.01000.00080.00060.0283
 EXNEX0.00000.00010.00000.00000.00000.00000.0001
 Proposed approach0.00140.00280.00000.00000.00020.00200.0064

Overall: the proportion of trials with erroneous Go decision for at least one subtrial.

What may also be interesting to investigators is the potential increase in statistical power to demonstrate the treatment effect, by incorporating information from complementary subtrials. Figure 2 visualizes the comparison of the Bayesian analysis models in terms of correctly declaring a clinical benefit in subtrial |$k$|⁠, setting |$\delta_U=0.25$|⁠; see Figure S5 of supplementary material available at Biostatistics online that visualizes the comparison setting |$\delta_U=0.30$|⁠. Across nearly all subtrials of the simulated basket trials in scenarios 1–5, the Bayesian approaches of borrowing show substantial advantages over the approach of no borrowing. When comparing between the Bayesian approaches of borrowing, we check how the chance would be for a subtrial with comparatively low treatment effect to be concluded with a correct Go decision, in the presence of consistent subtrials. Looking at scenarios 3, for example, our approach leads to higher statistical power for subtrials 2 and 6 compared with other Bayesian models.

Comparison of the Bayesian analysis models with respect to the analog of statistical power: null hypothesis is correctly rejected in the presence of a treatment effect per subtrial, setting $\delta_U=0.25$ and $\zeta = 0.975$.
Fig. 2.

Comparison of the Bayesian analysis models with respect to the analog of statistical power: null hypothesis is correctly rejected in the presence of a treatment effect per subtrial, setting |$\delta_U=0.25$| and |$\zeta = 0.975$|⁠.

Scenarios 5 and 6 represent situations where all subtrials are commensurate, but the former has a larger treatment effect size. Given our criterion that |$\mathbb{P}(\theta_k > 0.25) > 0.975$| for a Go decision, scenario 6 with all |$\theta_k = 0.30$| is particularly a hard scenario to allocate a Go. Compared with the approaches that enable borrowing, using the approach of no borrowing results in subtrials 3 and 4 having a slightly higher probability of correct Go decision. However, this does not mean the no borrowing approach is superior, since standard HM and EXNEX produce estimates of |$\theta_k$| with a similar level of bias, but much smaller posterior variances than the approach of no borrowing. These results are observed from Figure 1 and Figure S2 of the supplementary material available at Biostatistics online. More informative posterior distributions for |$\theta_k$| nevertheless do not necessarily mean a higher interval probability of |$\theta_k>\delta_U$|⁠: it is possible that diffuse posteriors for |$\theta_k$| obtained from the approach of no borrowing has comparable or even higher chances to exceed the level |$\gamma = 0.975$|⁠. When the consistent “true” |$\theta_k$|s increase from 0.30 (scenario 6) to 0.45 (scenario 5), we begin to observe the efficiency gains by using Bayesian models that permit borrowing of information than no borrowing. The proposed methodology appears to present a larger absolute gain in power compared with the alternative models, although we note that the absolute gain in power can be a misleading metric due to the non-linear shape of the power curve.

In scenario 7, due to the prior specification of |$\tau \sim HN(0.125)$| being incapable of accounting for the variability across subtrials, both standard HM and EXNEX shrink |$\theta_5$| and |$\theta_6$| excessively towards the mean effect across subgroups. This in turn dilutes the treatment effect in corresponding subtrials. Consequently, it seems better to implement the approach of no borrowing for possibility of declaring a positive treatment effect. Our approach presents slightly higher power than the no borrowing approach as there is some consistent information to be incorporated from a complementary subtrial. In scenario 8, our approach performs similarly to the alternatives, but slightly better for subtrial 6 due to leveraging consistent information.

We note this simulation study does not consider cases of basket trials involving rare disease subgroups, where certain subtrials can have a much smaller sample size than others. We present several hypothetical data examples in Section D of the supplementary material available at Biostatistics online to comment on the sensitivity to the difference in subtrial sample sizes.}

5. Discussion

The paradigm shift towards precision medicine opens new avenues for novel trial designs and analysis methodologies to deliver more tailored healthcare to patients. Basket trials emerge as a new class of efficient approaches to oncology drug development in the era of precision medicine, offering a framework to evaluate the treatment effect together with its heterogeneity in various patient subgroups. In this article, we have extended the key ideas of a basket trial approach to disease areas outside of oncology, and proposed a new Bayesian model to enable borrowing of information from the most commensurate subtrial(s) without requiring a priori clustering of similar subgroups. By including an information discrepancy measure, it can discern the degree of borrowing from complementary subtrials. In particular, the Hellinger distance plays a dual role in our methodology: (i) it gauges the maximum amount of information that could be leveraged from a specific subtrial |$k \neq k^\star$| to estimate |$\theta_{k^\star}$|⁠; (ii) when there are |$K \geq 3$| subtrials, it determines the weight allocation to reflect the relative importance for appropriate borrowing of information.

The Bayesian analysis methodology in Section 3 has been developed assuming the basket trial generates continuous response data. However, it could be easily generalized to analyze other types of data that can be fitted using a generalized linear model for non-Gaussian error distributions. For example, it would be readily applicable to analyzing phase II basket trials that use binary endpoints: after fitting the patient-level data per subtrial with a logistic regression model, our approach may be considered to stipulate commensurate predictive priors, informed by the pairwise Hellinger distance, for the subtrial-specific treatment effect parameters to permit borrowing of information from the most consistent subtrial(s). For down-weighting in cases of a data conflict suggested by the Hellinger distance, we did not delve into calibration of the “slab” prior but simply used a very uninformative uniform distribution, which ensures data from an inconsistent subtrial can be discarded. When using the proposed methodology in practice, we recommend specifying the spike-and-slab prior based on some preliminary knowledge about the magnitude of variances of |$\theta_k$|⁠. Specification of the “slab” prior may particularly deserve future research to exploit the advantage of the proposed methodology. We refer to Mutsvari and others (2016) as a relevant investigation, which focuses on choosing the diffuse component of a mixture prior for robust inferences. We also note that the exploration may be closely linked with the users’ stipulation of the prior probability weight, which is based upon the Hellinger distance, to be attributed to the “slab” prior.

In our simulation study, we have considered imbalance subtrial sizes. Simulation results show that our methodology can down-weight inconsistent information from a subtrial that has larger sample sizes. For illustrative purposes, we have supposed equal randomization ratio between treatment groups within a subtrial. Investigators can pragmatically determine the randomization ratio as well as the subtrial-wise sample size for a basket trial that may base decision making on our analysis methodology. Potentially, more dosage groups of the same treatment in each subtrial can be considered. Also, many have shown a great interest in sequential basket trials (Simon and others, 2016; Cunanan and others, 2017; Hobbs and Landin, 2018) with interim look(s) incorporated for the possibility of, say, terminating enrollment of patients in ineffective subgroups. We note that the proposed Bayesian approach can be implemented with any number of analyses following a flexible timescale for interim decision making. There is no requirement of a minimum sample size per subtrial to carry out an interim look, due to the use of an initial operational prior |$\pi_{0k}(\theta_k)$| for computing the pairwise Hellinger distance. However, an inflation of type I error rate arising from such repeated significance tests would occur.

Throughout, we have restricted our focus onto basket trials, where the subtrials use the same endpoint across patient subgroups. In many disease areas, multiple endpoints (FDA, 2017) may often arise, as it could involve various dimensions to conclude on the clinical benefit. One common situation is to continue monitoring toxicity in addition to the assessment of efficacy (Bryant and Day, 1995; Tournoux and others, 2007). With regards to this, our approach could be extended in several ways. For instance, in cases where the set of multiple endpoints remain the same across subgroups, it would be straightforward to establish a joint probability model and derive the pairwise Hellinger distance between multivariate probability densities (Pardo, 2005). Suitable alternatives include separating the discussion about borrowing of information by endpoint. A unified utility function may then be adopted for trial decision making based on evidence on multiple endpoints. In another more complex setting where the efficacy endpoint, for example, could be distinct but correlated across subgroups, one might need to translate the subtrial data onto a common scale in order to adapt the present approach. Ideas could be drawn from Zheng and others (2020), where incorporation of supplementary data recorded on a different measurement scale has been discussed in the context of phase I clinical trials.

6. Software

Software in the form of OpenBUGS code together with R functions is available on GitHub (https://github.com/BasketTrials/Bayesian-analysis-models).

Acknowledgments

The authors are thankful to the Associate Editor and two anonymous reviewers for carefully reading the manuscript and giving very helpful comments. Conflict of Interest: None declared.

Funding

JW is funded by the UK Medical Research Council (MC_UU_00002/6).

References

Bernardo,
J. M.
(
1996
).
The concept of exchangeability and its applications
.
Far East Journal of Mathematical Sciences
4
,
111
122
.

Berry,
S. M.
,
Broglio,
K. R.
,
Groshen,
S.
and
Berry,
D. A.
(
2013
).
Bayesian hierarchical modeling of patient subpopulations: efficient designs of phase II oncology clinical trials
.
Clinical Trials
10
,
720
734
.

Bryant,
J.
and
Day,
R}.
(
1995
).
Incorporating toxicity considerations into the design of two-stage phase II clinical trials
.
Biometrics
51
,
1372
1383
.

Chu,
Y.
and
Yuan,
Y.
(
2018
).
A Bayesian basket trial design using a calibrated Bayesian hierarchical model
.
Clinical Trials
15
,
149
158
.

Cunanan,
K. M.
,
Iasonos,
A.
,
Shen,
R.
,
Begg,
C. B.
and
Gönen,
M.
(
2017
).
An efficient basket trial design
.
Statistics in Medicine
36
,
1568
1579
.

Cunanan,
K. M.
,
Iasonos,
A.
,
Shen,
R.
and
Gönen,
M.
(
2019
).
Variance prior specification for a basket trial design using Bayesian hierarchical modeling
.
Clinical Trials
16
,
142
153
.

Dey,
D. K.
and
Birmiwal,
L. R.
(
1994
).
Robust Bayesian analysis using divergence measures
.
Statistics & Probability Letters
20
,
287
294
.

Draper,
D.
(
1995
).
Assessment and propagation of model uncertainty
.
Journal of the Royal Statistical Society. Series B (Methodological)
57
,
45
97
.

Eisenhauer,
E. A.
,
Therasse,
P.
and
Bogaerts,
J.
and others. (
2009
).
New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1)
.
European Journal of Cancer
45
,
228
247
.

Fda. (

2017
).
Multiple Endpoints in Clinical Trials
.
Rockville, MD
:
US Food and Drug Administration
.

Hobbs,
B. P.
,
Carlin,
B. P.
,
Mandrekar,
S. J.
and
Sargent,
D. J.
(
2011
).
Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials
.
Biometrics
67
,
1047
1056
.

Hobbs,
B. P.
,
Carlin,
B. P.
and
Sargent,
D. J.
(
2013
).
Adaptive adjustment of the randomization ratio using historical control data
.
Clinical Trials
10
,
430
440
.

Hobbs,
B. P.
and
Landin,
R.
(
2018
).
Bayesian basket trial design with exchangeability monitoring
.
Statistics in Medicine
37
,
3557
3572
.

Hobbs,
B. P.
,
Sargent,
D. J.
and
Carlin,
B. P.
(
2012
).
Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models
.
Bayesian Analysis
7
,
639
674
.

Hyman,
D. M.
,
Piha-Paul,
S. A.
and
Won,
H.
et al. (
2018
).
HER kinase inhibition in patients with HER2- and HER3-mutant cancers
.
Nature
554
,
189
194
.

Hyman,
D. M.
,
Puzanov,
I.
and
Subbiah,
V.
et al. (
2015
).
Vemurafenib in multiple nonmelanoma cancers with BRAF V600 mutations
.
New England Journal of Medicine
373
,
726
736
.

Jeffreys,
H.
(
1961
).
Theory of Probability
.
Oxford University Press
:
Amen House, London
.

Liu,
R.
,
Liu,
Z.
,
Ghadessi,
M.
and
Vonk,
R.
(
2017
).
Increasing the efficiency of oncology basket trials using a Bayesian approach
.
Contemporary Clinical Trials
63
,
67
72
.

Madigan,
D.
and
Raftery,
A. E.
(
1994
).
Model selection and accounting for model uncertainty in graphical models using Occam’s window
.
Journal of the American Statistical Association
89
,
1535
1546
.

Mirnezami,
R.
,
Nicholson,
J.
and
Darzi,
A.
(
2012
).
Preparing for precision medicine
.
New England Journal of Medicine
366
,
489
491
.

Mitchell,
T. J.
and
Beauchamp,
J. J.
(
1988
).
Bayesian variable selection in linear regression
.
Journal of the American Statistical Association
83
,
1023
1032
.

Mutsvari,
T
,
Tytgat,
D
and
Walley,
R}.
(
2016
).
Addressing potential prior-data conflict when using informative priors in proof-of-concept studies
.
Pharmaceutical Statistics
15
,
28
36
.

Neuenschwander,
B
,
Wandel,
S
,
Roychoudhury,
S
and
Bailey,
S}.
(
2016
).
Robust exchangeability designs for early phase clinical trials with multiple strata
.
Pharmaceutical Statistics
15
,
123
134
.

Pardo,
L.
(
2005
).
Statistical Inference Based on Divergence Measures
.
Statistics: A Series of Textbooks and Monographs
.
New York
:
CRC Press
.

Psioda,
M. A.
,
Xu,
J.
,
Jiang,
Q.
,
Ke,
C.
,
Yang,
Z.
and
Ibrahim,
J. G.
(
2021
).
Bayesian adaptive basket trial design using model averaging
.
Biostatistics
.
22
, 19–34.

Renfro,
L. A.
and
Mandrekar,
S.J.
(
2018
).
Definitions and statistical properties of master protocols for personalized medicine in oncology
.
Journal of Biopharmaceutical Statistics
28
,
217
228
.

Renfro,
L. A.
and
Sargent,
D. J.
(
2017
).
Statistical controversies in clinical research: basket trials, umbrella trials, and other master protocols: a review and examples
.
Annals of Oncology
28
,
34
43
.

Schork,
N. J.
(
2015
).
Personalized medicine: time for one-person trials
.
Nature
520
,
609
611
.

Schwartz,
L. H.
,
Litière,
S.
and
de Vries,
E.
and others. (
2016
). RECIST 1.1—update and clarification: from the RECIST committee.
European Journal of Cancer
62
,
132
137
.

Simon,
Richard
,
Geyer,
Susan
,
Subramanian,
Jyothi
and
Roychowdhury,
Sameek}.
(
2016
). The Bayesian basket design for genomic variant-driven phase II trials.
Seminars in Oncology
43
,
13
18
.

Spiegelhalter,
D. J.
,
Freedman,
L. S.
and
Parmar,
M. K. B.
(
1994
).
Bayesian approaches to randomized trials
.
Journal of the Royal Statistical Society. Series A (Statistics in Society)
157
,
357
416
.

Thall,
P. F.
and
Wathen,
J. K.
(
2008
).
Bayesian designs to account for patient heterogeneity in phase II clinical trials
.
Current Opinion in Oncology
20
,
407
411
.

Thall,
P. F.
,
Wathen,
J. K.
,
Bekele,
B. N.
,
Champlin,
R. E.
,
Baker,
L. H.
and
Benjamin,
R. S.
(
2003
).
Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes
.
Statistics in Medicine
22
,
763
780
.

Tournoux,
C.
,
Rycke,
Y. D.
,
Médioni,
J.
and
Asselain,
B.
(
2007
). Methods of joint evaluation of efficacy and toxicity in phase II clinical trials.
Contemporary Clinical Trials
28
,
514
524
.

Woodcock,
J.
and
LaVange,
L. M.
(
2017
).
Master protocols to study multiple therapies, multiple diseases, or both
.
New England Journal of Medicine
377
,
62
70
.

Zheng,
H.
,
Hampson,
L. V.
and
Wandel,
S.
(
2020
).
A robust Bayesian meta-analytic approach to incorporate animal data into phase I oncology trials
.
Statistical Methods in Medical Research
29
,
94
110
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data