Testing for similarity of binary efficacy–toxicity responses Free

Different scenarios corresponding to the null hypothesis (3.7) and the alternative (3.8)

	\|$\theta_1$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|
Alternative	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|
Null hypothesis	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|

	\|$\theta_1$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|
Alternative	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|
Null hypothesis	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|

Table 1.

Different scenarios corresponding to the null hypothesis (3.7) and the alternative (3.8)

	\|$\theta_1$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|
Alternative	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|
Null hypothesis	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|

	\|$\theta_1$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|
Alternative	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|
Null hypothesis	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|
	\|$(-1,2,-3,3,\nu_1)$\|	\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|

For the Type I error rate simulations, we counted the number of individual and simultaneous rejections of both null hypotheses in (3.9) and (3.10), allowing us to reject the global null hypothesis in (3.7). All simulation results are displayed in Tables 2 and 3, where the numbers in brackets correspond to the proportion of rejections for the individual tests on efficacy and toxicity. For the sake of brevity, we assume only two different margins |$\epsilon=(\epsilon^E,\epsilon^T)=(0.15,0.15)$| and |$(0.2,0.2)$|⁠. We observe that the global bootstrap test according to Algorithm 3.1 is rather conservative as the Type I error rates are very small. For example, for |$n_{\ell,i}=14$|⁠, |$\nu_1=\nu_2=1$| and |$\Delta=(\Delta^E,\Delta^T)=\epsilon=(0.2,0.2)$| the individual proportions of rejection are |$0.046$| for efficacy and |$0.058$| for toxicity, whereas the Type I error rate for the global test is |$0.001$|⁠, which is well below the nominal level. This is a common feature of the intersection union principle for the problem of testing equivalence in multivariate responses (Berger and Hsu, 1996). A similar conclusion holds for the high level of dependence, that is |$\nu_1=\nu_2=2.4$|⁠. Considering the same configuration as above, that is |$n_{\ell,i}=14$| and |$\Delta=\epsilon=(0.2,0.2)$|⁠, the individual proportions of rejection are |$0.088$| for efficacy and |$0.089$| for toxicity, whereas the Type I error rate for the global test is |$0.002$|⁠.

Table 2.

Simulated Type I error rates of the global bootstrap test (3.11) for two different choices of |$\nu_\ell,\ \ell=1,2$|

\|$\epsilon=(\epsilon^E,\epsilon^T)$\|	\|$n_{\ell,i}$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|	\|$\nu_\ell=1$\|	\|$\nu_\ell=2.4$\|
\|$(0.15,0.15)$\|	\|$7$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.001 (0.063/0.074)	0.006 (0.078/0.064)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.008 (0.122/0.060)	0.012 (0.112/0.075)
	\|$14$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.003 (0.040/0.047)	0.003 (0.082/0.065)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.001 (0.207/0.052)	0.020 (0.230/0.068)
	\|$21$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.026/0.046)	0.002 (0.051/0.057)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.016 (0.325/0.041)	0.029 (0.326/0.084)
	\|$28$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.049/0.053)	0.004 (0.125/0.090)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.032 (0.476/0.058)	0.034 (0.455/0.076)
	\|$50$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.035/0.078)	0.012 (0.210/0.084)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.074 (0.827/0.085)	0.089 (0.815/0.111)
\|$(0.2,0.2)$\|	\|$7$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.004 (0.061/0.063)	0.006 (0.091/0.101)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.012 (0.218/0.055)	0.019 (0.233/0.084)
	\|$14$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.001 (0.046/0.058)	0.002 (0.088/0.089)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.024 (0.396/0.067)	0.027 (0.442/0.065)
	\|$21$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.003 (0.048/0.070)	0.003 (0.090/0.087)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.033 (0.672/0.051)	0.040 (0.648/0.070)
	\|$28$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.003 (0.069/0.072)	0.004 (0.124/0.077)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.050 (0.813/0.065)	0.068 (0.870/0.078)
	\|$50$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.004 (0.054/0.076)	0.003 (0.145/0.103)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.060 (0.982/0.061)	0.127 (0.986/0.132)

\|$\epsilon=(\epsilon^E,\epsilon^T)$\|	\|$n_{\ell,i}$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|	\|$\nu_\ell=1$\|	\|$\nu_\ell=2.4$\|
\|$(0.15,0.15)$\|	\|$7$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.001 (0.063/0.074)	0.006 (0.078/0.064)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.008 (0.122/0.060)	0.012 (0.112/0.075)
	\|$14$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.003 (0.040/0.047)	0.003 (0.082/0.065)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.001 (0.207/0.052)	0.020 (0.230/0.068)
	\|$21$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.026/0.046)	0.002 (0.051/0.057)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.016 (0.325/0.041)	0.029 (0.326/0.084)
	\|$28$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.049/0.053)	0.004 (0.125/0.090)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.032 (0.476/0.058)	0.034 (0.455/0.076)
	\|$50$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.035/0.078)	0.012 (0.210/0.084)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.074 (0.827/0.085)	0.089 (0.815/0.111)
\|$(0.2,0.2)$\|	\|$7$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.004 (0.061/0.063)	0.006 (0.091/0.101)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.012 (0.218/0.055)	0.019 (0.233/0.084)
	\|$14$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.001 (0.046/0.058)	0.002 (0.088/0.089)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.024 (0.396/0.067)	0.027 (0.442/0.065)
	\|$21$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.003 (0.048/0.070)	0.003 (0.090/0.087)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.033 (0.672/0.051)	0.040 (0.648/0.070)
	\|$28$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.003 (0.069/0.072)	0.004 (0.124/0.077)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.050 (0.813/0.065)	0.068 (0.870/0.078)
	\|$50$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.004 (0.054/0.076)	0.003 (0.145/0.103)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.060 (0.982/0.061)	0.127 (0.986/0.132)

The numbers in brackets show the proportion of rejections for the individual tests according to the hypotheses (3.9) and (3.10).

Table 2.

Simulated Type I error rates of the global bootstrap test (3.11) for two different choices of |$\nu_\ell,\ \ell=1,2$|

\|$\epsilon=(\epsilon^E,\epsilon^T)$\|	\|$n_{\ell,i}$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|	\|$\nu_\ell=1$\|	\|$\nu_\ell=2.4$\|
\|$(0.15,0.15)$\|	\|$7$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.001 (0.063/0.074)	0.006 (0.078/0.064)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.008 (0.122/0.060)	0.012 (0.112/0.075)
	\|$14$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.003 (0.040/0.047)	0.003 (0.082/0.065)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.001 (0.207/0.052)	0.020 (0.230/0.068)
	\|$21$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.026/0.046)	0.002 (0.051/0.057)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.016 (0.325/0.041)	0.029 (0.326/0.084)
	\|$28$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.049/0.053)	0.004 (0.125/0.090)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.032 (0.476/0.058)	0.034 (0.455/0.076)
	\|$50$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.035/0.078)	0.012 (0.210/0.084)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.074 (0.827/0.085)	0.089 (0.815/0.111)
\|$(0.2,0.2)$\|	\|$7$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.004 (0.061/0.063)	0.006 (0.091/0.101)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.012 (0.218/0.055)	0.019 (0.233/0.084)
	\|$14$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.001 (0.046/0.058)	0.002 (0.088/0.089)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.024 (0.396/0.067)	0.027 (0.442/0.065)
	\|$21$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.003 (0.048/0.070)	0.003 (0.090/0.087)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.033 (0.672/0.051)	0.040 (0.648/0.070)
	\|$28$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.003 (0.069/0.072)	0.004 (0.124/0.077)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.050 (0.813/0.065)	0.068 (0.870/0.078)
	\|$50$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.004 (0.054/0.076)	0.003 (0.145/0.103)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.060 (0.982/0.061)	0.127 (0.986/0.132)

\|$\epsilon=(\epsilon^E,\epsilon^T)$\|	\|$n_{\ell,i}$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|	\|$\nu_\ell=1$\|	\|$\nu_\ell=2.4$\|
\|$(0.15,0.15)$\|	\|$7$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.001 (0.063/0.074)	0.006 (0.078/0.064)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.008 (0.122/0.060)	0.012 (0.112/0.075)
	\|$14$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.003 (0.040/0.047)	0.003 (0.082/0.065)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.001 (0.207/0.052)	0.020 (0.230/0.068)
	\|$21$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.026/0.046)	0.002 (0.051/0.057)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.016 (0.325/0.041)	0.029 (0.326/0.084)
	\|$28$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.049/0.053)	0.004 (0.125/0.090)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.032 (0.476/0.058)	0.034 (0.455/0.076)
	\|$50$\|	\|$(-2,3.4,-2,2.5,\nu_2)$\|	\|$(0.15,0.15)$\|	0.000 (0.035/0.078)	0.012 (0.210/0.084)
		\|$(-1,2,-2,2.5,\nu_2)$\|	\|$(0,0.15)$\|	0.074 (0.827/0.085)	0.089 (0.815/0.111)
\|$(0.2,0.2)$\|	\|$7$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.004 (0.061/0.063)	0.006 (0.091/0.101)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.012 (0.218/0.055)	0.019 (0.233/0.084)
	\|$14$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.001 (0.046/0.058)	0.002 (0.088/0.089)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.024 (0.396/0.067)	0.027 (0.442/0.065)
	\|$21$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.003 (0.048/0.070)	0.003 (0.090/0.087)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.033 (0.672/0.051)	0.040 (0.648/0.070)
	\|$28$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.003 (0.069/0.072)	0.004 (0.124/0.077)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.050 (0.813/0.065)	0.068 (0.870/0.078)
	\|$50$\|	\|$(-2.4,3.4,-1.8,2.5,\nu_2)$\|	\|$(0.2,0.2)$\|	0.004 (0.054/0.076)	0.003 (0.145/0.103)
		\|$(-1,2,-1.8,2.5,\nu_2)$\|	\|$(0,0.2)$\|	0.060 (0.982/0.061)	0.127 (0.986/0.132)

The numbers in brackets show the proportion of rejections for the individual tests according to the hypotheses (3.9) and (3.10).

Table 3.

Simulated power of the global bootstrap test (3.11) for two different choices of |$\nu_\ell,\ \ell=1,2$|

\|$\epsilon=(\epsilon^E,\epsilon^T)$\|	\|$n_{\ell,i}$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|	\|$\nu_\ell=1$\|	\|$\nu_\ell=2.4$\|
\|$(0.15,0.15)$\|	\|$7$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.009 (0.092/0.125)	0.007 (0.089/0.125)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	(0.05,0.05)	0.009 (0.129/0.108)	0.010 (0.114/0.116)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.002 (0.128/0.133)	0.018 (0.153/0.121)
	\|$14$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.008 (0.105/0.102)	0.014 (0.119/0.104)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.031 (0.176/0.146)	0.042 (0.183/0.172)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.035 (0.196/0.162)	0.045 (0.209/0.214)
	\|$21$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.020 (0.145/0.150)	0.025 (0.145/0.155)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.051 (0.288/0.201)	0.075 (0.242/0.254)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.085 (0.345/0.265)	0.077 (0.309/0.269)
	\|$28$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.038 (0.185/0.166)	0.057 (0.137/0.189)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.098 (0.387/0.266)	0.121 (0.356/0.313)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.201 (0.484/0.385)	0.202 (0.453/0.403)
	\|$50$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.066 (0.295/0.263)	0.106 (0.239/0.234)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.318 (0.624/0.484)	0.326 (0.565/0.527)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.566 (0.842/0.656)	0.581 (0.827/0.686)
\|$(0.2,0.2)$\|	\|$7$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.018 (0.133/0.140)	0.029 (0.159/0.129)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	(0.05,0.05)	0.027 (0.159/0.151)	0.032 (0.213/0.155)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.026 (0.183/0.189)	0.049 (0.221/0.191)
	\|$14$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.063 (0.277/0.210)	0.076 (0.278/0.230)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.112 (0.352/0.299)	0.099 (0.335/0.282)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.124 (0.409/0.300)	0.171 (0.451/0.356)
	\|$21$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.119 (0.369/0.310)	0.142 (0.343/0.321)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.243 (0.585/0.388)	0.254 (0.527/0.416)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.328 (0.658/0.505)	0.322 (0.593/0.536)
	\|$28$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.177 (0.468/0.348)	0.212 (0.429/0.418)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.445 (0.716/0.608)	0.472 (0.688/0.622)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.541 (0.816/0.660)	0.581 (0.822/0.705)
	\|$50$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.404 (0.717/0.543)	0.437 (0.653/0.602)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.740 (0.933/0.783)	0.765 (0.897/0.836)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.900 (0.987/0.914)	0.933 (0.985/0.945)

\|$\epsilon=(\epsilon^E,\epsilon^T)$\|	\|$n_{\ell,i}$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|	\|$\nu_\ell=1$\|	\|$\nu_\ell=2.4$\|
\|$(0.15,0.15)$\|	\|$7$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.009 (0.092/0.125)	0.007 (0.089/0.125)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	(0.05,0.05)	0.009 (0.129/0.108)	0.010 (0.114/0.116)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.002 (0.128/0.133)	0.018 (0.153/0.121)
	\|$14$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.008 (0.105/0.102)	0.014 (0.119/0.104)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.031 (0.176/0.146)	0.042 (0.183/0.172)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.035 (0.196/0.162)	0.045 (0.209/0.214)
	\|$21$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.020 (0.145/0.150)	0.025 (0.145/0.155)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.051 (0.288/0.201)	0.075 (0.242/0.254)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.085 (0.345/0.265)	0.077 (0.309/0.269)
	\|$28$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.038 (0.185/0.166)	0.057 (0.137/0.189)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.098 (0.387/0.266)	0.121 (0.356/0.313)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.201 (0.484/0.385)	0.202 (0.453/0.403)
	\|$50$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.066 (0.295/0.263)	0.106 (0.239/0.234)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.318 (0.624/0.484)	0.326 (0.565/0.527)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.566 (0.842/0.656)	0.581 (0.827/0.686)
\|$(0.2,0.2)$\|	\|$7$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.018 (0.133/0.140)	0.029 (0.159/0.129)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	(0.05,0.05)	0.027 (0.159/0.151)	0.032 (0.213/0.155)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.026 (0.183/0.189)	0.049 (0.221/0.191)
	\|$14$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.063 (0.277/0.210)	0.076 (0.278/0.230)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.112 (0.352/0.299)	0.099 (0.335/0.282)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.124 (0.409/0.300)	0.171 (0.451/0.356)
	\|$21$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.119 (0.369/0.310)	0.142 (0.343/0.321)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.243 (0.585/0.388)	0.254 (0.527/0.416)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.328 (0.658/0.505)	0.322 (0.593/0.536)
	\|$28$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.177 (0.468/0.348)	0.212 (0.429/0.418)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.445 (0.716/0.608)	0.472 (0.688/0.622)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.541 (0.816/0.660)	0.581 (0.822/0.705)
	\|$50$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.404 (0.717/0.543)	0.437 (0.653/0.602)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.740 (0.933/0.783)	0.765 (0.897/0.836)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.900 (0.987/0.914)	0.933 (0.985/0.945)

The numbers in brackets show the proportion of rejections for the individual tests according to the hypotheses (3.9) and (3.10).

Table 3.

Open in new tab Download slide

Simulated power of the global bootstrap test (3.11) for two different choices of |$\nu_\ell,\ \ell=1,2$|

\|$\epsilon=(\epsilon^E,\epsilon^T)$\|	\|$n_{\ell,i}$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|	\|$\nu_\ell=1$\|	\|$\nu_\ell=2.4$\|
\|$(0.15,0.15)$\|	\|$7$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.009 (0.092/0.125)	0.007 (0.089/0.125)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	(0.05,0.05)	0.009 (0.129/0.108)	0.010 (0.114/0.116)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.002 (0.128/0.133)	0.018 (0.153/0.121)
	\|$14$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.008 (0.105/0.102)	0.014 (0.119/0.104)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.031 (0.176/0.146)	0.042 (0.183/0.172)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.035 (0.196/0.162)	0.045 (0.209/0.214)
	\|$21$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.020 (0.145/0.150)	0.025 (0.145/0.155)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.051 (0.288/0.201)	0.075 (0.242/0.254)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.085 (0.345/0.265)	0.077 (0.309/0.269)
	\|$28$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.038 (0.185/0.166)	0.057 (0.137/0.189)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.098 (0.387/0.266)	0.121 (0.356/0.313)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.201 (0.484/0.385)	0.202 (0.453/0.403)
	\|$50$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.066 (0.295/0.263)	0.106 (0.239/0.234)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.318 (0.624/0.484)	0.326 (0.565/0.527)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.566 (0.842/0.656)	0.581 (0.827/0.686)
\|$(0.2,0.2)$\|	\|$7$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.018 (0.133/0.140)	0.029 (0.159/0.129)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	(0.05,0.05)	0.027 (0.159/0.151)	0.032 (0.213/0.155)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.026 (0.183/0.189)	0.049 (0.221/0.191)
	\|$14$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.063 (0.277/0.210)	0.076 (0.278/0.230)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.112 (0.352/0.299)	0.099 (0.335/0.282)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.124 (0.409/0.300)	0.171 (0.451/0.356)
	\|$21$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.119 (0.369/0.310)	0.142 (0.343/0.321)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.243 (0.585/0.388)	0.254 (0.527/0.416)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.328 (0.658/0.505)	0.322 (0.593/0.536)
	\|$28$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.177 (0.468/0.348)	0.212 (0.429/0.418)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.445 (0.716/0.608)	0.472 (0.688/0.622)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.541 (0.816/0.660)	0.581 (0.822/0.705)
	\|$50$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.404 (0.717/0.543)	0.437 (0.653/0.602)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.740 (0.933/0.783)	0.765 (0.897/0.836)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.900 (0.987/0.914)	0.933 (0.985/0.945)

\|$\epsilon=(\epsilon^E,\epsilon^T)$\|	\|$n_{\ell,i}$\|	\|$\theta_2$\|	\|$\Delta=(\Delta^E,\Delta^T)$\|	\|$\nu_\ell=1$\|	\|$\nu_\ell=2.4$\|
\|$(0.15,0.15)$\|	\|$7$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.009 (0.092/0.125)	0.007 (0.089/0.125)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	(0.05,0.05)	0.009 (0.129/0.108)	0.010 (0.114/0.116)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.002 (0.128/0.133)	0.018 (0.153/0.121)
	\|$14$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.008 (0.105/0.102)	0.014 (0.119/0.104)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.031 (0.176/0.146)	0.042 (0.183/0.172)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.035 (0.196/0.162)	0.045 (0.209/0.214)
	\|$21$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.020 (0.145/0.150)	0.025 (0.145/0.155)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.051 (0.288/0.201)	0.075 (0.242/0.254)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.085 (0.345/0.265)	0.077 (0.309/0.269)
	\|$28$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.038 (0.185/0.166)	0.057 (0.137/0.189)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.098 (0.387/0.266)	0.121 (0.356/0.313)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.201 (0.484/0.385)	0.202 (0.453/0.403)
	\|$50$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.066 (0.295/0.263)	0.106 (0.239/0.234)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.318 (0.624/0.484)	0.326 (0.565/0.527)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.566 (0.842/0.656)	0.581 (0.827/0.686)
\|$(0.2,0.2)$\|	\|$7$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.018 (0.133/0.140)	0.029 (0.159/0.129)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	(0.05,0.05)	0.027 (0.159/0.151)	0.032 (0.213/0.155)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.026 (0.183/0.189)	0.049 (0.221/0.191)
	\|$14$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.063 (0.277/0.210)	0.076 (0.278/0.230)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.112 (0.352/0.299)	0.099 (0.335/0.282)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.124 (0.409/0.300)	0.171 (0.451/0.356)
	\|$21$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.119 (0.369/0.310)	0.142 (0.343/0.321)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.243 (0.585/0.388)	0.254 (0.527/0.416)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.328 (0.658/0.505)	0.322 (0.593/0.536)
	\|$28$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.177 (0.468/0.348)	0.212 (0.429/0.418)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.445 (0.716/0.608)	0.472 (0.688/0.622)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.541 (0.816/0.660)	0.581 (0.822/0.705)
	\|$50$\|	\|$(-1.5,2.2,-3.6,3.2,\nu_2)$\|	\|$(0.1,0.1)$\|	0.404 (0.717/0.543)	0.437 (0.653/0.602)
		\|$(-1.2,2,-3.3,3.1,\nu_2)$\|	\|$(0.05,0.05)$\|	0.740 (0.933/0.783)	0.765 (0.897/0.836)
		\|$(-1,2,-3,3,\nu_2)$\|	\|$(0,0)$\|	0.900 (0.987/0.914)	0.933 (0.985/0.945)

The numbers in brackets show the proportion of rejections for the individual tests according to the hypotheses (3.9) and (3.10).

In general, we conclude that for a low level of dependence the individual tests on efficacy and toxicity yield rejection probabilities that are close to |$0.05$| when simulating on the margin of the global null hypothesis (that is |$\Delta=\epsilon$|⁠) and hence the global Type I error rates are well below |$\alpha$| in these cases. However, for a high level of correlation, that is |$\nu_1=\nu_2=2.4$|⁠, there are a few scenarios where the Type I error rate is too large. For instance, we observe the largest proportion of rejections of the global null hypothesis given by |$0.127$| for |$n_{\ell,i}=50$|⁠, |$\epsilon=(0.2,0.2)$| and |$\Delta=(0,0.2)$|⁠. Considering the same configurations but |$\epsilon=(0.15,0.15)$|⁠, yields a proportion of |$0.089$|⁠, which is lower but still above the desired value of |$0.05$|⁠. For all other scenarios, the Type I error of the global test is well below the nominal level. The size of the parameter |$\nu_\ell$| affects the precision of the estimates for the parameter |$\theta_\ell$| of the Gumbel model, which explains the different results for the rather low correlations corresponding to |$\nu_\ell=1$| and the high correlations obtained for |$\nu_\ell=2.4$|⁠, |$\ell=1,2$|⁠. In other words, a high correlation makes the estimation of the curves more difficult, even for large sample sizes. A more detailed discussion, including a table presenting the bias of the estimates for some configurations, can be found in Supplementary Section S3 available at Biostatistics online.

The simulated power is shown in Table 3. It turns out that the global test achieves reasonable power for sufficiently large sample sizes. For example, a maximum power (always attained at |$\Delta=(0,0)$|⁠) of |$0.933$| is achieved for the global test for a choice of |$n_{\ell,i}=50$|⁠, |$\nu_1=\nu_2=2.4,$| and |$\epsilon=(0.2,0.2)$|⁠. For a lower margin, that is, |$\epsilon=(0.15,0.15)$|⁠, the maximum power is smaller, but still increasing with growing sample sizes, reaching for instance |$0.581$| for |$n_{\ell,i}=50$| and |$\nu_1=\nu_2=2.4$|⁠. The same statement holds for a lower correlation of |$\nu_\ell=1,\ \ell=1,2$|⁠. For example, considering |$n_{\ell,i}=28$|⁠, |$\nu_1=\nu_2=1$| and |$\epsilon=(0.2,0.2)$|⁠, we observe a maximum power of |$0.541$|⁠.

5. Case study

To illustrate the proposed methodology, we consider an example that is inspired by a recent consulting project of one of the authors. A nonsteroidal anti-inflammatory drug is to be investigated for its ability to attenuate dental pain after the removal of two or more impacted third molar teeth. Dental pain is a common and inexpensive setting for analgesic proof of concept, recruitment being fast and the outcome being measured within a few hours. It is common to measure the pain intensity on an ordinal scale at baseline and several times after the administration of a single dose. The pain intensity difference from baseline, averaged over several hours after drug administration, may then be compared with a clinical relevance threshold to create a binary success variable for efficacy. In this particular setting, side effects such as nausea and sedation after dosing were anticipated, resulting in a binary toxicity variable whether the patient experienced any such adverse events. As approved analgesics with an identified dosing range and a known dose-response relationship for tolerability are available, the objective of the study at hand was to demonstrate similarity with a marketed product for the bivariate efficacy–toxicity outcome in a proof of concept setting.

This was a randomized double-blind multi-regional parallel group clinical trial with a total of 300 patients being allocated to either placebo or one of four active doses coded as |$0.05$|⁠, |$0.20$|⁠, |$0.50$|⁠, and |$1$| (for the investigational drug) and |$0.10$|⁠, |$0.30$|⁠, |$0.60$|⁠, and |$1$| (for the marketed product), resulting in |$n = 30$| per group (assuming equal allocation). To maintain confidentiality, the actual doses have been scaled to lie within the |$[0, 1]$| interval. Since the study has not been completed yet, we use a hypothetical data set to illustrate the proposed methodology.

This trial included half of the patients each from Western and Eastern Europe. Prior investigations suggested that the differences across both geographic regions are negligible. We thus compare the efficacy and toxicity data of the |$150$| patients randomized to the marketed drug across both regions. For this purpose, we fit two Gumbel models as defined in Section 3.1 to the data, one for the |$75$| patients from Western Europe (⁠|$\ell=1$|⁠) and one for the |$75$| patients from Eastern Europe (⁠|$\ell=2$|⁠). The estimated model parameters are

$$\begin{eqnarray}\label{gumbel_case_pre} \hat\theta_1=(-0.938,2.145,-2.284,1.689,0.498) ,\ \hat\theta_2=(-1.012,2.388,-2.728,1.910,-0.475), \end{eqnarray}$$

(5.1)

Fig. 2.

(a) Efficacy and toxicity curves corresponding to the fitted Gumbel models (5.1) for the hypothetical data described in Section 5. The solid (efficacy) and the dashed line (toxicity) correspond to the patients from Western Europe, the dotted (efficacy) and the dotted-dashed (toxicity) to those from Eastern Europe. (b) Efficacy and toxicity curves under the assumption of shared placebo parameters. Here the solid (efficacy) and the dashed line (toxicity) correspond to the marketed product, the dotted (efficacy) and the dotted-dashed (toxicity) to the investigational drug, respectively. The arrows indicate the maximum absolute distances.

Using |$n_{boot}=1000$| bootstrap replications, we obtain critical values |$q_{0.05}^E=0.061$| and |$q_{0.05}^T=0.056$| and test the global null hypothesis (3.7) against the alternative (3.8). Since |$\hat \Delta^E=0.029< 0.061=\hat q_{0.05}^E$| and |$\hat \Delta^T=0.051< 0.056=\hat q_{0.05}^T$|⁠, we can reject (3.7) at level |$\alpha = 0.05$|⁠. This conclusion can also be drawn by directly considering the |$p$|-values obtained from the empirical distribution functions of the bootstrap sample according to Step (iii) of Algorithm 2.1. In general, we reject the null hypothesis (3.7) at level |$\alpha$| if the maximum of the two individual |$p$|-values for (3.9) and (3.10) is smaller than or equal to |$\alpha$|⁠. Since the individual |$p$|-values are given by |$\hat F_{n_{boot}}^E(\hat \Delta^E)=0.015$| and |$\hat F_{n_{boot}}^T(\hat \Delta^T)=0.042$|⁠, we have |$\max{(0.015,0.042)}=0.042 < 0.05 = \alpha$| and can reject the null hypothesis (3.7), thus concluding similarity of efficacy and toxicity across the two geographic regions.

Based on this result, it is therefore reasonable to proceed with a further analysis of this trial using the data pooled from both regions. We now compare the investigational drug with the marketed product across all dose levels for the bivariate efficacy–toxicity outcomes of all |$300$| patients randomized into the study. For this analysis, it is reasonable to assume that the placebo effect is the same for both products with regard to efficacy and toxicity and to investigate the question of similarity under the assumption of shared placebo parameters as described in Section 3.4. More precisely we assume that |$\beta_{1,1}=\beta_{2,1}$| and |$\beta_{1,2}=\beta_{2,2}$|⁠, which reduces the number of parameters to be estimated from |$10$| to |$8$|⁠. The parameter estimates are |$\hat\theta_1=(-1.250, 2.661, -2.299, 1.564, -0.066)$| and |$\hat\theta_2=(-1.250, 2.481, -2.299, 1.453, 0.941)$|⁠, see Figure 2(b) for the corresponding efficacy and toxicity curves. The maximum distances are now given by |$\hat \Delta^E=0.031$| and |$\hat \Delta^T=0.024$|⁠, attained at the dose levels |$0.86$| and |$1$|⁠, respectively. We perform the test at a significance level of |$\alpha=0.05$| and generate bootstrap data under the assumption of common placebo parameters. The critical values are now given by |$q_{0.05}^E=0.060$| and |$q_{0.05}^T=0.035$| and hence we conclude that the null hypothesis (3.7) can be rejected as |$\hat \Delta^E=0.031< \hat q_{0.05}^E=0.06$| and |$\hat \Delta^T=0.024< \hat q_{0.05}^T=0.035$|⁠. The |$p$|-values are given by |$\hat F_{n_{boot}}^E(\hat \Delta^E)=0.021$| and |$\hat F_{n_{boot}}^T(\hat \Delta^T)=0.031$|⁠, respectively.

Finally, we note that fitting separate models as shown above also implies that the dependence parameter is allowed to differ between the two drugs. Such an approach seems sensible in practice as it would be hard to justify clinically that the dependence parameter is the same, unless the two products are from the same chemical class or have a common mode of action. If for a given problem at hand it can be argued in favor of a shared dependence parameter then the methods in this article can be extended following Möllenhoff and others (2020).

6. Conclusions and discussion

In the first part of this article, we investigated a single efficacy response given by a binary outcome and derived a test procedure for the similarity of the corresponding dose-response curves, which can be modeled, for instance, by a parametric logistic regression or a probit model. We developed a parametric bootstrap test and decide for similarity if the maximum deviation between the estimated dose-response profiles is sufficiently small. We also considered the situation of an additional second toxicity endpoint to model the joint efficacy–toxicity responses. For this purpose we assumed efficacy and toxicity to be observed simultaneously resulting in bivariate (correlated) binary outcomes and used a Gumbel model to fit the data. The bootstrap test was extended to this situation by combining two individual tests through the intersection union principle. In the second part of this article, we investigated the operating characteristics by means of an extensive simulation study. The choice of the margin |$\epsilon$| measuring the degree of similarity has a major impact on the performance of the test. The explicit choice has to be made on an individual basis and under consideration of clinical experts. We demonstrated that the resulting procedures control their level in most of the configurations and achieve reasonable power. However, for a high level of dependence between the efficacy and the toxicity outcome we observed a slight inflation of the Type I error in some few scenarios. This can be explained by the difficulty in estimating the model parameters with sufficient precision for large correlations: Increasing correlations severely increases the bias of the estimates and hence affects the performance of the test. We provide a more detailed discussion in the Supplementary Material available at Biostatistics online.

In this article, we used a Gumbel-type copula to model the dependency of bivariate binary outcomes. In the Supplementary Material available at Biostatistics online, we demonstrate that the methodology is easily applicable to other copula models. Moreover, we also investigate the sensitivity of our approach with respect to the choice of the copula by means of a simulation study and demonstrate that the approach is remarkably robust. A heuristic explanation for this observation is that parametric copula models provide some flexibility for modeling different dependencies by choosing different parameters. Therefore, a given dependency structure can often be reasonably well modeled by different copula models choosing appropriate parameters. A similar observation was also made in Dette and others (2014) in the context of copula-based regression models.

The methods proposed in this article are broadly applicable whenever binary efficacy and toxicity responses are compared. These groups can be, for example, different populations or treatments. The methodology can also be extended to models with shared parameters, such as a common placebo effect. A standard application for the latter is the comparison of a new with an old formulation in the development of a generic product because doses are immediately comparable. Our approach is different to the standard bioequivalence assessment based on pharmacokinetic (PK) parameters, such as the area under the curve or the maximum concentration. One reviewer argued that the PK is often linear in dose meaning that a factor on the “vertical” outcome scale can be translated to a factor on the “horizontal” dose scale and this implies that two dilutions of the same drug can only be bioequivalent if the concentrations are very close to each other. With the suggested approach in this article, equivalence, and therefore similarity, is based on small absolute differences on the “vertical scale” (recall (3.7)). This means that drugs are similar if the dose range only covers low doses or, as an alternative formulation, a low dose of a drug is similar to placebo in this metric. In clinical applications, however, the dose range should be chosen sufficient large (including high doses) such that a relevant difference to placebo can be detected.

In some settings, the efficacy or toxicity responses are not modeled by binary outcomes, but rather by a continuous response. In case of two continuous outcomes, Fedorov and Wu (2007) considered normally distributed correlated responses which are dichotomized due to binary utility and the methodology proposed in this article can be adapted to the situation considered by these authors. A further interesting situation occurs in case of mixed outcomes, where one of the response variables is continuous and the other a binary one. Modeling these types of responses is still a challenging problem. Tao and others (2013) investigated this situation by modeling these multiple endpoints by a joint model constructed with archimedean copula. A test approach corresponding to these types of outcomes is an interesting topic which we leave for future research.

7. Software

Software in the form of R code together with a sample input data set and complete documentation is available online at https://github.com/kathrinmoellenhoff/Efficacy_Toxicity.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Acknowledgments

The authors would like to thank two referees and the Associate Editor for their useful suggestions which greatly improved the manuscript. Conflict of Interest: None declared.

Funding

The authors gratefully acknowledge financial support by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Teilprojekt T1) of the German Research Foundation (DFG).

References

Amatya,

A.

and

Demirtas,

H.

(

2015

).

Multiord: An r package for generating correlated ordinal data

.

Communications in Statistics-Simulation and Computation

44

,

1683

–

1691

.

Berger,

R. L.

and

Hsu,

J. C.

(

1996

).

Bioequivalence trials, intersection-union tests and equivalence confidence sets

.

Statistical Science

11

,

283

–

319

.

Berger,

R. L.

(

1982

).

Multiparameter hypothesis testing and acceptance sampling

.

Technometrics

24

,

295

–

300

.

Bradley,

R.

and

Gart,

J.

(

1962

).

The asymptotic properties of ml estimators when sampling from associated populations

.

Biometrika

49

,

205

–

214

.

Bretz,

F.

,

Möllenhoff,

K.

,

Dette,

H.

,

Liu,

W.

and

Trampisch,

M.

(

2018

).

Assessing the similarity of dose response and target doses in two non-overlapping subgroups

.

Statistics in Medicine

37

,

722

–

738

.

Bretz,

F.

,

Pinheiro,

J. C.

and

Branson,

M.

(

2005

).

Combining multiple comparisons and modeling techniques in dose-response studies

.

Biometrics

61

,

738

–

748

.

Deldossi,

L.

,

Osmetti,

S. A.

and

Tommasi,

C.

(

2019

).

Optimal design to discriminate between rival copula models for a bivariate binary response

.

TEST

28

,

147

–

165

.

Dette,

H.

,

Möllenhoff,

K.

,

Volgushev,

S.

and

Bretz,

F.

(

2018

).

Equivalence of regression curves

.

Journal of the American Statistical Association

113

,

711

–

729

.

Dette,

H.

,

Van Hecke,

R.

and

Volgushev,

S.

(

2014

).

Some comments on copula-based regression

.

Journal of the American Statistical Association

109

,

1319

–

1324

.

Devroye,

L.

(

1986

).

Sample-based non-uniform random variate generation

. In:

Proceedings of the 18th Conference on Winter simulation

.

Washington, DC, USA

, pp.

260

–

265

.

Dragalin,

V.

and

Fedorov,

V.

(

2006

).

Adaptive designs for dose-finding based on efficacy–toxicity response

.

Journal of Statistical Planning and Inference

136

,

1800

–

1823

.

Emrich,

L.

and

Piedmonte,

M.

(

1991

).

A method for generating high-dimensional multivariate binary variates

.

The American Statistician

45

,

302

–

304

.

Fedorov,

V.

and

Wu,

Y.

(

2007

).

Dose finding designs for continuous responses and binary utility

.

Journal of Biopharmaceutical Statistics

17

,

1085

–

1096

.

Gaydos,

B.

,

Krams,

M.

,

Perevozskaya,

I.

,

Bretz,

F.

,

Liu,

Q.

,

Gallo,

P.

,

Berry,

D.

,

Chuang-Stein,

C.

,

Pinheiro,

J.

and

Bedding,

A.

(

2006

).

Adaptive dose-response studies

.

Drug Information Journal

40

,

451

–

461

.

Glonek,

G.

and

McCullagh,

P.

(

1995

).

Multivariate logistic models

.

Journal of the Royal Statistical Society: Series B (Methodological)

57

,

533

–

546

.

Gsteiger,

S.

,

Bretz,

F.

and

Liu,

W.

(

2011

).

Simultaneous confidence bands for nonlinear regression models with application to population pharmacokinetic analyses

.

Journal of Biopharmaceutical Statistics

21

,

708

–

725

.

Gumbel,

E. J.

(

1961

).

Bivariate logistic distributions

.

Journal of the American Statistical Association

56

,

335

–

349

.

Heise,

M.

and

Myers,

R.

(

1996

).

Optimal designs for bivariate logistic regression

.

Biometrics

,

613

–

624

.

Jhee,

S. S.

,

Lyness,

W. H.

,

Rojas,

P. B.

,

Leibowitz,

M. T.

,

Zarotsky,

V.

and

Jacobsen,

L. V.

(

2004

).

Similarity of insulin detemir pharmacokinetics, safety, and tolerability profiles in healthy caucasian and Japanese American subjects

.

The Journal of Clinical Pharmacology

44

,

258

–

264

.

Leisch,

F.

,

Weingessel,

A.

and

Hornik,

K.

(

1998

).

On the generation of correlated artificial binary data

,

Working Papers SFB “Adaptive Information Systems and Modelling in Economics and Management Science

”,

13

.

Liu,

W.

,

Bretz,

F.

,

Hayter,

A. J.

and

Wynn,

H. P.

(

2009

).

Assessing non-superiority, non-inferiority or equivalence when comparing two regression models over a restricted covariate region

.

Biometrics

65

,

1279

–

1287

.

Möllenhoff,

K.

,

Bretz,

F.

and

Dette,

H.

(

2020

).

Equivalence of regression curves sharing common parameters

.

Biometrics

76

,

518

–

529

.

Möllenhoff,

K.

,

Dette,

H.

,

Kotzagiorgis,

E.

,

Volgushev,

S.

and

Collignon,

O.

(

2018

).

Regulatory assessment of drug dissolution profiles comparability via maximum deviation

.

Statistics in Medicine

37

,

2968

–

2981

.

Murtaugh,

P.

and

Fisher,

L.

(

1990

).

Bivariate binary models of efficacy and toxicity in dose-ranging trials

.

Communications in Statistics Theory and Methods

19

,

2003

–

2020

.