Flexible evaluation of surrogacy in platform studies Free

Mean and standard deviations of the median absolute prediction error based on leave-one-out CV, using the DPM model and the simple multivariate normal model (Simple). |$\hat{D}$| denotes the estimate based on the full data model, and |$D$| in comparison to the true treatment effect

Setting	DPM \|$\hat{D}$\|	Simple \|$\hat{D}$\|	DPM \|${D}$\|	Simple \|$D$\|
inter-0.0-0.0	0.32 (0.04)	0.82 (0.07)	0.29 (0.04)	0.81 (0.07)
inter-0.0-0.3	0.45 (0.05)	0.87 (0.08)	0.44 (0.04)	0.87 (0.08)
interhide-0.0-0.0	0.49 (0.07)	0.81 (0.08)	0.47 (0.07)	0.81 (0.08)
interhide-0.3-0.0	0.49 (0.06)	0.82 (0.09)	0.46 (0.07)	0.81 (0.09)
linear-0.0-0.0	0.25 (0.02)	0.41 (0.02)	0.23 (0.02)	0.41 (0.02)
linear-0.3-0.0	0.30 (0.02)	0.41 (0.02)	0.29 (0.02)	0.41 (0.02)
linear-0.3-0.3	0.41 (0.04)	0.50 (0.03)	0.41 (0.03)	0.49 (0.03)
manybiom-0.0-0.0	0.44 (0.05)	1.80 (0.12)	0.42 (0.05)	1.80 (0.12)
manybiom-0.3-0.0	0.42 (0.05)	1.82 (0.13)	0.40 (0.05)	1.82 (0.13)
manybiom-0.3-0.3	0.54 (0.06)	1.85 (0.13)	0.54 (0.06)	1.85 (0.13)
nonlinear-0.0-0.0	0.42 (0.04)	0.66 (0.05)	0.40 (0.04)	0.65 (0.05)
nonlinear-0.3-0.0	0.45 (0.04)	0.71 (0.06)	0.43 (0.04)	0.70 (0.06)
nonlinear-0.3-0.3	0.54 (0.06)	0.76 (0.07)	0.53 (0.06)	0.75 (0.06)
nonlinearskew-0.0-0.0	0.41 (0.05)	0.61 (0.05)	0.39 (0.05)	0.61 (0.05)
null-0.0-0.0	0.50 (0.15)	0.86 (0.04)	0.48 (0.16)	0.85 (0.04)
onetrt-0.0-0.0	0.37 (0.07)	0.89 (0.06)	0.34 (0.08)	0.89 (0.06)
onetrt-0.0-0.3	0.54 (0.07)	0.94 (0.06)	0.53 (0.07)	0.94 (0.06)
simple-0.0-0.0	0.26 (0.02)	0.42 (0.02)	0.24 (0.02)	0.41 (0.02)
simple-0.3-0.0	0.26 (0.02)	0.42 (0.02)	0.24 (0.02)	0.41 (0.02)
simple-0.3-0.3	0.38 (0.03)	0.50 (0.03)	0.37 (0.03)	0.49 (0.03)
simplestrong-0.0-0.0	0.46 (0.04)	1.16 (0.06)	0.45 (0.04)	1.16 (0.06)
twotrt-0.0-0.0	1.84 (0.99)	2.20 (0.35)	1.83 (1.00)	2.20 (0.35)
twotrt-0.3-0.0	1.67 (0.77)	2.11 (0.32)	1.67 (0.79)	2.11 (0.33)

Setting	DPM \|$\hat{D}$\|	Simple \|$\hat{D}$\|	DPM \|${D}$\|	Simple \|$D$\|
inter-0.0-0.0	0.32 (0.04)	0.82 (0.07)	0.29 (0.04)	0.81 (0.07)
inter-0.0-0.3	0.45 (0.05)	0.87 (0.08)	0.44 (0.04)	0.87 (0.08)
interhide-0.0-0.0	0.49 (0.07)	0.81 (0.08)	0.47 (0.07)	0.81 (0.08)
interhide-0.3-0.0	0.49 (0.06)	0.82 (0.09)	0.46 (0.07)	0.81 (0.09)
linear-0.0-0.0	0.25 (0.02)	0.41 (0.02)	0.23 (0.02)	0.41 (0.02)
linear-0.3-0.0	0.30 (0.02)	0.41 (0.02)	0.29 (0.02)	0.41 (0.02)
linear-0.3-0.3	0.41 (0.04)	0.50 (0.03)	0.41 (0.03)	0.49 (0.03)
manybiom-0.0-0.0	0.44 (0.05)	1.80 (0.12)	0.42 (0.05)	1.80 (0.12)
manybiom-0.3-0.0	0.42 (0.05)	1.82 (0.13)	0.40 (0.05)	1.82 (0.13)
manybiom-0.3-0.3	0.54 (0.06)	1.85 (0.13)	0.54 (0.06)	1.85 (0.13)
nonlinear-0.0-0.0	0.42 (0.04)	0.66 (0.05)	0.40 (0.04)	0.65 (0.05)
nonlinear-0.3-0.0	0.45 (0.04)	0.71 (0.06)	0.43 (0.04)	0.70 (0.06)
nonlinear-0.3-0.3	0.54 (0.06)	0.76 (0.07)	0.53 (0.06)	0.75 (0.06)
nonlinearskew-0.0-0.0	0.41 (0.05)	0.61 (0.05)	0.39 (0.05)	0.61 (0.05)
null-0.0-0.0	0.50 (0.15)	0.86 (0.04)	0.48 (0.16)	0.85 (0.04)
onetrt-0.0-0.0	0.37 (0.07)	0.89 (0.06)	0.34 (0.08)	0.89 (0.06)
onetrt-0.0-0.3	0.54 (0.07)	0.94 (0.06)	0.53 (0.07)	0.94 (0.06)
simple-0.0-0.0	0.26 (0.02)	0.42 (0.02)	0.24 (0.02)	0.41 (0.02)
simple-0.3-0.0	0.26 (0.02)	0.42 (0.02)	0.24 (0.02)	0.41 (0.02)
simple-0.3-0.3	0.38 (0.03)	0.50 (0.03)	0.37 (0.03)	0.49 (0.03)
simplestrong-0.0-0.0	0.46 (0.04)	1.16 (0.06)	0.45 (0.04)	1.16 (0.06)
twotrt-0.0-0.0	1.84 (0.99)	2.20 (0.35)	1.83 (1.00)	2.20 (0.35)
twotrt-0.3-0.0	1.67 (0.77)	2.11 (0.32)	1.67 (0.79)	2.11 (0.33)

Table 1

Mean and standard deviations of the median absolute prediction error based on leave-one-out CV, using the DPM model and the simple multivariate normal model (Simple). |$\hat{D}$| denotes the estimate based on the full data model, and |$D$| in comparison to the true treatment effect

Setting	DPM \|$\hat{D}$\|	Simple \|$\hat{D}$\|	DPM \|${D}$\|	Simple \|$D$\|
inter-0.0-0.0	0.32 (0.04)	0.82 (0.07)	0.29 (0.04)	0.81 (0.07)
inter-0.0-0.3	0.45 (0.05)	0.87 (0.08)	0.44 (0.04)	0.87 (0.08)
interhide-0.0-0.0	0.49 (0.07)	0.81 (0.08)	0.47 (0.07)	0.81 (0.08)
interhide-0.3-0.0	0.49 (0.06)	0.82 (0.09)	0.46 (0.07)	0.81 (0.09)
linear-0.0-0.0	0.25 (0.02)	0.41 (0.02)	0.23 (0.02)	0.41 (0.02)
linear-0.3-0.0	0.30 (0.02)	0.41 (0.02)	0.29 (0.02)	0.41 (0.02)
linear-0.3-0.3	0.41 (0.04)	0.50 (0.03)	0.41 (0.03)	0.49 (0.03)
manybiom-0.0-0.0	0.44 (0.05)	1.80 (0.12)	0.42 (0.05)	1.80 (0.12)
manybiom-0.3-0.0	0.42 (0.05)	1.82 (0.13)	0.40 (0.05)	1.82 (0.13)
manybiom-0.3-0.3	0.54 (0.06)	1.85 (0.13)	0.54 (0.06)	1.85 (0.13)
nonlinear-0.0-0.0	0.42 (0.04)	0.66 (0.05)	0.40 (0.04)	0.65 (0.05)
nonlinear-0.3-0.0	0.45 (0.04)	0.71 (0.06)	0.43 (0.04)	0.70 (0.06)
nonlinear-0.3-0.3	0.54 (0.06)	0.76 (0.07)	0.53 (0.06)	0.75 (0.06)
nonlinearskew-0.0-0.0	0.41 (0.05)	0.61 (0.05)	0.39 (0.05)	0.61 (0.05)
null-0.0-0.0	0.50 (0.15)	0.86 (0.04)	0.48 (0.16)	0.85 (0.04)
onetrt-0.0-0.0	0.37 (0.07)	0.89 (0.06)	0.34 (0.08)	0.89 (0.06)
onetrt-0.0-0.3	0.54 (0.07)	0.94 (0.06)	0.53 (0.07)	0.94 (0.06)
simple-0.0-0.0	0.26 (0.02)	0.42 (0.02)	0.24 (0.02)	0.41 (0.02)
simple-0.3-0.0	0.26 (0.02)	0.42 (0.02)	0.24 (0.02)	0.41 (0.02)
simple-0.3-0.3	0.38 (0.03)	0.50 (0.03)	0.37 (0.03)	0.49 (0.03)
simplestrong-0.0-0.0	0.46 (0.04)	1.16 (0.06)	0.45 (0.04)	1.16 (0.06)
twotrt-0.0-0.0	1.84 (0.99)	2.20 (0.35)	1.83 (1.00)	2.20 (0.35)
twotrt-0.3-0.0	1.67 (0.77)	2.11 (0.32)	1.67 (0.79)	2.11 (0.33)

Setting	DPM \|$\hat{D}$\|	Simple \|$\hat{D}$\|	DPM \|${D}$\|	Simple \|$D$\|
inter-0.0-0.0	0.32 (0.04)	0.82 (0.07)	0.29 (0.04)	0.81 (0.07)
inter-0.0-0.3	0.45 (0.05)	0.87 (0.08)	0.44 (0.04)	0.87 (0.08)
interhide-0.0-0.0	0.49 (0.07)	0.81 (0.08)	0.47 (0.07)	0.81 (0.08)
interhide-0.3-0.0	0.49 (0.06)	0.82 (0.09)	0.46 (0.07)	0.81 (0.09)
linear-0.0-0.0	0.25 (0.02)	0.41 (0.02)	0.23 (0.02)	0.41 (0.02)
linear-0.3-0.0	0.30 (0.02)	0.41 (0.02)	0.29 (0.02)	0.41 (0.02)
linear-0.3-0.3	0.41 (0.04)	0.50 (0.03)	0.41 (0.03)	0.49 (0.03)
manybiom-0.0-0.0	0.44 (0.05)	1.80 (0.12)	0.42 (0.05)	1.80 (0.12)
manybiom-0.3-0.0	0.42 (0.05)	1.82 (0.13)	0.40 (0.05)	1.82 (0.13)
manybiom-0.3-0.3	0.54 (0.06)	1.85 (0.13)	0.54 (0.06)	1.85 (0.13)
nonlinear-0.0-0.0	0.42 (0.04)	0.66 (0.05)	0.40 (0.04)	0.65 (0.05)
nonlinear-0.3-0.0	0.45 (0.04)	0.71 (0.06)	0.43 (0.04)	0.70 (0.06)
nonlinear-0.3-0.3	0.54 (0.06)	0.76 (0.07)	0.53 (0.06)	0.75 (0.06)
nonlinearskew-0.0-0.0	0.41 (0.05)	0.61 (0.05)	0.39 (0.05)	0.61 (0.05)
null-0.0-0.0	0.50 (0.15)	0.86 (0.04)	0.48 (0.16)	0.85 (0.04)
onetrt-0.0-0.0	0.37 (0.07)	0.89 (0.06)	0.34 (0.08)	0.89 (0.06)
onetrt-0.0-0.3	0.54 (0.07)	0.94 (0.06)	0.53 (0.07)	0.94 (0.06)
simple-0.0-0.0	0.26 (0.02)	0.42 (0.02)	0.24 (0.02)	0.41 (0.02)
simple-0.3-0.0	0.26 (0.02)	0.42 (0.02)	0.24 (0.02)	0.41 (0.02)
simple-0.3-0.3	0.38 (0.03)	0.50 (0.03)	0.37 (0.03)	0.49 (0.03)
simplestrong-0.0-0.0	0.46 (0.04)	1.16 (0.06)	0.45 (0.04)	1.16 (0.06)
twotrt-0.0-0.0	1.84 (0.99)	2.20 (0.35)	1.83 (1.00)	2.20 (0.35)
twotrt-0.3-0.0	1.67 (0.77)	2.11 (0.32)	1.67 (0.79)	2.11 (0.33)

Table 2 shows the mean and standard deviations over the simulation replicates of the posterior probability |$P(\hat{D} < \hat{D}^0 | \boldsymbol{x}, \boldsymbol{z})$|⁠, that is, the probability that the surrogate has predictive value. With the exception of the simple settings, the DPM model has larger average probabilities of superiority than the simple model in all settings. In the null setting, where there is no surrogate value, the DPM still has substantially higher probability of superiority over the null than the simple model, though with the average probability of 0.53, this is not too high, but rather the probability of superiority for the simple model appears to be too low. The largest absolute probabilities are in the linear (but nonlinear in |$Z_j$|⁠), simple and nonlinear settings, where the model is most easily able to detect differences from the null.

Table 2

Mean and standard deviations of the posterior probability of superiority over the null model, using the DPM model and the simple multivariate normal model (Simple). |$\hat{D}$| denotes the estimate based on the full data DPM model, and |$D$| in comparison to the true treatment effect

Setting-\|$c_z$\|-\|$c_u$\|	DPM \|$P(\hat{D} < \hat{D}^0)$\|	Simple \|$P(\hat{D} < \hat{D}^0)$\|	DPM \|$P({D} < {D}^0)$\|	Simple \|$P({D} < {D}^0)$\|
inter-0.0-0.0	0.57 (0.04)	0.42 (0.05)	0.57 (0.04)	0.41 (0.05)
inter-0.0-0.3	0.56 (0.04)	0.44 (0.06)	0.56 (0.04)	0.43 (0.06)
interhide-0.0-0.0	0.54 (0.06)	0.45 (0.07)	0.53 (0.06)	0.45 (0.07)
interhide-0.3-0.0	0.54 (0.06)	0.45 (0.08)	0.54 (0.06)	0.45 (0.08)
linear-0.0-0.0	0.80 (0.03)	0.82 (0.01)	0.80 (0.03)	0.82 (0.01)
linear-0.3-0.0	0.81 (0.03)	0.83 (0.02)	0.81 (0.03)	0.83 (0.02)
linear-0.3-0.3	0.80 (0.02)	0.80 (0.02)	0.80 (0.02)	0.80 (0.02)
manybiom-0.0-0.0	0.64 (0.04)	0.31 (0.03)	0.64 (0.04)	0.31 (0.03)
manybiom-0.3-0.0	0.66 (0.04)	0.31 (0.02)	0.66 (0.04)	0.31 (0.02)
manybiom-0.3-0.3	0.63 (0.04)	0.32 (0.03)	0.63 (0.04)	0.32 (0.03)
nonlinear-0.0-0.0	0.71 (0.05)	0.66 (0.06)	0.72 (0.04)	0.66 (0.06)
nonlinear-0.3-0.0	0.70 (0.05)	0.64 (0.06)	0.71 (0.05)	0.63 (0.06)
nonlinear-0.3-0.3	0.70 (0.05)	0.64 (0.05)	0.70 (0.05)	0.64 (0.05)
nonlinearskew-0.0-0.0	0.73 (0.04)	0.68 (0.06)	0.73 (0.04)	0.68 (0.07)
null-0.0-0.0	0.53 (0.19)	0.37 (0.13)	0.52 (0.22)	0.36 (0.14)
onetrt-0.0-0.0	0.43 (0.04)	0.27 (0.03)	0.41 (0.05)	0.25 (0.03)
onetrt-0.0-0.3	0.42 (0.03)	0.31 (0.04)	0.42 (0.03)	0.30 (0.04)
simple-0.0-0.0	0.80 (0.04)	0.82 (0.02)	0.80 (0.04)	0.82 (0.02)
simple-0.3-0.0	0.80 (0.03)	0.82 (0.02)	0.80 (0.03)	0.82 (0.02)
simple-0.3-0.3	0.80 (0.03)	0.80 (0.02)	0.80 (0.03)	0.80 (0.02)
simplestrong-0.0-0.0	0.78 (0.04)	0.77 (0.02)	0.78 (0.04)	0.77 (0.02)
twotrt-0.0-0.0	0.56 (0.07)	0.49 (0.07)	0.56 (0.07)	0.48 (0.07)
twotrt-0.3-0.0	0.57 (0.06)	0.49 (0.06)	0.57 (0.06)	0.49 (0.06)

Setting-\|$c_z$\|-\|$c_u$\|	DPM \|$P(\hat{D} < \hat{D}^0)$\|	Simple \|$P(\hat{D} < \hat{D}^0)$\|	DPM \|$P({D} < {D}^0)$\|	Simple \|$P({D} < {D}^0)$\|
inter-0.0-0.0	0.57 (0.04)	0.42 (0.05)	0.57 (0.04)	0.41 (0.05)
inter-0.0-0.3	0.56 (0.04)	0.44 (0.06)	0.56 (0.04)	0.43 (0.06)
interhide-0.0-0.0	0.54 (0.06)	0.45 (0.07)	0.53 (0.06)	0.45 (0.07)
interhide-0.3-0.0	0.54 (0.06)	0.45 (0.08)	0.54 (0.06)	0.45 (0.08)
linear-0.0-0.0	0.80 (0.03)	0.82 (0.01)	0.80 (0.03)	0.82 (0.01)
linear-0.3-0.0	0.81 (0.03)	0.83 (0.02)	0.81 (0.03)	0.83 (0.02)
linear-0.3-0.3	0.80 (0.02)	0.80 (0.02)	0.80 (0.02)	0.80 (0.02)
manybiom-0.0-0.0	0.64 (0.04)	0.31 (0.03)	0.64 (0.04)	0.31 (0.03)
manybiom-0.3-0.0	0.66 (0.04)	0.31 (0.02)	0.66 (0.04)	0.31 (0.02)
manybiom-0.3-0.3	0.63 (0.04)	0.32 (0.03)	0.63 (0.04)	0.32 (0.03)
nonlinear-0.0-0.0	0.71 (0.05)	0.66 (0.06)	0.72 (0.04)	0.66 (0.06)
nonlinear-0.3-0.0	0.70 (0.05)	0.64 (0.06)	0.71 (0.05)	0.63 (0.06)
nonlinear-0.3-0.3	0.70 (0.05)	0.64 (0.05)	0.70 (0.05)	0.64 (0.05)
nonlinearskew-0.0-0.0	0.73 (0.04)	0.68 (0.06)	0.73 (0.04)	0.68 (0.07)
null-0.0-0.0	0.53 (0.19)	0.37 (0.13)	0.52 (0.22)	0.36 (0.14)
onetrt-0.0-0.0	0.43 (0.04)	0.27 (0.03)	0.41 (0.05)	0.25 (0.03)
onetrt-0.0-0.3	0.42 (0.03)	0.31 (0.04)	0.42 (0.03)	0.30 (0.04)
simple-0.0-0.0	0.80 (0.04)	0.82 (0.02)	0.80 (0.04)	0.82 (0.02)
simple-0.3-0.0	0.80 (0.03)	0.82 (0.02)	0.80 (0.03)	0.82 (0.02)
simple-0.3-0.3	0.80 (0.03)	0.80 (0.02)	0.80 (0.03)	0.80 (0.02)
simplestrong-0.0-0.0	0.78 (0.04)	0.77 (0.02)	0.78 (0.04)	0.77 (0.02)
twotrt-0.0-0.0	0.56 (0.07)	0.49 (0.07)	0.56 (0.07)	0.48 (0.07)
twotrt-0.3-0.0	0.57 (0.06)	0.49 (0.06)	0.57 (0.06)	0.49 (0.06)

Table 2

Open in new tab Download slide

Mean and standard deviations of the posterior probability of superiority over the null model, using the DPM model and the simple multivariate normal model (Simple). |$\hat{D}$| denotes the estimate based on the full data DPM model, and |$D$| in comparison to the true treatment effect

Setting-\|$c_z$\|-\|$c_u$\|	DPM \|$P(\hat{D} < \hat{D}^0)$\|	Simple \|$P(\hat{D} < \hat{D}^0)$\|	DPM \|$P({D} < {D}^0)$\|	Simple \|$P({D} < {D}^0)$\|
inter-0.0-0.0	0.57 (0.04)	0.42 (0.05)	0.57 (0.04)	0.41 (0.05)
inter-0.0-0.3	0.56 (0.04)	0.44 (0.06)	0.56 (0.04)	0.43 (0.06)
interhide-0.0-0.0	0.54 (0.06)	0.45 (0.07)	0.53 (0.06)	0.45 (0.07)
interhide-0.3-0.0	0.54 (0.06)	0.45 (0.08)	0.54 (0.06)	0.45 (0.08)
linear-0.0-0.0	0.80 (0.03)	0.82 (0.01)	0.80 (0.03)	0.82 (0.01)
linear-0.3-0.0	0.81 (0.03)	0.83 (0.02)	0.81 (0.03)	0.83 (0.02)
linear-0.3-0.3	0.80 (0.02)	0.80 (0.02)	0.80 (0.02)	0.80 (0.02)
manybiom-0.0-0.0	0.64 (0.04)	0.31 (0.03)	0.64 (0.04)	0.31 (0.03)
manybiom-0.3-0.0	0.66 (0.04)	0.31 (0.02)	0.66 (0.04)	0.31 (0.02)
manybiom-0.3-0.3	0.63 (0.04)	0.32 (0.03)	0.63 (0.04)	0.32 (0.03)
nonlinear-0.0-0.0	0.71 (0.05)	0.66 (0.06)	0.72 (0.04)	0.66 (0.06)
nonlinear-0.3-0.0	0.70 (0.05)	0.64 (0.06)	0.71 (0.05)	0.63 (0.06)
nonlinear-0.3-0.3	0.70 (0.05)	0.64 (0.05)	0.70 (0.05)	0.64 (0.05)
nonlinearskew-0.0-0.0	0.73 (0.04)	0.68 (0.06)	0.73 (0.04)	0.68 (0.07)
null-0.0-0.0	0.53 (0.19)	0.37 (0.13)	0.52 (0.22)	0.36 (0.14)
onetrt-0.0-0.0	0.43 (0.04)	0.27 (0.03)	0.41 (0.05)	0.25 (0.03)
onetrt-0.0-0.3	0.42 (0.03)	0.31 (0.04)	0.42 (0.03)	0.30 (0.04)
simple-0.0-0.0	0.80 (0.04)	0.82 (0.02)	0.80 (0.04)	0.82 (0.02)
simple-0.3-0.0	0.80 (0.03)	0.82 (0.02)	0.80 (0.03)	0.82 (0.02)
simple-0.3-0.3	0.80 (0.03)	0.80 (0.02)	0.80 (0.03)	0.80 (0.02)
simplestrong-0.0-0.0	0.78 (0.04)	0.77 (0.02)	0.78 (0.04)	0.77 (0.02)
twotrt-0.0-0.0	0.56 (0.07)	0.49 (0.07)	0.56 (0.07)	0.48 (0.07)
twotrt-0.3-0.0	0.57 (0.06)	0.49 (0.06)	0.57 (0.06)	0.49 (0.06)

Setting-\|$c_z$\|-\|$c_u$\|	DPM \|$P(\hat{D} < \hat{D}^0)$\|	Simple \|$P(\hat{D} < \hat{D}^0)$\|	DPM \|$P({D} < {D}^0)$\|	Simple \|$P({D} < {D}^0)$\|
inter-0.0-0.0	0.57 (0.04)	0.42 (0.05)	0.57 (0.04)	0.41 (0.05)
inter-0.0-0.3	0.56 (0.04)	0.44 (0.06)	0.56 (0.04)	0.43 (0.06)
interhide-0.0-0.0	0.54 (0.06)	0.45 (0.07)	0.53 (0.06)	0.45 (0.07)
interhide-0.3-0.0	0.54 (0.06)	0.45 (0.08)	0.54 (0.06)	0.45 (0.08)
linear-0.0-0.0	0.80 (0.03)	0.82 (0.01)	0.80 (0.03)	0.82 (0.01)
linear-0.3-0.0	0.81 (0.03)	0.83 (0.02)	0.81 (0.03)	0.83 (0.02)
linear-0.3-0.3	0.80 (0.02)	0.80 (0.02)	0.80 (0.02)	0.80 (0.02)
manybiom-0.0-0.0	0.64 (0.04)	0.31 (0.03)	0.64 (0.04)	0.31 (0.03)
manybiom-0.3-0.0	0.66 (0.04)	0.31 (0.02)	0.66 (0.04)	0.31 (0.02)
manybiom-0.3-0.3	0.63 (0.04)	0.32 (0.03)	0.63 (0.04)	0.32 (0.03)
nonlinear-0.0-0.0	0.71 (0.05)	0.66 (0.06)	0.72 (0.04)	0.66 (0.06)
nonlinear-0.3-0.0	0.70 (0.05)	0.64 (0.06)	0.71 (0.05)	0.63 (0.06)
nonlinear-0.3-0.3	0.70 (0.05)	0.64 (0.05)	0.70 (0.05)	0.64 (0.05)
nonlinearskew-0.0-0.0	0.73 (0.04)	0.68 (0.06)	0.73 (0.04)	0.68 (0.07)
null-0.0-0.0	0.53 (0.19)	0.37 (0.13)	0.52 (0.22)	0.36 (0.14)
onetrt-0.0-0.0	0.43 (0.04)	0.27 (0.03)	0.41 (0.05)	0.25 (0.03)
onetrt-0.0-0.3	0.42 (0.03)	0.31 (0.04)	0.42 (0.03)	0.30 (0.04)
simple-0.0-0.0	0.80 (0.04)	0.82 (0.02)	0.80 (0.04)	0.82 (0.02)
simple-0.3-0.0	0.80 (0.03)	0.82 (0.02)	0.80 (0.03)	0.82 (0.02)
simple-0.3-0.3	0.80 (0.03)	0.80 (0.02)	0.80 (0.03)	0.80 (0.02)
simplestrong-0.0-0.0	0.78 (0.04)	0.77 (0.02)	0.78 (0.04)	0.77 (0.02)
twotrt-0.0-0.0	0.56 (0.07)	0.49 (0.07)	0.56 (0.07)	0.48 (0.07)
twotrt-0.3-0.0	0.57 (0.06)	0.49 (0.06)	0.57 (0.06)	0.49 (0.06)

Figures 1 and 2 show a more detailed summary of the posterior distributions from an illustrative single replicate from the settings with |$c_z = c_u = 0$|⁠. The DPM model is creating multiple clusters when appropriate, particularly in the nonlinear and observed interaction settings, to flexibly model complex associations. In the null setting, the DPM model tends to have more precise estimates of the treatment effect on the clinical outcome despite there being no association with the treatment effect on the potential surrogate (Figure S3 of the Supplementary material available at Biostatistics online).

Fig. 1

Medians of leave-one-out predictions versus true values, with contours for the posterior density in the linear and nonlinear settings.

Fig. 2

Medians of leave-one-out predictions versus true values, with contours for the posterior density in the two interaction settings.

Open in new tab Download slide

For the given sample size configuration, our proposed method has difficulty picking up the appropriate cluster in the one and two treatment settings and hence is not able to detect the improved value of the surrogate for the subgroups for which there is surrogate value (Figure S3 of the Supplementary material available at Biostatistics online); however, these are very difficult scenarios. When we consider variation over multiple biomarker groups, as in setting manybiom, we see that the cluster detection improves and we are able to detect the surrogate quality difference (Figure S6 of the Supplementary material available at Biostatistics online). Over the simulation replications in settings 6, 7, and 8, where there are clearly defined clusters, the mean and standard deviation of the proportion of clusters that are correctly identified for each leave-one-out group over the replicates is 0.78 (0.11) in the onetrt setting with |$c_z = c_u = 0$|⁠, 0.80 (0.13) in the onetrt setting with |$c_z = 0.3, c_u = 0$|⁠, 0.48 (0.06) in the twotrt setting with |$c_z = c_u = 0$|⁠, 0.48 (0.06) in the twotrt setting with |$c_z = 0, c_u = 0.3$|⁠, 0.90 (0.05) in the manybiom setting with |$c_z = c_u = 0$|⁠, 0.93 (0.04) in the manybiom setting with |$c_z = 0.3, c_u = 0$|⁠, and 0.94 (0.04) in the manybiom setting with |$c_z = c_u = 0.3$|⁠. Thus, the DPM model is able to identify the correct cluster a large proportion of the time in the setting where there are distinct clusters and enough data.

Additionally, our method is not only able to identify the correct clustering based on surrogate quality that differs by observed subgroups (as in the inter settings) but also when this variation is based on unobserved latent variables (as in the interhide settings)at least in some replicates. Figures S7 and S8 of the Supplementary material available at Biostatistics online show the posterior density of |$\hat{D}$| by clusters (identified using our model), with the different colors indicating different clusters with different surrogate quality. The top panel of Figure S7 of the Supplementary material available at Biostatistics online is for the setting where the clusters are based on a latent variable and the lower panel for the setting where the clusters are based on a biomarker-treatment-level observed covariate. Figure S8 of the Supplementary material available at Biostatistics online shows the settings manybiom (top panel) and twotrt (bottom panel). In these settings, the DPM model is able to detect the differential surrogate value over the clusters.

In summary, the simulation results show that we able to detect and estimate the quality of a surrogate that is useful over all treatments and biomarkers even when the surrogate effect to outcome effect relationship is complex. Our proposed method outperforms the simple model in every case, even when the true data generating mechanism exactly matches the simple parametric model. Additionally, we show our proposed method is able to detect variations in surrogate quality in most settings with adequate power. However, we see that to detect this variation on average we need a large enough subgroup with high surrogate quality, and a large difference in surrogate quality between subgroups (i.e., clearly defined clusters).

4.4. Illustrative example

To illustrate how our proposed methods would be used in practice, from evaluation to the use of the evaluated surrogate in the next setting, we consider a single data set generated under the two treatment (twotrt) setting, with |$c_z = c_u = 0$|⁠, and with independent uniform censoring. Like all of the simulations scenarios, the data generation is based on the code used during the planning phase the Probio trial and fits one of the scenarios that the trial team believed to be plausible.

We assume that both the clinical outcome and surrogate are observed for groups |$1, \ldots, KM = 63$|⁠, and the clinical outcome is not yet observed in the group |$KM+1 = 64$|⁠. Using our model, we would like to predict the unknown treatment effect on the clinical outcome in this group, based on the observed data in the 63 other treatment by biomarker groups in which both the candidate surrogate endpoint and the clinical outcome are observed. To do this, we fit our model using the available data, which does not include the clinical outcome for group |$KM+1 = 64$|⁠, to obtain samples from the posterior of the treatment effect on the clinical outcome for that group. We compute estimates of the prediction error by running the leave-one-out procedure among the remaining 63 groups with complete data.

The results are shown in Figure 3, which plots the posterior estimates of the treatment effects. The target is to predict the treatment effect on the clinical outcome for a new group where the clinical outcome in not measured, indicated by the open circle for the median prediction. This particular effect is correctly assigned to the cluster where there is high surrogate value, and over the iterations the cluster is correctly assigned 97.2|$\%$| of the time.

Fig. 3

Illustrative example. The filled points represent the posterior medians of the treatment effects with fully observed data, while the unfilled point represents the posterior predicted median for the group where the clinical outcome is unobserved and the vertical line is a prediction interval obtained by adding and subtracting the median leave-one-out error in that cluster.

Open in new tab Download slide

The overall median prediction error of the DPM model is 1.96, but when looking by cluster, the median error is 2.22 in cluster 2, and 1.63 in cluster 1. In reference to the null model, which has an overall median prediction error of 2.18, the posterior probability of surrogate value |$P(\hat{D} < \hat{D}^0 | \boldsymbol{x}, \boldsymbol{z})$| is 0.47 overall, 0.41 in cluster 2, and 0.54 in cluster 1. It is evident based on these estimates and in the figure that these clusters have differential surrogate value, and the model is able to correctly estimate cluster membership based on the surrogate and group-level covariates alone for the new group.

For future use of the surrogate, one could fit the model with the new treatment effect on the surrogate to determine which cluster the new trial or group gets assigned to, then form posterior predictions of the treatment effect on the clinical outcome as we have done here. It would also be of interest to investigate how the treatment effect on the surrogate and the group level covariates determine cluster membership so that if there are observable determinants of the clustering, those could be used in future (rather than rerunning the full model with the new treatment effect on the surrogate). In this example, the clusters are almost fully identified by two of the treatments. So, one could potentially rerun the model using only these two treatments and then use the resulting model moving forward.

5. Discussion

We propose a flexible and efficient model for assessing the value of a potential surrogate in the context of platform trials. Although the goal of our method is the evaluation of surrogates and the characterization of surrogate heterogeneity in a general platform study, our motivating example is a Bayesian adaptive platform trial with the goal of identifying effective biomarker--treatment combinations in metastatic, castrate-resistant prostate cancer. Our simulation study and illustrative example based on the ProBio trial demonstrates that this method is fit-for-purpose for evaluating the potential surrogate, ctDNA, in that study, but also in general platform studies with or without adaptive randomization or early stopping. The value of ctDNA as a potential surrogate is important to characterize as additional treatment--biomarker combinations are added to the platform study, and also for future trials in earlier-stage prostate cancer, where the time to recurrence or death is much longer. If ctDNA is found to be a high-quality surrogate in a biomarker--treatment combination that is planned for study in early-stage prostate cancer, then the results can potentially be put into clinical practice sooner that they would be without the surrogate. Conversely, evidence that ctDNA is a poor surrogate in that group would be equally valuable to ensure that a treatment is not put into practice on the basis of low quality evidence.

The strengths of the method are the flexibility and data-adaptive nature of the priors in the hierarchical model. In addition the approach allows for baseline group-level covariates to be conditioned on in a flexible manner. To our knowledge, this is a novel approach for surrogate evaluation. Although allowing for inclusion of covariates may make the surrogate quality look poorer, as these covariates may have some predictive ability, if these baseline measures are available this is likely a fairer estimate of the surrogate quality for a surrogate used in practice. Additionally, if surrogate quality varies with these baseline covariates, they allow for better use of the surrogate moving forward.

Due to the clustering in our proposed method, we can assess surrogate quality variation over not only observed subgroups but also data-identified subgroups. The clustering of the DPM is the manner by which it is flexible and data adaptive. More clusters means a more flexible model, and since the number of clusters is data adaptive, our proposed model is as flexible as the data can inform. When distinct clusters are clearly identified by the data, it is worthwhile to explore what those clusters mean and if they indicate differential surrogate value. By its nature, our DPM model allows for such exploration and explanation, something not as straightforward with other types of flexible models, such as splines or locally weighted smoothers. Although simulations show that there are limits to how well this works, particularly when the observed subgroups are very small or the surrogate quality varies only by a latent variable, we are able to detect variations in surrogate quality between large observed subgroups and in some settings with subgroups defined by latent variables. This is an important extension to previous work with the goal of identifying surrogate quality variation (Papanikos and others, 2020) or that allowed for such variation, without evaluating it, in a less flexible manner (Gabriel and others, 2016, 2019).

The shared control arm, adaptive nature of the trial, and stopping rules means that the treatment effect estimates based on the Probio trial may be biased, especially for those that stopped due to superiority or futility (Emerson and Fleming, 1990). Exploration of whether additionally modifications to our method can be made to further reduce bias is an area of future work. Our method specifies a parametric model in the first stage of the hierarchy and a nonparametric model (DPM) at the second stage. Completely nonparametric hierarchical DPM have been developed (Teh and others, 2006). Another avenue for future work would be to implement such mixtures to flexibly model the treatment effects at the first stage of the hierarchy as well. Finally, although we demonstrate how independent censoring can be accounted for, investigation of dependent censoring may be useful for trial settings without registers and whose outcomes do not involve all-cause death.

Software

Software in the form of an R package and complete documentation is available on the corresponding author’s GitHub at https://github.com/sachsmc/dpsurrogate.

Supplementary material

Supplementary material is available online athttp://biostatistics.oxfordjournals.org.

Acknowledgments

Conflict of Interest: None declared.

Funding

Swedish Research Council (2019-00227 to M.C.S.); Swedish Research Council (2017-01898 to E.E.G.); and National Institutes of Health (R01 HL158963 to M.J.D.).

References

Baker,

S. G.

(

2006

).

A simple meta-analytic approach for using a binary surrogate endpoint to predict the effect of intervention on true endpoint

.

Biostatistics

7

,

58

–

70

.

Burzykowski,

T.

,

Molenberghs,

G.

and

Buyse,

M.

(

2006

).

The Evaluation of Surrogate Endpoints

.

New York

:

Springer Science & Business Media

.

Google Preview

Buyse,

M.

,

Molenberghs,

G.

,

Burzykowski,

T.

,

Renard,

D.

and

Geys,

H.

(

2000

).

The validation of surrogate endpoints in meta-analyses of randomized experiments

.

Biostatistics

1

,

49

–

67

.

Carreras,

M.

and

Brannath,

W.

(

2013

).

Shrinkage estimation in two-stage adaptive designs with midtrial treatment selection

.

Statistics in Medicine

32

,

1677

–

1690

.

Crippa,

A.

,

De Laere,

B.

,

Discacciati,

A.

,

Larsson,

B.

,

Connor,

J. T.

,

Gabriel,

E. E.

,

Thellenberg,

C.

,

Jänes,

E.

,

Enblad,

G.

,

Ullen,

A.

and others. (

2020

).

The ProBio trial: molecular biomarkers for advancing personalized treatment decision in patients with metastatic castration-resistant prostate cancer

.

Trials

21

,

1

–

10

.

Dahl,

D. B.

(

2006

).

Model-based clustering for expression data via a Dirichlet process mixture model

.

Bayesian Inference for Gene Expression and Proteomics

4

,

201

–

218

.

Dai,

J. Y.

and

Hughes,

J. P.

(

2012

).

A unified procedure for meta-analytic evaluation of surrogate end points in randomized clinical trials

.

Biostatistics

13

,

609

–

624

.

Daniels,

M. J.

and

Hughes,

M. D.

(

1997

).

Meta-analysis for the evaluation of potential surrogate markers

.

Statistics in Medicine

16

,

1965

–

1982

.

De Laere,

B.

,

Crippa,

A.

,

Discacciati,

A.

,

Larsson,

B.

,

Oldenburg,

J.

,

Mortezavi,

A.

,

Ost,

P.

,

Eklund,

M.

,

Lindberg,

J.

,

Grönberg,

H.

and others. (

2022

).

Clinical trial protocol for ProBio: an outcome-adaptive and randomised multiarm biomarker-driven study in patients with metastatic prostate cancer

.

European Urology Focus

8

,

1617

–

1621

.

Emerson,

S. S.

and

Fleming,

T. R.

(

1990

).

Parameter estimation following group sequential hypothesis testing

.

Biometrika

77

,

875

–

892

.

-. https://www.ncbi.nlm.nih.gov/books/NBK326791/.

FDA-NIH Biomarker Working

Group.

(

2016

).

Best (Biomarkers, Endpoints, and Other Tools) Resource

[Internet]

.

Silver Spring

(

MD

):

Food and Drug Administration

(

US

);

Bethesda

(

MD

):

National Institutes of Health

(

US

);

2016

Google Preview

Gabriel,

E. E.

,

Daniels,

M. J.

and

Halloran,

M. E.

(

2016

).

Comparing biomarkers as trial level general surrogates

.

Biometrics

72

,

1046

–

1054

.

Gabriel,

E. E.

,

Sachs,

M. C.

,

Daniels,

M. J.

and

Halloran,

M. E.

(

2019

).

Optimizing and evaluating biomarker combinations as trial-level general surrogates

.

Statistics in Medicine

38

,

1135

–

1146

.

Gail,

M.

,

Pfeiffer,

R.

,

van Houwelingen,

H.

and

Carroll,

R.

(

2000

).

On meta-analytic assessment of surrogate outcomes

.

Biostatistics

1

,

231

–

246

.

Korn,

E. L.

,

Albert,

P. S.

and

McShane,

L. M.

(

2005

).

Assessing surrogates as trial endpoints using mixed models

.

Statistics in Medicine

24

,

163

–

182

.

Li,

Y.

,

Taylor,

J. M.

and

Elliott,

M. R.

(

2010

).

A Bayesian approach to surrogacy assessment using principal stratification in clinical trials

.

Biometrics

66

,

523

–

531

.

Meyer,

E. L.

,

Mesenbrink,

P.

,

Dunger-Baldauf,

C.

,

Fülle,

H.-J.

,

Glimm,

E.

,

Li,

Y.

,

Posch,

M.

and

König,

F.

(

2020

).

The evolution of master protocol clinical trial designs: a systematic literature review

.

Clinical Therapeutics

42

,

1330

–

1360

.

Müller,

P.

,

Erkanli,

A.

and

West,

M.

(

1996

).

Bayesian curve fitting using multivariate normal mixtures

.

Biometrika

83

,

67

–

79

.

Murray,

J. S.

and

Reiter,

J. P.

(

2016

).

Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence

.

Journal of the American Statistical Association

111

,

1466

–

1479

.

. https://cran.r-project.org/package=rjags.

Papanikos,

T.

,

Thompson,

J. R.

,

Abrams,

K. R.

,

Städler,

N.

,

Ciani,

O.

,

Taylor,

R.

and

Bujkiewicz,

S.

(

2020

).

Bayesian hierarchical meta-analytic methods for modeling surrogate relationships that vary across treatment classes using aggregate data

.

Statistics in Medicine

39

,

1103

–

1124

.

Plummer,

M.

(

2021

).

rjags: Bayesian Graphical Models using MCMC

.

R package version 4-11

Qi,

X.

,

Zhou,

S.

and

Plummer,

M.

(

2022

).

On Bayesian modeling of censored data in JAGS

.

BMC Bioinformatics

23

,

1

–

13

.

PubMed

. https://cran.r-project.org/package=dirichletprocess.

Ross,

G. J.

and

Markwick,

D.

(

2020

).

Dirichletprocess: Build Dirichlet Process Objects for Bayesian Modelling

.

R package version 0.4.0

Shahbaba,

B.

and

Neal,

R.

(

2009

).

Nonlinear models using Dirichlet process mixtures

.

Journal of Machine Learning Research

10

,

1829

–

1850

.

Teh,

Y. W.

,

Jordan,

M. I.

,

Beal,

M. J.

and

Blei,

D. M.

(

2006

).

Hierarchical Dirichlet processes

.

Journal of the American Statistical Association

101

,

1566

–

1581

.

Vandekerkhove,

G.

,

Struss,

W. J.

,

Annala,

M.

,

Kallio,

H. M.

,

Khalaf,

D.

,

Warner,

E. W.

,

Herberts,

C.

,

Ritch,

E.

,

Beja,

K.

,

Loktionova,

Y.

and others. (

2019

).

Circulating tumor DNA abundance and potential utility in de novo metastatic prostate cancer

.

European Urology

75

,

667

–

675

.

Wade,

S.

,

Mongelluzzo,

S.

and

Petrone,

S.

(

2011

).

An enriched conjugate prior for Bayesian nonparametric inference

.

Bayesian Analysis

6

,

359

–

385

.