Statistical methods for biomarker data pooled from multiple nested case–control studies Free

Comparison of operating characteristics for |$\beta_x$| under the model |$\textrm{logit}(P(Y_{sji}=1|X_{sji})) = \beta_{0sj} + \beta_xX_{sji}$|⁠, for naive (⁠|$\hat{\beta}_{N}$|⁠), internalized (⁠|$\hat{\beta}_{IN}$|⁠), full calibration (⁠|$\hat{\beta}_{FC}$|⁠), and two-stage (⁠|$\hat{\beta}_{TS}$|⁠) methods

	Mean percent bias (SE)				MSE				Coverage rate
\|$\beta_x$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|
log(1.25)	\|$-$\|29.4 (0.029)	\|$-$\|3.1 (0.037)	0.1 (0.038)	\|$-$\|0.8 (0.038)	0.0051	0.0014	0.0014	0.0014	0.37	0.96	0.95	0.96
log(1.50)	\|$-$\|29.0 (0.032)	\|$-$\|3.2 (0.040)	0.0 (0.042)	\|$-$\|0.9 (0.042)	0.0149	0.0018	0.0018	0.0018	0.05	0.95	0.94	0.95
log(1.75)	\|$-$\|28.6 (0.035)	\|$-$\|3.5 (0.043)	\|$-$\|0.1 (0.045)	\|$-$\|1.0 (0.045)	0.0269	0.0023	0.0021	0.0021	0.01	0.93	0.94	0.95
log(2.00)	\|$-$\|28.2 (0.039)	\|$-$\|3.5 (0.049)	0.0 (0.051)	\|$-$\|1.0 (0.051)	0.0396	0.0030	0.0027	0.0027	0.00	0.91	0.93	0.93
log(2.25)	\|$-$\|28.0 (0.042)	\|$-$\|3.6 (0.052)	\|$-$\|0.1 (0.055)	\|$-$\|1.1 (0.055)	0.0532	0.0036	0.0030	0.0031	0.00	0.90	0.94	0.94
log(2.50)	\|$-$\|27.9 (0.044)	\|$-$\|3.9 (0.055)	\|$-$\|0.3 (0.058)	\|$-$\|1.3 (0.058)	0.0671	0.0043	0.0034	0.0035	0.00	0.90	0.96	0.94

	Mean percent bias (SE)				MSE				Coverage rate
\|$\beta_x$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|
log(1.25)	\|$-$\|29.4 (0.029)	\|$-$\|3.1 (0.037)	0.1 (0.038)	\|$-$\|0.8 (0.038)	0.0051	0.0014	0.0014	0.0014	0.37	0.96	0.95	0.96
log(1.50)	\|$-$\|29.0 (0.032)	\|$-$\|3.2 (0.040)	0.0 (0.042)	\|$-$\|0.9 (0.042)	0.0149	0.0018	0.0018	0.0018	0.05	0.95	0.94	0.95
log(1.75)	\|$-$\|28.6 (0.035)	\|$-$\|3.5 (0.043)	\|$-$\|0.1 (0.045)	\|$-$\|1.0 (0.045)	0.0269	0.0023	0.0021	0.0021	0.01	0.93	0.94	0.95
log(2.00)	\|$-$\|28.2 (0.039)	\|$-$\|3.5 (0.049)	0.0 (0.051)	\|$-$\|1.0 (0.051)	0.0396	0.0030	0.0027	0.0027	0.00	0.91	0.93	0.93
log(2.25)	\|$-$\|28.0 (0.042)	\|$-$\|3.6 (0.052)	\|$-$\|0.1 (0.055)	\|$-$\|1.1 (0.055)	0.0532	0.0036	0.0030	0.0031	0.00	0.90	0.94	0.94
log(2.50)	\|$-$\|27.9 (0.044)	\|$-$\|3.9 (0.055)	\|$-$\|0.3 (0.058)	\|$-$\|1.3 (0.058)	0.0671	0.0043	0.0034	0.0035	0.00	0.90	0.96	0.94

Percent bias and MSE are computed by |$(\hat{\beta}-\beta)/\beta$| and |$(\beta - \hat{\beta})^2$|⁠, respectively and the reported value is the average over 1000 simulations. Standard error is the square root of the empirical variance over all replicates. The coverage rate is the proportion of simulations whose estimated 95% confidence intervals covered the true effect |$\beta_x$|⁠.

Table 1.

Open in new tab Download slide

Comparison of operating characteristics for |$\beta_x$| under the model |$\textrm{logit}(P(Y_{sji}=1|X_{sji})) = \beta_{0sj} + \beta_xX_{sji}$|⁠, for naive (⁠|$\hat{\beta}_{N}$|⁠), internalized (⁠|$\hat{\beta}_{IN}$|⁠), full calibration (⁠|$\hat{\beta}_{FC}$|⁠), and two-stage (⁠|$\hat{\beta}_{TS}$|⁠) methods

	Mean percent bias (SE)				MSE				Coverage rate
\|$\beta_x$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|
log(1.25)	\|$-$\|29.4 (0.029)	\|$-$\|3.1 (0.037)	0.1 (0.038)	\|$-$\|0.8 (0.038)	0.0051	0.0014	0.0014	0.0014	0.37	0.96	0.95	0.96
log(1.50)	\|$-$\|29.0 (0.032)	\|$-$\|3.2 (0.040)	0.0 (0.042)	\|$-$\|0.9 (0.042)	0.0149	0.0018	0.0018	0.0018	0.05	0.95	0.94	0.95
log(1.75)	\|$-$\|28.6 (0.035)	\|$-$\|3.5 (0.043)	\|$-$\|0.1 (0.045)	\|$-$\|1.0 (0.045)	0.0269	0.0023	0.0021	0.0021	0.01	0.93	0.94	0.95
log(2.00)	\|$-$\|28.2 (0.039)	\|$-$\|3.5 (0.049)	0.0 (0.051)	\|$-$\|1.0 (0.051)	0.0396	0.0030	0.0027	0.0027	0.00	0.91	0.93	0.93
log(2.25)	\|$-$\|28.0 (0.042)	\|$-$\|3.6 (0.052)	\|$-$\|0.1 (0.055)	\|$-$\|1.1 (0.055)	0.0532	0.0036	0.0030	0.0031	0.00	0.90	0.94	0.94
log(2.50)	\|$-$\|27.9 (0.044)	\|$-$\|3.9 (0.055)	\|$-$\|0.3 (0.058)	\|$-$\|1.3 (0.058)	0.0671	0.0043	0.0034	0.0035	0.00	0.90	0.96	0.94

	Mean percent bias (SE)				MSE				Coverage rate
\|$\beta_x$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|	\|$\hat{\beta}_N$\|	\|$\hat{\beta}_{IN}$\|	\|$\hat{\beta}_{FC}$\|	\|$\hat{\beta}_{TS}$\|
log(1.25)	\|$-$\|29.4 (0.029)	\|$-$\|3.1 (0.037)	0.1 (0.038)	\|$-$\|0.8 (0.038)	0.0051	0.0014	0.0014	0.0014	0.37	0.96	0.95	0.96
log(1.50)	\|$-$\|29.0 (0.032)	\|$-$\|3.2 (0.040)	0.0 (0.042)	\|$-$\|0.9 (0.042)	0.0149	0.0018	0.0018	0.0018	0.05	0.95	0.94	0.95
log(1.75)	\|$-$\|28.6 (0.035)	\|$-$\|3.5 (0.043)	\|$-$\|0.1 (0.045)	\|$-$\|1.0 (0.045)	0.0269	0.0023	0.0021	0.0021	0.01	0.93	0.94	0.95
log(2.00)	\|$-$\|28.2 (0.039)	\|$-$\|3.5 (0.049)	0.0 (0.051)	\|$-$\|1.0 (0.051)	0.0396	0.0030	0.0027	0.0027	0.00	0.91	0.93	0.93
log(2.25)	\|$-$\|28.0 (0.042)	\|$-$\|3.6 (0.052)	\|$-$\|0.1 (0.055)	\|$-$\|1.1 (0.055)	0.0532	0.0036	0.0030	0.0031	0.00	0.90	0.94	0.94
log(2.50)	\|$-$\|27.9 (0.044)	\|$-$\|3.9 (0.055)	\|$-$\|0.3 (0.058)	\|$-$\|1.3 (0.058)	0.0671	0.0043	0.0034	0.0035	0.00	0.90	0.96	0.94

Percent bias and MSE are computed by |$(\hat{\beta}-\beta)/\beta$| and |$(\beta - \hat{\beta})^2$|⁠, respectively and the reported value is the average over 1000 simulations. Standard error is the square root of the empirical variance over all replicates. The coverage rate is the proportion of simulations whose estimated 95% confidence intervals covered the true effect |$\beta_x$|⁠.

We also performed simulations that fixed the total sample size at 1000 participants while varying the calibration subset size between 30, 50, and 150 subjects (or 3%, 5%, and 15% participation rates, respectively). As shown in Figure 1, at all calibration study sizes, the full calibration method offered nearly unbiased point estimates. With larger calibration study sizes, the MSEs decreased as a result of the improvement in efficiency. However, the internalized method estimates experienced increasing downward bias as the proportion of subjects participating in the calibration subset increased owing to increasingly differential calibration of cases and controls. As calibration study size increased, the two-stage method point estimates were increasingly less biased with decreasing MSEs owing to the improved bias and efficiency of calibration parameters.

Fig. 1.

Comparison of methods as number of participants in the calibration study increases. The number of subjects in each study remains fixed at 1000, or equivalently, 500 case–control pairs. The calibration study participation rates considered are 3%, 5%, and 15%, or 30, 50, and 150 individuals, respectively. Panels a-c depict the percent bias of the parameter estimate while panels d-f display the MSE of the estimate.

3.2. Model with an interaction term

For the simulations involving a model with an interaction term, we also generated |$V_{sji}$| in the multivariate normal model for each stratum such that

$$\begin{equation*} \begin{split} \begin{pmatrix} W_{sji} \\ V_{sji} \\ e_{sji} \end{pmatrix} & \sim \textrm{MVN} \left( \begin{pmatrix} \mu_{ws} \\ \mu_v \\ 0 \end{pmatrix}, \begin{pmatrix} \sigma^2_{ws} & \sigma_{wvs} & 0 \\ \cdot & \sigma_v^2 & 0 \\ \cdot & \cdot & \sigma^2_e \end{pmatrix}\!\!\right), \\ X_{sji} & = a_s +b_sW_{sji} +e_{sji}, \end{split} \end{equation*}$$

where |$\mu_{ws} = (\mu_x-a_s)/b_s$|⁠, |$\sigma_{wvs} = Cov(W_{sji},V_{sji})$|⁠, and |$\sigma_{ws}^2 =(\sigma_x^2-\sigma^2_e)/b_s^2$|⁠. We again used 1:1 matching and omitted other covariates such that the risk model was |$\text{logit}(P(Y_{si}=1|X_{sji},V_{sji})) = \beta_{0sj} + \beta_x X_{sji} + \beta_v V_{sji} + \beta_{xv} X_{sji}V_{sji}$|⁠. Like before, we assumed four studies contributed to the analysis, each with 1000 total subjects and 100 individuals in the calibration subset. We set |$\mu_x=\mu_v=0$|⁠, |$\sigma^2_x=\sigma_v^2=1$|⁠, |$\boldsymbol{a}=(-3,1,-1,3)$|⁠, |$\boldsymbol{b}=(0.5, 0.75, 1.25, 1.5)$|⁠, and |$\text{Corr}(X_{sji},V_{sji})=0.2$|⁠, which in turn induced |$\sigma_{wvs}$| for each study. We considered the same range of RRs for the main effect of the biomarker measurement and also chose four combinations of |$(\beta_{v},\beta_{xv})$| to address all possible qualitative effects of the covariate and interaction term, including |$(\exp(\beta_v),\exp(\beta_{xv})) \in [(0.8,0.8),(0.8,1.2),(1.2,0.8),(1.2,1.2)]$|⁠. The simulation results for |$(\exp(\beta_v),\exp(\beta_{xv}))=(1.2,1.2)$| are reported in Table 2 while the results for |$(\exp(\beta_v),\exp(\beta_{xv})) \in [(0.8,1.2),(1.2,0.8),(1.2,1.2)]$| are presented in Section 7 of the supplementary material available at Biostatistics online.

Table 2.

Comparison of operating characteristics for (⁠|$\beta_x$|⁠,|$\beta_v$|⁠,|$\beta_{xv}$|⁠) for naive (N), internalized (IN), full calibration (FC), and two-stage (TS) methods under the model |$\text{logit}(P(Y_{sji}=1|X_{sji}, V_{sji}, \boldsymbol{Z}_{sji}))= \beta_{0sj} + \beta_xX_{sji}+\beta_vV_{sji}+\beta_{xv}X_{sji}V_{sji}$|⁠, with |$(\exp(\beta_v),\exp(\beta_{xv}))=(1.2,1.2)$|

	Percent bias (SE) of \|$\beta_x$\|				MSE of \|$\beta_x$\|				Coverage rate of \|$\beta_x$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|25.8 (0.036)	\|$-$\|2.9 (0.045)	0.9 (0.048)	0.4 (0.048)	0.0342	0.0023	0.0023	0.0023	0.00	0.95	0.96	0.96
\|$\log(0.8)$\|	\|$-$\|29.4 (0.031)	\|$-$\|2.7 (0.037)	0.6 (0.039)	0.2 (0.039)	0.0050	0.0015	0.0015	0.0015	0.41	0.96	0.96	0.95
\|$\log(1.2)$\|	\|$-$\|28.6 (0.030)	\|$-$\|2.8 (0.037)	0.3 (0.038)	\|$-$\|0.1 (0.038)	0.0038	0.0014	0.0015	0.0014	0.54	0.96	0.96	0.96
\|$\log(1.5)$\|	\|$-$\|27.6 (0.032)	\|$-$\|2.8 (0.041)	0.4 (0.043)	\|$-$\|0.2 (0.043)	0.0140	0.0018	0.0018	0.0018	0.06	0.96	0.95	0.95
\|$\log(2.0)$\|	\|$-$\|25.7 (0.040)	\|$-$\|2.6 (0.048)	0.9 (0.050)	0.8 (0.050)	0.0333	0.0026	0.0026	0.0026	0.00	0.95	0.96	0.95
\|$\log(2.5)$\|	\|$-$\|23.4 (0.045)	\|$-$\|2.5 (0.054)	1.4 (0.057)	1.1 (0.057)	0.0478	0.0034	0.0035	0.0034	0.00	0.96	0.96	0.95

	Percent bias (SE) of \|$\beta_x$\|				MSE of \|$\beta_x$\|				Coverage rate of \|$\beta_x$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|25.8 (0.036)	\|$-$\|2.9 (0.045)	0.9 (0.048)	0.4 (0.048)	0.0342	0.0023	0.0023	0.0023	0.00	0.95	0.96	0.96
\|$\log(0.8)$\|	\|$-$\|29.4 (0.031)	\|$-$\|2.7 (0.037)	0.6 (0.039)	0.2 (0.039)	0.0050	0.0015	0.0015	0.0015	0.41	0.96	0.96	0.95
\|$\log(1.2)$\|	\|$-$\|28.6 (0.030)	\|$-$\|2.8 (0.037)	0.3 (0.038)	\|$-$\|0.1 (0.038)	0.0038	0.0014	0.0015	0.0014	0.54	0.96	0.96	0.96
\|$\log(1.5)$\|	\|$-$\|27.6 (0.032)	\|$-$\|2.8 (0.041)	0.4 (0.043)	\|$-$\|0.2 (0.043)	0.0140	0.0018	0.0018	0.0018	0.06	0.96	0.95	0.95
\|$\log(2.0)$\|	\|$-$\|25.7 (0.040)	\|$-$\|2.6 (0.048)	0.9 (0.050)	0.8 (0.050)	0.0333	0.0026	0.0026	0.0026	0.00	0.95	0.96	0.95
\|$\log(2.5)$\|	\|$-$\|23.4 (0.045)	\|$-$\|2.5 (0.054)	1.4 (0.057)	1.1 (0.057)	0.0478	0.0034	0.0035	0.0034	0.00	0.96	0.96	0.95

	Percent bias (SE) of \|$\beta_v$\|				MSE of \|$\beta_v$\|				Coverage rate of \|$\beta_v$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|26.9 (0.035)	\|$-$\|6.5 (0.034)	\|$-$\|1.0 (0.035)	\|$-$\|1.0 (0.035)	0.0019	0.0012	0.0012	0.0012	0.88	0.96	0.95	0.95
\|$\log(0.8)$\|	\|$-$\|15.6 (0.034)	\|$-$\|2.7 (0.033)	\|$-$\|0.9 (0.034)	\|$-$\|1.0 (0.034)	0.0014	0.0011	0.0011	0.0011	0.93	0.95	0.95	0.94
\|$\log(1.2)$\|	\|$-$\|0.8 (0.033)	1.5 (0.033)	0.3 (0.033)	0.3 (0.033)	0.0011	0.0011	0.0011	0.0011	0.95	0.95	0.95	0.95
\|$\log(1.5)$\|	5.9 (0.034)	3.1 (0.033)	0.3 (0.033)	0.2 (0.033)	0.0012	0.0011	0.0011	0.0011	0.95	0.96	0.96	0.96
\|$\log(2.0)$\|	12.1 (0.035)	5.4 (0.035)	0.6 (0.035)	0.4 (0.035)	0.0013	0.0012	0.0012	0.0012	0.94	0.96	0.96	0.96
\|$\log(2.5)$\|	15.0 (0.035)	7.6 (0.037)	1.3 (0.037)	1.2 (0.037)	0.0015	0.0014	0.0014	0.0014	0.94	0.95	0.95	0.95

	Percent bias (SE) of \|$\beta_v$\|				MSE of \|$\beta_v$\|				Coverage rate of \|$\beta_v$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|26.9 (0.035)	\|$-$\|6.5 (0.034)	\|$-$\|1.0 (0.035)	\|$-$\|1.0 (0.035)	0.0019	0.0012	0.0012	0.0012	0.88	0.96	0.95	0.95
\|$\log(0.8)$\|	\|$-$\|15.6 (0.034)	\|$-$\|2.7 (0.033)	\|$-$\|0.9 (0.034)	\|$-$\|1.0 (0.034)	0.0014	0.0011	0.0011	0.0011	0.93	0.95	0.95	0.94
\|$\log(1.2)$\|	\|$-$\|0.8 (0.033)	1.5 (0.033)	0.3 (0.033)	0.3 (0.033)	0.0011	0.0011	0.0011	0.0011	0.95	0.95	0.95	0.95
\|$\log(1.5)$\|	5.9 (0.034)	3.1 (0.033)	0.3 (0.033)	0.2 (0.033)	0.0012	0.0011	0.0011	0.0011	0.95	0.96	0.96	0.96
\|$\log(2.0)$\|	12.1 (0.035)	5.4 (0.035)	0.6 (0.035)	0.4 (0.035)	0.0013	0.0012	0.0012	0.0012	0.94	0.96	0.96	0.96
\|$\log(2.5)$\|	15.0 (0.035)	7.6 (0.037)	1.3 (0.037)	1.2 (0.037)	0.0015	0.0014	0.0014	0.0014	0.94	0.95	0.95	0.95

	Percent bias (SE) of \|$\beta_{xv}$\|				MSE of \|$\beta_{xv}$\|				Coverage rate of \|$\beta_{xv}$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|82.8 (0.011)	1.3 (0.042)	2.3 (0.043)	1.6 (0.043)	0.0063	0.0018	0.0019	0.0018	0.00	0.95	0.95	0.96
\|$\log(0.8)$\|	\|$-$\|88.7 (0.010)	0.4 (0.038)	0.9 (0.038)	0.3 (0.038)	0.0072	0.0014	0.0015	0.0015	0.00	0.95	0.95	0.95
\|$\log(1.2)$\|	\|$-$\|93.9 (0.010)	0.4 (0.037)	0.0 (0.038)	\|$-$\|0.7 (0.038)	0.0081	0.0014	0.0014	0.0014	0.00	0.95	0.95	0.95
\|$\log(1.5)$\|	\|$-$\|96.6 (0.010)	1.9 (0.039)	0.3 (0.039)	\|$-$\|0.4 (0.039)	0.0086	0.0015	0.0015	0.0015	0.00	0.95	0.95	0.95
\|$\log(2.0)$\|	\|$-$\|99.0 (0.011)	0.7 (0.044)	\|$-$\|2.1 (0.044)	\|$-$\|2.7 (0.044)	0.0090	0.0019	0.0020	0.0020	0.00	0.95	0.96	0.95
\|$\log(2.5)$\|	\|$-$\|100.3 (0.013)	3.2 (0.048)	\|$-$\|1.1 (0.049)	\|$-$\|1.9 (0.048)	0.0093	0.0023	0.0024	0.0023	0.00	0.95	0.96	0.95

	Percent bias (SE) of \|$\beta_{xv}$\|				MSE of \|$\beta_{xv}$\|				Coverage rate of \|$\beta_{xv}$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|82.8 (0.011)	1.3 (0.042)	2.3 (0.043)	1.6 (0.043)	0.0063	0.0018	0.0019	0.0018	0.00	0.95	0.95	0.96
\|$\log(0.8)$\|	\|$-$\|88.7 (0.010)	0.4 (0.038)	0.9 (0.038)	0.3 (0.038)	0.0072	0.0014	0.0015	0.0015	0.00	0.95	0.95	0.95
\|$\log(1.2)$\|	\|$-$\|93.9 (0.010)	0.4 (0.037)	0.0 (0.038)	\|$-$\|0.7 (0.038)	0.0081	0.0014	0.0014	0.0014	0.00	0.95	0.95	0.95
\|$\log(1.5)$\|	\|$-$\|96.6 (0.010)	1.9 (0.039)	0.3 (0.039)	\|$-$\|0.4 (0.039)	0.0086	0.0015	0.0015	0.0015	0.00	0.95	0.95	0.95
\|$\log(2.0)$\|	\|$-$\|99.0 (0.011)	0.7 (0.044)	\|$-$\|2.1 (0.044)	\|$-$\|2.7 (0.044)	0.0090	0.0019	0.0020	0.0020	0.00	0.95	0.96	0.95
\|$\log(2.5)$\|	\|$-$\|100.3 (0.013)	3.2 (0.048)	\|$-$\|1.1 (0.049)	\|$-$\|1.9 (0.048)	0.0093	0.0023	0.0024	0.0023	0.00	0.95	0.96	0.95

Table 2.

Comparison of operating characteristics for (⁠|$\beta_x$|⁠,|$\beta_v$|⁠,|$\beta_{xv}$|⁠) for naive (N), internalized (IN), full calibration (FC), and two-stage (TS) methods under the model |$\text{logit}(P(Y_{sji}=1|X_{sji}, V_{sji}, \boldsymbol{Z}_{sji}))= \beta_{0sj} + \beta_xX_{sji}+\beta_vV_{sji}+\beta_{xv}X_{sji}V_{sji}$|⁠, with |$(\exp(\beta_v),\exp(\beta_{xv}))=(1.2,1.2)$|

	Percent bias (SE) of \|$\beta_x$\|				MSE of \|$\beta_x$\|				Coverage rate of \|$\beta_x$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|25.8 (0.036)	\|$-$\|2.9 (0.045)	0.9 (0.048)	0.4 (0.048)	0.0342	0.0023	0.0023	0.0023	0.00	0.95	0.96	0.96
\|$\log(0.8)$\|	\|$-$\|29.4 (0.031)	\|$-$\|2.7 (0.037)	0.6 (0.039)	0.2 (0.039)	0.0050	0.0015	0.0015	0.0015	0.41	0.96	0.96	0.95
\|$\log(1.2)$\|	\|$-$\|28.6 (0.030)	\|$-$\|2.8 (0.037)	0.3 (0.038)	\|$-$\|0.1 (0.038)	0.0038	0.0014	0.0015	0.0014	0.54	0.96	0.96	0.96
\|$\log(1.5)$\|	\|$-$\|27.6 (0.032)	\|$-$\|2.8 (0.041)	0.4 (0.043)	\|$-$\|0.2 (0.043)	0.0140	0.0018	0.0018	0.0018	0.06	0.96	0.95	0.95
\|$\log(2.0)$\|	\|$-$\|25.7 (0.040)	\|$-$\|2.6 (0.048)	0.9 (0.050)	0.8 (0.050)	0.0333	0.0026	0.0026	0.0026	0.00	0.95	0.96	0.95
\|$\log(2.5)$\|	\|$-$\|23.4 (0.045)	\|$-$\|2.5 (0.054)	1.4 (0.057)	1.1 (0.057)	0.0478	0.0034	0.0035	0.0034	0.00	0.96	0.96	0.95

	Percent bias (SE) of \|$\beta_x$\|				MSE of \|$\beta_x$\|				Coverage rate of \|$\beta_x$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|25.8 (0.036)	\|$-$\|2.9 (0.045)	0.9 (0.048)	0.4 (0.048)	0.0342	0.0023	0.0023	0.0023	0.00	0.95	0.96	0.96
\|$\log(0.8)$\|	\|$-$\|29.4 (0.031)	\|$-$\|2.7 (0.037)	0.6 (0.039)	0.2 (0.039)	0.0050	0.0015	0.0015	0.0015	0.41	0.96	0.96	0.95
\|$\log(1.2)$\|	\|$-$\|28.6 (0.030)	\|$-$\|2.8 (0.037)	0.3 (0.038)	\|$-$\|0.1 (0.038)	0.0038	0.0014	0.0015	0.0014	0.54	0.96	0.96	0.96
\|$\log(1.5)$\|	\|$-$\|27.6 (0.032)	\|$-$\|2.8 (0.041)	0.4 (0.043)	\|$-$\|0.2 (0.043)	0.0140	0.0018	0.0018	0.0018	0.06	0.96	0.95	0.95
\|$\log(2.0)$\|	\|$-$\|25.7 (0.040)	\|$-$\|2.6 (0.048)	0.9 (0.050)	0.8 (0.050)	0.0333	0.0026	0.0026	0.0026	0.00	0.95	0.96	0.95
\|$\log(2.5)$\|	\|$-$\|23.4 (0.045)	\|$-$\|2.5 (0.054)	1.4 (0.057)	1.1 (0.057)	0.0478	0.0034	0.0035	0.0034	0.00	0.96	0.96	0.95

	Percent bias (SE) of \|$\beta_v$\|				MSE of \|$\beta_v$\|				Coverage rate of \|$\beta_v$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|26.9 (0.035)	\|$-$\|6.5 (0.034)	\|$-$\|1.0 (0.035)	\|$-$\|1.0 (0.035)	0.0019	0.0012	0.0012	0.0012	0.88	0.96	0.95	0.95
\|$\log(0.8)$\|	\|$-$\|15.6 (0.034)	\|$-$\|2.7 (0.033)	\|$-$\|0.9 (0.034)	\|$-$\|1.0 (0.034)	0.0014	0.0011	0.0011	0.0011	0.93	0.95	0.95	0.94
\|$\log(1.2)$\|	\|$-$\|0.8 (0.033)	1.5 (0.033)	0.3 (0.033)	0.3 (0.033)	0.0011	0.0011	0.0011	0.0011	0.95	0.95	0.95	0.95
\|$\log(1.5)$\|	5.9 (0.034)	3.1 (0.033)	0.3 (0.033)	0.2 (0.033)	0.0012	0.0011	0.0011	0.0011	0.95	0.96	0.96	0.96
\|$\log(2.0)$\|	12.1 (0.035)	5.4 (0.035)	0.6 (0.035)	0.4 (0.035)	0.0013	0.0012	0.0012	0.0012	0.94	0.96	0.96	0.96
\|$\log(2.5)$\|	15.0 (0.035)	7.6 (0.037)	1.3 (0.037)	1.2 (0.037)	0.0015	0.0014	0.0014	0.0014	0.94	0.95	0.95	0.95

	Percent bias (SE) of \|$\beta_v$\|				MSE of \|$\beta_v$\|				Coverage rate of \|$\beta_v$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|26.9 (0.035)	\|$-$\|6.5 (0.034)	\|$-$\|1.0 (0.035)	\|$-$\|1.0 (0.035)	0.0019	0.0012	0.0012	0.0012	0.88	0.96	0.95	0.95
\|$\log(0.8)$\|	\|$-$\|15.6 (0.034)	\|$-$\|2.7 (0.033)	\|$-$\|0.9 (0.034)	\|$-$\|1.0 (0.034)	0.0014	0.0011	0.0011	0.0011	0.93	0.95	0.95	0.94
\|$\log(1.2)$\|	\|$-$\|0.8 (0.033)	1.5 (0.033)	0.3 (0.033)	0.3 (0.033)	0.0011	0.0011	0.0011	0.0011	0.95	0.95	0.95	0.95
\|$\log(1.5)$\|	5.9 (0.034)	3.1 (0.033)	0.3 (0.033)	0.2 (0.033)	0.0012	0.0011	0.0011	0.0011	0.95	0.96	0.96	0.96
\|$\log(2.0)$\|	12.1 (0.035)	5.4 (0.035)	0.6 (0.035)	0.4 (0.035)	0.0013	0.0012	0.0012	0.0012	0.94	0.96	0.96	0.96
\|$\log(2.5)$\|	15.0 (0.035)	7.6 (0.037)	1.3 (0.037)	1.2 (0.037)	0.0015	0.0014	0.0014	0.0014	0.94	0.95	0.95	0.95

	Percent bias (SE) of \|$\beta_{xv}$\|				MSE of \|$\beta_{xv}$\|				Coverage rate of \|$\beta_{xv}$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|82.8 (0.011)	1.3 (0.042)	2.3 (0.043)	1.6 (0.043)	0.0063	0.0018	0.0019	0.0018	0.00	0.95	0.95	0.96
\|$\log(0.8)$\|	\|$-$\|88.7 (0.010)	0.4 (0.038)	0.9 (0.038)	0.3 (0.038)	0.0072	0.0014	0.0015	0.0015	0.00	0.95	0.95	0.95
\|$\log(1.2)$\|	\|$-$\|93.9 (0.010)	0.4 (0.037)	0.0 (0.038)	\|$-$\|0.7 (0.038)	0.0081	0.0014	0.0014	0.0014	0.00	0.95	0.95	0.95
\|$\log(1.5)$\|	\|$-$\|96.6 (0.010)	1.9 (0.039)	0.3 (0.039)	\|$-$\|0.4 (0.039)	0.0086	0.0015	0.0015	0.0015	0.00	0.95	0.95	0.95
\|$\log(2.0)$\|	\|$-$\|99.0 (0.011)	0.7 (0.044)	\|$-$\|2.1 (0.044)	\|$-$\|2.7 (0.044)	0.0090	0.0019	0.0020	0.0020	0.00	0.95	0.96	0.95
\|$\log(2.5)$\|	\|$-$\|100.3 (0.013)	3.2 (0.048)	\|$-$\|1.1 (0.049)	\|$-$\|1.9 (0.048)	0.0093	0.0023	0.0024	0.0023	0.00	0.95	0.96	0.95

	Percent bias (SE) of \|$\beta_{xv}$\|				MSE of \|$\beta_{xv}$\|				Coverage rate of \|$\beta_{xv}$\|
\|$\beta_x$\|	N	IN	FC	TS	N	IN	FC	TS	N	IN	FC	TS
\|$\log(0.5)$\|	\|$-$\|82.8 (0.011)	1.3 (0.042)	2.3 (0.043)	1.6 (0.043)	0.0063	0.0018	0.0019	0.0018	0.00	0.95	0.95	0.96
\|$\log(0.8)$\|	\|$-$\|88.7 (0.010)	0.4 (0.038)	0.9 (0.038)	0.3 (0.038)	0.0072	0.0014	0.0015	0.0015	0.00	0.95	0.95	0.95
\|$\log(1.2)$\|	\|$-$\|93.9 (0.010)	0.4 (0.037)	0.0 (0.038)	\|$-$\|0.7 (0.038)	0.0081	0.0014	0.0014	0.0014	0.00	0.95	0.95	0.95
\|$\log(1.5)$\|	\|$-$\|96.6 (0.010)	1.9 (0.039)	0.3 (0.039)	\|$-$\|0.4 (0.039)	0.0086	0.0015	0.0015	0.0015	0.00	0.95	0.95	0.95
\|$\log(2.0)$\|	\|$-$\|99.0 (0.011)	0.7 (0.044)	\|$-$\|2.1 (0.044)	\|$-$\|2.7 (0.044)	0.0090	0.0019	0.0020	0.0020	0.00	0.95	0.96	0.95
\|$\log(2.5)$\|	\|$-$\|100.3 (0.013)	3.2 (0.048)	\|$-$\|1.1 (0.049)	\|$-$\|1.9 (0.048)	0.0093	0.0023	0.0024	0.0023	0.00	0.95	0.96	0.95

We focus first on the operating characteristics of |$\hat{\beta}_x$| (Table 2, and Table 1 and Figure 3 of Section 7 of the supplementary material available at Biostatistics online). The mean |$\hat{\beta}_x$| estimate under the internalized method is consistently biased downward by 2–4% across all effect sizes and all combinations of |$(\exp(\beta_v),\exp(\beta_{xv}))$|⁠. The full calibration and two-stage point estimates were separated by no more than 1% and were approximately unbiased for the true RR. All methods performed markedly better than the naive approach, whose point biases ranged from |$-$|32% to |$-$|23%. The MSE of the two-stage and aggregated methods were similar across all effect combinations.

For |$\beta_v$|⁠, the full calibration and two-stage approach also performed similarly with regard to percent bias (Table 2, and Table 2 and Figure 3 of Section 7 of the supplementary material available at Biostatistics online). The internalized method estimate was the least biased near the null of |$\beta_x$| but became more biased for increasingly protective or deleterious effects. For example, when |$(\beta_x, \beta_v, \beta_{xv})=(1.2,1.2,1.2)$|⁠, the percent bias of |$\beta_v$| was 1.5%, and when |$(\beta_x, \beta_v, \beta_{xv})=(2.5,1.2,1.2)$|⁠, the percent bias of |$\beta_v$| was 7.6%. The naive estimates of |$\beta_v$| were increasingly biased as |$|\beta_x|$| increased, with biases reaching |$-$|28.3% and 23.0% in some settings.

Across all combinations of effects, the percent bias in estimates of |$\beta_{xv}$| from all three calibration methods was contained within (⁠|$-$|4.9%, 3.9%) (Table 2, and Table 3 and Figure 3 of Section 7 of supplementary material available at Biostatistics online). The |$\hat{\beta}_{xv}$| estimates from the calibration methods were generally similar and strongly outperformed the naive method, whose percent bias in its |$\beta_{xv}$| estimates ranged from |$-$|105.4% to |$-$|81.7%.

Table 3.

Point and RR estimates for the association of circulating 25(OH)D|$^a$| and stroke, adjusting for years of follow-up after blood draw, BMI (overweight or not), smoking (never/ever), family history of myocardial infarction (yes/no), hypertension (yes/no), and diabetes (yes/no)

Method	\|$\hat{\beta}_x$\|	RR	RR 95% CI
Internalized	\|$-$\|0.051	0.950	(0.721, 1.253)
Full calibration	\|$-$\|0.046	0.955	(0.715, 1.276)
Two-stage	\|$-$\|0.048	0.953	(0.717, 1.266)
Naive	\|$-$\|0.018	0.983	(0.833, 1.159)

Method	\|$\hat{\beta}_x$\|	RR	RR 95% CI
Internalized	\|$-$\|0.051	0.950	(0.721, 1.253)
Full calibration	\|$-$\|0.046	0.955	(0.715, 1.276)
Two-stage	\|$-$\|0.048	0.953	(0.717, 1.266)
Naive	\|$-$\|0.018	0.983	(0.833, 1.159)

|$^a$|Estimates correspond to a 20 nmol/L increase in circulating 25(OH)D.

Table 3.

Point and RR estimates for the association of circulating 25(OH)D|$^a$| and stroke, adjusting for years of follow-up after blood draw, BMI (overweight or not), smoking (never/ever), family history of myocardial infarction (yes/no), hypertension (yes/no), and diabetes (yes/no)

Method	\|$\hat{\beta}_x$\|	RR	RR 95% CI
Internalized	\|$-$\|0.051	0.950	(0.721, 1.253)
Full calibration	\|$-$\|0.046	0.955	(0.715, 1.276)
Two-stage	\|$-$\|0.048	0.953	(0.717, 1.266)
Naive	\|$-$\|0.018	0.983	(0.833, 1.159)

Method	\|$\hat{\beta}_x$\|	RR	RR 95% CI
Internalized	\|$-$\|0.051	0.950	(0.721, 1.253)
Full calibration	\|$-$\|0.046	0.955	(0.715, 1.276)
Two-stage	\|$-$\|0.048	0.953	(0.717, 1.266)
Naive	\|$-$\|0.018	0.983	(0.833, 1.159)

|$^a$|Estimates correspond to a 20 nmol/L increase in circulating 25(OH)D.

4. Applied example

We completed two data examples to illustrate the methods. In the first example, we investigate the impact of circulating 25-hydroxyvitamin D (25(OH)D) levels on risk of stroke. In the second example, we investigate the impact of 25(OH)D levels and its interaction with a dichotomized body mass index (BMI) term on the risk of a composite outcome, fatal or nonfatal stroke, or myocardial infarction (henceforth referred to as the CVD endpoint). In both examples, we match each case to a single control based on sex and age at blood draw.

We applied the two aggregated methods (i.e. full calibration and internalized), two-stage, and naive methods to data combined from three large prospective cohort studies in the United States, including the HPFS (Wu and others, 2011), the NHS1 (Eliassen and others, 2016), and the NHS2 (Eliassen and others, 2011). The HPFS began enrollment in 1986 and includes 51 529 male health professionals aged 40–75 years at baseline. The NHS1 enrolled 121 701 female nurses aged 30–55 years at baseline in 1976. The NHS2, a younger counterpart to the NHS1, was established in 1989 with the enrollment of 116 671 female nurses, aged 25–42 years at baseline. In each cohort, participants completed biannual questionnaires providing information about medical history, diet, and lifestyle conditions. Between 1989 and 1997, each study completed laboratory assays on blood samples for a host of biomarkers, including 25(OH)D, from a subset of participants. Subjects with a previous cancer diagnosis were not eligible for random selection. Individuals were excluded from the pooled analysis if they did not have 25(OH)D measurements available or stroke or myocardial infarction outcome data.

Each study obtained calibration data among a subset of controls by re-assaying their blood samples at Heartland Assays, LLC between 2011 and 2013. Circulating 25(OH)D levels were modeled continuously and reported using 20 nmol/L increments. Table 4 of the supplementary material available at Biostatistics online lists information about the main studies and the calibration subsets, including the parameter estimates of the study-specific calibration models.

Table 4.

Point estimates with 95% confidence intervals for circulating 25(OH)D, BMI|$^a$|⁠, and their interaction with CVD as the outcome event, adjusting for years of follow-up after blood draw, smoking (never/ever), family history of myocardial infarction (yes/no), hypertension (yes/no), and diabetes (yes/no)

Method	25(OH)D	BMI	BMI \|$\times$\| 25(OH)D
Internalized	\|$-$\|0.091 (⁠\|$-$\|0.232, 0.050)	\|$-$\|0.592 (⁠\|$-$\|0.663, \|$-$\|0.521)	\|$-$\|0.072 (⁠\|$-$\|0.267, 0.123)
Full calibration	\|$-$\|0.089 (⁠\|$-$\|0.246, 0.068)	\|$-$\|0.587 (⁠\|$-$\|0.661, \|$-$\|0.513)	\|$-$\|0.069 (⁠\|$-$\|0.271, 0.133)
Two-stage	\|$-$\|0.086 (⁠\|$-$\|0.244, 0.072)	\|$-$\|0.569 (⁠\|$-$\|0.643, \|$-$\|0.495)	\|$-$\|0.066 (⁠\|$-$\|0.266, 0.135)
Naive	\|$-$\|0.173 (⁠\|$-$\|0.380, 0.032)	\|$-$\|0.163 (⁠\|$-$\|0.374, 0.053)	\|$-$\|0.082 (⁠\|$-$\|0.180, 0.016)

Method	25(OH)D	BMI	BMI \|$\times$\| 25(OH)D
Internalized	\|$-$\|0.091 (⁠\|$-$\|0.232, 0.050)	\|$-$\|0.592 (⁠\|$-$\|0.663, \|$-$\|0.521)	\|$-$\|0.072 (⁠\|$-$\|0.267, 0.123)
Full calibration	\|$-$\|0.089 (⁠\|$-$\|0.246, 0.068)	\|$-$\|0.587 (⁠\|$-$\|0.661, \|$-$\|0.513)	\|$-$\|0.069 (⁠\|$-$\|0.271, 0.133)
Two-stage	\|$-$\|0.086 (⁠\|$-$\|0.244, 0.072)	\|$-$\|0.569 (⁠\|$-$\|0.643, \|$-$\|0.495)	\|$-$\|0.066 (⁠\|$-$\|0.266, 0.135)
Naive	\|$-$\|0.173 (⁠\|$-$\|0.380, 0.032)	\|$-$\|0.163 (⁠\|$-$\|0.374, 0.053)	\|$-$\|0.082 (⁠\|$-$\|0.180, 0.016)

|$^a$|BMI is treated as a dichotomized variable taking value 1 if less than 25 kg/m|$^2$| and 0 otherwise.

Table 4.

Point estimates with 95% confidence intervals for circulating 25(OH)D, BMI|$^a$|⁠, and their interaction with CVD as the outcome event, adjusting for years of follow-up after blood draw, smoking (never/ever), family history of myocardial infarction (yes/no), hypertension (yes/no), and diabetes (yes/no)

Method	25(OH)D	BMI	BMI \|$\times$\| 25(OH)D
Internalized	\|$-$\|0.091 (⁠\|$-$\|0.232, 0.050)	\|$-$\|0.592 (⁠\|$-$\|0.663, \|$-$\|0.521)	\|$-$\|0.072 (⁠\|$-$\|0.267, 0.123)
Full calibration	\|$-$\|0.089 (⁠\|$-$\|0.246, 0.068)	\|$-$\|0.587 (⁠\|$-$\|0.661, \|$-$\|0.513)	\|$-$\|0.069 (⁠\|$-$\|0.271, 0.133)
Two-stage	\|$-$\|0.086 (⁠\|$-$\|0.244, 0.072)	\|$-$\|0.569 (⁠\|$-$\|0.643, \|$-$\|0.495)	\|$-$\|0.066 (⁠\|$-$\|0.266, 0.135)
Naive	\|$-$\|0.173 (⁠\|$-$\|0.380, 0.032)	\|$-$\|0.163 (⁠\|$-$\|0.374, 0.053)	\|$-$\|0.082 (⁠\|$-$\|0.180, 0.016)

Method	25(OH)D	BMI	BMI \|$\times$\| 25(OH)D
Internalized	\|$-$\|0.091 (⁠\|$-$\|0.232, 0.050)	\|$-$\|0.592 (⁠\|$-$\|0.663, \|$-$\|0.521)	\|$-$\|0.072 (⁠\|$-$\|0.267, 0.123)
Full calibration	\|$-$\|0.089 (⁠\|$-$\|0.246, 0.068)	\|$-$\|0.587 (⁠\|$-$\|0.661, \|$-$\|0.513)	\|$-$\|0.069 (⁠\|$-$\|0.271, 0.133)
Two-stage	\|$-$\|0.086 (⁠\|$-$\|0.244, 0.072)	\|$-$\|0.569 (⁠\|$-$\|0.643, \|$-$\|0.495)	\|$-$\|0.066 (⁠\|$-$\|0.266, 0.135)
Naive	\|$-$\|0.173 (⁠\|$-$\|0.380, 0.032)	\|$-$\|0.163 (⁠\|$-$\|0.374, 0.053)	\|$-$\|0.082 (⁠\|$-$\|0.180, 0.016)

|$^a$|BMI is treated as a dichotomized variable taking value 1 if less than 25 kg/m|$^2$| and 0 otherwise.

In the first example involving the stroke endpoint, we pooled 179 matched case-control pairs. Previous work by Sun and others (2012) showed that individuals with 25(OH)D measurements in the top tertile of the population had reduced risk of stroke compared to individuals with measurements in the bottom tertile. Our analyses coarsely matched on age (grouped into tertiles in each cohort) and adjusted for years of follow-up after blood draw, smoking status (never/ever), family history of myocardial infarction (yes/no), personal history of hypertension (yes/no), BMI (less than, or greater or equal to 25 kg/m|$^2$|⁠), and personal history of diabetes (yes/no). The internalized, full calibration, and two-stage methods all demonstrated a nonsignificant inverse association for 25(OH)D levels and risk of stroke, with RRs of 0.95, 0.96, and 0.95, respectively (Table 3). The naive approach, which pooled all local laboratory measurements directly without calibration, yielded a RR of 0.98 and a confidence interval that was narrower than those estimated under the calibration methods.

In the second example involving the CVD endpoint, we pooled 624 case–control pairs. Some literature suggests that vitamin D deficiency is more deleterious among individuals with high BMI (Levi-Vardi and Yagil, 2017). We dichotomized BMI into a binary variable with value 1 if the subject was not overweight (BMI|$<$|25 kg/m|$^2$|⁠) and value 0 otherwise. In addition to the matching factors, analyses were adjusted for years of follow-up after blood draw, smoking status (never/ever), family history of myocardial infarction (yes/no), personal history of hypertension (yes/no), and personal history of diabetes (yes/no). The estimated regression coefficients and confidence intervals for 25(OH)D, BMI, and their interaction are presented in Table 4. All pooling methods indicated that having higher circulating 25(OH)D and a BMI less than 25 kg/m|$^2$| were associated with a lower risk of stroke. Based on the results in Table 4, the RR from the aggregated and two-stage methods associated with a 20 nmol/L increase in 25(OH)D among subjects with a BMI less than 25 kg/m|$^2$| ranged from 0.85 to 0.86, while the RR from the naive analysis was 0.78. Among overweight subjects, the corresponding RR from the calibration methods ranged from 0.91 to 0.92, and the naive RR estimate was 0.84 (see Table 5 of Section 7 in the supplementary material available at Biostatistics online). As this illustrative example demonstrates, the bias in the naive RR estimate for models including an interaction term is not necessarily towards the null. The bias may be toward the alternative hypotheses and thus lead to false-positive scientific findings.

5. Discussion

In this work, we proposed statistical methods for analyzing calibrated biomarker data pooled across multiple nested case–control studies. Our methods facilitate inference on the main effect of the biomarker as well as a biomarker–covariate interaction term. Keeping with common practice, we estimated study-specific calibration models from subsets of controls reassayed at the reference lab. The methods developed here can also be used to contend with exposure measurement error when pooling data from multiple studies with internal validation subsets.

Several observations stem from our work. We consistently observed that the full calibration and two-stage methods offered similar point estimates, standard errors, and coverage rates. In simulation, the difference in effect estimates between the full calibration and two-stage methods was less than 2% regardless of the inclusion of an interaction term. When incorporating an interaction term, the direction and degree of bias in the |$\hat{\beta}_v$| estimate was not consistent and depended on the direction and magnitude of |$\beta_x$| and |$\beta_v$|⁠.

Comparison of the two aggregated methods (i.e. full calibration and internalized method) showed that, regardless of whether one is interested in the main effect of the biomarker or the interaction term, the full calibration approach is the preferred aggregated method. Average percent bias in |$\beta_x$| and |$\beta_{xv}$| estimates were minimized by the full calibration method. Under a controls-only calibration scheme, |$(\hat{a}_{s,co},\hat{b}_{s,co})$| were slightly biased for the parameters in model (2.2) for strong biomarker effects. Any bias in these estimates is uniformly incorporated in both cases and controls under the full calibration approach such that bias is minimized in the |$\beta_x$| and |$\beta_{xv}$| estimates. In fact, the intercept |$\hat{a}_{s,co}$| cancels out in the approximate conditional likelihood contribution for the full calibration approach, which does not occur for the internalized method. In practice, the size of the calibration subset is often determined by logistical concerns (i.e. available budget) rather than statistical ones. We recommend pursuing the largest calibration subset within these constraints and note that even small participation rates (i.e. 30 subjects) can yield good results if the sample is representative of the underlying distribution of biomarker values (Figure 1).

Naive estimates were typically quite biased and illustrated the risk of failing to implement a calibration step when necessary. More problematically, the naive estimates were sometimes biased toward the alternative, resulting in an inflated type I error rate.

Although this article focuses on the common scenario of a controls-only calibration study, all the methods discussed also apply if the calibration subset includes both cases and controls. Furthermore, both the full calibration and internalized methods work for nonlinear calibration models. If necessary, one could include nonlinear terms in the calibration model when applying the full calibration and/or internalized methods. Note however that the two-stage method does require the linear calibration model in (2.2).

Regarding inclusion of covariates, if covariates are correlated with the biomarker and not the outcome, they may be included in the calibration model but not in the conditional logistic regression model. Covariates that are correlated with both the biomarker and the outcome can be included in both models.

Although the aggregated and two-stage methods are equally viable and valid options for analyzing outcome–exposure relationships in pooled data, logistical considerations may dictate the preferred approach for the statistical analysis. For instance, aggregated methods often lend themselves better to subgroup analyses because they reduce issues resulting from sparse data for a single study in specific strata. If the main exposure effect and at least some covariate effects are homogeneous, the aggregated method may also offer efficiency gains in covariate estimation relative to the two-stage method (Lin and Zeng, 2010). However, the two-stage method may be more appealing than the aggregated methods at times for its intuitive and simple implementation, and its robustness to these covariate homogeneity assumptions.

6. Software

Functions in the form of R code (with an example) are available at the first author’s Github account https://github.com/agsloan/PoolingBiomarkerData and last author’s website https://www.hsph.harvard.edu/molin-wang/software.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Acknowledgments

We are grateful to Tao Hou and Shiaw-Shyuan (Sherry) Yaun for their assistance in accessing the data. We also thank the Circulating Biomarkers and Breast and Colorectal Cancer Consortium team (R01CA152071, PI: Stephanie Smith-Warner; Intramural Research Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute: Regina Ziegler) for conducting the calibration study in the vitamin D examples. Conflict of Interest: None declared.

Funding

This work was supported by the NIH (T32-NS048005 to A.S.) and by the NIH/NCI (R03CA212799 to M.W.).

References

Breslow,

N. E.

,

Day,

N. E.

,

Halvorsen,

K. T.

,

Prentice,

R. L.

and

Sabai,

C.

(

1978

).

Estimation of multiple relative risk functions in matched case-control studies

.

American Journal of Epidemiology

108

(

4

),

299

–

307

.

Carroll,

R.

,

Ruppert,

D.

,

Stefanski,

L.

and

Crainiceanu,

C.

(

2006

).

Measurement error in nonlinear models: a modern perspective; 2nd ed.

,

Monographs on Statistics and Applied Probability

.

Boca Raton, FL

:

Chapman and Hall

.

Debray,

T. P. A.

,

Moons,

K. G. M.

,

Abo-Zaid,

G. M. A.

,

Koffijberg,

H.

and

Riley,

R. D.

(

2013

).

Individual participant data meta-analysis for a binary outcome: one-stage or two-stage?

PloS one

8

(

4

),

e60650

.

Eliassen,

A. H.

,

Spiegelman,

D.

,

Hollis,

B. W.

,

Horst,

R. L.

,

Willett,

W. C.

and

Hankinson,

S. E.

(

2011

).

Plasma 25-hydroxyvitamin D and risk of breast cancer in the Nurses’ Health study II

.

Breast Cancer Research

13

(

3

),

R50

.

Eliassen,

A. H.

,

Warner,

E. T.

,

Rosner,

B.

,

Collins,

L. C.

,

Beck,

A. H.

,

Quintana,

L. M.

,

Tamimi,

M.

and

Hankinson,

S. E.

(

2016

).

Plasma 25-hydroxyvitamin d and risk of breast cancer in women followed over 20 years

.

Cancer research

76

(

18

),

5423

–

5430

.

Gail,

M. H.

,

Wu,

J.

,

Wang,

M.

,

Yaun,

S.

,

Cook,

N. R.

,

Eliassen,

A. H.

,

McCullough,

M. L.

,

Yu,

K.

,

Zeleniuch-Jacquotte,

A.

,

Smith-Warner,

S. A.

,

Ziegler,

R. G.

and others. (

2016

).

Calibration and seasonal adjustment for matched caseâŁ“-control studies of vitamin D and cancer

.

Statistics in Medicine

35

(

13

),

2133

–

2148

.

Gong,

G.

and

Samaniego,

F. J.

(

1981

).

Pseudo maximum likelihood estimation: theory and applications

.

The Annals of Statistics

9

(

4

),

861

–

869

.

Guolo,

A.

and

Brazzale,

A. R.

(

2008

).

A simulation-based comparison of techniques to correct for measurement error in matched case–control studies

.

Statistics in medicine

27

(

19

),

3755

–

3775

.

Key,

T. J.

,

Appleby,

P. N.

,

Allen,

N. E.

and

Reeves,

G. K.

(

2010

).

Pooling biomarker data from different studies of disease risk, with a focus on endogenous hormones

.

Cancer Epidemiology and Prevention Biomarkers

19

(

4

),

960

–

965

.

Lai,

J. K. C.

,

Lucas,

R. M.

,

Banks,

E.

and

Ponsonby,

A.

(

2012

).

Variability in vitamin D assays impairs clinical assessment of vitamin D status

.

Internal medicine journal

42

(

1

),

43

–

50

.

Levi-Vardi,

R.

and

Yagil,

Y.

(

2017

).

Vitamin D, hypertension, and ischemic stroke: An unresolved relationship

.

American Heart Association

70

(

3

),

496

–

498

.

OpenURL Placeholder Text

Lin,

D. Y.

and

Zeng,

D.

(

2010

).

On the relative efficiency of using summary statistics versus individual-level data in meta-analysis

.

Biometrika

97

(

2

),

321

–

332

.

McCullough,

M. L.

,

Zoltick,

E. S.

,

Weinstein,

S. J.

,

Fedirko,

V.

,

Wang,

M.

,

Cook,

N. R.

,

Eliassen,

A. H.

,

Zeleniuch-Jacquotte,

A.

,

Agnoli,

C.

,

Albanes,

D.

and others. (

2018

).

Circulating vitamin d and colorectal cancer risk: an international pooling project of 17 cohorts

.

JNCI: Journal of the National Cancer Institute

111

(

2

),

158

–

169

.

McShane,

L. M.

,

Midthune,

D. N.

,

Dorgan,

J. F.

,

Freedman,

L. S.

and

Carroll,

R. J.

(

2001

).

Covariate measurement error adjustment for matched case–control studies

.

Biometrics

57

(

1

),

62

–

73

.

Prentice,

R.L.

and

Breslow,

N.E.

(

1978

).

Retrospective studies and failure time models

.

Biometrika

65

(

1

),

153

–

158

.

Rosner,

B.

,

Spiegelman,

D.

and

Willett,

W. C.

(

1990

).

Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error

.

American journal of epidemiology

132

(

4

),

734

–

745

.

Sloan,

A.

,

Song,

Y.

,

Gail,

M. H.

,

Betensky,

R.

,

Rosner,

B.

,

Ziegler,

R. G.

,

Smith-Warner,

S. A.

and

Wang,

M.

(

2019

).

Design and analysis considerations for combining data from multiple biomarker studies

.

Statistics in medicine

38

(

8

),

1303

–

1320

.

Smith-Warner,

S. A.

,

Spiegelman,

D.

,

Ritz,

J.

,

Albanes,

D.

,

Beeson,

W. L.

,

Bernstein,

L.

,

Berrino,

F.

,

Van Den Brandt,

P. A.

,

Buring,

J. E.

and

Cho,

E.

(

2006

).

Methods for pooling results of epidemiologic studies: the pooling project of prospective studies of diet and cancer

.

American journal of epidemiology

163

(

11

),

1053

–

1064

.

Spiegelman,

D.

,

Carroll,

R. J.

and

Kipnis,

V.

(

2001

).

Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument

.

Statistics in medicine

20

(

1

),

139

–

160

.

Spiegelman,

D.

,

McDermott,

A.

and

Rosner,

B.

(

1997

).

Regression calibration method for correcting measurement-error bias in nutritional epidemiology

.

The American journal of clinical nutrition

65

(

4

),

1179S

–

1186S

.

Sun,

Q.

and

Pan,

A.

and

Hu,

F.

and

Manson,

J.

and

Rexrode,

K.

(

2012

).

25-hydroxyvitamin D levels and the risk of stroke: a prospective study and meta-analysis

.

Stroke

43

(

6

),

1470

–

1477

.

Tabberer,

M.

,

Benson,

V. S.

,

Gelhorn,

H.

,

Wilson,

H.

,

Karlsson,

N.

,

Mullerova,

H.

,

Menjoge,

S.

,

Rennard,

S. I.

,

Tal-Singer,

R.

and

Merrill,

D.

(

2017

).

The COPD biomarkers qualification consortium database: baseline characteristics of the St. George’s respiratory questionnaire dataset

.

Chronic Obstructive Pulmonary Diseases: Journal of the COPD Foundation

4

(

2

),

112

.

OpenURL Placeholder Text