On the estimation of the incidence and prevalence in two-phase longitudinal sampling design

Simulation result for fixed |$\theta_t$| for three different scenarios

Scenario 1: Time invariant \|$X$\|
Time	\|$\widehat{p_t}$\|	V(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	V(⁠\|$\widehat{\theta_t}$\|⁠)	Sensitivity	Specificity	True \|$p_t$\|	True \|$\theta_t$\|
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0771	0.0194	0.0588	0.0259	0.932	0.114	0.068	0.05
Wave2	0.0729	0.0202	0.0436	0.0281	0.919	0.143	0.07	0.05
Wave3	0.0825	0.0213	0.0522	0.029	0.928	0.143	0.078	0.05
Wave4	0.0704	0.0188	0.0376	0.0288	0.947	0.134	0.079	0.05
Wave5	0.07	0.0196	0.0474	0.0272	0.943	0.136	0.082	0.05
Wave6	0.0706	0.016	0.0468	0.0254	0.976	0.098	0.086	0.05
Wave7	0.0734	0.0169	0.061	0.0233	0.975	0.073	0.078	0.05
Wave8	0.0788	0.023	0.0657	0.0284	0.937	0.074	0.067	0.05
Wave9	0.0641	0.0182	0.0349	0.0298	0.967	0.082	0.067	0.05
Wave10	0.0843	0.025	0.0699	0.0303	0.935	0.087	0.067	0.05
Scenario 2: Classification via \|$X$\| improves with time
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0879	0.0218	0.0698	0.0275	0.911	0.112	0.068	0.05
Wave2	0.0812	0.0177	0.0421	0.0283	0.962	0.265	0.069	0.05
Wave3	0.0646	0.0143	0.0437	0.0231	0.977	0.173	0.06	0.05
Wave4	0.0631	0.0149	0.0522	0.0207	0.975	0.798	0.057	0.05
Wave5	0.0489	0.0079	0.0371	0.0172	1	1	0.056	0.05
Wave6	0.0557	0.0087	0.0557	0.0116	1	1	0.057	0.05
Wave7	0.0514	0.0086	0.0514	0.0122	1	1	0.051	0.05
Wave8	0.0462	0.0083	0.0462	0.0121	1	1	0.049	0.05
Wave9	0.0535	0.0092	0.0535	0.0123	1	1	0.053	0.05
Wave10	0.0494	0.0091	0.0494	0.013	1	1	0.049	0.05
Scenario 3: Classification via \|$X$\| degrades with time
Baseline	0.092	0.0091	—	—	1	0.185	0.1	—
Wave1	0.0595	0.013	0.0595	0.0164	0.977	0.124	0.058	0.05
Wave2	0.0635	0.0173	0.0536	0.0215	0.946	0.121	0.059	0.05
Wave3	0.0583	0.0179	0.0376	0.025	0.933	0.088	0.066	0.05
Wave4	0.1049	0.027	0.0839	0.0308	0.872	0.087	0.08	0.05
Wave5	0.0929	0.0275	0.0363	0.0391	0.8	0.045	0.082	0.05
Wave6	0.0751	0.0232	0.014	0.0367	0.892	0.049	0.098	0.05
Wave7	0.1094	0.0286	0.0739	0.0354	0.848	0.033	0.109	0.05
Wave8	0.0978	0.0278	0.0344	0.0404	0.867	0.037	0.112	0.05
Wave9	0.117	0.0313	0.065	0.0409	0.833	0.035	0.117	0.05
Wave10	0.165	0.0365	0.099	0.0454	0.733	0.027	0.119	0.05

Scenario 1: Time invariant \|$X$\|
Time	\|$\widehat{p_t}$\|	V(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	V(⁠\|$\widehat{\theta_t}$\|⁠)	Sensitivity	Specificity	True \|$p_t$\|	True \|$\theta_t$\|
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0771	0.0194	0.0588	0.0259	0.932	0.114	0.068	0.05
Wave2	0.0729	0.0202	0.0436	0.0281	0.919	0.143	0.07	0.05
Wave3	0.0825	0.0213	0.0522	0.029	0.928	0.143	0.078	0.05
Wave4	0.0704	0.0188	0.0376	0.0288	0.947	0.134	0.079	0.05
Wave5	0.07	0.0196	0.0474	0.0272	0.943	0.136	0.082	0.05
Wave6	0.0706	0.016	0.0468	0.0254	0.976	0.098	0.086	0.05
Wave7	0.0734	0.0169	0.061	0.0233	0.975	0.073	0.078	0.05
Wave8	0.0788	0.023	0.0657	0.0284	0.937	0.074	0.067	0.05
Wave9	0.0641	0.0182	0.0349	0.0298	0.967	0.082	0.067	0.05
Wave10	0.0843	0.025	0.0699	0.0303	0.935	0.087	0.067	0.05
Scenario 2: Classification via \|$X$\| improves with time
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0879	0.0218	0.0698	0.0275	0.911	0.112	0.068	0.05
Wave2	0.0812	0.0177	0.0421	0.0283	0.962	0.265	0.069	0.05
Wave3	0.0646	0.0143	0.0437	0.0231	0.977	0.173	0.06	0.05
Wave4	0.0631	0.0149	0.0522	0.0207	0.975	0.798	0.057	0.05
Wave5	0.0489	0.0079	0.0371	0.0172	1	1	0.056	0.05
Wave6	0.0557	0.0087	0.0557	0.0116	1	1	0.057	0.05
Wave7	0.0514	0.0086	0.0514	0.0122	1	1	0.051	0.05
Wave8	0.0462	0.0083	0.0462	0.0121	1	1	0.049	0.05
Wave9	0.0535	0.0092	0.0535	0.0123	1	1	0.053	0.05
Wave10	0.0494	0.0091	0.0494	0.013	1	1	0.049	0.05
Scenario 3: Classification via \|$X$\| degrades with time
Baseline	0.092	0.0091	—	—	1	0.185	0.1	—
Wave1	0.0595	0.013	0.0595	0.0164	0.977	0.124	0.058	0.05
Wave2	0.0635	0.0173	0.0536	0.0215	0.946	0.121	0.059	0.05
Wave3	0.0583	0.0179	0.0376	0.025	0.933	0.088	0.066	0.05
Wave4	0.1049	0.027	0.0839	0.0308	0.872	0.087	0.08	0.05
Wave5	0.0929	0.0275	0.0363	0.0391	0.8	0.045	0.082	0.05
Wave6	0.0751	0.0232	0.014	0.0367	0.892	0.049	0.098	0.05
Wave7	0.1094	0.0286	0.0739	0.0354	0.848	0.033	0.109	0.05
Wave8	0.0978	0.0278	0.0344	0.0404	0.867	0.037	0.112	0.05
Wave9	0.117	0.0313	0.065	0.0409	0.833	0.035	0.117	0.05
Wave10	0.165	0.0365	0.099	0.0454	0.733	0.027	0.119	0.05

Table 1.

Simulation result for fixed |$\theta_t$| for three different scenarios

Scenario 1: Time invariant \|$X$\|
Time	\|$\widehat{p_t}$\|	V(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	V(⁠\|$\widehat{\theta_t}$\|⁠)	Sensitivity	Specificity	True \|$p_t$\|	True \|$\theta_t$\|
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0771	0.0194	0.0588	0.0259	0.932	0.114	0.068	0.05
Wave2	0.0729	0.0202	0.0436	0.0281	0.919	0.143	0.07	0.05
Wave3	0.0825	0.0213	0.0522	0.029	0.928	0.143	0.078	0.05
Wave4	0.0704	0.0188	0.0376	0.0288	0.947	0.134	0.079	0.05
Wave5	0.07	0.0196	0.0474	0.0272	0.943	0.136	0.082	0.05
Wave6	0.0706	0.016	0.0468	0.0254	0.976	0.098	0.086	0.05
Wave7	0.0734	0.0169	0.061	0.0233	0.975	0.073	0.078	0.05
Wave8	0.0788	0.023	0.0657	0.0284	0.937	0.074	0.067	0.05
Wave9	0.0641	0.0182	0.0349	0.0298	0.967	0.082	0.067	0.05
Wave10	0.0843	0.025	0.0699	0.0303	0.935	0.087	0.067	0.05
Scenario 2: Classification via \|$X$\| improves with time
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0879	0.0218	0.0698	0.0275	0.911	0.112	0.068	0.05
Wave2	0.0812	0.0177	0.0421	0.0283	0.962	0.265	0.069	0.05
Wave3	0.0646	0.0143	0.0437	0.0231	0.977	0.173	0.06	0.05
Wave4	0.0631	0.0149	0.0522	0.0207	0.975	0.798	0.057	0.05
Wave5	0.0489	0.0079	0.0371	0.0172	1	1	0.056	0.05
Wave6	0.0557	0.0087	0.0557	0.0116	1	1	0.057	0.05
Wave7	0.0514	0.0086	0.0514	0.0122	1	1	0.051	0.05
Wave8	0.0462	0.0083	0.0462	0.0121	1	1	0.049	0.05
Wave9	0.0535	0.0092	0.0535	0.0123	1	1	0.053	0.05
Wave10	0.0494	0.0091	0.0494	0.013	1	1	0.049	0.05
Scenario 3: Classification via \|$X$\| degrades with time
Baseline	0.092	0.0091	—	—	1	0.185	0.1	—
Wave1	0.0595	0.013	0.0595	0.0164	0.977	0.124	0.058	0.05
Wave2	0.0635	0.0173	0.0536	0.0215	0.946	0.121	0.059	0.05
Wave3	0.0583	0.0179	0.0376	0.025	0.933	0.088	0.066	0.05
Wave4	0.1049	0.027	0.0839	0.0308	0.872	0.087	0.08	0.05
Wave5	0.0929	0.0275	0.0363	0.0391	0.8	0.045	0.082	0.05
Wave6	0.0751	0.0232	0.014	0.0367	0.892	0.049	0.098	0.05
Wave7	0.1094	0.0286	0.0739	0.0354	0.848	0.033	0.109	0.05
Wave8	0.0978	0.0278	0.0344	0.0404	0.867	0.037	0.112	0.05
Wave9	0.117	0.0313	0.065	0.0409	0.833	0.035	0.117	0.05
Wave10	0.165	0.0365	0.099	0.0454	0.733	0.027	0.119	0.05

Scenario 1: Time invariant \|$X$\|
Time	\|$\widehat{p_t}$\|	V(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	V(⁠\|$\widehat{\theta_t}$\|⁠)	Sensitivity	Specificity	True \|$p_t$\|	True \|$\theta_t$\|
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0771	0.0194	0.0588	0.0259	0.932	0.114	0.068	0.05
Wave2	0.0729	0.0202	0.0436	0.0281	0.919	0.143	0.07	0.05
Wave3	0.0825	0.0213	0.0522	0.029	0.928	0.143	0.078	0.05
Wave4	0.0704	0.0188	0.0376	0.0288	0.947	0.134	0.079	0.05
Wave5	0.07	0.0196	0.0474	0.0272	0.943	0.136	0.082	0.05
Wave6	0.0706	0.016	0.0468	0.0254	0.976	0.098	0.086	0.05
Wave7	0.0734	0.0169	0.061	0.0233	0.975	0.073	0.078	0.05
Wave8	0.0788	0.023	0.0657	0.0284	0.937	0.074	0.067	0.05
Wave9	0.0641	0.0182	0.0349	0.0298	0.967	0.082	0.067	0.05
Wave10	0.0843	0.025	0.0699	0.0303	0.935	0.087	0.067	0.05
Scenario 2: Classification via \|$X$\| improves with time
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0879	0.0218	0.0698	0.0275	0.911	0.112	0.068	0.05
Wave2	0.0812	0.0177	0.0421	0.0283	0.962	0.265	0.069	0.05
Wave3	0.0646	0.0143	0.0437	0.0231	0.977	0.173	0.06	0.05
Wave4	0.0631	0.0149	0.0522	0.0207	0.975	0.798	0.057	0.05
Wave5	0.0489	0.0079	0.0371	0.0172	1	1	0.056	0.05
Wave6	0.0557	0.0087	0.0557	0.0116	1	1	0.057	0.05
Wave7	0.0514	0.0086	0.0514	0.0122	1	1	0.051	0.05
Wave8	0.0462	0.0083	0.0462	0.0121	1	1	0.049	0.05
Wave9	0.0535	0.0092	0.0535	0.0123	1	1	0.053	0.05
Wave10	0.0494	0.0091	0.0494	0.013	1	1	0.049	0.05
Scenario 3: Classification via \|$X$\| degrades with time
Baseline	0.092	0.0091	—	—	1	0.185	0.1	—
Wave1	0.0595	0.013	0.0595	0.0164	0.977	0.124	0.058	0.05
Wave2	0.0635	0.0173	0.0536	0.0215	0.946	0.121	0.059	0.05
Wave3	0.0583	0.0179	0.0376	0.025	0.933	0.088	0.066	0.05
Wave4	0.1049	0.027	0.0839	0.0308	0.872	0.087	0.08	0.05
Wave5	0.0929	0.0275	0.0363	0.0391	0.8	0.045	0.082	0.05
Wave6	0.0751	0.0232	0.014	0.0367	0.892	0.049	0.098	0.05
Wave7	0.1094	0.0286	0.0739	0.0354	0.848	0.033	0.109	0.05
Wave8	0.0978	0.0278	0.0344	0.0404	0.867	0.037	0.112	0.05
Wave9	0.117	0.0313	0.065	0.0409	0.833	0.035	0.117	0.05
Wave10	0.165	0.0365	0.099	0.0454	0.733	0.027	0.119	0.05

Table 2.

Simulation result for variable |$\theta_t$| for three different scenarios

Scenario 1: Time invariant \|$X$\|
Time	\|$\widehat{p_t}$\|	V(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	V(⁠\|$\widehat{\theta_t}$\|⁠)	Sensitivity	Specificity	True \|$p_t$\|	True \|$\theta_t$\|
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0216	0.0114	0.0023	0.0214	0.909	0.09	0.022	0.0005
Wave2	0.0364	0.0122	0.0268	0.0162	0.958	0.108	0.039	0.03
Wave3	0.0136	0.0114	0.0035	0.0169	0.666	0.093	0.016	0.002
Wave4	0.0806	0.0176	0.0711	0.0195	0.962	0.152	0.077	0.065
Wave5	0.135	0.0273	0.1159	0.0305	0.925	0.337	0.108	0.092
Wave6	0.0909	0.0233	0.0339	0.0376	0.928	0.084	0.068	0.038
Wave7	0.0666	0.0202	0.03	0.0316	0.933	0.093	0.056	0.042
Wave8	0.0334	0.007	0.0077	0.0222	1	0.088	0.054	0.039
Wave9	0.0631	0.016	0.0631	0.0176	0.969	0.075	0.067	0.047
Wave10	0.0364	0.007	0.0225	0.0189	1	0.078	0.055	0.036
Scenario 2: Classification via \|$X$\| improves with time
Baseline	0.095	0.0092	—	—	1	0.145	0.1	—
Wave1	0.0442	0.0068	0.0442	0.0122	1	0.135	0.051	0.046
Wave2	0.0254	0.0054	0.0254	0.0088	1	0.137	0.026	0.019
Wave3	0.0878	0.0148	0.0877	0.0147	0.984	0.798	0.088	0.087
Wave4	0.0437	0.0073	0.0325	0.0173	1	0.171	0.044	0.032
Wave5	0.0727	0.0158	0.0727	0.0169	0.978	0.484	0.067	0.067
Wave6	0.04	0.0074	0.0274	0.0181	1	0.356	0.044	0.037
Wave7	0.0283	0.0064	0.0283	0.0098	1	0.078	0.028	0.024
Wave8	0.0752	0.0177	0.0752	0.0179	0.975	1	0.069	0.069
Wave9	0.0474	0.0086	0.033	0.0203	1	1	0.05	0.043
Wave10	0.0566	0.0095	0.0566	0.0127	1	1	0.058	0.054
Scenario 3: Classification via \|$X$\| degrades with time
Baseline	0.0879	0.013	—	—	0.987	0.181	0.1	—
Wave1	0.0424	0.0184	0.033	0.0237	0.75	0.076	0.038	0.016
Wave2	0.0943	0.027	0.0664	0.0309	0.681	0.063	0.074	0.049
Wave3	0.1454	0.0296	0.0793	0.0378	0.859	0.071	0.142	0.096
Wave4	0.1394	0.0283	0.0576	0.0412	0.901	0.064	0.155	0.078
Wave5	0.15	0.0334	0.0853	0.0432	0.735	0.042	0.14	0.057
Wave6	0.1267	0.033	0.0176	0.0483	0.636	0.043	0.122	0.024
Wave7	0.1631	0.0254	0.0704	0.0399	0.829	0.007	0.181	0.095
Wave8	0.1276	0.0186	0.0262	0.0328	0.384	0.001	0.136	0.015
Wave9	0.196	0.0155	0.0963	0.0223	0.812	0	0.189	0.079
Wave10	0.2198	0.0168	0.1009	0.0222	0.83	0	0.211	0.097

Scenario 1: Time invariant \|$X$\|
Time	\|$\widehat{p_t}$\|	V(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	V(⁠\|$\widehat{\theta_t}$\|⁠)	Sensitivity	Specificity	True \|$p_t$\|	True \|$\theta_t$\|
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0216	0.0114	0.0023	0.0214	0.909	0.09	0.022	0.0005
Wave2	0.0364	0.0122	0.0268	0.0162	0.958	0.108	0.039	0.03
Wave3	0.0136	0.0114	0.0035	0.0169	0.666	0.093	0.016	0.002
Wave4	0.0806	0.0176	0.0711	0.0195	0.962	0.152	0.077	0.065
Wave5	0.135	0.0273	0.1159	0.0305	0.925	0.337	0.108	0.092
Wave6	0.0909	0.0233	0.0339	0.0376	0.928	0.084	0.068	0.038
Wave7	0.0666	0.0202	0.03	0.0316	0.933	0.093	0.056	0.042
Wave8	0.0334	0.007	0.0077	0.0222	1	0.088	0.054	0.039
Wave9	0.0631	0.016	0.0631	0.0176	0.969	0.075	0.067	0.047
Wave10	0.0364	0.007	0.0225	0.0189	1	0.078	0.055	0.036
Scenario 2: Classification via \|$X$\| improves with time
Baseline	0.095	0.0092	—	—	1	0.145	0.1	—
Wave1	0.0442	0.0068	0.0442	0.0122	1	0.135	0.051	0.046
Wave2	0.0254	0.0054	0.0254	0.0088	1	0.137	0.026	0.019
Wave3	0.0878	0.0148	0.0877	0.0147	0.984	0.798	0.088	0.087
Wave4	0.0437	0.0073	0.0325	0.0173	1	0.171	0.044	0.032
Wave5	0.0727	0.0158	0.0727	0.0169	0.978	0.484	0.067	0.067
Wave6	0.04	0.0074	0.0274	0.0181	1	0.356	0.044	0.037
Wave7	0.0283	0.0064	0.0283	0.0098	1	0.078	0.028	0.024
Wave8	0.0752	0.0177	0.0752	0.0179	0.975	1	0.069	0.069
Wave9	0.0474	0.0086	0.033	0.0203	1	1	0.05	0.043
Wave10	0.0566	0.0095	0.0566	0.0127	1	1	0.058	0.054
Scenario 3: Classification via \|$X$\| degrades with time
Baseline	0.0879	0.013	—	—	0.987	0.181	0.1	—
Wave1	0.0424	0.0184	0.033	0.0237	0.75	0.076	0.038	0.016
Wave2	0.0943	0.027	0.0664	0.0309	0.681	0.063	0.074	0.049
Wave3	0.1454	0.0296	0.0793	0.0378	0.859	0.071	0.142	0.096
Wave4	0.1394	0.0283	0.0576	0.0412	0.901	0.064	0.155	0.078
Wave5	0.15	0.0334	0.0853	0.0432	0.735	0.042	0.14	0.057
Wave6	0.1267	0.033	0.0176	0.0483	0.636	0.043	0.122	0.024
Wave7	0.1631	0.0254	0.0704	0.0399	0.829	0.007	0.181	0.095
Wave8	0.1276	0.0186	0.0262	0.0328	0.384	0.001	0.136	0.015
Wave9	0.196	0.0155	0.0963	0.0223	0.812	0	0.189	0.079
Wave10	0.2198	0.0168	0.1009	0.0222	0.83	0	0.211	0.097

Table 2.

Simulation result for variable |$\theta_t$| for three different scenarios

Scenario 1: Time invariant \|$X$\|
Time	\|$\widehat{p_t}$\|	V(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	V(⁠\|$\widehat{\theta_t}$\|⁠)	Sensitivity	Specificity	True \|$p_t$\|	True \|$\theta_t$\|
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0216	0.0114	0.0023	0.0214	0.909	0.09	0.022	0.0005
Wave2	0.0364	0.0122	0.0268	0.0162	0.958	0.108	0.039	0.03
Wave3	0.0136	0.0114	0.0035	0.0169	0.666	0.093	0.016	0.002
Wave4	0.0806	0.0176	0.0711	0.0195	0.962	0.152	0.077	0.065
Wave5	0.135	0.0273	0.1159	0.0305	0.925	0.337	0.108	0.092
Wave6	0.0909	0.0233	0.0339	0.0376	0.928	0.084	0.068	0.038
Wave7	0.0666	0.0202	0.03	0.0316	0.933	0.093	0.056	0.042
Wave8	0.0334	0.007	0.0077	0.0222	1	0.088	0.054	0.039
Wave9	0.0631	0.016	0.0631	0.0176	0.969	0.075	0.067	0.047
Wave10	0.0364	0.007	0.0225	0.0189	1	0.078	0.055	0.036
Scenario 2: Classification via \|$X$\| improves with time
Baseline	0.095	0.0092	—	—	1	0.145	0.1	—
Wave1	0.0442	0.0068	0.0442	0.0122	1	0.135	0.051	0.046
Wave2	0.0254	0.0054	0.0254	0.0088	1	0.137	0.026	0.019
Wave3	0.0878	0.0148	0.0877	0.0147	0.984	0.798	0.088	0.087
Wave4	0.0437	0.0073	0.0325	0.0173	1	0.171	0.044	0.032
Wave5	0.0727	0.0158	0.0727	0.0169	0.978	0.484	0.067	0.067
Wave6	0.04	0.0074	0.0274	0.0181	1	0.356	0.044	0.037
Wave7	0.0283	0.0064	0.0283	0.0098	1	0.078	0.028	0.024
Wave8	0.0752	0.0177	0.0752	0.0179	0.975	1	0.069	0.069
Wave9	0.0474	0.0086	0.033	0.0203	1	1	0.05	0.043
Wave10	0.0566	0.0095	0.0566	0.0127	1	1	0.058	0.054
Scenario 3: Classification via \|$X$\| degrades with time
Baseline	0.0879	0.013	—	—	0.987	0.181	0.1	—
Wave1	0.0424	0.0184	0.033	0.0237	0.75	0.076	0.038	0.016
Wave2	0.0943	0.027	0.0664	0.0309	0.681	0.063	0.074	0.049
Wave3	0.1454	0.0296	0.0793	0.0378	0.859	0.071	0.142	0.096
Wave4	0.1394	0.0283	0.0576	0.0412	0.901	0.064	0.155	0.078
Wave5	0.15	0.0334	0.0853	0.0432	0.735	0.042	0.14	0.057
Wave6	0.1267	0.033	0.0176	0.0483	0.636	0.043	0.122	0.024
Wave7	0.1631	0.0254	0.0704	0.0399	0.829	0.007	0.181	0.095
Wave8	0.1276	0.0186	0.0262	0.0328	0.384	0.001	0.136	0.015
Wave9	0.196	0.0155	0.0963	0.0223	0.812	0	0.189	0.079
Wave10	0.2198	0.0168	0.1009	0.0222	0.83	0	0.211	0.097

Scenario 1: Time invariant \|$X$\|
Time	\|$\widehat{p_t}$\|	V(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	V(⁠\|$\widehat{\theta_t}$\|⁠)	Sensitivity	Specificity	True \|$p_t$\|	True \|$\theta_t$\|
Baseline	0.0998	0.0162	—	—	0.975	0.178	0.1	—
Wave1	0.0216	0.0114	0.0023	0.0214	0.909	0.09	0.022	0.0005
Wave2	0.0364	0.0122	0.0268	0.0162	0.958	0.108	0.039	0.03
Wave3	0.0136	0.0114	0.0035	0.0169	0.666	0.093	0.016	0.002
Wave4	0.0806	0.0176	0.0711	0.0195	0.962	0.152	0.077	0.065
Wave5	0.135	0.0273	0.1159	0.0305	0.925	0.337	0.108	0.092
Wave6	0.0909	0.0233	0.0339	0.0376	0.928	0.084	0.068	0.038
Wave7	0.0666	0.0202	0.03	0.0316	0.933	0.093	0.056	0.042
Wave8	0.0334	0.007	0.0077	0.0222	1	0.088	0.054	0.039
Wave9	0.0631	0.016	0.0631	0.0176	0.969	0.075	0.067	0.047
Wave10	0.0364	0.007	0.0225	0.0189	1	0.078	0.055	0.036
Scenario 2: Classification via \|$X$\| improves with time
Baseline	0.095	0.0092	—	—	1	0.145	0.1	—
Wave1	0.0442	0.0068	0.0442	0.0122	1	0.135	0.051	0.046
Wave2	0.0254	0.0054	0.0254	0.0088	1	0.137	0.026	0.019
Wave3	0.0878	0.0148	0.0877	0.0147	0.984	0.798	0.088	0.087
Wave4	0.0437	0.0073	0.0325	0.0173	1	0.171	0.044	0.032
Wave5	0.0727	0.0158	0.0727	0.0169	0.978	0.484	0.067	0.067
Wave6	0.04	0.0074	0.0274	0.0181	1	0.356	0.044	0.037
Wave7	0.0283	0.0064	0.0283	0.0098	1	0.078	0.028	0.024
Wave8	0.0752	0.0177	0.0752	0.0179	0.975	1	0.069	0.069
Wave9	0.0474	0.0086	0.033	0.0203	1	1	0.05	0.043
Wave10	0.0566	0.0095	0.0566	0.0127	1	1	0.058	0.054
Scenario 3: Classification via \|$X$\| degrades with time
Baseline	0.0879	0.013	—	—	0.987	0.181	0.1	—
Wave1	0.0424	0.0184	0.033	0.0237	0.75	0.076	0.038	0.016
Wave2	0.0943	0.027	0.0664	0.0309	0.681	0.063	0.074	0.049
Wave3	0.1454	0.0296	0.0793	0.0378	0.859	0.071	0.142	0.096
Wave4	0.1394	0.0283	0.0576	0.0412	0.901	0.064	0.155	0.078
Wave5	0.15	0.0334	0.0853	0.0432	0.735	0.042	0.14	0.057
Wave6	0.1267	0.033	0.0176	0.0483	0.636	0.043	0.122	0.024
Wave7	0.1631	0.0254	0.0704	0.0399	0.829	0.007	0.181	0.095
Wave8	0.1276	0.0186	0.0262	0.0328	0.384	0.001	0.136	0.015
Wave9	0.196	0.0155	0.0963	0.0223	0.812	0	0.189	0.079
Wave10	0.2198	0.0168	0.1009	0.0222	0.83	0	0.211	0.097

6. Analysis of home health care study

According to National Institute of Mental Health depression is termed as a major mood disorder that hinder a person’s daily mental and physical activities. Depression can arise from multiple reasons that varies among different age groups. Studies have shown that depression among older individuals is strongly related to their history of illness and physical inability, although majority of these individuals are not clinically depressed, they are at higher risk of developing depression in future. Steffens and others (2009) reported overall depression prevalence of |$11.19\%$| based on a nationally representative cohort study for subjects with age more than |$71$|⁠. As discussed in Section 1.2, Bruce and others (2002) conducted a longitudinal study with clinical diagnosis data of older adults with medical comorbidity and functional disability, in order to identify potential risk factors associated with new depression cases. The goal of the study was to early identification, intervention, and prevention of clinically depressed individuals. Original study was designed as a single gold-standard test based on consensus, which deemed best from the feasibility point of view. The study also gather a wealth of associated socio-clinical and demographical data on the recruited subjects (Weissman and others, 2011a,b). Our objective is to show that if some of those additionally gathered covariates can be used to create a screening test, then using our developed methods one can obtain accurate estimator of prevalence and incidence. This can results in significant cost saving as in two-phase design time (and money) consuming gold-standard test need to be carried out only for a fraction of all recruited subjects. Since the accuracy of the screening test determines the success of two-phase design, we have used two different methods of screening-test construction. We have used informants score, demographic traits (age, gender, marital status, education, poverty status, race, and smoking status), mobility, MMSE, ADL, IADL, BMI etc. to construct screening test. Two clustering mechanism: (i) Model-based clustering and (ii) hierarchical clustering, are chosen as the screening test. The data used to obtain estimate at three separate time points: the baseline, 3-month followup, and 1-year followup. The design for the two-phase sampling scheme is;

(1) The screening test is conducted on the entire available sample at each stage to separate the subjects into two groups with: (i) depressed (screened positive) and (ii) non-depressed (screened negative).
(2) An “Ethical” sampling plan is followed, i.e. those screened positive in the screening test are all included in the second phase for gold-standard test.
(3) A simple random sample of screened negative individuals received gold-standard test in the second phase. We have considered three different fractions e.g. 5%, 10%, and 20% to study the accuracy of our estimation. Increasing the proportions of negative screened individuals will push the cost up but will reduce the variability of estimates.

After the phase two testing in each time period, the predicted prevalence and the predicted incidence rate are calculated via equations (2.1) and (3.2), respectively along with their standard deviations. The goal of this two-phase sampling scheme is to compare the predicted prevalence to the observed truth, in order to determine the precision of the proposed estimates. Moreover, since the screening test is fairly cheap as it is based on easily obtained additional information and then the gold-standard test need to be administered only on a fraction of total subjects, therefore the effective cost of the entire study could be significantly reduced. Albeit, when the original study was carried out two-phase longitudinal design was neither popular and to the best of our knowledge this article is the first endeavor to do so from the statistical methodology point of view. Hence, we use Home Health Care study as a benchmark purpose only and not to criticize the original design retrospectively. We hope that our methodological development will create synergy to consider two-phase design as an attractive alternative even in longitudinal follow up studies where the goal is true case detention over time. It is to be noted that the original Home Health Care study did not report any incidence rate, which we also estimated from the available data at each wave. The following sections will elaborate the screening tests that we constructed and their performances at each wave.

6.1. Model-based clustering

Note, the distributions of the variables considered for constructing screening test are not homogeneous, i.e. some variables are continuous, some discrete valued, and rest are nominal. This is major violation of mixture-model based clustering assumption. To alleviate this issue, principal component analysis (PCA) is performed first on the screening test variables to capture maximum possible variation in the data. The number of principal components chosen for the clustering are 10, 9, and 9, respectively for the three waves. Elbow plot of the PC’s are available in the supplementary material available at Biostatistics online. A model-based clustering (Fraley and Raftery, 2002) is implemented on the derived principle components of each time point to classify the entire available sample at each wave into depressed and non-depressed group. In order to check the accuracy of the proposed screening test, sensitivity and specificity after screening test is being conducted. Table 3 demonstrates the performance of the model-based clustering at each wave. Note that, since we have a low prevalence disease case (e.g. Depression) following the suggestion of Gordis (2009) more emphasis was given on sensitivity (see Section 4.2). Also for low prevalence population screening test often produces high number of false positives, thus yielding relatively low specificity.

Table 3.

Sensitivity and specificity analysis of different clustering methods as screening test

	Model-based clustering		Hierarchical clustering
Time of study	Sensitivity	Specificity	Sensitivity	Specificity
Baseline	0.568	0.291	0.745	0.171
3 Month	0.885	0.137	0.529	0.290
12 Month	0.821	0.223	0.682	0.315

	Model-based clustering		Hierarchical clustering
Time of study	Sensitivity	Specificity	Sensitivity	Specificity
Baseline	0.568	0.291	0.745	0.171
3 Month	0.885	0.137	0.529	0.290
12 Month	0.821	0.223	0.682	0.315

Table 3.

Sensitivity and specificity analysis of different clustering methods as screening test

	Model-based clustering		Hierarchical clustering
Time of study	Sensitivity	Specificity	Sensitivity	Specificity
Baseline	0.568	0.291	0.745	0.171
3 Month	0.885	0.137	0.529	0.290
12 Month	0.821	0.223	0.682	0.315

	Model-based clustering		Hierarchical clustering
Time of study	Sensitivity	Specificity	Sensitivity	Specificity
Baseline	0.568	0.291	0.745	0.171
3 Month	0.885	0.137	0.529	0.290
12 Month	0.821	0.223	0.682	0.315

Following the sampling scheme mentioned above, we estimated the prevalence and incidence rate for each wave and for each fraction (e.g. 5%, 10%, and 20%) of negative screened individuals (by screening test) included for gold-standard test. The sampling scheme of choosing negative screened individuals is repeated |$500$| times to generate respective mean prevalence and their dispersion measure. The corresponding incidence rates and its standard deviation for each sampling scheme and wave, are estimated in Table 4. The second and third columns of Table 4 are the cohort size and the true prevalence observed for each wave. Forth column shows the proportion of the negatively screened individuals who are included for the second phase test and the final sample size is in the fifth column. Columns six, seven, eight, and nine exhibit estimated prevalence, estimated standard deviation of the prevalence, estimated incidence, and estimated standard deviation of the incidence, respectively.

Table 4.

Detailed analysis for different clustering methods as screening test

Model-based clustering as screening test
Time	Cohort Size	True \|$p_t$\|	Proportion	Sample Size	\|$\widehat{p_t}$\|	SD(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	SD(⁠\|$\widehat{\theta_t}$\|⁠)
Baseline	539	15.95\|$\%$\|	5\|$\%$\|	184	16.06\|$\%$\|	5.73\|$\%$\|	NA	NA
			10\|$\%$\|	202	16.05\|$\%$\|	4.24\|$\%$\|	NA	NA
			20\|$\%$\|	240	15.98\|$\%$\|	2.64\|$\%$\|	NA	NA
3 Month	401	10.22\|$\%$\|	5\|$\%$\|	234	10.41\|$\%$\|	4.45\|$\%$\|	12.30\|$\%$\|	8.09\|$\%$\|
			10\|$\%$\|	243	10.11\|$\%$\|	2.97\|$\%$\|	12.02\|$\%$\|	5.76\|$\%$\|
			20\|$\%$\|	260	10.24\|$\%$\|	1.96\|$\%$\|	12.03\|$\%$\|	3.65\|$\%$\|
12 Month	293	15.69\|$\%$\|	5\|$\%$\|	132	15.90\|$\%$\|	6.98\|$\%$\|	17.90\|$\%$\|	8.83\|$\%$\|
			10\|$\%$\|	141	15.55\|$\%$\|	4.25\|$\%$\|	17.59\|$\%$\|	5.49\|$\%$\|
			20\|$\%$\|	158	15.75\|$\%$\|	2.95\|$\%$\|	17.65\|$\%$\|	3.76\|$\%$\|
Hierarchical clustering as screening test
Baseline	539	15.95\|$\%$\|	5\|$\%$\|	299	15.77\|$\%$\|	5.13\|$\%$\|	NA	NA
			10\|$\%$\|	311	15.64\|$\%$\|	3.41\|$\%$\|	NA	NA
			20\|$\%$\|	337	15.52\|$\%$\|	2.23\|$\%$\|	NA	NA
3 Month	401	10.22\|$\%$\|	5\|$\%$\|	283	8.45\|$\%$\|	6.99\|$\%$\|	10.33\|$\%$\|	9.99\|$\%$\|
			10\|$\%$\|	289	8.44\|$\%$\|	4.76\|$\%$\|	10.26\|$\%$\|	6.75\|$\%$\|
			20\|$\%$\|	302	8.62\|$\%$\|	3.22\|$\%$\|	10.56\|$\%$\|	4.51\|$\%$\|
12 Month	293	15.69\|$\%$\|	5\|$\%$\|	99	16.59\|$\%$\|	3.35\|$\%$\|	17.97\|$\%$\|	7.37\|$\%$\|
			10\|$\%$\|	109	16.48\|$\%$\|	2.41\|$\%$\|	18.27\|$\%$\|	5.07\|$\%$\|
			20\|$\%$\|	130	16.22\|$\%$\|	1.50\|$\%$\|	18.10\|$\%$\|	3.37\|$\%$\|

Model-based clustering as screening test
Time	Cohort Size	True \|$p_t$\|	Proportion	Sample Size	\|$\widehat{p_t}$\|	SD(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	SD(⁠\|$\widehat{\theta_t}$\|⁠)
Baseline	539	15.95\|$\%$\|	5\|$\%$\|	184	16.06\|$\%$\|	5.73\|$\%$\|	NA	NA
			10\|$\%$\|	202	16.05\|$\%$\|	4.24\|$\%$\|	NA	NA
			20\|$\%$\|	240	15.98\|$\%$\|	2.64\|$\%$\|	NA	NA
3 Month	401	10.22\|$\%$\|	5\|$\%$\|	234	10.41\|$\%$\|	4.45\|$\%$\|	12.30\|$\%$\|	8.09\|$\%$\|
			10\|$\%$\|	243	10.11\|$\%$\|	2.97\|$\%$\|	12.02\|$\%$\|	5.76\|$\%$\|
			20\|$\%$\|	260	10.24\|$\%$\|	1.96\|$\%$\|	12.03\|$\%$\|	3.65\|$\%$\|
12 Month	293	15.69\|$\%$\|	5\|$\%$\|	132	15.90\|$\%$\|	6.98\|$\%$\|	17.90\|$\%$\|	8.83\|$\%$\|
			10\|$\%$\|	141	15.55\|$\%$\|	4.25\|$\%$\|	17.59\|$\%$\|	5.49\|$\%$\|
			20\|$\%$\|	158	15.75\|$\%$\|	2.95\|$\%$\|	17.65\|$\%$\|	3.76\|$\%$\|
Hierarchical clustering as screening test
Baseline	539	15.95\|$\%$\|	5\|$\%$\|	299	15.77\|$\%$\|	5.13\|$\%$\|	NA	NA
			10\|$\%$\|	311	15.64\|$\%$\|	3.41\|$\%$\|	NA	NA
			20\|$\%$\|	337	15.52\|$\%$\|	2.23\|$\%$\|	NA	NA
3 Month	401	10.22\|$\%$\|	5\|$\%$\|	283	8.45\|$\%$\|	6.99\|$\%$\|	10.33\|$\%$\|	9.99\|$\%$\|
			10\|$\%$\|	289	8.44\|$\%$\|	4.76\|$\%$\|	10.26\|$\%$\|	6.75\|$\%$\|
			20\|$\%$\|	302	8.62\|$\%$\|	3.22\|$\%$\|	10.56\|$\%$\|	4.51\|$\%$\|
12 Month	293	15.69\|$\%$\|	5\|$\%$\|	99	16.59\|$\%$\|	3.35\|$\%$\|	17.97\|$\%$\|	7.37\|$\%$\|
			10\|$\%$\|	109	16.48\|$\%$\|	2.41\|$\%$\|	18.27\|$\%$\|	5.07\|$\%$\|
			20\|$\%$\|	130	16.22\|$\%$\|	1.50\|$\%$\|	18.10\|$\%$\|	3.37\|$\%$\|

Table 4.

Detailed analysis for different clustering methods as screening test

Model-based clustering as screening test
Time	Cohort Size	True \|$p_t$\|	Proportion	Sample Size	\|$\widehat{p_t}$\|	SD(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	SD(⁠\|$\widehat{\theta_t}$\|⁠)
Baseline	539	15.95\|$\%$\|	5\|$\%$\|	184	16.06\|$\%$\|	5.73\|$\%$\|	NA	NA
			10\|$\%$\|	202	16.05\|$\%$\|	4.24\|$\%$\|	NA	NA
			20\|$\%$\|	240	15.98\|$\%$\|	2.64\|$\%$\|	NA	NA
3 Month	401	10.22\|$\%$\|	5\|$\%$\|	234	10.41\|$\%$\|	4.45\|$\%$\|	12.30\|$\%$\|	8.09\|$\%$\|
			10\|$\%$\|	243	10.11\|$\%$\|	2.97\|$\%$\|	12.02\|$\%$\|	5.76\|$\%$\|
			20\|$\%$\|	260	10.24\|$\%$\|	1.96\|$\%$\|	12.03\|$\%$\|	3.65\|$\%$\|
12 Month	293	15.69\|$\%$\|	5\|$\%$\|	132	15.90\|$\%$\|	6.98\|$\%$\|	17.90\|$\%$\|	8.83\|$\%$\|
			10\|$\%$\|	141	15.55\|$\%$\|	4.25\|$\%$\|	17.59\|$\%$\|	5.49\|$\%$\|
			20\|$\%$\|	158	15.75\|$\%$\|	2.95\|$\%$\|	17.65\|$\%$\|	3.76\|$\%$\|
Hierarchical clustering as screening test
Baseline	539	15.95\|$\%$\|	5\|$\%$\|	299	15.77\|$\%$\|	5.13\|$\%$\|	NA	NA
			10\|$\%$\|	311	15.64\|$\%$\|	3.41\|$\%$\|	NA	NA
			20\|$\%$\|	337	15.52\|$\%$\|	2.23\|$\%$\|	NA	NA
3 Month	401	10.22\|$\%$\|	5\|$\%$\|	283	8.45\|$\%$\|	6.99\|$\%$\|	10.33\|$\%$\|	9.99\|$\%$\|
			10\|$\%$\|	289	8.44\|$\%$\|	4.76\|$\%$\|	10.26\|$\%$\|	6.75\|$\%$\|
			20\|$\%$\|	302	8.62\|$\%$\|	3.22\|$\%$\|	10.56\|$\%$\|	4.51\|$\%$\|
12 Month	293	15.69\|$\%$\|	5\|$\%$\|	99	16.59\|$\%$\|	3.35\|$\%$\|	17.97\|$\%$\|	7.37\|$\%$\|
			10\|$\%$\|	109	16.48\|$\%$\|	2.41\|$\%$\|	18.27\|$\%$\|	5.07\|$\%$\|
			20\|$\%$\|	130	16.22\|$\%$\|	1.50\|$\%$\|	18.10\|$\%$\|	3.37\|$\%$\|

Model-based clustering as screening test
Time	Cohort Size	True \|$p_t$\|	Proportion	Sample Size	\|$\widehat{p_t}$\|	SD(⁠\|$\widehat{p_t}$\|⁠)	\|$\widehat{\theta_t}$\|	SD(⁠\|$\widehat{\theta_t}$\|⁠)
Baseline	539	15.95\|$\%$\|	5\|$\%$\|	184	16.06\|$\%$\|	5.73\|$\%$\|	NA	NA
			10\|$\%$\|	202	16.05\|$\%$\|	4.24\|$\%$\|	NA	NA
			20\|$\%$\|	240	15.98\|$\%$\|	2.64\|$\%$\|	NA	NA
3 Month	401	10.22\|$\%$\|	5\|$\%$\|	234	10.41\|$\%$\|	4.45\|$\%$\|	12.30\|$\%$\|	8.09\|$\%$\|
			10\|$\%$\|	243	10.11\|$\%$\|	2.97\|$\%$\|	12.02\|$\%$\|	5.76\|$\%$\|
			20\|$\%$\|	260	10.24\|$\%$\|	1.96\|$\%$\|	12.03\|$\%$\|	3.65\|$\%$\|
12 Month	293	15.69\|$\%$\|	5\|$\%$\|	132	15.90\|$\%$\|	6.98\|$\%$\|	17.90\|$\%$\|	8.83\|$\%$\|
			10\|$\%$\|	141	15.55\|$\%$\|	4.25\|$\%$\|	17.59\|$\%$\|	5.49\|$\%$\|
			20\|$\%$\|	158	15.75\|$\%$\|	2.95\|$\%$\|	17.65\|$\%$\|	3.76\|$\%$\|
Hierarchical clustering as screening test
Baseline	539	15.95\|$\%$\|	5\|$\%$\|	299	15.77\|$\%$\|	5.13\|$\%$\|	NA	NA
			10\|$\%$\|	311	15.64\|$\%$\|	3.41\|$\%$\|	NA	NA
			20\|$\%$\|	337	15.52\|$\%$\|	2.23\|$\%$\|	NA	NA
3 Month	401	10.22\|$\%$\|	5\|$\%$\|	283	8.45\|$\%$\|	6.99\|$\%$\|	10.33\|$\%$\|	9.99\|$\%$\|
			10\|$\%$\|	289	8.44\|$\%$\|	4.76\|$\%$\|	10.26\|$\%$\|	6.75\|$\%$\|
			20\|$\%$\|	302	8.62\|$\%$\|	3.22\|$\%$\|	10.56\|$\%$\|	4.51\|$\%$\|
12 Month	293	15.69\|$\%$\|	5\|$\%$\|	99	16.59\|$\%$\|	3.35\|$\%$\|	17.97\|$\%$\|	7.37\|$\%$\|
			10\|$\%$\|	109	16.48\|$\%$\|	2.41\|$\%$\|	18.27\|$\%$\|	5.07\|$\%$\|
			20\|$\%$\|	130	16.22\|$\%$\|	1.50\|$\%$\|	18.10\|$\%$\|	3.37\|$\%$\|

6.2. Hierarchical clustering

We also considered a hierarchical clustering mechanism (Ward, 1963; Murtagh, 2014) as a screening test. The test subjects are partitioned into depressed and non-depressed group by employing clustering on screening test variables. An advantage of this approach is that screening variables do not need to be of any specific type such as model-based clustering. We provide the screening test clustering result in the supplementary material available at Biostatistics online. Sensitivity and specificity of the screening test are provided in Table 3.

Following the similar sampling scheme of Section 6.1, we have drawn sample fraction of 5%, 10%, and 20% from the phase one screened non-depressed group and performed the gold-standard test on them along with all subjects of the phase one screened depressed group. Relatively large sample size in the 1st wave is resulted from high proportion of phase one screened depressed group. The results for hierarchical clustering utilized as the screening test are displayed in Table 4, which elaborates the detailed analysis of the experiments broken down for each wave of the study. The predicted prevalence |$\widehat{p_t}$| at each wave are not much imprecise compared to the actual |$p_t$| with estimation variability decreasing with increase of sampling fraction in the phase two sample. The predicted incidence rate is also presented for wave-II (3 month followup) and wave-III (1 year followup) with similar trend in variability as prevalence.

6.3. Discussion on screening test performance

To summarize, the first phase screening test is considered as the clustering (model based and hierarchical) based on screening test variables. It should be noted here that the PCA based clustering lacks meaningful explanation as the information on the original variables are lost while constructing the PC’s. In order to retain these information, the hierarchical clustering can be considered as a viable alternative. However, if the objective is not to find meaning of the screening test rather use it as a black-box for classification, it can serve that purpose well as evident from it’s performance. Both screening tests significantly reduce total number of gold-standard test, compared with the original study, while estimated prevalence is quite close to the observed truth. Table 4 shows that the prediction performance of prevalence estimate is more robust for model-based clustering than the hierarchical clustering. As mentioned earlier that the original study only measured the prevalence rate at each wave, so no incidence rate was reported. We also notice an increase in predicted incidence rate from 3 month to 12 month screening. A possible explanation for this increment is that the chance of developing major depressive disorder increases rapidly with homebound geriatric individuals with passing time. Nevertheless, we have demonstrated that the proposed methodology could result in significant cost savings as the gold standard test is only performed on a smaller group of individuals form entire cohort and no extra cost is incurred for the screening test. This comes without much compromise in the precision of estimation, while testing for |$<45\%$| of the total sample in each time point.

Remark:

As mentioned before, in the original study (Bruce and others, 2002; Weinberger and others, 2009), only the gold-standard test was carried out as it was not intended to be a two-phase design. As a result no screening test was constructed and no cost comparison was made. Ideally a prospectively designed two-phase study should first construct a screening test via pilot study or based on historical data and justify parameters of the constructed screening test via cost-effectiveness and efficiency analysis. In this article, we have constructed retrospectively defined screening test/s based on available auxiliary information to show considerable savings in sample size, which should potentially lead to lower cost. However in order to perform efficiency analysis of two-phase sampling over single-phase counterpart, information about the cost of each screening test is also required, along with the gold-standard test. Thus we cannot measure the efficiency of the two-phase mechanism as described in Section 4.

7. Discussion

This research is motivated by real-life studies and intents to address the estimation issues in two-phase longitudinal study design. Though for the simulation studies, we have closely followed the “Ethical” sampling design, our developed methodology is applicable for any general two-phase design scheme. From all the explored cases we could summarize two significant findings. First, sensitivity and specificity of the first phase fallible test plays a crucial role in determining the efficiency of the estimate. This is something additional to the comments made by McName (2003), from the cost consideration context. Second, though incidence rate and prevalence rates are quite related, prevalence estimate shows remarkable robustness in comparison to the incidence estimate at any time point. This is somewhat surprising, as we expected that the trend should be somewhat parallel. Specifically, if sensitivity is fairly close to unity then prevalence and incidence estimates coincides under “Ethical” sampling scheme, and in that case incidence estimate do inherit some degree of robustness. Also we would like to point out that longitudinal estimation of prevalence and incidence has medical significance. The monotonic trend may well indicate the general health pattern of the community and whether any intervention is effective or not, over time. As a future work we are planning to extend our approach in the regression estimate context. Another direction is to include a more complicated sampling plan which can accommodate inclusion of new subjects over time and especially the estimation issues with missing data. Both situations are quite common in practice. Another exciting future direction could be designing efficient sampling plan with fixed cost consideration in longitudinal setup. Nevertheless, we hope that the present article will shed some light on the estimation issues in the two-phase sampling design from the longitudinal perspective.

Acknowledgements

Last author would also like to thank Jianzhao Shen for proposing the problem related to motivating example 1. We also thank Dr P. E. Shrout for his comments on a previous version of the paper. Conflict of Interest: None declared.

Funding

Research of last author is partly supported by PCORI contract ME-1409-21410 and NIH grant P30-ES020957.

References

Beckett,

L. A.

,

Scherr,

P. A.

and

Evans,

D. A.

(

1992

).

Pupoulation prevalence estimates from the complex samples

.

Journal of Clinical Epidemology

45

,

393

–

402

.

Bruce,

M. L.

,

McAvay,

G. J.

,

Raue,

P. J.

,

Ellen,

L.,

Meyers,

B. S.

,

Keohane,

D. J.

,

Jagoda,

D. R.

and

Weber,

C.

(

2002

).

Major depression in elderly home health care patients

.

American Journal of Psychiatry

159

,

1367

–

1374

.

Callahan,

C. M.

,

Hall,

K. S.

,

Hui,

S. L.

,

Musick,

B. S.

,

Unverzagt,

F. W.

and

Hendrie,

H. C.

(

1996

).

Relationship of age, education, and occupation with dementia among a community-based sample of African Americans

.

American Medical Association

53

,

134

–

140

.

Clayton,

D.

,

Spiegelhalter,

D.

,

Dunn,

G.

and

Pickels,

A.

(

1998

).

Analysis of longitudinal binary data from multiphase sampling

.

Journal of Royal Statistical Society

60

,

71

–

87

.

Cochran,

W. G.

(

1977

).

Sampling Techniques

, 3rd edition.

New York

:

Wiley

.

Google Preview

Deming,

W.

(

1977

).

An essay on screening, or two-phase sampling applied to surveys of a community

.

International Statistical Review

45

,

29

–

37

.

Dunn,

G.

,

Pickels,

A.

,

Tansella,

M.

and

Vazquez-Barquero,

J.

(

1999

).

Two-phase epidemiological surveys in psychiatric research

.

British Journal of Psychiatry

174

,

359

–

363

.

Fraley,

C.

and

Raftery,

A.

(

2002

).

Model-based clustering, discriminant analysis and density estimation

.

Journal of the American Statistical Association

97

,

611

–

631

.

Fraley,

C.

and

Raftery,

A.

(

2006

).

MCLUST version 4 for R: normal mixture modeling and model-based clustering

.

Technical Report

.

University of Washinton

,

tr504

.

Google Preview

Gao,

S.

,

Hui,

S. L.

,

Hall,

K. S.

and

Hendrie,

H. C.

(

2000

).

Estimating disease prevalence from two-phase surveyes with non-response at the second pahse

.

Statistics in Medicine

19

,

2101

–

2114

.

Goodman,

L. A.

(

1960

).

On the exact variance of products

.

Journal of the American Statistical Association

55

,

708

–

713

.

Goodman,

L. A.

(

1962

).

The Variance of the Product of K Random Variables

.

Journal of the American Statistical Association

57

,

54

–

60

.

Gordis,

L.

(

2009

).

Epidemiology

.

Philadelphia, PA

:

Saunders Elsevier

.

Google Preview

Hall,

K. S.

,

Gao,

S.

,

Emsley,

C. L.

,

Ogunniyi,

A.

,

Morgan,

O.

and

Hendrie,

H. C.

(

1999

).

Community screening interview for dementia (CSI’D’); Performnace in five disparate study sites

.

International Journal of Geriatric Psychiatry

15

,

521

–

531

.

Hendrie,

H. C.

,

Ogunniyi,

A. O.

,

Hall,

K. S.

,

Baiyewu,

O.

,

Unverzagt,

F. W.

,

Gureje,

O.

, d

Gao,

S.

,

Evans,

R. M.

,

Ogunseyinde,

A. O.

,

Adeyinka,

A. O.

,

Musick,

B.

and

Hui,

S. L.

(

2001

).

Incidence of dementia and Alzheimer disease in 2 communities

.

Journal of American Medical Association

6

,

739

–

747

.

Hendrie,

H. C.

,

Osuntokun,

B. O.

,

Hall,

K. S.

,

Ogunniyi,

A. O.

and others

(

1995

).

Prevalence of Alzheimer’s disease and dementia in two communities: Nigerian Africans and African Americans

.

American Psychiatric Association

152

,

1485

–

1492

.

McName,

R.

(

2003

).

Efficency of two-phase designs for prevalence estimation

.

International Journal of Epidemiology

32

,

1072

–

1078

.

McName,

R.

(

2004

).

Two-phase sampling for simulatnoeus prevalence estimation and case detection

.

Biometrics

60

,

783

–

792

.

Murtagh,

F.

and

Legendre,

P.

(

2014

).

Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?

Journal of Classification

31

,

274

–

295

.

Neyman,

J.

(

1938

).

Contribution to the theory of sampling human poplulations

.

Journal of American Statistical Association

33

,

101

–

116

.

Pickels,

A.

,

Dunn,

G.

and

Vazquez-Barquero,

J.

(

1995

).

Screening for stratification in two-phase (“two- stage”) epidemiological surveys

.

Statistical Methods in Medical Research

4

,

73

–

89

.

Steffens,

D. C.

,

Fisher,

G. G.

,

Langa,

K. M.

,

Potter,

G. G.

and

Plassman,

B. L.

(

2009

).

Prevalence of depression among older Americans: the Aging, Demographics and Memory Study

.

International Psychogeriatrics

21

,

879

–

888

.

Shen,

J.

,

Gao,

S.

,

Unverzagt,

F. W.

,

Ogunniyi,

A.

,

Baiyewu,

O.

,

Gureje,

O.

,

Hendrie,

H. C.

and

Hall,

K. S.

(

2006

).

Validation analysis of informant’s ratings of conginitive function in African American and Nigerians

.

Internation Journal of Geriatric Psychiatry

21

,

618

–

625

.

Shrout,

P. E.

and

Newman,

S. C.

(

1989

).

Design of two-phase prevalence surveyes of rare disorders

.

Biometrics

45

,

549

–

555

.

Ward,

J. H.

(

1963

).

Hierarchical Grouping to Optimize an Objective Function

.

Journal of the American Statistical Association

58

,

236

–

244

.