SUMMARY

Two-phase sampling design is a common practice in many medical studies. Generally, the first-phase classification is fallible but relatively cheap, while the accurate second phase state-of-the-art medical diagnosis is complex and rather expensive to perform. When constructed efficiently it offers great potential for higher true case detection as well as for higher precision at a limited cost. In this article, we consider epidemiological studies with two-phase sampling design. However, instead of a single two-phase study, we consider a scenario where a series of two-phase studies are done in a longitudinal fashion on a cohort of interest. Another major design issue is non-curable pattern of certain disease (e.g. Dementia, Alzheimer’s etc.). Thus often the identified disease positive subjects are removed from the original population under observation, as they require clinical attention, which is quite different from the yet unidentified group. In this article, we motivated our methodology development from two real-life studies. We consider efficient and simultaneous estimation of prevalence as well incidence at multiple time points from a sampling design-based approach. We have explicitly shown the benefit of our developed methodology for an elderly population with significant burden of home-health care usage and at the high risk of major depressive disorder.

1. Introduction

The aim of any multi-phase design is of two-fold. First, to detect as many cases as possible and second, the efficient estimation of prevalence at a limited cost. Two-phase design in particular has been very popular in epidemiological studies (Pickels and others, 1995; Dunn and others, 1999). In a standard two-phase design (Neyman, 1938), at the first phase all subjects under study receive a low cost and easy to administer but fallible screening test. Depending upon the first phase result, subjects are then classified into two (or more) categories. In the second phase, a random sample is drawn from each of these categories, which undergoes state-of-the-art (or “gold-standard”) and rather expensive diagnostic procedure to determine their true disease status. Optimal design strategies for two-phase surveys has been discussed in Deming (1977), Shrout and Newman (1989) and McName (2004). For a more detailed review of the literature, please see Pickels and others (1995). Two-phase studies are popular in psychometric research and commonly found in mental health (Beckett and others, 1992; Hendrie and others, 2001) studies.

In the two-phase study the most important task is the estimation of prevalence, i.e. previously undetected and untreated “cases”. However, often we are presented with a scenario where multiple two-phase studies are performed over different time points. For such a scenario not only the prevalence but also estimation of the incidence, i.e. the frequency of the fresh cases, also becomes significant. In a regression estimation setup Clayton and others (1998) represented such a scenario. However that study has limitation in two senses. First, it focuses on the incidence estimation only and second, it only considers two time points. Extension of the same for multiple time points in the Longitudinal data framework is complicated and unexplored so far. In this article, we present simultaneous estimation of prevalence and incidence from the sampling design perspective. Our method is fairly simple and applicable for multiple time points.

1.1. Motivating example 1

Our first example comes from a NIH funded study for detecting Alzheimer and Dementia in two communities. The study design is presented in detail in Hendrie and others (1995, 2001). A fixed number of elderly subjects are followed in a longitudinal fashion for about 5 years in two communities (African Americans in Indiana, USA and Yoruba in Ibadan, Nigeria). The estimation of prevalence and incidence of dementia is carried out at two time points (or waves) separately at 2-year follow-up and at 5-year follow-up. However, due to irreversible nature of the disease, at each successive time point those who are identified to be demented via the second phase gold-standard test are progressively excluded. Hence, it represents a monotonic decreasing population under investigation over the time (see Figure 1(a)). Though new subjects could be included as part of the existing cohort, we did not consider those in the current context. At each wave a two-phase design is carried out. In the first phase, a screening test is done based on CSI’D’ score (Hall and others, 1999). Based on the output of this score subjects are categorized into four groups (Good, Intermediate, Poor, and Impaired). A random sample with fixed percentage from each category is drawn for second phase clinical assessment, which is considered to be gold-standard. An ethical sampling plan (see Section 2) is followed. The percentage to be sampled for second phase from each outcome group is not guided by any optimality property, rather based on associated cost and convenience. More importantly, the wave 1 and wave 2 estimations are carried out independently. There exist other interesting aspect of this cohort, which are studied in many subsequent papers (Callahan and others, 1996; Shen and others, 2006), but we do not elaborate further as that is not the main goal of this article.

Schematic diagram of the longitudinal two-phase study in (a). Schematic diagram of the two-phase study at $T=t$ in (b). Expected frequency in the cells of the survey at $T=t$ are presented in (c).
Fig. 1.

Schematic diagram of the longitudinal two-phase study in (a). Schematic diagram of the two-phase study at |$T=t$| in (b). Expected frequency in the cells of the survey at |$T=t$| are presented in (c).

1.2. Motivating example 2

Our second example comes from another NIH funded study to detect late life depression (LLD) in medical home care. LLD is under identified, under diagnosed and under treated. The identification of LLD is complicated due to co-morbid physical illness, impaired cognitive function, and stigma associate with being diagnosed as depressed at an older age. Thus the main goal of the study was to assess the prevalence, 1-month persistence and other clinical features, and clinical, functional, and health care outcomes at 12 months of major depression and subsyndromal depression in elderly patients newly admitted for medical home care at a large regional Visiting Nurses Agency (VNS Westchester County, NY, USA). The subjects were a random representative sample of elderly patients (age 65 or more) from primarily homebound newly admitted to the VNS and were sampled on a weekly basis for a period of 2 years. For this specific study two-phase sampling was not proposed originally, rather a single gold-standard test is carried out. A team of geriatric psychiatrist, geriatrician, clinical psychologist, and sociologist evaluated HAM-D, GDS, SCID (patient and informant), and tape recorded semi-structured Nurse interview together to create a new “gold-standard” of major depression that is based on consensus using DSM-IV diagnosis criteria (Bruce and others, 2002; Weinberger and others, 2009). Undoubtedly, this was time consuming and expensive, albeit followed DSM-IV’s “etiologic” approach for the diagnoses of depression. Though the primary end-point of the original study was 12 month, the data was collected at baseline, at 3-month and at the end of 12 month study period, however, subjects were followed over a 24-month period. Our objective in this present situation is to show that if a well constructed screening test was implemented using available surrogate information in a two-phase sampling design, one can estimate incidence and prevalence quite accurately but using a much smaller sample size (i.e. lower cost). This is because unlike the original study design, not all subjects were required to be evaluated in the gold-standard test even when an “Ethical” sampling plan is followed. There exist many other interesting aspect of this study from subject characteristics, which are studied in many subsequent papers (Weissman and others, 2011a,b).

In the rest of the article, we closely follow the common design of our motivating examples, though the methodological development is general enough to be applicable to any longitudinal two-phase sampling plan. The rest of the article is organized as follows. In Section 2, we introduce some notation and relevant background material. In Section 3, we discuss estimation issue for the incidence rate over time. We have done some efficiency analysis in Section 4, under the assumption that the cost of each phase is available. Section 5 concentrated on simulation studies with screening test improving/degrading over time. In Section 6, we have taken one of the motivating examples to study the behavior of our estimate. We conclude the article with a brief discussion.

2. Design of two-phase survey: some notations

First, we introduce some notations, which are broadly taken from Shrout and Newman (1989) as: |$D$|⁠, true disease status (e.g. presence of diseases, for e.g. dementia, depression etc.) indicated by some well-defined diagnostic procedure; |$X$|⁠, explanatory variable(s) (e.g. informants score, demographic, socioeconomic, and other subrogate information) used for predicting prevalence in phase one; |$Y$| the fallible classification obtained using screening test in phase one. In phase one based on |$X$|⁠, we first classify the subjects either |$Y=1$| (presence of disease) or |$Y=2$| (absence of disease or, |$\overline{Y}$|⁠). Popularly logistic regression have been used (Gao and others, 2000), however, the labeling of |$Y$| is more of a clustering problem than a classification one. This will be elucidated further in the simulation section. At the baseline, we have |$N$| many individuals in the study, out of whom |$n (<<N)$| have the disease. Of course |$n$| is unknown and need to be estimated. Let the random variable |$T=t$| denote the |$t$|-th two-phase study (also known as |$t$|-th wave) for |$T=1,2,3,\cdots$|⁠, which are not necessarily equispaced. At |$T=t$|⁠, let |$p_t$| denote the prevalence rate. For the transition from |$T=t-1$| to |$t$|⁠, let |$\theta_{t}$| denote the incidence rate. At the baseline, i.e. when |$T=1$| our initial assumption is |$\theta_1=0$|⁠, so one only have to estimate prevalence (⁠|$p_1$|⁠). For the sake of simplicity in this article we assume there is no loss due to death, absence etc., at different phases and also at different time points. Let |$n_{t1}$| and |$n_{t2}$| denote number of people for which we detect |$D=1$| and |$D=2$| (or, |$\overline{D}$|⁠) at the second phase of |$T=t$|⁠, respectively. Due to the irreversible nature of many disease (e.g. dementia, Alzheimer’s etc.) it is assumed that once accurate diagnosis is made (in the second phase), we exclude those people from the study at all other successive time points. Hence at the beginning of |$T=t$|⁠, number of subject under study is |$N_t=N-\sum_{i=1}^{t-1} n_{i1}$|⁠. Let |$d_{tj}=\sum_{i=1}^{N_t} 1_{\{Y_i = j|X_i\}}$| denote number of subject classified as |$Y=j$| for |$j=1, 2$|⁠. Note that |$N_t= \sum_{j=1}^2 d_{tj}$| for |$\forall \ T=t$|⁠. Following standard notion, let |$f_{tj}$| for |$j=1,2$| denote the fraction of the random sample included in the second phase study at |$T=t$|⁠. If |$n_t$| is the number of second phase sample for |$T=t$| then |$n_t=\sum_{j=1}^2 f_{tj} d_{tj}$|⁠. Due to “Ethical” reason (Shrout and Newman, 1989) in many two-phase studies |$f_{t1}=1$| is chosen or all the screened positive (⁠|$Y=1$|⁠) samples are included in the second phase. For our motivating example 1 this is the case, however for general discussion we will assume |$0\leq f_{tj}\leq 1$|⁠, |$j=1,2$|⁠.

From Figure 1(b), at |$T=t$|⁠, the sensitivity and specificity are given by |$P(Y=1|D)=\frac{a_t}{n_{t1}}$| and |$P(Y=2|\overline{D})=\frac{d_t}{n_{t2}}$|⁠, respectively. Let |$\pi_{t1}$| denote the probability of an individual screened positive at the first phase when |$T=t$|⁠. The maximum likelihood (ML) estimate of the above is given by |$\widehat{\pi}_{t1}=\frac{\sum_{i=1}^{N_t} 1_{\{Y_i =1|X_i\}}}{N_t}=\frac{d_{t1}}{N_t}$|⁠. Also define at |$T=t$|⁠, |$\lambda_{t1}=P(D|Y)$| and |$\lambda_{t2}=P(D|\overline{Y})$|⁠, the prevalence of disease at the screened positive and negative group. The ML estimate of |$\lambda_{t1}$| and |$\lambda_{t2}$| are |$\frac{a_t}{f_{t1} d_{t1}}$| and |$\frac{c_t}{f_{t2} d_{t2}}$|⁠, respectively. Note all the samples from the first phase are not examined in the second phase. Figure 1(c) illustrates the expected frequency in the cells of the survey at |$T=t$|⁠. The estimate of the prevalence at |$T=t$| is the weighted average of prevalence in |$Y=1$| and |$Y=2$| which is given by,
(2.1)
The large sample variance of the above is given by,
(2.2)
Suppose each screening test costs |$c_S$| in the first phase and |$c_D$| for the second phase diagnostic with |$c_S<<c_D$|⁠. Under the constraint that total study cost is fixed, the optimal choice of |$f_{t1}$| and |$f_{t1}$| is given in Shrout and Newman (1989) (also in Cochran (1977)), which is obtained by minimizing (2.2):
(2.3)
For “Ethical” reason often |$f_{t1}=1$| for all |$T=t$|⁠. In that situation, under the assumption that the remaining available resource after the first phase does not cover the expenses of including every member in the second phase (i.e. |$<N_t (c_D-c_S)$|⁠), the optimal value of |$f_{t2}$| is:
(2.4)
where |$c=\frac{c_S}{c_D}$|⁠. If |$f_{t2}^{\ast\ast}>1$| for some |$T=t$|⁠, this implies two-phase design is less efficient than a single-phase design with only gold-standard test. In this article, we do not consider |$c_S$| and |$c_D$| to be time variant. However, when the time gap between two successive two-phase studies are rather long, it makes sense to consider them to be time variant.

3. Estimation of incidence rate

Prevalence is essentially the number of persons having true disease at the beginning of the study in the cohort or population of interest. At all other time points estimation of incidence is more important and meaningful. At those points prevalence has contribution both from the fresh cases of disease as well as from the previously undetected cases. A general outline of the above sampling design at |$T=t$| is presented in Figure 1(b), in which, the observed outcome of the first and second phase is depicted in a |$2\times2$| contingency table. The true unobserved disease status in Figure 1(b) requires some algebra and is given in Theorem 3.1 below.

 
Theorem 3.1

Suppose we have |$N_t$| many subjects under the study at the beginning of |$T=t$|⁠, with |$\theta_{t}$| being the incidence rate for the transition from |$T=t-1$| to |$T=t$|⁠. Then the number of subjects with true disease is given by, |$n-\sum_{i=1}^{t-1}n_{i1}+(N-n)\left\{1- \prod_{i=1}^{t}(1-\theta_i)\right\}$|⁠, while its complement is |$(N-n)\left\{\prod_{i=1}^{t}(1-\theta_i)\right\}$|⁠.

For brevity all proofs are provided in the supplementary material available at Biostatistics online. To calculate prevalence at any |$T=t$| we may use equation (2.1). Note that from this we can get an estimate of |$n$| and its variance as |$\widehat{n}=N\widehat{p_1}$| and |$V(\widehat{n})=N^2V(\widehat{p_1})$|⁠, respectively. To estimate incidence at any |$T=t$| we use the identity, |$ 1-p_t=\frac{(N-n)\left\{\prod_{i=1}^{t}(1-\theta_i)\right\}}{N-\sum_{i=1}^{t-1}n_{i1}} $|⁠. Solving above yields,
(3.1)

An exact formula for the variance involving the product of many random variables are given in Goodman (1962). Unfortunately, even if we assume independence of the involving random variables, variance calculation for |${\theta}_{t}$| is rather prohibitive. Next, we present another equivalent formulation of |${\theta}_{t}$|⁠, which is computationally much simpler.

3.1. Equivalent form for |$\theta_t$|

The estimating equation (3.1) of the incidence rate though useful is little complicated for interpretation purpose. An equivalent expression for |${\theta}_{t}$| in terms of only prevalence is presented in this section. We consider two adjacent time points say |$T=t-1$| and |$T=t$| with prevalence rate |${p}_{t-1}$| and |${p}_{t}$| and from the experimental design |$N_{t-1}=N_t+n_{t-1,1}$|⁠. At any |$T=t-1$|⁠, number of people with true disease status is |$N_{t-1} p_{t-1}$|⁠, out of which |$n_{t-1,1}$| are truly detected and removed from the study. At |$T=t$|⁠, number of undetected people with true disease status is |$N_{t-1} p_{t-1}-n_{t-1,1}$|⁠. Hence, the number of fresh cases of disease at |$T=t$| is |$\left(N_t-N_{t-1} p_{t-1}+n_{t-1,1}\right)\theta_t$|⁠. The expression for the same can be also derived by using prevalence estimate at |$T=t-1$| and |$t$|⁠, which will be |$N_{t} p_{t}-N_{t-1} p_{t-1}+n_{t-1,1}$|⁠. Equating both we get,
(3.2)

The interpretation of the above estimate is straight forward, which essentially is a ratio of the number of new cases of positive disease status, divided by the effective sample size at |$t$|-th time. Above estimate of |$\theta_t$| has some interesting property for “Ethical” sampling design (Shrout and Newman, 1989), depending upon the sensitivity of the first phase test, which is described below.

 
Theorem 3.2

For the “Ethical” sampling design |$p_ t=\theta_t$|⁠, if and only if sensitivity at the |$T=t-1$| is equal to |$1$|⁠.

Given |$f_{t-1,1}=1$|⁠, if sensitivity turns out to be one, it essentially tells us that the cases at |$T=t$| are all attributed due to incidence only. We next describe the variance of |$\widehat{\theta}_{t}$| given as,
(3.3)
An exact formula for the variance involving the product of two random variables are given in Goodman (1960); which states that for two independent random variable |$A$| and |$B$| the |$V(AB)$| is given by, |$ V(AB)=E[A]^2V(B)+E[B]^2V(A)+V(A)V(B).$| The unbiased estimator of the above is obtained by using usual sample estimates, |$ \widehat{V}(AB)=\overline{a}^2 s(B)+ \overline{b}^2 s(A) - s(A)s(B). $|Goodman (1960) also provided the consistent estimate of the above in case of non-independence, which is little bit more involved. Notably, if |$p_t$| and |$p_{t-1}$| are assumed to be independent then using equation above and delta method the approximate variance for |$\widehat{\theta}_{t}$| is given by,
(3.4)

Equations (3.1) and (3.2) may look unrelated, but in fact they are equivalent. To show that we next propose a lemma.

 
Lemma 3.3

Both the estimate of |$\theta_{t}$| given in equations (3.1) and 3.2 are equivalent in the sense that following identity connects them together,|$\prod_{i=1}^t (1-\theta_{t})=\frac{N_t (1-p_{t})}{N (1-p_{1})}.$|

4. Efficiency comparison: single- vs. two-phase design

For cross-sectional setup McName (2003) described in details the efficiency of a two-phase design with a simple, “single-phase” design. In this section, we deduce the same in a longitudinal setting. Suppose total allowable cost at the |$t$|-th time point is fixed at |$C_{t}$|⁠; For the sake of simplicity, we assume |$c_S$| and |$c_D$| (first and second phase cost) does not vary considerably with time. Also note |$C_{t}=c_S N_t + c_D n_t$| must hold for two-phase design. It is easy to show that number of subjects under investigation in two different sampling design is related as |$n_{0t}=N_t\left[ \frac{c_S}{c_D}+ \pi_{t1}f_{t1}+(1-\pi_{t1})f_{t2} \right]$|⁠, where |$n_{0t}$| denote the number of persons in a single-phase design under diagnosis by gold-standard diagnostics only with |$c_D$| cost per subject. For this single-phase design the prevalence estimate is a simple sample proportion of cases with variance |$\frac{c_D p(1-p)}{C_{t}}$|⁠. For relative efficiency (RE), McName (2003) compared the smallest two-phase standard error (SE) with that of the standard error of single phase prevalence estimate as:
(4.1)
at the |$t-$|th time point. Above can be simplified in terms of specificity and sensitivity. We denote |$S_{t1}=$| specificity at |$t-$|th time|$=P(Y=2|\bar{D})=\frac{d_t}{d_t+b_t}$| and |$S_{t2}=$| sensitivity at |$t-$|th time|$=P(Y=1|D)=\frac{a_t}{a_t+c_t}$|⁠. Equation (4.1) can be equivalently expressed as,
(4.2)
where |$\rho_t=Correlation(Y,D)$| at the |$t$|-th time. Using the fact |$|\rho_t|\leq 1$| and |$\frac{c_S}{c_D}<1$|⁠, McName (2003) also provided a lower bound for the above in terms of specificity and sensitivity. The lower bound holds even when we fix |$f_{t1}=1$| and |$f_{t2}<1$|⁠, parallel to our motivating examples involving “Ethical” sampling. McName (2003) also concluded that except for high specificity and sensitivity, simple random sample design will usually yield a more precise estimate. However, this does not take care of the ethical reason which is also pointed out by McName (2004). In the longitudinal setup, we present two situations for efficiency comparison.

4.1. Screening test improves over time

For the ease of exposition, we assume that there exist a monotonic improvement in the screening test. We assume without loss of generality that explanatory variable(s) |$X$| is(are) used for classifying |$Y$| correctly, so that specificity and sensitivity approaches to |$1$|⁠, as |$t\rightarrow\infty$|⁠. Essentially this points out |$b_t,c_t\rightarrow 0$| as |$t\rightarrow\infty$| in the contingency table of Figure 1(b). Let us assume |$S_{t1}=\frac{\alpha}{\alpha+e^{-t}}$| and |$S_{t2}=\frac{\beta}{\beta+e^{-t}}$| which satisfy the above properties. Of course, there exist other functional form which also satisfy those properties, however, we choose the above due to its simplicity of exposition and closeness with the logistic-link function. A test is hardly considered to be of any practical use if both specificity and sensitivity are below |$0.5$|⁠. McName (2004) pointed out some simplification if we agree to take |$S_{t1}=S_{t2}$|⁠, in which case |$\rho_{max}=S_{t1}+S_{t2}-1$|⁠. Considering both, let us take |$\alpha=\beta=1$| for the time being which yields,
(4.3)
where |$0\leq c=\frac{c_S}{c_D}\leq 1$|⁠. Hence, the reduction in SE is bounded by, |$ 1-\frac{\mbox{min SE}_{2-phase}}{\mbox{SE}_{1-phase}}<1-\frac{2\sqrt{e^{-t}}}{1+e^{-t}}.$| For the case |$S_{t1}\neq S_{t2}$|⁠, |$\rho_{max}=\frac{S_{t1}+S_{t2}-1}{\sqrt{(S_{t1}+S_{t2}-1)^2+\left(\sqrt{(1-S_{t1})S_{t1}}+\sqrt{(1-S_{t2})S_{t2}} \right)^2}}$| the simplification of which is rather involved. For our specific functional choice of |$S_{t1}$| and |$S_{t2}$|⁠, |$\rho_{max}$| is a function of |$\alpha$| ,|$\beta$| and |$t$|⁠. If we replace the above |$\rho_{max}$| in equation (4.2), it yields |$\frac{\mbox{min SE}_{2-phase}}{\mbox{SE}_{1-phase}}=f(\alpha,\beta,c,t)$|⁠, simplification of which is not possible without making further restrictive assumption/s.

4.2. Screening test degrades over time

Here, we assume that the screening test performance degrades with time monotonically. In other words classification (of |$Y$|⁠) performance of the explanatory variable(s) |$X$| fail as time progresses. In real life this can happen when screening test is constructed on baseline variables and the disease characteristics in the population changes significantly over time. Thus over time the performance of screening test degrades yielding increasing number of false positives and false negatives. This implies that specificity and sensitivity approaches to |$0.5$| or lower as |$t\uparrow$|⁠. The case when both specificity and sensitivity falls below |$0.5$| corresponds to random guessing and hence not of much practical value. However, in practice specificity and sensitivity are often inversely related and producing a screening test that is high in both may be difficult to produce unless considerable time and resource are spent in producing such test. This is in contrary to the idea of “cheap” screening test in two-phase sampling. Hence, when constructing a screening test for low prevalence disease (as in our data example in Section 6) more emphasis is given on achieving high sensitivity (Gordis, 2009). It is recommended for preventable or curable disease we optimize sensitivity first following specificity. If we assume |$S_{t1}=\frac{\alpha}{\alpha+e^{-1/t}}$|⁠, |$S_{t2}=\frac{\beta}{\beta+e^{-1/t}}$| and also |$\alpha=\beta=1$|⁠, this will yield,
(4.4)

Hence the reduction in SE is bounded by, |$ 1-\frac{\mbox{min SE}_{2-phase}}{\mbox{SE}_{1-phase}}<1-\frac{2\sqrt{e^{-1/t}}}{1+e^{-1/t}}. $| The expression for the case |$S_{t1}\neq S_{t2}$| can be obtained in a similar fashion.

5. Simulation studies

As stated in the introduction section, our work is motivated by the problem of estimating incidence and prevalence in longitudinal setup. Two motivating examples have many common features (detection of disease status longitudinally), as well as variations, which are unique to each specific study design. In simulation setup, we have assumed a simplified setup which is common to both studies, to get a good idea how our estimation method performs under different scenario. In particular, we assumed that a fixed number of sample is being followed over the time, with no additional recruitment in between. There could be also data loss due to attrition and dropout due to untimely death and refusal to participate in the study at a future date. Missing data and time varying covariates are also often accompany many longitudinal study, however not considered in the present setup.

We assumed that the covariate(s) |$X$| which could be surrogate marker(s), informants score, socio/clinical variables etc. (e.g. CSI’D’ in example 1 of Hall and others (1999)) are used or potentially could be used in the actual study to do the stratification in the first phase. Prevalence and incidence are also could be highly correlated with other demographic variables such as age, sex, race etc. A logistic regression based classification technique has been used to create |$Y$| in Shen and others (2006). However, in the present case stratification of phase one is more of a clustering problem rather than the classification. Clustering essentially involves creating labels (⁠|$Y$|⁠) from the explanatory variable(s) (⁠|$X$|⁠), while classification aims to create “rules” when both |$Y$| and |$X$| are available. Unfortunately |$Y$| is not available in the present case in the beginning of phase one. Hence for the creation of |$Y$| label we have used mixture model based clustering (Fraley and Raftery, 2006) with two clusters (e.g. disease and non-disease) for all simulation. Details of the simulation steps are as follows:

  1. We generate |$900$| samples for the non-disease group such that |$X_N\sim N(2,2)$|⁠. For the disease positive group, we generate |$100$| samples from |$X_D\sim N(-2,4)$|⁠. We store the original labels (disease and non-disease) as |$D$|⁠. In the motivating example 1 (Hall and others, 1999), subjects with lower CSI’D’ are deemed to be demented and also the standard deviation in the demented group is higher than the normal one. Choice of |$X_D$| and |$X_N$| are primarily governed by the above considerations. Above distributional setup also ensures existence of enough overlap between two groups, thus creating some degree of fallibility in |$Y$|⁠.

  2. We cluster |$N=1000$| samples into two different clusters using mixture model. Denote |$Y_i=1$| if i-th subject is grouped in the disease positive cluster, |$2$| otherwise. Clustering acts as a proxy to the fallible phase one screening test in our simulation.

  3. Following the strategy of Shrout and Newman (1989), select every member from the disease positive cluster and randomly select |$10\%$| subjects from the non-disease one.

  4. On the assumption that gold-standard second phase test is highly accurate, treat the original |$D$| as the output of it. Create a |$2\times 2$| contingency table comparing |$Y$| and |$D$| of those subjects selected at the second phase. Estimate the prevalence and incidence rate (if applicable).

  5. Remove those subjects who have had the true disease positive status in the second phase via |$D$|⁠.

  6. Choose a |$\theta_t$| (incidence rate) and out of |$N-n$| non-disease individuals change the status of |$(N-n)\theta_t$| many subjects from non-disease to disease. Note that, for variable incidence rate, |$\theta_t$| will vary for each |$T=t$|⁠, while for the fixed |$\theta_t$| it needs to be selected only once. For those with changed status are assigned a new covariate |$X$| following Step 1.

  7. The true number of the sample from each category can be found from the Figure 1(b) (see “Unobserved Truth”) as a function of time.

  8. No more updating of the covariate |$X$| is needed if we assume that it is invariant over time. However, if we assume monotonic changes (improvement/degradation) in the |$X$|⁠, adjustments are required. Improvements will signify further separation of disease and non-disease groups, resulting in more accurate prediction of |$Y$|⁠. For each member in the disease positive group change the score of the |$i-$|th individual as, |$X_i^{new}=X_i^{old}-\delta_i \gamma_i$|⁠, where |$\delta_i \sim U[0,2]$| denotes the rate of improvement and |$\gamma_i\sim Bernoulli(0.5)$| is an indicator of such an improvement. |$\delta_i$| and |$\gamma_i$| vary among different individuals. If we choose |$\gamma_i=1$|⁠, |$\forall i$| it indicates improvement for all subjects, while for |$\gamma_i=0$|⁠, |$\forall i$| indicates the invariant case. Similarly for the non-disease group define, |$X_i^{new}=X_i^{old}+\delta_i\gamma_i$|⁠. For the degradation of informants score we will follow similar strategy by defining |$X_i^{new}=X_i^{old}+\delta_i\gamma_i$| for the disease positive group and |$X_i^{new}=X_i^{old}-\delta_i\gamma_i$| for the non-disease group. This essentially makes separation between two groups even harder, which in turn will lower the predictive accuracy of |$Y$|⁠.

We repeat the above steps for |$T=1,2,\ldots,10$|⁠. Note that our motivating examples have only few (two in example 1 and three in example 2) time points including the baseline. Here, we consider six different possible scenarios;

  1. Time invariant |$X$| with fixed incidence rate.

  2. Classification via |$X$| improves with time and fixed incidence rate.

  3. Classification via |$X$| degrades with time and fixed incidence rate.

  4. Time invariant |$X$| with variable incidence rate.

  5. Classification via |$X$| improves with time and variable incidence rate.

  6. Classification via |$X$| degrades with time and variable incidence rate.

For each case, we estimate the prevalence and incidence rate via equations (2.1) and (3.2) and also their respective variance. The result for six different cases are represented in Tables 1 and 2. In each table, we also report the sensitivity and specificity of the first phase clustering result. This is important as pointed out by McName (2003), as the efficiency of the two-phase design often determined by the high sensitivity and/or specificity. For the comparison purpose we also report the true prevalence, which is obtained via |$D$| in each wave. Table 1 represents the fixed incidence rate case with |$\theta_t=0.05$| for all the waves. For the time invariant |$X$|⁠, the estimated incidence rate is close to the true value. When |$X$| improves in predicting |$Y$| with time, the estimated incidence rate is highly accurate and numerically very close to the true value |$0.05$|⁠. We also see after wave five, the first phase clustering results are perfect with sensitivity and specificity approaching to one. While this is too good to be true in reality it does indicate the fact that sensitivity and specificity of the first phase test plays a significant role, not only in the efficiency of the two-phase design but also on the accuracy of the estimated incidence rate. Similar statements can be made on the estimate of the prevalence. On the other hand, when |$X$| degrades in predicting |$Y$| with time the first phase clustering produces many misclassified |$Y$|⁠. This results in low sensitivity and specificity with progressing time. Notably, prevalence estimate is quite robust to withstand this mis-specification, however, similar statements cannot be made for the incidence estimate. Table 2 represents the variable incidence rate case. For the invariant case the results are not as accurate as the fixed incidence rate (see Table 1) for both |$p_t$| and |$\theta_t$|⁠. However, if we compute the simple correlation between |$\widehat{\theta_t}$| and true |$\theta_t$| it yields correlation of |$0.91$|⁠. Notably, the specificity for both tables are relatively low. For the improved |$X$| the estimates (for both |$p_t$| and |$\theta_t$|⁠) are quite accurate, with high first phase sensitivity and specificity. Simulation studies presented above exhibits somewhat low specificity due to the non-separability between the disease and non-disease group resulting in high number of false positives. This can be easily altered by lowering the standard deviation each normal distribution. Additional results with high baseline specificity (and sensitivity) is available in the supplementary material available at Biostatistics online.

Table 1.

Simulation result for fixed |$\theta_t$| for three different scenarios

Scenario 1: Time invariant |$X$|
Time|$\widehat{p_t}$|V(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|V(⁠|$\widehat{\theta_t}$|⁠)SensitivitySpecificityTrue |$p_t$|True |$\theta_t$|
Baseline0.09980.01620.9750.1780.1
Wave10.07710.01940.05880.02590.9320.1140.0680.05
Wave20.07290.02020.04360.02810.9190.1430.070.05
Wave30.08250.02130.05220.0290.9280.1430.0780.05
Wave40.07040.01880.03760.02880.9470.1340.0790.05
Wave50.070.01960.04740.02720.9430.1360.0820.05
Wave60.07060.0160.04680.02540.9760.0980.0860.05
Wave70.07340.01690.0610.02330.9750.0730.0780.05
Wave80.07880.0230.06570.02840.9370.0740.0670.05
Wave90.06410.01820.03490.02980.9670.0820.0670.05
Wave100.08430.0250.06990.03030.9350.0870.0670.05
Scenario 2: Classification via |$X$| improves with time
Baseline0.09980.01620.9750.1780.1
Wave10.08790.02180.06980.02750.9110.1120.0680.05
Wave20.08120.01770.04210.02830.9620.2650.0690.05
Wave30.06460.01430.04370.02310.9770.1730.060.05
Wave40.06310.01490.05220.02070.9750.7980.0570.05
Wave50.04890.00790.03710.0172110.0560.05
Wave60.05570.00870.05570.0116110.0570.05
Wave70.05140.00860.05140.0122110.0510.05
Wave80.04620.00830.04620.0121110.0490.05
Wave90.05350.00920.05350.0123110.0530.05
Wave100.04940.00910.04940.013110.0490.05
Scenario 3: Classification via |$X$| degrades with time
Baseline0.0920.009110.1850.1
Wave10.05950.0130.05950.01640.9770.1240.0580.05
Wave20.06350.01730.05360.02150.9460.1210.0590.05
Wave30.05830.01790.03760.0250.9330.0880.0660.05
Wave40.10490.0270.08390.03080.8720.0870.080.05
Wave50.09290.02750.03630.03910.80.0450.0820.05
Wave60.07510.02320.0140.03670.8920.0490.0980.05
Wave70.10940.02860.07390.03540.8480.0330.1090.05
Wave80.09780.02780.03440.04040.8670.0370.1120.05
Wave90.1170.03130.0650.04090.8330.0350.1170.05
Wave100.1650.03650.0990.04540.7330.0270.1190.05
Scenario 1: Time invariant |$X$|
Time|$\widehat{p_t}$|V(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|V(⁠|$\widehat{\theta_t}$|⁠)SensitivitySpecificityTrue |$p_t$|True |$\theta_t$|
Baseline0.09980.01620.9750.1780.1
Wave10.07710.01940.05880.02590.9320.1140.0680.05
Wave20.07290.02020.04360.02810.9190.1430.070.05
Wave30.08250.02130.05220.0290.9280.1430.0780.05
Wave40.07040.01880.03760.02880.9470.1340.0790.05
Wave50.070.01960.04740.02720.9430.1360.0820.05
Wave60.07060.0160.04680.02540.9760.0980.0860.05
Wave70.07340.01690.0610.02330.9750.0730.0780.05
Wave80.07880.0230.06570.02840.9370.0740.0670.05
Wave90.06410.01820.03490.02980.9670.0820.0670.05
Wave100.08430.0250.06990.03030.9350.0870.0670.05
Scenario 2: Classification via |$X$| improves with time
Baseline0.09980.01620.9750.1780.1
Wave10.08790.02180.06980.02750.9110.1120.0680.05
Wave20.08120.01770.04210.02830.9620.2650.0690.05
Wave30.06460.01430.04370.02310.9770.1730.060.05
Wave40.06310.01490.05220.02070.9750.7980.0570.05
Wave50.04890.00790.03710.0172110.0560.05
Wave60.05570.00870.05570.0116110.0570.05
Wave70.05140.00860.05140.0122110.0510.05
Wave80.04620.00830.04620.0121110.0490.05
Wave90.05350.00920.05350.0123110.0530.05
Wave100.04940.00910.04940.013110.0490.05
Scenario 3: Classification via |$X$| degrades with time
Baseline0.0920.009110.1850.1
Wave10.05950.0130.05950.01640.9770.1240.0580.05
Wave20.06350.01730.05360.02150.9460.1210.0590.05
Wave30.05830.01790.03760.0250.9330.0880.0660.05
Wave40.10490.0270.08390.03080.8720.0870.080.05
Wave50.09290.02750.03630.03910.80.0450.0820.05
Wave60.07510.02320.0140.03670.8920.0490.0980.05
Wave70.10940.02860.07390.03540.8480.0330.1090.05
Wave80.09780.02780.03440.04040.8670.0370.1120.05
Wave90.1170.03130.0650.04090.8330.0350.1170.05
Wave100.1650.03650.0990.04540.7330.0270.1190.05
Table 1.

Simulation result for fixed |$\theta_t$| for three different scenarios

Scenario 1: Time invariant |$X$|
Time|$\widehat{p_t}$|V(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|V(⁠|$\widehat{\theta_t}$|⁠)SensitivitySpecificityTrue |$p_t$|True |$\theta_t$|
Baseline0.09980.01620.9750.1780.1
Wave10.07710.01940.05880.02590.9320.1140.0680.05
Wave20.07290.02020.04360.02810.9190.1430.070.05
Wave30.08250.02130.05220.0290.9280.1430.0780.05
Wave40.07040.01880.03760.02880.9470.1340.0790.05
Wave50.070.01960.04740.02720.9430.1360.0820.05
Wave60.07060.0160.04680.02540.9760.0980.0860.05
Wave70.07340.01690.0610.02330.9750.0730.0780.05
Wave80.07880.0230.06570.02840.9370.0740.0670.05
Wave90.06410.01820.03490.02980.9670.0820.0670.05
Wave100.08430.0250.06990.03030.9350.0870.0670.05
Scenario 2: Classification via |$X$| improves with time
Baseline0.09980.01620.9750.1780.1
Wave10.08790.02180.06980.02750.9110.1120.0680.05
Wave20.08120.01770.04210.02830.9620.2650.0690.05
Wave30.06460.01430.04370.02310.9770.1730.060.05
Wave40.06310.01490.05220.02070.9750.7980.0570.05
Wave50.04890.00790.03710.0172110.0560.05
Wave60.05570.00870.05570.0116110.0570.05
Wave70.05140.00860.05140.0122110.0510.05
Wave80.04620.00830.04620.0121110.0490.05
Wave90.05350.00920.05350.0123110.0530.05
Wave100.04940.00910.04940.013110.0490.05
Scenario 3: Classification via |$X$| degrades with time
Baseline0.0920.009110.1850.1
Wave10.05950.0130.05950.01640.9770.1240.0580.05
Wave20.06350.01730.05360.02150.9460.1210.0590.05
Wave30.05830.01790.03760.0250.9330.0880.0660.05
Wave40.10490.0270.08390.03080.8720.0870.080.05
Wave50.09290.02750.03630.03910.80.0450.0820.05
Wave60.07510.02320.0140.03670.8920.0490.0980.05
Wave70.10940.02860.07390.03540.8480.0330.1090.05
Wave80.09780.02780.03440.04040.8670.0370.1120.05
Wave90.1170.03130.0650.04090.8330.0350.1170.05
Wave100.1650.03650.0990.04540.7330.0270.1190.05
Scenario 1: Time invariant |$X$|
Time|$\widehat{p_t}$|V(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|V(⁠|$\widehat{\theta_t}$|⁠)SensitivitySpecificityTrue |$p_t$|True |$\theta_t$|
Baseline0.09980.01620.9750.1780.1
Wave10.07710.01940.05880.02590.9320.1140.0680.05
Wave20.07290.02020.04360.02810.9190.1430.070.05
Wave30.08250.02130.05220.0290.9280.1430.0780.05
Wave40.07040.01880.03760.02880.9470.1340.0790.05
Wave50.070.01960.04740.02720.9430.1360.0820.05
Wave60.07060.0160.04680.02540.9760.0980.0860.05
Wave70.07340.01690.0610.02330.9750.0730.0780.05
Wave80.07880.0230.06570.02840.9370.0740.0670.05
Wave90.06410.01820.03490.02980.9670.0820.0670.05
Wave100.08430.0250.06990.03030.9350.0870.0670.05
Scenario 2: Classification via |$X$| improves with time
Baseline0.09980.01620.9750.1780.1
Wave10.08790.02180.06980.02750.9110.1120.0680.05
Wave20.08120.01770.04210.02830.9620.2650.0690.05
Wave30.06460.01430.04370.02310.9770.1730.060.05
Wave40.06310.01490.05220.02070.9750.7980.0570.05
Wave50.04890.00790.03710.0172110.0560.05
Wave60.05570.00870.05570.0116110.0570.05
Wave70.05140.00860.05140.0122110.0510.05
Wave80.04620.00830.04620.0121110.0490.05
Wave90.05350.00920.05350.0123110.0530.05
Wave100.04940.00910.04940.013110.0490.05
Scenario 3: Classification via |$X$| degrades with time
Baseline0.0920.009110.1850.1
Wave10.05950.0130.05950.01640.9770.1240.0580.05
Wave20.06350.01730.05360.02150.9460.1210.0590.05
Wave30.05830.01790.03760.0250.9330.0880.0660.05
Wave40.10490.0270.08390.03080.8720.0870.080.05
Wave50.09290.02750.03630.03910.80.0450.0820.05
Wave60.07510.02320.0140.03670.8920.0490.0980.05
Wave70.10940.02860.07390.03540.8480.0330.1090.05
Wave80.09780.02780.03440.04040.8670.0370.1120.05
Wave90.1170.03130.0650.04090.8330.0350.1170.05
Wave100.1650.03650.0990.04540.7330.0270.1190.05
Table 2.

Simulation result for variable |$\theta_t$| for three different scenarios

Scenario 1: Time invariant |$X$|
Time|$\widehat{p_t}$|V(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|V(⁠|$\widehat{\theta_t}$|⁠)SensitivitySpecificityTrue |$p_t$|True |$\theta_t$|
Baseline0.09980.01620.9750.1780.1
Wave10.02160.01140.00230.02140.9090.090.0220.0005
Wave20.03640.01220.02680.01620.9580.1080.0390.03
Wave30.01360.01140.00350.01690.6660.0930.0160.002
Wave40.08060.01760.07110.01950.9620.1520.0770.065
Wave50.1350.02730.11590.03050.9250.3370.1080.092
Wave60.09090.02330.03390.03760.9280.0840.0680.038
Wave70.06660.02020.030.03160.9330.0930.0560.042
Wave80.03340.0070.00770.022210.0880.0540.039
Wave90.06310.0160.06310.01760.9690.0750.0670.047
Wave100.03640.0070.02250.018910.0780.0550.036
Scenario 2: Classification via |$X$| improves with time
Baseline0.0950.009210.1450.1
Wave10.04420.00680.04420.012210.1350.0510.046
Wave20.02540.00540.02540.008810.1370.0260.019
Wave30.08780.01480.08770.01470.9840.7980.0880.087
Wave40.04370.00730.03250.017310.1710.0440.032
Wave50.07270.01580.07270.01690.9780.4840.0670.067
Wave60.040.00740.02740.018110.3560.0440.037
Wave70.02830.00640.02830.009810.0780.0280.024
Wave80.07520.01770.07520.01790.97510.0690.069
Wave90.04740.00860.0330.0203110.050.043
Wave100.05660.00950.05660.0127110.0580.054
Scenario 3: Classification via |$X$| degrades with time
Baseline0.08790.0130.9870.1810.1
Wave10.04240.01840.0330.02370.750.0760.0380.016
Wave20.09430.0270.06640.03090.6810.0630.0740.049
Wave30.14540.02960.07930.03780.8590.0710.1420.096
Wave40.13940.02830.05760.04120.9010.0640.1550.078
Wave50.150.03340.08530.04320.7350.0420.140.057
Wave60.12670.0330.01760.04830.6360.0430.1220.024
Wave70.16310.02540.07040.03990.8290.0070.1810.095
Wave80.12760.01860.02620.03280.3840.0010.1360.015
Wave90.1960.01550.09630.02230.81200.1890.079
Wave100.21980.01680.10090.02220.8300.2110.097
Scenario 1: Time invariant |$X$|
Time|$\widehat{p_t}$|V(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|V(⁠|$\widehat{\theta_t}$|⁠)SensitivitySpecificityTrue |$p_t$|True |$\theta_t$|
Baseline0.09980.01620.9750.1780.1
Wave10.02160.01140.00230.02140.9090.090.0220.0005
Wave20.03640.01220.02680.01620.9580.1080.0390.03
Wave30.01360.01140.00350.01690.6660.0930.0160.002
Wave40.08060.01760.07110.01950.9620.1520.0770.065
Wave50.1350.02730.11590.03050.9250.3370.1080.092
Wave60.09090.02330.03390.03760.9280.0840.0680.038
Wave70.06660.02020.030.03160.9330.0930.0560.042
Wave80.03340.0070.00770.022210.0880.0540.039
Wave90.06310.0160.06310.01760.9690.0750.0670.047
Wave100.03640.0070.02250.018910.0780.0550.036
Scenario 2: Classification via |$X$| improves with time
Baseline0.0950.009210.1450.1
Wave10.04420.00680.04420.012210.1350.0510.046
Wave20.02540.00540.02540.008810.1370.0260.019
Wave30.08780.01480.08770.01470.9840.7980.0880.087
Wave40.04370.00730.03250.017310.1710.0440.032
Wave50.07270.01580.07270.01690.9780.4840.0670.067
Wave60.040.00740.02740.018110.3560.0440.037
Wave70.02830.00640.02830.009810.0780.0280.024
Wave80.07520.01770.07520.01790.97510.0690.069
Wave90.04740.00860.0330.0203110.050.043
Wave100.05660.00950.05660.0127110.0580.054
Scenario 3: Classification via |$X$| degrades with time
Baseline0.08790.0130.9870.1810.1
Wave10.04240.01840.0330.02370.750.0760.0380.016
Wave20.09430.0270.06640.03090.6810.0630.0740.049
Wave30.14540.02960.07930.03780.8590.0710.1420.096
Wave40.13940.02830.05760.04120.9010.0640.1550.078
Wave50.150.03340.08530.04320.7350.0420.140.057
Wave60.12670.0330.01760.04830.6360.0430.1220.024
Wave70.16310.02540.07040.03990.8290.0070.1810.095
Wave80.12760.01860.02620.03280.3840.0010.1360.015
Wave90.1960.01550.09630.02230.81200.1890.079
Wave100.21980.01680.10090.02220.8300.2110.097
Table 2.

Simulation result for variable |$\theta_t$| for three different scenarios

Scenario 1: Time invariant |$X$|
Time|$\widehat{p_t}$|V(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|V(⁠|$\widehat{\theta_t}$|⁠)SensitivitySpecificityTrue |$p_t$|True |$\theta_t$|
Baseline0.09980.01620.9750.1780.1
Wave10.02160.01140.00230.02140.9090.090.0220.0005
Wave20.03640.01220.02680.01620.9580.1080.0390.03
Wave30.01360.01140.00350.01690.6660.0930.0160.002
Wave40.08060.01760.07110.01950.9620.1520.0770.065
Wave50.1350.02730.11590.03050.9250.3370.1080.092
Wave60.09090.02330.03390.03760.9280.0840.0680.038
Wave70.06660.02020.030.03160.9330.0930.0560.042
Wave80.03340.0070.00770.022210.0880.0540.039
Wave90.06310.0160.06310.01760.9690.0750.0670.047
Wave100.03640.0070.02250.018910.0780.0550.036
Scenario 2: Classification via |$X$| improves with time
Baseline0.0950.009210.1450.1
Wave10.04420.00680.04420.012210.1350.0510.046
Wave20.02540.00540.02540.008810.1370.0260.019
Wave30.08780.01480.08770.01470.9840.7980.0880.087
Wave40.04370.00730.03250.017310.1710.0440.032
Wave50.07270.01580.07270.01690.9780.4840.0670.067
Wave60.040.00740.02740.018110.3560.0440.037
Wave70.02830.00640.02830.009810.0780.0280.024
Wave80.07520.01770.07520.01790.97510.0690.069
Wave90.04740.00860.0330.0203110.050.043
Wave100.05660.00950.05660.0127110.0580.054
Scenario 3: Classification via |$X$| degrades with time
Baseline0.08790.0130.9870.1810.1
Wave10.04240.01840.0330.02370.750.0760.0380.016
Wave20.09430.0270.06640.03090.6810.0630.0740.049
Wave30.14540.02960.07930.03780.8590.0710.1420.096
Wave40.13940.02830.05760.04120.9010.0640.1550.078
Wave50.150.03340.08530.04320.7350.0420.140.057
Wave60.12670.0330.01760.04830.6360.0430.1220.024
Wave70.16310.02540.07040.03990.8290.0070.1810.095
Wave80.12760.01860.02620.03280.3840.0010.1360.015
Wave90.1960.01550.09630.02230.81200.1890.079
Wave100.21980.01680.10090.02220.8300.2110.097
Scenario 1: Time invariant |$X$|
Time|$\widehat{p_t}$|V(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|V(⁠|$\widehat{\theta_t}$|⁠)SensitivitySpecificityTrue |$p_t$|True |$\theta_t$|
Baseline0.09980.01620.9750.1780.1
Wave10.02160.01140.00230.02140.9090.090.0220.0005
Wave20.03640.01220.02680.01620.9580.1080.0390.03
Wave30.01360.01140.00350.01690.6660.0930.0160.002
Wave40.08060.01760.07110.01950.9620.1520.0770.065
Wave50.1350.02730.11590.03050.9250.3370.1080.092
Wave60.09090.02330.03390.03760.9280.0840.0680.038
Wave70.06660.02020.030.03160.9330.0930.0560.042
Wave80.03340.0070.00770.022210.0880.0540.039
Wave90.06310.0160.06310.01760.9690.0750.0670.047
Wave100.03640.0070.02250.018910.0780.0550.036
Scenario 2: Classification via |$X$| improves with time
Baseline0.0950.009210.1450.1
Wave10.04420.00680.04420.012210.1350.0510.046
Wave20.02540.00540.02540.008810.1370.0260.019
Wave30.08780.01480.08770.01470.9840.7980.0880.087
Wave40.04370.00730.03250.017310.1710.0440.032
Wave50.07270.01580.07270.01690.9780.4840.0670.067
Wave60.040.00740.02740.018110.3560.0440.037
Wave70.02830.00640.02830.009810.0780.0280.024
Wave80.07520.01770.07520.01790.97510.0690.069
Wave90.04740.00860.0330.0203110.050.043
Wave100.05660.00950.05660.0127110.0580.054
Scenario 3: Classification via |$X$| degrades with time
Baseline0.08790.0130.9870.1810.1
Wave10.04240.01840.0330.02370.750.0760.0380.016
Wave20.09430.0270.06640.03090.6810.0630.0740.049
Wave30.14540.02960.07930.03780.8590.0710.1420.096
Wave40.13940.02830.05760.04120.9010.0640.1550.078
Wave50.150.03340.08530.04320.7350.0420.140.057
Wave60.12670.0330.01760.04830.6360.0430.1220.024
Wave70.16310.02540.07040.03990.8290.0070.1810.095
Wave80.12760.01860.02620.03280.3840.0010.1360.015
Wave90.1960.01550.09630.02230.81200.1890.079
Wave100.21980.01680.10090.02220.8300.2110.097

6. Analysis of home health care study

According to National Institute of Mental Health depression is termed as a major mood disorder that hinder a person’s daily mental and physical activities. Depression can arise from multiple reasons that varies among different age groups. Studies have shown that depression among older individuals is strongly related to their history of illness and physical inability, although majority of these individuals are not clinically depressed, they are at higher risk of developing depression in future. Steffens and others (2009) reported overall depression prevalence of |$11.19\%$| based on a nationally representative cohort study for subjects with age more than |$71$|⁠. As discussed in Section 1.2, Bruce and others (2002) conducted a longitudinal study with clinical diagnosis data of older adults with medical comorbidity and functional disability, in order to identify potential risk factors associated with new depression cases. The goal of the study was to early identification, intervention, and prevention of clinically depressed individuals. Original study was designed as a single gold-standard test based on consensus, which deemed best from the feasibility point of view. The study also gather a wealth of associated socio-clinical and demographical data on the recruited subjects (Weissman and others, 2011a,b). Our objective is to show that if some of those additionally gathered covariates can be used to create a screening test, then using our developed methods one can obtain accurate estimator of prevalence and incidence. This can results in significant cost saving as in two-phase design time (and money) consuming gold-standard test need to be carried out only for a fraction of all recruited subjects. Since the accuracy of the screening test determines the success of two-phase design, we have used two different methods of screening-test construction. We have used informants score, demographic traits (age, gender, marital status, education, poverty status, race, and smoking status), mobility, MMSE, ADL, IADL, BMI etc. to construct screening test. Two clustering mechanism: (i) Model-based clustering and (ii) hierarchical clustering, are chosen as the screening test. The data used to obtain estimate at three separate time points: the baseline, 3-month followup, and 1-year followup. The design for the two-phase sampling scheme is;

  1. (1) The screening test is conducted on the entire available sample at each stage to separate the subjects into two groups with: (i) depressed (screened positive) and (ii) non-depressed (screened negative).

  2. (2) An “Ethical” sampling plan is followed, i.e. those screened positive in the screening test are all included in the second phase for gold-standard test.

  3. (3) A simple random sample of screened negative individuals received gold-standard test in the second phase. We have considered three different fractions e.g. 5%, 10%, and 20% to study the accuracy of our estimation. Increasing the proportions of negative screened individuals will push the cost up but will reduce the variability of estimates.

After the phase two testing in each time period, the predicted prevalence and the predicted incidence rate are calculated via equations (2.1) and (3.2), respectively along with their standard deviations. The goal of this two-phase sampling scheme is to compare the predicted prevalence to the observed truth, in order to determine the precision of the proposed estimates. Moreover, since the screening test is fairly cheap as it is based on easily obtained additional information and then the gold-standard test need to be administered only on a fraction of total subjects, therefore the effective cost of the entire study could be significantly reduced. Albeit, when the original study was carried out two-phase longitudinal design was neither popular and to the best of our knowledge this article is the first endeavor to do so from the statistical methodology point of view. Hence, we use Home Health Care study as a benchmark purpose only and not to criticize the original design retrospectively. We hope that our methodological development will create synergy to consider two-phase design as an attractive alternative even in longitudinal follow up studies where the goal is true case detention over time. It is to be noted that the original Home Health Care study did not report any incidence rate, which we also estimated from the available data at each wave. The following sections will elaborate the screening tests that we constructed and their performances at each wave.

6.1. Model-based clustering

Note, the distributions of the variables considered for constructing screening test are not homogeneous, i.e. some variables are continuous, some discrete valued, and rest are nominal. This is major violation of mixture-model based clustering assumption. To alleviate this issue, principal component analysis (PCA) is performed first on the screening test variables to capture maximum possible variation in the data. The number of principal components chosen for the clustering are 10, 9, and 9, respectively for the three waves. Elbow plot of the PC’s are available in the supplementary material available at Biostatistics online. A model-based clustering (Fraley and Raftery, 2002) is implemented on the derived principle components of each time point to classify the entire available sample at each wave into depressed and non-depressed group. In order to check the accuracy of the proposed screening test, sensitivity and specificity after screening test is being conducted. Table 3 demonstrates the performance of the model-based clustering at each wave. Note that, since we have a low prevalence disease case (e.g. Depression) following the suggestion of Gordis (2009) more emphasis was given on sensitivity (see Section 4.2). Also for low prevalence population screening test often produces high number of false positives, thus yielding relatively low specificity.

Table 3.

Sensitivity and specificity analysis of different clustering methods as screening test

 Model-based clusteringHierarchical clustering
Time of studySensitivitySpecificitySensitivitySpecificity
Baseline0.5680.2910.7450.171
3 Month0.8850.1370.5290.290
12 Month0.8210.2230.6820.315
 Model-based clusteringHierarchical clustering
Time of studySensitivitySpecificitySensitivitySpecificity
Baseline0.5680.2910.7450.171
3 Month0.8850.1370.5290.290
12 Month0.8210.2230.6820.315
Table 3.

Sensitivity and specificity analysis of different clustering methods as screening test

 Model-based clusteringHierarchical clustering
Time of studySensitivitySpecificitySensitivitySpecificity
Baseline0.5680.2910.7450.171
3 Month0.8850.1370.5290.290
12 Month0.8210.2230.6820.315
 Model-based clusteringHierarchical clustering
Time of studySensitivitySpecificitySensitivitySpecificity
Baseline0.5680.2910.7450.171
3 Month0.8850.1370.5290.290
12 Month0.8210.2230.6820.315

Following the sampling scheme mentioned above, we estimated the prevalence and incidence rate for each wave and for each fraction (e.g. 5%, 10%, and 20%) of negative screened individuals (by screening test) included for gold-standard test. The sampling scheme of choosing negative screened individuals is repeated |$500$| times to generate respective mean prevalence and their dispersion measure. The corresponding incidence rates and its standard deviation for each sampling scheme and wave, are estimated in Table 4. The second and third columns of Table 4 are the cohort size and the true prevalence observed for each wave. Forth column shows the proportion of the negatively screened individuals who are included for the second phase test and the final sample size is in the fifth column. Columns six, seven, eight, and nine exhibit estimated prevalence, estimated standard deviation of the prevalence, estimated incidence, and estimated standard deviation of the incidence, respectively.

Table 4.

Detailed analysis for different clustering methods as screening test

Model-based clustering as screening test
TimeCohort SizeTrue |$p_t$|ProportionSample Size|$\widehat{p_t}$|SD(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|SD(⁠|$\widehat{\theta_t}$|⁠)
Baseline53915.95|$\%$|5|$\%$|18416.06|$\%$|5.73|$\%$|NANA
10|$\%$|20216.05|$\%$|4.24|$\%$|NANA
20|$\%$|24015.98|$\%$|2.64|$\%$|NANA
3 Month40110.22|$\%$|5|$\%$|23410.41|$\%$|4.45|$\%$|12.30|$\%$|8.09|$\%$|
10|$\%$|24310.11|$\%$|2.97|$\%$|12.02|$\%$|5.76|$\%$|
20|$\%$|26010.24|$\%$|1.96|$\%$|12.03|$\%$|3.65|$\%$|
12 Month29315.69|$\%$|5|$\%$|13215.90|$\%$|6.98|$\%$|17.90|$\%$|8.83|$\%$|
10|$\%$|14115.55|$\%$|4.25|$\%$|17.59|$\%$|5.49|$\%$|
20|$\%$|15815.75|$\%$|2.95|$\%$|17.65|$\%$|3.76|$\%$|
Hierarchical clustering as screening test
Baseline53915.95|$\%$|5|$\%$|29915.77|$\%$|5.13|$\%$|NANA
10|$\%$|31115.64|$\%$|3.41|$\%$|NANA
20|$\%$|33715.52|$\%$|2.23|$\%$|NANA
3 Month40110.22|$\%$|5|$\%$|2838.45|$\%$|6.99|$\%$|10.33|$\%$|9.99|$\%$|
10|$\%$|2898.44|$\%$|4.76|$\%$|10.26|$\%$|6.75|$\%$|
20|$\%$|3028.62|$\%$|3.22|$\%$|10.56|$\%$|4.51|$\%$|
12 Month29315.69|$\%$|5|$\%$|9916.59|$\%$|3.35|$\%$|17.97|$\%$|7.37|$\%$|
10|$\%$|10916.48|$\%$|2.41|$\%$|18.27|$\%$|5.07|$\%$|
20|$\%$|13016.22|$\%$|1.50|$\%$|18.10|$\%$|3.37|$\%$|
Model-based clustering as screening test
TimeCohort SizeTrue |$p_t$|ProportionSample Size|$\widehat{p_t}$|SD(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|SD(⁠|$\widehat{\theta_t}$|⁠)
Baseline53915.95|$\%$|5|$\%$|18416.06|$\%$|5.73|$\%$|NANA
10|$\%$|20216.05|$\%$|4.24|$\%$|NANA
20|$\%$|24015.98|$\%$|2.64|$\%$|NANA
3 Month40110.22|$\%$|5|$\%$|23410.41|$\%$|4.45|$\%$|12.30|$\%$|8.09|$\%$|
10|$\%$|24310.11|$\%$|2.97|$\%$|12.02|$\%$|5.76|$\%$|
20|$\%$|26010.24|$\%$|1.96|$\%$|12.03|$\%$|3.65|$\%$|
12 Month29315.69|$\%$|5|$\%$|13215.90|$\%$|6.98|$\%$|17.90|$\%$|8.83|$\%$|
10|$\%$|14115.55|$\%$|4.25|$\%$|17.59|$\%$|5.49|$\%$|
20|$\%$|15815.75|$\%$|2.95|$\%$|17.65|$\%$|3.76|$\%$|
Hierarchical clustering as screening test
Baseline53915.95|$\%$|5|$\%$|29915.77|$\%$|5.13|$\%$|NANA
10|$\%$|31115.64|$\%$|3.41|$\%$|NANA
20|$\%$|33715.52|$\%$|2.23|$\%$|NANA
3 Month40110.22|$\%$|5|$\%$|2838.45|$\%$|6.99|$\%$|10.33|$\%$|9.99|$\%$|
10|$\%$|2898.44|$\%$|4.76|$\%$|10.26|$\%$|6.75|$\%$|
20|$\%$|3028.62|$\%$|3.22|$\%$|10.56|$\%$|4.51|$\%$|
12 Month29315.69|$\%$|5|$\%$|9916.59|$\%$|3.35|$\%$|17.97|$\%$|7.37|$\%$|
10|$\%$|10916.48|$\%$|2.41|$\%$|18.27|$\%$|5.07|$\%$|
20|$\%$|13016.22|$\%$|1.50|$\%$|18.10|$\%$|3.37|$\%$|
Table 4.

Detailed analysis for different clustering methods as screening test

Model-based clustering as screening test
TimeCohort SizeTrue |$p_t$|ProportionSample Size|$\widehat{p_t}$|SD(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|SD(⁠|$\widehat{\theta_t}$|⁠)
Baseline53915.95|$\%$|5|$\%$|18416.06|$\%$|5.73|$\%$|NANA
10|$\%$|20216.05|$\%$|4.24|$\%$|NANA
20|$\%$|24015.98|$\%$|2.64|$\%$|NANA
3 Month40110.22|$\%$|5|$\%$|23410.41|$\%$|4.45|$\%$|12.30|$\%$|8.09|$\%$|
10|$\%$|24310.11|$\%$|2.97|$\%$|12.02|$\%$|5.76|$\%$|
20|$\%$|26010.24|$\%$|1.96|$\%$|12.03|$\%$|3.65|$\%$|
12 Month29315.69|$\%$|5|$\%$|13215.90|$\%$|6.98|$\%$|17.90|$\%$|8.83|$\%$|
10|$\%$|14115.55|$\%$|4.25|$\%$|17.59|$\%$|5.49|$\%$|
20|$\%$|15815.75|$\%$|2.95|$\%$|17.65|$\%$|3.76|$\%$|
Hierarchical clustering as screening test
Baseline53915.95|$\%$|5|$\%$|29915.77|$\%$|5.13|$\%$|NANA
10|$\%$|31115.64|$\%$|3.41|$\%$|NANA
20|$\%$|33715.52|$\%$|2.23|$\%$|NANA
3 Month40110.22|$\%$|5|$\%$|2838.45|$\%$|6.99|$\%$|10.33|$\%$|9.99|$\%$|
10|$\%$|2898.44|$\%$|4.76|$\%$|10.26|$\%$|6.75|$\%$|
20|$\%$|3028.62|$\%$|3.22|$\%$|10.56|$\%$|4.51|$\%$|
12 Month29315.69|$\%$|5|$\%$|9916.59|$\%$|3.35|$\%$|17.97|$\%$|7.37|$\%$|
10|$\%$|10916.48|$\%$|2.41|$\%$|18.27|$\%$|5.07|$\%$|
20|$\%$|13016.22|$\%$|1.50|$\%$|18.10|$\%$|3.37|$\%$|
Model-based clustering as screening test
TimeCohort SizeTrue |$p_t$|ProportionSample Size|$\widehat{p_t}$|SD(⁠|$\widehat{p_t}$|⁠)|$\widehat{\theta_t}$|SD(⁠|$\widehat{\theta_t}$|⁠)
Baseline53915.95|$\%$|5|$\%$|18416.06|$\%$|5.73|$\%$|NANA
10|$\%$|20216.05|$\%$|4.24|$\%$|NANA
20|$\%$|24015.98|$\%$|2.64|$\%$|NANA
3 Month40110.22|$\%$|5|$\%$|23410.41|$\%$|4.45|$\%$|12.30|$\%$|8.09|$\%$|
10|$\%$|24310.11|$\%$|2.97|$\%$|12.02|$\%$|5.76|$\%$|
20|$\%$|26010.24|$\%$|1.96|$\%$|12.03|$\%$|3.65|$\%$|
12 Month29315.69|$\%$|5|$\%$|13215.90|$\%$|6.98|$\%$|17.90|$\%$|8.83|$\%$|
10|$\%$|14115.55|$\%$|4.25|$\%$|17.59|$\%$|5.49|$\%$|
20|$\%$|15815.75|$\%$|2.95|$\%$|17.65|$\%$|3.76|$\%$|
Hierarchical clustering as screening test
Baseline53915.95|$\%$|5|$\%$|29915.77|$\%$|5.13|$\%$|NANA
10|$\%$|31115.64|$\%$|3.41|$\%$|NANA
20|$\%$|33715.52|$\%$|2.23|$\%$|NANA
3 Month40110.22|$\%$|5|$\%$|2838.45|$\%$|6.99|$\%$|10.33|$\%$|9.99|$\%$|
10|$\%$|2898.44|$\%$|4.76|$\%$|10.26|$\%$|6.75|$\%$|
20|$\%$|3028.62|$\%$|3.22|$\%$|10.56|$\%$|4.51|$\%$|
12 Month29315.69|$\%$|5|$\%$|9916.59|$\%$|3.35|$\%$|17.97|$\%$|7.37|$\%$|
10|$\%$|10916.48|$\%$|2.41|$\%$|18.27|$\%$|5.07|$\%$|
20|$\%$|13016.22|$\%$|1.50|$\%$|18.10|$\%$|3.37|$\%$|

6.2. Hierarchical clustering

We also considered a hierarchical clustering mechanism (Ward, 1963; Murtagh, 2014) as a screening test. The test subjects are partitioned into depressed and non-depressed group by employing clustering on screening test variables. An advantage of this approach is that screening variables do not need to be of any specific type such as model-based clustering. We provide the screening test clustering result in the supplementary material available at Biostatistics online. Sensitivity and specificity of the screening test are provided in Table 3.

Following the similar sampling scheme of Section 6.1, we have drawn sample fraction of 5%, 10%, and 20% from the phase one screened non-depressed group and performed the gold-standard test on them along with all subjects of the phase one screened depressed group. Relatively large sample size in the 1st wave is resulted from high proportion of phase one screened depressed group. The results for hierarchical clustering utilized as the screening test are displayed in Table 4, which elaborates the detailed analysis of the experiments broken down for each wave of the study. The predicted prevalence |$\widehat{p_t}$| at each wave are not much imprecise compared to the actual |$p_t$| with estimation variability decreasing with increase of sampling fraction in the phase two sample. The predicted incidence rate is also presented for wave-II (3 month followup) and wave-III (1 year followup) with similar trend in variability as prevalence.

6.3. Discussion on screening test performance

To summarize, the first phase screening test is considered as the clustering (model based and hierarchical) based on screening test variables. It should be noted here that the PCA based clustering lacks meaningful explanation as the information on the original variables are lost while constructing the PC’s. In order to retain these information, the hierarchical clustering can be considered as a viable alternative. However, if the objective is not to find meaning of the screening test rather use it as a black-box for classification, it can serve that purpose well as evident from it’s performance. Both screening tests significantly reduce total number of gold-standard test, compared with the original study, while estimated prevalence is quite close to the observed truth. Table 4 shows that the prediction performance of prevalence estimate is more robust for model-based clustering than the hierarchical clustering. As mentioned earlier that the original study only measured the prevalence rate at each wave, so no incidence rate was reported. We also notice an increase in predicted incidence rate from 3 month to 12 month screening. A possible explanation for this increment is that the chance of developing major depressive disorder increases rapidly with homebound geriatric individuals with passing time. Nevertheless, we have demonstrated that the proposed methodology could result in significant cost savings as the gold standard test is only performed on a smaller group of individuals form entire cohort and no extra cost is incurred for the screening test. This comes without much compromise in the precision of estimation, while testing for |$<45\%$| of the total sample in each time point.

 
Remark:

As mentioned before, in the original study (Bruce and others, 2002; Weinberger and others, 2009), only the gold-standard test was carried out as it was not intended to be a two-phase design. As a result no screening test was constructed and no cost comparison was made. Ideally a prospectively designed two-phase study should first construct a screening test via pilot study or based on historical data and justify parameters of the constructed screening test via cost-effectiveness and efficiency analysis. In this article, we have constructed retrospectively defined screening test/s based on available auxiliary information to show considerable savings in sample size, which should potentially lead to lower cost. However in order to perform efficiency analysis of two-phase sampling over single-phase counterpart, information about the cost of each screening test is also required, along with the gold-standard test. Thus we cannot measure the efficiency of the two-phase mechanism as described in Section 4.

7. Discussion

This research is motivated by real-life studies and intents to address the estimation issues in two-phase longitudinal study design. Though for the simulation studies, we have closely followed the “Ethical” sampling design, our developed methodology is applicable for any general two-phase design scheme. From all the explored cases we could summarize two significant findings. First, sensitivity and specificity of the first phase fallible test plays a crucial role in determining the efficiency of the estimate. This is something additional to the comments made by McName (2003), from the cost consideration context. Second, though incidence rate and prevalence rates are quite related, prevalence estimate shows remarkable robustness in comparison to the incidence estimate at any time point. This is somewhat surprising, as we expected that the trend should be somewhat parallel. Specifically, if sensitivity is fairly close to unity then prevalence and incidence estimates coincides under “Ethical” sampling scheme, and in that case incidence estimate do inherit some degree of robustness. Also we would like to point out that longitudinal estimation of prevalence and incidence has medical significance. The monotonic trend may well indicate the general health pattern of the community and whether any intervention is effective or not, over time. As a future work we are planning to extend our approach in the regression estimate context. Another direction is to include a more complicated sampling plan which can accommodate inclusion of new subjects over time and especially the estimation issues with missing data. Both situations are quite common in practice. Another exciting future direction could be designing efficient sampling plan with fixed cost consideration in longitudinal setup. Nevertheless, we hope that the present article will shed some light on the estimation issues in the two-phase sampling design from the longitudinal perspective.

Acknowledgements

Last author would also like to thank Jianzhao Shen for proposing the problem related to motivating example 1. We also thank Dr P. E. Shrout for his comments on a previous version of the paper. Conflict of Interest: None declared.

Funding

Research of last author is partly supported by PCORI contract ME-1409-21410 and NIH grant P30-ES020957.

References

Beckett,
L. A.
,
Scherr,
P. A.
and
Evans,
D. A.
(
1992
).
Pupoulation prevalence estimates from the complex samples
.
Journal of Clinical Epidemology
45
,
393
402
.

Bruce,
M. L.
,
McAvay,
G. J.
,
Raue,
P. J.
,
Ellen,
L.,
Meyers,
B. S.
,
Keohane,
D. J.
,
Jagoda,
D. R.
and
Weber,
C.
(
2002
).
Major depression in elderly home health care patients
.
American Journal of Psychiatry
159
,
1367
1374
.

Callahan,
C. M.
,
Hall,
K. S.
,
Hui,
S. L.
,
Musick,
B. S.
,
Unverzagt,
F. W.
and
Hendrie,
H. C.
(
1996
).
Relationship of age, education, and occupation with dementia among a community-based sample of African Americans
.
American Medical Association
53
,
134
140
.

Clayton,
D.
,
Spiegelhalter,
D.
,
Dunn,
G.
and
Pickels,
A.
(
1998
).
Analysis of longitudinal binary data from multiphase sampling
.
Journal of Royal Statistical Society
60
,
71
87
.

Cochran,
W. G.
(
1977
).
Sampling Techniques
, 3rd edition.
New York
:
Wiley
.

Deming,
W.
(
1977
).
An essay on screening, or two-phase sampling applied to surveys of a community
.
International Statistical Review
45
,
29
37
.

Dunn,
G.
,
Pickels,
A.
,
Tansella,
M.
and
Vazquez-Barquero,
J.
(
1999
).
Two-phase epidemiological surveys in psychiatric research
.
British Journal of Psychiatry
174
,
359
363
.

Fraley,
C.
and
Raftery,
A.
(
2002
).
Model-based clustering, discriminant analysis and density estimation
.
Journal of the American Statistical Association
97
,
611
631
.

Fraley,
C.
and
Raftery,
A.
(
2006
).
MCLUST version 4 for R: normal mixture modeling and model-based clustering
.
Technical Report
.
University of Washinton
,
tr504
.

Gao,
S.
,
Hui,
S. L.
,
Hall,
K. S.
and
Hendrie,
H. C.
(
2000
).
Estimating disease prevalence from two-phase surveyes with non-response at the second pahse
.
Statistics in Medicine
19
,
2101
2114
.

Goodman,
L. A.
(
1960
).
On the exact variance of products
.
Journal of the American Statistical Association
55
,
708
713
.

Goodman,
L. A.
(
1962
).
The Variance of the Product of K Random Variables
.
Journal of the American Statistical Association
57
,
54
60
.

Gordis,
L.
(
2009
).
Epidemiology
.
Philadelphia, PA
:
Saunders Elsevier
.

Hall,
K. S.
,
Gao,
S.
,
Emsley,
C. L.
,
Ogunniyi,
A.
,
Morgan,
O.
and
Hendrie,
H. C.
(
1999
).
Community screening interview for dementia (CSI’D’); Performnace in five disparate study sites
.
International Journal of Geriatric Psychiatry
15
,
521
531
.

Hendrie,
H. C.
,
Ogunniyi,
A. O.
,
Hall,
K. S.
,
Baiyewu,
O.
,
Unverzagt,
F. W.
,
Gureje,
O.
, d
Gao,
S.
,
Evans,
R. M.
,
Ogunseyinde,
A. O.
,
Adeyinka,
A. O.
,
Musick,
B.
and
Hui,
S. L.
(
2001
).
Incidence of dementia and Alzheimer disease in 2 communities
.
Journal of American Medical Association
6
,
739
747
.

Hendrie,
H. C.
,
Osuntokun,
B. O.
,
Hall,
K. S.
,
Ogunniyi,
A. O.
and others
(
1995
).
Prevalence of Alzheimer’s disease and dementia in two communities: Nigerian Africans and African Americans
.
American Psychiatric Association
152
,
1485
1492
.

McName,
R.
(
2003
).
Efficency of two-phase designs for prevalence estimation
.
International Journal of Epidemiology
32
,
1072
1078
.

McName,
R.
(
2004
).
Two-phase sampling for simulatnoeus prevalence estimation and case detection
.
Biometrics
60
,
783
792
.

Murtagh,
F.
and
Legendre,
P.
(
2014
).
Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?
Journal of Classification
31
,
274
295
.

Neyman,
J.
(
1938
).
Contribution to the theory of sampling human poplulations
.
Journal of American Statistical Association
33
,
101
116
.

Pickels,
A.
,
Dunn,
G.
and
Vazquez-Barquero,
J.
(
1995
).
Screening for stratification in two-phase (“two- stage”) epidemiological surveys
.
Statistical Methods in Medical Research
4
,
73
89
.

Steffens,
D. C.
,
Fisher,
G. G.
,
Langa,
K. M.
,
Potter,
G. G.
and
Plassman,
B. L.
(
2009
).
Prevalence of depression among older Americans: the Aging, Demographics and Memory Study
.
International Psychogeriatrics
21
,
879
888
.

Shen,
J.
,
Gao,
S.
,
Unverzagt,
F. W.
,
Ogunniyi,
A.
,
Baiyewu,
O.
,
Gureje,
O.
,
Hendrie,
H. C.
and
Hall,
K. S.
(
2006
).
Validation analysis of informant’s ratings of conginitive function in African American and Nigerians
.
Internation Journal of Geriatric Psychiatry
21
,
618
625
.

Shrout,
P. E.
and
Newman,
S. C.
(
1989
).
Design of two-phase prevalence surveyes of rare disorders
.
Biometrics
45
,
549
555
.

Ward,
J. H.
(
1963
).
Hierarchical Grouping to Optimize an Objective Function
.
Journal of the American Statistical Association
58
,
236
244
.

Weinberger,
M. I.
,
Raue,
P. J.
,
Meyers,
B. S.
and
Bruce,
M. L.
(
2009
).
Predictors of new onset depression in medically ill, disabled older adults at 1 year follow-up
.
The American Journal of Geriatric Psychiatry
17
,
802
809
.

Weissman,
J.
,
Meyers,
B. S.
,
Ghosh,
S.
and
Bruce,
M. L.
(
2011
).
Demographic, clinical, and functional factors associated with antidepressant use in the home healthcare elderly
.
The American Journal of Geriatric Psychiatry
19
,
1042
1045
.

Weissman,
J.
,
Meyers,
B. S.
,
Ghosh,
S.
and
Bruce,
M. L.
(
2011
).
Sociodemographic and clinical factors associated with antidepressant type in a national sample of the home health care elderly
.
General Hospital Psychiatry
33
,
587
593
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data