Semiparametric Additive Time-Varying Coefficients Model for Longitudinal Data with Censored Time Origin

Sun, Yanqing; Shou, Qiong; Gilbert, Peter B.; Heng, Fei; Qian, Xiyuan

doi:10.1111/biom.13610

Abstract

Statistical analysis of longitudinal data often involves modeling treatment effects on clinically relevant longitudinal biomarkers since an initial event (the time origin). In some studies including preventive HIV vaccine efficacy trials, some participants have biomarkers measured starting at the time origin, whereas others have biomarkers measured starting later with the time origin unknown. The semiparametric additive time-varying coefficient model is investigated where the effects of some covariates vary nonparametrically with time while the effects of others remain constant. Weighted profile least squares estimators coupled with kernel smoothing are developed. The method uses the expectation maximization approach to deal with the censored time origin. The Kaplan–Meier estimator and other failure time regression models such as the Cox model can be utilized to estimate the distribution and the conditional distribution of left censored event time related to the censored time origin. Asymptotic properties of the parametric and nonparametric estimators and consistent asymptotic variance estimators are derived. A two-stage estimation procedure for choosing weight is proposed to improve estimation efficiency. Numerical simulations are conducted to examine finite sample properties of the proposed estimators. The simulation results show that the theory and methods work well. The efficiency gain of the two-stage estimation procedure depends on the distribution of the longitudinal error processes. The method is applied to analyze data from the Merck 023/HVTN 502 Step HIV vaccine study.

censored time origin, kernel smoothing, longitudinal data, random sampling times, Step vaccine trial, weight selection

1 Introduction

In preventive HIV vaccine efficacy trials, thousands of HIV uninfected volunteers are randomized to receive vaccine or placebo, and are monitored for HIV infection. Participants diagnosed with HIV infection have various endpoints measured longitudinally starting at the date of diagnosis; these endpoints include viral loads and CD4 cell counts as markers of HIV disease progression and secondary transmission. An objective of such trials is to assess the vaccine effect on the biomarkers, and all previous analyses assessed the biomarkers based on the time from HIV diagnosis (Fitzgerald et al., 2011; Rerks-Ngarm et al., 2013; Janes et al., 2015). However, it is more biologically meaningful to assess whether vaccination modifies the biomarkers over time since actual HIV acquisition. This assessment is challenging because exact times of HIV acquisition are generally unobtainable; rather data are available only on bounds and between which the true time origin must lie ⁠, where, as shown in Figure 1, for example, is the date of the last antibody (Ab)-based HIV negative diagnostic test result and is the date of the first antibody (Ab)-based HIV positive test result. Given details of the HIV testing algorithm described in the application, some participants have ⁠, at least approximately, such that is considered to be directly observed, whereas other participants have and is unknown. This set-up occurs in other multi-stage longitudinal studies, as depicted in Figure 1.

Censored time origin in the study of a longitudinal response. Based on the HIV testing algorithm, each infected participant was classified into one of two groups, defined by whether the earliest HIV positive sample was (Ab+, PCR+) or (Ab-, PCR+). (a) For participants with the earliest HIV positive sample (Ab+, PCR+), is left censored by ; (b) For participants with the earliest HIV positive sample (Ab-, PCR+), and is observed

Figure 1

Censored time origin formula in the study of a longitudinal response. Based on the HIV testing algorithm, each infected participant was classified into one of two groups, defined by whether the earliest HIV positive sample was (Ab+, PCR+) or (Ab-, PCR+). (a) For participants with the earliest HIV positive sample (Ab+, PCR+), formula is left censored by formula ⁠; (b) For participants with the earliest HIV positive sample (Ab-, PCR+), formula and formula is observed

Open in new tab Download slide

In Figure 1, for each participant ⁠, is the gap time between the true time origin and the time when longitudinal markers begin to be measured. The time lapse from and is ⁠. For participant i, the longitudinal markers are measured at times ⁠, where is the time between and the jth marker measurement, for ⁠. In the HIV vaccine study, most participants have first measurement time at ⁠; some participants have and others have ⁠. Formally, we write and such that is left censored by with censoring indicator ⁠; is observed if and if ⁠. The time from to the jth sampling time is ⁠. The time origin is considered censored because is left censored.

Semiparametric regression models for longitudinal data have been intensively studied; see Lin and Ying (2001), Hu et al. (2004), Qu and Li (2006), Fan et al. (2007) and Sun et al. (2013), among others. However, to the best of our knowledge, none of these methods address the problem in which the time origin may be censored. In this paper, we study the semiparametric additive model with time-varying effects for longitudinal data with censored time origin. Weighted profile least squares estimators are developed for the unknown parameters as well as for the nonparametric coefficient functions. The expectation maximization approach is utilized to deal with the censored time origin. The proposed method does not assume any specific model for the sampling times, and thus avoids misspecification of the sampling models. The method is applied to investigate the effect of an HIV vaccine on viral load over time since actual HIV acquisition in the Merck 023/HVTN 502 Step study. Asymptotic properties of the parametric and nonparametric estimators and consistent asymptotic variance estimators are derived. Our numerical study shows that the proposed methods work well with satisfying finite sample properties.

The rest of the paper is organized as follows. Section 2 introduces the semiparametric additive model and develops the estimation method. Preliminaries on the data structure and model assumptions are given in Section 2.1. The weighted profile least squares estimators coupled with the kernel smoothing and EM algorithm are developed in Section 2.2. Computational issues about estimation at the boundaries and the weight selection are discussed in Section 2.3. In Section 3, we establish asymptotic properties of the nonparametric and parametric estimators, and in Section 4 study their finite sample performances in simulations. The proposed method is applied to Step trial in Section 5. Concluding remarks are given in Section 6. The proofs of the asymptotic results, additional simulations and data analysis, and the discussions on bandwidth choice are placed in Web Appendices available in the Supporting Information of this article.

2 Profile Weighted Least Squares Estimation through EM Algorithm

2.1 Preliminaries

Suppose that there is a random sample of n participants. For participant i, let be the response process and let and be possibly time-dependent covariates of dimensions and ⁠, respectively, where is the time since the actual time origin, and τ is the study duration. We consider the following semiparametric additive time-varying coefficients model

(1)

where is an unspecified vector of smooth regression functions, γ is a dimensional vector of parameters, and is a mean-zero process. The notation represents transpose of a vector or matrix x. Specify the first component of as 1 gives a model with a nonparametric baseline response process. The effect of is time-varying modeled nonparametrically while the effect of is time-independent modeled parametrically.

The observations of are taken at time points ⁠, where is the total number of observations from the ith subject. The can be written as the sum of two parts as shown in Figure 1, where is the time from the actual time origin participant to left censoring by ⁠, and is the time from the right edge of the interval for to the jth visit for the ith subject, where visit 1 is at ⁠. Let be the end of follow-up time or censoring time for the ith participant since ⁠. The censoring time is allowed to depend on the covariates and ⁠. The responses for the ith participant can only be observed at the time points before since the actual time origin.

The sampling times may vary among participants. The number of observations taken from the ith participant by time t is ⁠, where is the indicator function. Let be the conditional mean rate of the sampling times for participant i at time t defined by ⁠, ⁠. The mean rate function is assumed to only depend on ⁠, which is the part of the covariates that affects the potential sampling times. We denote ⁠, where is an unspecified nonnegative smooth function.

Let ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, where is a collection of possible auxiliary variables that are not of interest in the modeling of but may be useful in predicting the distribution of ⁠. For the censored time origin, left-censored data and are observed. The observed data for participant i can be expressed as ⁠. The observation is if and if ⁠, where ⁠. Although exact times may be unobtainable, the values ⁠, and at are known. Assume that are independent identically distributed (iid). The observed data are denoted by ⁠.

We assume that and are independent conditional on ⁠, and that the censoring time is noninformative in that ⁠, and ⁠, ⁠. Let ⁠. Assume ⁠, ⁠, ⁠, ⁠, ⁠, ⁠. Assume also that and are independent conditional on ⁠, ⁠, and ⁠. This assumption implies that, conditional on covariate processes, sampling times are noninformative for the longitudinal response.

2.2 Estimation Procedures

When all 's are observed, estimation procedure such as in Sun and Wu (2005) and Sun et al. (2013) can be used to analyze model (1). If the unobserved or censored 's are treated as missing, then is not missing at random. The inverse probability weighting of complete-cases method and the augmented inverse probability weighted complete-case method of Robins et al. (1994), which have been successfully adapted in Sun and Gilbert (2012), Sun et al. (2017), Yang et al. (2017) and by many other authors, will not work in this situation. We propose an estimation procedure based on the missing-data principle using the EM-algorithm.

The conditional distribution of equals for and 1 for ⁠. Let be the estimated conditional distribution of ⁠. Let ⁠. The estimation of model (1) can be based on minimizing the following objective function:

(2)

where is a nonnegative weight function, and is the estimate of the conditional expectation ⁠, which can be obtained through estimation of as we show below.

For ease of presentation, we adopt the notation for ⁠, ⁠, ⁠, where is a smooth function of ⁠. The above objective function can be written as

(3)

Note that ⁠, and are observed. The conditional expectation for participant i with equals

(4)

where the last equality holds because for ⁠. Since ⁠, estimating by for ⁠, we have

(5)

The basic idea of estimating the conditional distribution is to transform the data set from left censored to right censored. Assume that is bounded by a predetermined constant L. This is reasonable since for the application concerned here is less than the time interval between two consecutive testing times that is usually between 3 and 6 months. The distribution of based on the left censored data can be estimated by the methods developed for right censored data through the transformation ⁠. The Kaplan–Meier estimator can be used to estimate distribution of when is independent of ⁠. Otherwise, a failure time regression model such as the Cox model can be used to estimate the conditional distribution ⁠.

Next, we present an estimation procedure that estimates the nonparametric component with the kernel smoothing and the parametric component γ with the profile weighted least squares method. Let be a symmetric kernel function with compact support. For fixed γ and at time t, we estimate by minimizing the following objective function with respect to β:

(6)

where and h is the bandwidth depending on n.

Taking the derivative of with respect to β for a fixed γ yields

(7)

which leads to the following estimating function

(8)

Let ⁠, and define and similarly by replacing with ⁠, and with ⁠, respectively. Solving for fixed γ and t yields ⁠, where and ⁠.

Replacing by in (3) and taking derivative with respect to γ, we obtain the profile estimating function for γ:

(9)

where is taken as a subinterval of [0, τ] to avoid boundary problems in the theoretical justifications. In practice, can be taken to be very close to [0, τ]. Equation can be solved explicitly yielding the weighted profile estimator ⁠, where

(10)

The local profile estimator of is given by ⁠.

2.3 Computational Issues and the Weight Selection

Our estimation procedure uses local smoothing or the local constant method for estimating ⁠. It is known that the local linear estimation technique, cf. Fan and Gijbels (1996), can improve the performance of estimation at boundaries. For the estimation at the inner points, the local linear and local constant estimators are equivalent, with the same asymptotic distributions. As shown in Fan and Gijbels (1996), the boundary effects from the local constant estimator can be reduced by applying the equivalent kernel of the local linear approach.

Following Fan and Gijbels (1996, sections 2.3.1 and 3.2.2), to reduce the estimation bias for at boundary points, for example, ⁠, we replace by the equivalent kernel to the local linear fit modified for the time-varying coefficient model for longitudinal data, defined by ⁠, where ⁠. The equivalent kernel is a kernel up to a normalizing constant satisfying the finite sample condition ⁠. This nice feature works to reduce bias for boundary points similar to the symmetrical kernel for interior points. Because it is simple and faster to use the kernel smoothing, we suggest to estimate using the equivalent kernel for the boundary time points, while using the standard kernel for the interior time points. Our simulations show that this adjustment works well.

We proposed a weighted profile least squares estimation method coupled with the EM approach for the semiparametric additive time-varying coefficient model for longitudinal data starting from a possibly censored relevant event. The proposed estimators are consistent and asymptotically normal as long as the weight process converges in probability to a deterministic function ⁠. The weight can be selected to improve estimation efficiency, although it is often conveniently taken to be 1. Lin and Carroll (2000) showed that the most efficient estimation of the nonparametric component can be achieved by ignoring the within-subject correlation. However, more efficient estimation for the parametric component γ is obtained by using the inverse of true covariance matrix of the longitudinal responses (Lin and Carroll, 2001; Wang et al., 2005). In a simplified situation where the error processes are uncorrelated at different times, the optimal weight is inversely proportional to the conditional variance of the error process (Bickel et al., 1993; Sun et al., 2013; Qi et al., 2017).

We investigated a two-stage estimation procedure for choosing the weight within the framework of the marginal approach that ignores the within-subject correlation. In the first stage, the unit weight function is used to obtain and ⁠. Suppose that does not depend on ⁠, then can be consistently estimated by ≪ ⁠, where is the residual of the first stage estimation. In the second stage, the updated estimators and are obtained by choosing the weight ⁠. When depends on ⁠, the optimal weight can be estimated using a multivariate kernel estimator (Qi et al., 2017, Web Appendix C). A simulation study is conducted in Section 4.2 to investigate the efficiency of the two-stage estimation procedure.

3 Asymptotic Properties

In this section, we present the asymptotic properties of the proposed estimators. Define ⁠, ⁠, and ⁠, where ⁠. Let and ⁠. Let γ₀ and be the true values of γ and under model (1), respectively. In addition to the conditional independence assumptions and noninformative censoring assumptions stated in Section 2.1, more regularity conditions are given in Condition A in Web Appendix A.

Ying (1989) showed that the consistency and weak convergence of the Kaplan Meier estimation for the distribution of can be extended to the whole line under Condition (A.7). By Lemma 2 presented in Web Appendix B, uniformly in ⁠. Similar asymptotic results hold for and ⁠. It follows that and converge to and uniformly in ⁠, respectively. These results are the basic building blocks for proving the asymptotic results for and ⁠.

Note that is the minimizer of ⁠. In Part (a) of the proof of Theorem 1, we show that converges uniformly to a deterministic function of γ that minimizes at ⁠. The consistency of follows by Theorem 5.7 of van der Vaart (1998). By the first-order Taylor expansion of at γ₀, we have

(11)

where is on the line segment between and γ₀. To prove the asymptotic normality of ⁠, it is sufficient to prove that converges in probability to a nonsingular matrix, and that converges in distribution. The convergence of the information matrix can be obtained by applying Lemma 1 in Web Appendix B. We show in Part (b) of the proof of Theorem 1 that

(12)

where henceforth we adopt the notation ⁠.

The asymptotic properties of and are summarized as Theorems 1 and 2. Theorems 1 and 2 are proved in Web Appendix B assuming that and are independent conditional on and both are independent of ⁠. Web Appendix C provides an outline of the proofs when the conditional hazard function of depends on covariates through the Cox model. This model assumption is for the convenience of theoretical development using the existing large sample results for the Cox model with right censored data (Andersen and Gill, 1982). Web Appendix C also discussed possibility of using other failure time regression models to estimate the conditional distribution of left censored event time.

Theorem 1. Under Condition A, we have

(a)
as ⁠;
(b)
as ⁠, where ⁠, ⁠.

The asymptotic variance of can be consistently estimated by ⁠, where

(13)

and ⁠.

Next we present the asymptotic results for ⁠. First, we introduce a few quantities to be used in the expression of the asymptotic variance of ⁠. We define to be the expected number of sampling points by time t under possibly censored time origin, denoted by ⁠. Let be the natural filtration. Then the intensity of is given by ⁠. Hence is a martingale, with predictable variation process ⁠.

Theorem 2. Under Condition A, we have

(a)
converges to in probability uniformly in as ;
(b)
For each ⁠, as ⁠, where ⁠, ⁠. Here ⁠, .

The covariance matrix of can be estimated by ⁠, where ⁠. However, consider the approximation

(14)

A more refined variance estimation for with higher order accuracy can be based on ⁠, where

(15)

This variance estimator is used in the simulations and in the real data application.

4 Simulation Study

We conduct a numerical study to examine the finite sample performance of the proposed methods. Data are simulated using the following semiparametric additive model ⁠:

(16)

where ⁠, ⁠, ⁠, is uniformly distributed on [0,1], and is a Bernoulli random variable with ⁠. The error process has a normal distribution with mean and variance 1 for participant i where follows a standard normal distribution.

For participant i, is generated from a uniform distribution on (0,0.8). The left censoring time is generated from a uniform distribution on with a and b adjusted to yield a desired percentage of left censoring for ⁠. The first sampling point is set as ⁠, and the rest of the 's are generated from a Poisson process with intensity rate ⁠, where ⁠, ⁠, and ⁠. Let be the responses at time points following model (16). The right censoring time is exponentially distributed with the rate parameter adjusted to give a prespecified percentage of right censoring (drop-out or administrative censoring at τ) in the time interval [0, τ], which is the probability of ⁠. We set for the time interval ⁠. The average number of observations in the interval per participant is about 4.7. The Epanechnikov kernel ⁠, and for the estimating function (9). The local constant kernel is used for time points in while the equivalent kernel to local linear smoothing is applied for the boundary points in ⁠. For added protection against boundary bias effects, we consider as boundary points in the calculation instead of ⁠.

The performance of the estimator for γ is measured through the bias (Bias), the sample standard error of the estimates (SSE), the estimated standard error of (ESE), and the coverage probability (CP) of a 95% confidence interval for γ. The performance of the estimator for the jth component on the interval is evaluated by the square root of average integrated squared error (RASE) defined by

(17)

where is the kth estimate of for and N is the number of simulations.

4.1 Simulation Study Using Unit Weight

First, we consider four simulation settings to demonstrate the validity and advantage of the proposed method in handling the censored time origin using unit weight ⁠. The first three settings show the performances of the proposed estimators with ⁠, and 50% of left censoring for ⁠. The fourth setting compares the performances of the proposed estimators with the naive version of the approach that ignores the censored time origin (or ⁠) by mistreating as the measurement times since the actual time origin and as the response at ⁠.

For sample sizes ⁠, 300, and 500, and bandwidths ⁠, 0.4, and 0.5, Table 1 shows results based on 500 simulations. The biases of are small, and the sample standard errors of are close to the estimated standard errors. Both standard errors decrease as the sample size increases, and they decrease with the left censoring percentage ⁠. The coverage probabilities of are close to their 0.95 nominal level. The ⁠, ⁠, decrease as sample size increases. The RASEs for ⁠, ⁠, also increase as increases. The values of SSE and ESE for are similar for the three different bandwidth choices. But and become smaller as bandwidth increases.

TABLE 1

Open in new tab

Summary statistics for the estimators formula and formula under model (16) with formula left censoring and 30% right censoring. Each entry is based on 500 simulations

	n	h	Bias	SSE	ESE	CP	RASE	RASE
0%	200	0.3	0.0075	0.1963	0.1886	0.938	0.3798	0.6312
		0.4	0.0078	0.1958	0.1893	0.940	0.3444	0.5742
		0.5	0.0080	0.1952	0.1900	0.938	0.3194	0.5370
	300	0.3	−0.0030	0.1640	0.1552	0.940	0.3076	0.5157
		0.4	−0.0030	0.1637	0.1556	0.938	0.2798	0.4688
		0.5	−0.0028	0.1636	0.1559	0.940	0.2588	0.4359
	500	0.3	0.0002	0.1184	0.1212	0.950	0.2370	0.4048
		0.4	0.0000	0.1181	0.1214	0.950	0.2160	0.3738
		0.5	0.0000	0.1181	0.1216	0.952	0.2005	0.3571
20%	200	0.3	0.0003	0.1990	0.1891	0.930	0.3961	0.6778
		0.4	0.0005	0.1979	0.1899	0.932	0.3559	0.6091
		0.5	0.0005	0.1973	0.1903	0.938	0.3295	0.5670
	300	0.3	−0.0049	0.1650	0.1554	0.926	0.3242	0.5436
		0.4	−0.0048	0.1648	0.1558	0.928	0.2928	0.4918
		0.5	−0.0046	0.1648	0.1562	0.930	0.2713	0.4587
	500	0.3	0.0010	0.1164	0.1212	0.958	0.2480	0.4293
		0.4	0.0013	0.1164	0.1215	0.960	0.2234	0.3922
		0.5	0.0013	0.1164	0.1217	0.960	0.2069	0.3713
50%	200	0.3	0.0013	0.2005	0.1899	0.934	0.4474	0.7841
		0.4	0.0010	0.1996	0.1911	0.936	0.3928	0.6921
		0.5	0.0011	0.1992	0.1917	0.936	0.3584	0.6315
	300	0.3	−0.0050	0.1663	0.1563	0.922	0.3707	0.6415
		0.4	−0.0045	0.1663	0.1571	0.926	0.3265	0.5677
		0.5	−0.0042	0.1666	0.1575	0.926	0.2975	0.5157
	500	0.3	0.0021	0.1170	0.1222	0.962	0.2820	0.5166
		0.4	0.0023	0.1168	0.1226	0.964	0.2486	0.4630
		0.5	0.0025	0.1167	0.1228	0.960	0.2273	0.4237

	n	h	Bias	SSE	ESE	CP	RASE	RASE
0%	200	0.3	0.0075	0.1963	0.1886	0.938	0.3798	0.6312
		0.4	0.0078	0.1958	0.1893	0.940	0.3444	0.5742
		0.5	0.0080	0.1952	0.1900	0.938	0.3194	0.5370
	300	0.3	−0.0030	0.1640	0.1552	0.940	0.3076	0.5157
		0.4	−0.0030	0.1637	0.1556	0.938	0.2798	0.4688
		0.5	−0.0028	0.1636	0.1559	0.940	0.2588	0.4359
	500	0.3	0.0002	0.1184	0.1212	0.950	0.2370	0.4048
		0.4	0.0000	0.1181	0.1214	0.950	0.2160	0.3738
		0.5	0.0000	0.1181	0.1216	0.952	0.2005	0.3571
20%	200	0.3	0.0003	0.1990	0.1891	0.930	0.3961	0.6778
		0.4	0.0005	0.1979	0.1899	0.932	0.3559	0.6091
		0.5	0.0005	0.1973	0.1903	0.938	0.3295	0.5670
	300	0.3	−0.0049	0.1650	0.1554	0.926	0.3242	0.5436
		0.4	−0.0048	0.1648	0.1558	0.928	0.2928	0.4918
		0.5	−0.0046	0.1648	0.1562	0.930	0.2713	0.4587
	500	0.3	0.0010	0.1164	0.1212	0.958	0.2480	0.4293
		0.4	0.0013	0.1164	0.1215	0.960	0.2234	0.3922
		0.5	0.0013	0.1164	0.1217	0.960	0.2069	0.3713
50%	200	0.3	0.0013	0.2005	0.1899	0.934	0.4474	0.7841
		0.4	0.0010	0.1996	0.1911	0.936	0.3928	0.6921
		0.5	0.0011	0.1992	0.1917	0.936	0.3584	0.6315
	300	0.3	−0.0050	0.1663	0.1563	0.922	0.3707	0.6415
		0.4	−0.0045	0.1663	0.1571	0.926	0.3265	0.5677
		0.5	−0.0042	0.1666	0.1575	0.926	0.2975	0.5157
	500	0.3	0.0021	0.1170	0.1222	0.962	0.2820	0.5166
		0.4	0.0023	0.1168	0.1226	0.964	0.2486	0.4630
		0.5	0.0025	0.1167	0.1228	0.960	0.2273	0.4237

TABLE 1

Open in new tab

Summary statistics for the estimators formula and formula under model (16) with formula left censoring and 30% right censoring. Each entry is based on 500 simulations

	n	h	Bias	SSE	ESE	CP	RASE	RASE
0%	200	0.3	0.0075	0.1963	0.1886	0.938	0.3798	0.6312
		0.4	0.0078	0.1958	0.1893	0.940	0.3444	0.5742
		0.5	0.0080	0.1952	0.1900	0.938	0.3194	0.5370
	300	0.3	−0.0030	0.1640	0.1552	0.940	0.3076	0.5157
		0.4	−0.0030	0.1637	0.1556	0.938	0.2798	0.4688
		0.5	−0.0028	0.1636	0.1559	0.940	0.2588	0.4359
	500	0.3	0.0002	0.1184	0.1212	0.950	0.2370	0.4048
		0.4	0.0000	0.1181	0.1214	0.950	0.2160	0.3738
		0.5	0.0000	0.1181	0.1216	0.952	0.2005	0.3571
20%	200	0.3	0.0003	0.1990	0.1891	0.930	0.3961	0.6778
		0.4	0.0005	0.1979	0.1899	0.932	0.3559	0.6091
		0.5	0.0005	0.1973	0.1903	0.938	0.3295	0.5670
	300	0.3	−0.0049	0.1650	0.1554	0.926	0.3242	0.5436
		0.4	−0.0048	0.1648	0.1558	0.928	0.2928	0.4918
		0.5	−0.0046	0.1648	0.1562	0.930	0.2713	0.4587
	500	0.3	0.0010	0.1164	0.1212	0.958	0.2480	0.4293
		0.4	0.0013	0.1164	0.1215	0.960	0.2234	0.3922
		0.5	0.0013	0.1164	0.1217	0.960	0.2069	0.3713
50%	200	0.3	0.0013	0.2005	0.1899	0.934	0.4474	0.7841
		0.4	0.0010	0.1996	0.1911	0.936	0.3928	0.6921
		0.5	0.0011	0.1992	0.1917	0.936	0.3584	0.6315
	300	0.3	−0.0050	0.1663	0.1563	0.922	0.3707	0.6415
		0.4	−0.0045	0.1663	0.1571	0.926	0.3265	0.5677
		0.5	−0.0042	0.1666	0.1575	0.926	0.2975	0.5157
	500	0.3	0.0021	0.1170	0.1222	0.962	0.2820	0.5166
		0.4	0.0023	0.1168	0.1226	0.964	0.2486	0.4630
		0.5	0.0025	0.1167	0.1228	0.960	0.2273	0.4237

	n	h	Bias	SSE	ESE	CP	RASE	RASE
0%	200	0.3	0.0075	0.1963	0.1886	0.938	0.3798	0.6312
		0.4	0.0078	0.1958	0.1893	0.940	0.3444	0.5742
		0.5	0.0080	0.1952	0.1900	0.938	0.3194	0.5370
	300	0.3	−0.0030	0.1640	0.1552	0.940	0.3076	0.5157
		0.4	−0.0030	0.1637	0.1556	0.938	0.2798	0.4688
		0.5	−0.0028	0.1636	0.1559	0.940	0.2588	0.4359
	500	0.3	0.0002	0.1184	0.1212	0.950	0.2370	0.4048
		0.4	0.0000	0.1181	0.1214	0.950	0.2160	0.3738
		0.5	0.0000	0.1181	0.1216	0.952	0.2005	0.3571
20%	200	0.3	0.0003	0.1990	0.1891	0.930	0.3961	0.6778
		0.4	0.0005	0.1979	0.1899	0.932	0.3559	0.6091
		0.5	0.0005	0.1973	0.1903	0.938	0.3295	0.5670
	300	0.3	−0.0049	0.1650	0.1554	0.926	0.3242	0.5436
		0.4	−0.0048	0.1648	0.1558	0.928	0.2928	0.4918
		0.5	−0.0046	0.1648	0.1562	0.930	0.2713	0.4587
	500	0.3	0.0010	0.1164	0.1212	0.958	0.2480	0.4293
		0.4	0.0013	0.1164	0.1215	0.960	0.2234	0.3922
		0.5	0.0013	0.1164	0.1217	0.960	0.2069	0.3713
50%	200	0.3	0.0013	0.2005	0.1899	0.934	0.4474	0.7841
		0.4	0.0010	0.1996	0.1911	0.936	0.3928	0.6921
		0.5	0.0011	0.1992	0.1917	0.936	0.3584	0.6315
	300	0.3	−0.0050	0.1663	0.1563	0.922	0.3707	0.6415
		0.4	−0.0045	0.1663	0.1571	0.926	0.3265	0.5677
		0.5	−0.0042	0.1666	0.1575	0.926	0.2975	0.5157
	500	0.3	0.0021	0.1170	0.1222	0.962	0.2820	0.5166
		0.4	0.0023	0.1168	0.1226	0.964	0.2486	0.4630
		0.5	0.0025	0.1167	0.1228	0.960	0.2273	0.4237

Table 2 presents results for estimation of γ and with the approach that ignores the censored time origin. It shows that ⁠, ⁠, increase dramatically compared to the corresponding results of the proposed estimators in Table 1 that account for the censored time origin. Table 2 also shows that there is little bias in the estimation of the constant effect γ when the censored time origin issue is ignored. Table 3 gives a side-by-side comparison of the proposed estimator for versus the approach that misplaces the time origin under 50% left censoring in the presence (⁠⁠) and absence (⁠⁠) of right censoring.

TABLE 2

Open in new tab

Summary statistics of estimation of γ and formula under model (16) with misplaced time origin under 50% left censoring and 30% right censoring. Each entry is based on 500 simulations

	n	h	Bias	SSE	ESE	CP	RASE	RASE
50%	200	0.3	−0.0001	0.2341	0.2231	0.932	0.5742	1.5619
		0.4	−0.0007	0.2339	0.2240	0.934	0.5477	1.5296
		0.5	−0.0009	0.2336	0.2246	0.936	0.5265	1.4946
	300	0.3	−0.0055	0.2015	0.1847	0.920	0.5355	1.5214
		0.4	−0.0056	0.2014	0.1853	0.926	0.5171	1.4991
		0.5	−0.0058	0.2018	0.1856	0.924	0.5007	1.4689
	500	0.3	−0.0013	0.1469	0.1444	0.936	0.4809	1.4537
		0.4	−0.0013	0.1467	0.1446	0.934	0.4695	1.4396
		0.5	−0.0014	0.1465	0.1448	0.940	0.4579	1.4168

	n	h	Bias	SSE	ESE	CP	RASE	RASE
50%	200	0.3	−0.0001	0.2341	0.2231	0.932	0.5742	1.5619
		0.4	−0.0007	0.2339	0.2240	0.934	0.5477	1.5296
		0.5	−0.0009	0.2336	0.2246	0.936	0.5265	1.4946
	300	0.3	−0.0055	0.2015	0.1847	0.920	0.5355	1.5214
		0.4	−0.0056	0.2014	0.1853	0.926	0.5171	1.4991
		0.5	−0.0058	0.2018	0.1856	0.924	0.5007	1.4689
	500	0.3	−0.0013	0.1469	0.1444	0.936	0.4809	1.4537
		0.4	−0.0013	0.1467	0.1446	0.934	0.4695	1.4396
		0.5	−0.0014	0.1465	0.1448	0.940	0.4579	1.4168

TABLE 2

Open in new tab

Summary statistics of estimation of γ and formula under model (16) with misplaced time origin under 50% left censoring and 30% right censoring. Each entry is based on 500 simulations

	n	h	Bias	SSE	ESE	CP	RASE	RASE
50%	200	0.3	−0.0001	0.2341	0.2231	0.932	0.5742	1.5619
		0.4	−0.0007	0.2339	0.2240	0.934	0.5477	1.5296
		0.5	−0.0009	0.2336	0.2246	0.936	0.5265	1.4946
	300	0.3	−0.0055	0.2015	0.1847	0.920	0.5355	1.5214
		0.4	−0.0056	0.2014	0.1853	0.926	0.5171	1.4991
		0.5	−0.0058	0.2018	0.1856	0.924	0.5007	1.4689
	500	0.3	−0.0013	0.1469	0.1444	0.936	0.4809	1.4537
		0.4	−0.0013	0.1467	0.1446	0.934	0.4695	1.4396
		0.5	−0.0014	0.1465	0.1448	0.940	0.4579	1.4168

	n	h	Bias	SSE	ESE	CP	RASE	RASE
50%	200	0.3	−0.0001	0.2341	0.2231	0.932	0.5742	1.5619
		0.4	−0.0007	0.2339	0.2240	0.934	0.5477	1.5296
		0.5	−0.0009	0.2336	0.2246	0.936	0.5265	1.4946
	300	0.3	−0.0055	0.2015	0.1847	0.920	0.5355	1.5214
		0.4	−0.0056	0.2014	0.1853	0.926	0.5171	1.4991
		0.5	−0.0058	0.2018	0.1856	0.924	0.5007	1.4689
	500	0.3	−0.0013	0.1469	0.1444	0.936	0.4809	1.4537
		0.4	−0.0013	0.1467	0.1446	0.934	0.4695	1.4396
		0.5	−0.0014	0.1465	0.1448	0.940	0.4579	1.4168

TABLE 3

Open in new tab

Side-by-side comparison of the proposed estimator for formula under model (16) with the approach that misplaces the time origin under 50% left censoring and formula right censoring. Each entry is based on 500 simulations

			RASE(⁠⁠)		RASE(⁠⁠)
			Proposed	Misplaced	Proposed	Misplaced
	n	h	method	origin	method	origin
0%	200	0.3	0.4078	0.5410	0.7111	1.5325
		0.4	0.3557	0.5207	0.6246	1.5081
		0.5	0.3231	0.5033	0.5683	1.4757
	300	0.3	0.3302	0.5084	0.5762	1.4941
		0.4	0.2903	0.4951	0.5100	1.4791
		0.5	0.2654	0.4818	0.4650	1.4528
	500	0.3	0.2528	0.4686	0.4642	1.4379
		0.4	0.2230	0.4605	0.4170	1.4294
		0.5	0.2042	0.4505	0.3829	1.4094
30%	200	0.3	0.4474	0.5742	0.7841	1.5619
		0.4	0.3928	0.5477	0.6921	1.5296
		0.5	0.3584	0.5265	0.6315	1.4946
	300	0.3	0.3707	0.5355	0.6415	1.5214
		0.4	0.3265	0.5171	0.5677	1.4991
		0.5	0.2975	0.5007	0.5157	1.4689
	500	0.3	0.2820	0.4809	0.5166	1.4537
		0.4	0.2486	0.4695	0.4630	1.4396
		0.5	0.2273	0.4579	0.4237	1.4168

			RASE(⁠⁠)		RASE(⁠⁠)
			Proposed	Misplaced	Proposed	Misplaced
	n	h	method	origin	method	origin
0%	200	0.3	0.4078	0.5410	0.7111	1.5325
		0.4	0.3557	0.5207	0.6246	1.5081
		0.5	0.3231	0.5033	0.5683	1.4757
	300	0.3	0.3302	0.5084	0.5762	1.4941
		0.4	0.2903	0.4951	0.5100	1.4791
		0.5	0.2654	0.4818	0.4650	1.4528
	500	0.3	0.2528	0.4686	0.4642	1.4379
		0.4	0.2230	0.4605	0.4170	1.4294
		0.5	0.2042	0.4505	0.3829	1.4094
30%	200	0.3	0.4474	0.5742	0.7841	1.5619
		0.4	0.3928	0.5477	0.6921	1.5296
		0.5	0.3584	0.5265	0.6315	1.4946
	300	0.3	0.3707	0.5355	0.6415	1.5214
		0.4	0.3265	0.5171	0.5677	1.4991
		0.5	0.2975	0.5007	0.5157	1.4689
	500	0.3	0.2820	0.4809	0.5166	1.4537
		0.4	0.2486	0.4695	0.4630	1.4396
		0.5	0.2273	0.4579	0.4237	1.4168

TABLE 3

Open in new tab

Side-by-side comparison of the proposed estimator for formula under model (16) with the approach that misplaces the time origin under 50% left censoring and formula right censoring. Each entry is based on 500 simulations

			RASE(⁠⁠)		RASE(⁠⁠)
			Proposed	Misplaced	Proposed	Misplaced
	n	h	method	origin	method	origin
0%	200	0.3	0.4078	0.5410	0.7111	1.5325
		0.4	0.3557	0.5207	0.6246	1.5081
		0.5	0.3231	0.5033	0.5683	1.4757
	300	0.3	0.3302	0.5084	0.5762	1.4941
		0.4	0.2903	0.4951	0.5100	1.4791
		0.5	0.2654	0.4818	0.4650	1.4528
	500	0.3	0.2528	0.4686	0.4642	1.4379
		0.4	0.2230	0.4605	0.4170	1.4294
		0.5	0.2042	0.4505	0.3829	1.4094
30%	200	0.3	0.4474	0.5742	0.7841	1.5619
		0.4	0.3928	0.5477	0.6921	1.5296
		0.5	0.3584	0.5265	0.6315	1.4946
	300	0.3	0.3707	0.5355	0.6415	1.5214
		0.4	0.3265	0.5171	0.5677	1.4991
		0.5	0.2975	0.5007	0.5157	1.4689
	500	0.3	0.2820	0.4809	0.5166	1.4537
		0.4	0.2486	0.4695	0.4630	1.4396
		0.5	0.2273	0.4579	0.4237	1.4168

			RASE(⁠⁠)		RASE(⁠⁠)
			Proposed	Misplaced	Proposed	Misplaced
	n	h	method	origin	method	origin
0%	200	0.3	0.4078	0.5410	0.7111	1.5325
		0.4	0.3557	0.5207	0.6246	1.5081
		0.5	0.3231	0.5033	0.5683	1.4757
	300	0.3	0.3302	0.5084	0.5762	1.4941
		0.4	0.2903	0.4951	0.5100	1.4791
		0.5	0.2654	0.4818	0.4650	1.4528
	500	0.3	0.2528	0.4686	0.4642	1.4379
		0.4	0.2230	0.4605	0.4170	1.4294
		0.5	0.2042	0.4505	0.3829	1.4094
30%	200	0.3	0.4474	0.5742	0.7841	1.5619
		0.4	0.3928	0.5477	0.6921	1.5296
		0.5	0.3584	0.5265	0.6315	1.4946
	300	0.3	0.3707	0.5355	0.6415	1.5214
		0.4	0.3265	0.5171	0.5677	1.4991
		0.5	0.2975	0.5007	0.5157	1.4689
	500	0.3	0.2820	0.4809	0.5166	1.4537
		0.4	0.2486	0.4695	0.4630	1.4396
		0.5	0.2273	0.4579	0.4237	1.4168

Figure 2 shows the average estimates of based on 500 simulations under the four simulation settings described above. Figure 2(a)–(c) plots average estimates based on the proposed method corresponding to 0%, 20%, and 50% left censoring for ⁠, and Figure 2(d) corresponds to the fourth case. Figure 2(a)–(c) shows that the biases are small, thus the estimated curves fit the true curve quite well. In contrast, there are large biases and an obvious time shift in the estimated covariate effect for in Figure 2(d).

Averages of the estimates for and for , and 30% right censoring based on 500 simulations. The solid black lines are for and the dashed black lines are for . (a)–(c) The biases in the cases of 0%, 20%, and 50% left censoring rate of , respectively. (d) The results in the case of misplaced time origin by ignoring

Figure 2

Averages of the estimates for formula and formula for formula ⁠, formula and 30% right censoring based on 500 simulations. The solid black lines are for formula and the dashed black lines are for formula ⁠. (a)–(c) The biases in the cases of 0%, 20%, and 50% left censoring rate of formula ⁠, respectively. (d) The results in the case of misplaced time origin by ignoring formula

Open in new tab Download slide

Figure 3 shows standard errors of and based on 500 simulations under the four simulation settings. Figure 3(a)–(d) plots results under the four simulation settings. In all four plots, the sample standard error curves are quite close to the estimated standard error curve. In the first three cases, large variations for time near zero are typical for the local linear approach near the boundaries; see page 73 of Fan and Gijbels (1996). The fourth case in Figure 3(d) does not have large variation near zero as the new time zero is shifted from a time point that is of duration after the actual time origin for ith subject, ⁠.

Sample and estimated standard errors of the estimates for and for , and 30% right censoring based on 500 simulations. The solid lines are for and the dashed lines are for . The gray lines are the estimated standard error and the black ones are the sample standard error. (a)–(c) The results in the cases of 0%, 20%, and 50% left censoring rate of , respectively. (d) The results in the case of misplaced time origin by ignoring

Figure 3

Sample and estimated standard errors of the estimates for formula and formula for formula ⁠, formula and 30% right censoring based on 500 simulations. The solid lines are for formula and the dashed lines are for formula ⁠. The gray lines are the estimated standard error and the black ones are the sample standard error. (a)–(c) The results in the cases of 0%, 20%, and 50% left censoring rate of formula ⁠, respectively. (d) The results in the case of misplaced time origin by ignoring formula

Open in new tab Download slide

Figure 4 shows the coverage probabilities of 95% pointwise confidence intervals for and for based on 500 simulations under the four simulation settings. Figure 4(a)–(c) shows that the proposed estimators have accurate coverage probabilities close to the 0.95 nominal level except for time near zero, while Figure 4(d) shows very poor coverage probabilities for both and for the approach that ignores the censored time origin.

Coverage probabilities of 95% pointwise confidence intervals for and for , and 30% right censoring based on 500 simulations. The solid lines are the coverage probabilities for and the dashed lines are for . (a)–(c) The coverage probabilities of 95% pointwise confidence intervals in the cases of 0%, 20% and 50% left censoring rate of , respectively. (d) The results in the case of misplaced time origin by ignoring

Figure 4

Coverage probabilities of 95% pointwise confidence intervals for formula and formula for formula ⁠, formula and 30% right censoring based on 500 simulations. The solid lines are the coverage probabilities for formula and the dashed lines are for formula ⁠. (a)–(c) The coverage probabilities of 95% pointwise confidence intervals in the cases of 0%, 20% and 50% left censoring rate of formula ⁠, respectively. (d) The results in the case of misplaced time origin by ignoring formula

Open in new tab Download slide

An additional simulation study is conducted in Web Appendix D when the conditional hazard function of depends on the baseline covariates through the Cox model. The simulation results presented in Web Tables 1–3 and Web Figures 1–3 show that the theory and methods work well for the Cox model.

4.2 Simulation Study Using the Estimated Weight

A simulation study is conducted to evaluate the efficiency gain of the two-stage estimation procedure. We consider four different error models for ⁠. In Error Model I, is a normal distribution with mean and variance 1 conditional on the ith subject, and is N(0, 1). In Error Model II, has a normal distribution with mean and variance of conditional on the ith subject, and is N(0, 0.5²). Error Model III is same as for Error Model II except for from N(0, 0.3²). In Error Model IV, is a Gaussian process with mean 0 and variance ⁠, and and are independent for ⁠. Errors and for are dependent under both Error Models I, II, and III, but independent under Error Models IV. Variance of error is time-varying under Error Models II–IV, but constant under Error Models I. The second-stage estimator is obtained by using the estimated weight where is given in Section 2.3.

Define the empirical relative efficiency (Eff) of the weighted estimator to as ⁠. The efficiency gain depends on the distribution of error process ⁠. Table 4 shows the simulation results on performance of the second-stage estimator and its empirical relative efficiency. Overall empirical relative efficiency varies in the range of 1 and 1.35. The amount of the efficiency gain is not observed when variance of is constant. The second-stage estimator is most efficient when errors at different time points are not correlated. The efficiency gain is less obvious when errors are correlated. The simulation study also shows that there is no clear efficiency gain of over for all error models.

TABLE 4

Open in new tab

The empirical relative efficiency Eff(γ) of the two-stage estimator using the estimated weight to the estimator using unit weight under model (16) with 20% left censoring and 30% right censoring, n = 200, 300, 500 and formula ⁠. Each entry is based on 500 simulations

	Unit Weight				Estimated Weight
n	Bias	SSE	ESE	CP	Bias	SSE	ESE	CP	Eff(γ)
	Error Model I
200	0.0005	0.1979	0.1899	0.932	0.0006	0.1984	0.1875	0.932	0.9972
300	−0.0048	0.1648	0.1558	0.928	−0.0050	0.1650	0.1545	0.928	0.9988
500	0.0013	0.1164	0.1215	0.960	0.0012	0.1162	0.1208	0.960	1.0017
	Error Model II
200	0.0001	0.1282	0.1263	0.952	0.0006	0.1197	0.1143	0.940	1.0707
300	−0.0041	0.1070	0.1035	0.940	−0.0021	0.0992	0.0939	0.928	1.0789
500	0.0000	0.0774	0.0803	0.954	0.0009	0.0706	0.0733	0.964	1.0959
	Error Model III
200	−0.0007	0.1066	0.1071	0.958	0.0000	0.0918	0.0896	0.956	1.1612
300	−0.0036	0.0901	0.0876	0.940	−0.0013	0.0773	0.0732	0.924	1.1665
500	−0.0003	0.0658	0.0679	0.956	0.0007	0.0550	0.0570	0.964	1.1970
	Error Model IV
200	−0.0018	0.0929	0.0954	0.954	−0.0009	0.0709	0.0715	0.950	1.3100
300	−0.0028	0.0794	0.0779	0.948	−0.0006	0.0602	0.0580	0.932	1.3187
500	−0.0009	0.0597	0.0603	0.950	0.0003	0.0443	0.0451	0.958	1.3481

	Unit Weight				Estimated Weight
n	Bias	SSE	ESE	CP	Bias	SSE	ESE	CP	Eff(γ)
	Error Model I
200	0.0005	0.1979	0.1899	0.932	0.0006	0.1984	0.1875	0.932	0.9972
300	−0.0048	0.1648	0.1558	0.928	−0.0050	0.1650	0.1545	0.928	0.9988
500	0.0013	0.1164	0.1215	0.960	0.0012	0.1162	0.1208	0.960	1.0017
	Error Model II
200	0.0001	0.1282	0.1263	0.952	0.0006	0.1197	0.1143	0.940	1.0707
300	−0.0041	0.1070	0.1035	0.940	−0.0021	0.0992	0.0939	0.928	1.0789
500	0.0000	0.0774	0.0803	0.954	0.0009	0.0706	0.0733	0.964	1.0959
	Error Model III
200	−0.0007	0.1066	0.1071	0.958	0.0000	0.0918	0.0896	0.956	1.1612
300	−0.0036	0.0901	0.0876	0.940	−0.0013	0.0773	0.0732	0.924	1.1665
500	−0.0003	0.0658	0.0679	0.956	0.0007	0.0550	0.0570	0.964	1.1970
	Error Model IV
200	−0.0018	0.0929	0.0954	0.954	−0.0009	0.0709	0.0715	0.950	1.3100
300	−0.0028	0.0794	0.0779	0.948	−0.0006	0.0602	0.0580	0.932	1.3187
500	−0.0009	0.0597	0.0603	0.950	0.0003	0.0443	0.0451	0.958	1.3481

TABLE 4

Open in new tab

The empirical relative efficiency Eff(γ) of the two-stage estimator using the estimated weight to the estimator using unit weight under model (16) with 20% left censoring and 30% right censoring, n = 200, 300, 500 and formula ⁠. Each entry is based on 500 simulations

	Unit Weight				Estimated Weight
n	Bias	SSE	ESE	CP	Bias	SSE	ESE	CP	Eff(γ)
	Error Model I
200	0.0005	0.1979	0.1899	0.932	0.0006	0.1984	0.1875	0.932	0.9972
300	−0.0048	0.1648	0.1558	0.928	−0.0050	0.1650	0.1545	0.928	0.9988
500	0.0013	0.1164	0.1215	0.960	0.0012	0.1162	0.1208	0.960	1.0017
	Error Model II
200	0.0001	0.1282	0.1263	0.952	0.0006	0.1197	0.1143	0.940	1.0707
300	−0.0041	0.1070	0.1035	0.940	−0.0021	0.0992	0.0939	0.928	1.0789
500	0.0000	0.0774	0.0803	0.954	0.0009	0.0706	0.0733	0.964	1.0959
	Error Model III
200	−0.0007	0.1066	0.1071	0.958	0.0000	0.0918	0.0896	0.956	1.1612
300	−0.0036	0.0901	0.0876	0.940	−0.0013	0.0773	0.0732	0.924	1.1665
500	−0.0003	0.0658	0.0679	0.956	0.0007	0.0550	0.0570	0.964	1.1970
	Error Model IV
200	−0.0018	0.0929	0.0954	0.954	−0.0009	0.0709	0.0715	0.950	1.3100
300	−0.0028	0.0794	0.0779	0.948	−0.0006	0.0602	0.0580	0.932	1.3187
500	−0.0009	0.0597	0.0603	0.950	0.0003	0.0443	0.0451	0.958	1.3481

	Unit Weight				Estimated Weight
n	Bias	SSE	ESE	CP	Bias	SSE	ESE	CP	Eff(γ)
	Error Model I
200	0.0005	0.1979	0.1899	0.932	0.0006	0.1984	0.1875	0.932	0.9972
300	−0.0048	0.1648	0.1558	0.928	−0.0050	0.1650	0.1545	0.928	0.9988
500	0.0013	0.1164	0.1215	0.960	0.0012	0.1162	0.1208	0.960	1.0017
	Error Model II
200	0.0001	0.1282	0.1263	0.952	0.0006	0.1197	0.1143	0.940	1.0707
300	−0.0041	0.1070	0.1035	0.940	−0.0021	0.0992	0.0939	0.928	1.0789
500	0.0000	0.0774	0.0803	0.954	0.0009	0.0706	0.0733	0.964	1.0959
	Error Model III
200	−0.0007	0.1066	0.1071	0.958	0.0000	0.0918	0.0896	0.956	1.1612
300	−0.0036	0.0901	0.0876	0.940	−0.0013	0.0773	0.0732	0.924	1.1665
500	−0.0003	0.0658	0.0679	0.956	0.0007	0.0550	0.0570	0.964	1.1970
	Error Model IV
200	−0.0018	0.0929	0.0954	0.954	−0.0009	0.0709	0.0715	0.950	1.3100
300	−0.0028	0.0794	0.0779	0.948	−0.0006	0.0602	0.0580	0.932	1.3187
500	−0.0009	0.0597	0.0603	0.950	0.0003	0.0443	0.0451	0.958	1.3481

5 Analysis of Step Study

Step study was a multicenter, double-blind, randomized, placebo-controlled preventive HIV vaccine efficacy trial conducted in North America, the Caribbean, South America, and Australia from 2004 to 2009 (Buchbinder et al., 2008; Fitzgerald et al., 2011; Duerr et al., 2012). A co-primary objective of the study was to determine whether the MRKAd5 HIV gag/pol/nef vaccine, which elicits T cell immune responses to HIV proteins through delivery of the gag, pol, and nef HIV genes to the immune system by the adenovirus type 5 (Ad5) common cold vector, is capable of controlling HIV replication among participants who acquired HIV infection after vaccination. Three thousand HIV-1 negative participants at high risk of HIV infection and with ages between 18 and 45 were enrolled and randomly assigned to receive vaccine or placebo in 1:1 allocation, stratified by sex, study site, and anti-Ad5 neutralizing antibody titer at baseline.

Of the 3000 participants, 174 acquired HIV infection during the trial, 159 of which were male and 15 female. As females comprised only <10% of the sample, we analyze the males only. Study participants received antibody (Ab)-based HIV diagnostic ELISA tests at periodic study visits at Weeks 12, 52, and every six months thereafter through 5 years. Participants with a positive ELISA test had HIV infection confirmed by an antigen-based HIV-specific RNA PCR test. Moreover, for all confirmed HIV infected participants, a “look-back” procedure was applied wherein all earlier available blood samples going back in time were tested for HIV infection using the more sensitive RNA PCR test. The antibody tests used in Step have near-perfect sensitivity to detect HIV infections starting 4 weeks after HIV acquisition, but miss HIV infections before that, whereas the RNA PCR tests have near-perfect sensitivity starting about a week after HIV acquisition. Let be the time of the latest negative Ab- test result, the first positive Ab+ test result, and the actual HIV acquisition time. Based on the HIV testing algorithm, each infected participant was classified into one of two groups, defined by whether the earliest HIV positive sample was (Ab+, PCR+) or (Ab-, PCR+). The (Ab+, PCR+) group has such that and is left censored (Figure 1a), and the (Ab-, PCR+) group has and such that is observed (Figure 1b). The left censoring rate for is 70.4%.

At the time of a participant's first antibody-based positive HIV infection diagnosis (⁠⁠), 18 post-infection visits were scheduled at weeks 0, 1, 2, 8, 12, 26 and every 26 weeks thereafter through week 338. However, the actual dates of visits vary. We define as the time from to the jth visit for the ith infected participant. At each study visit, HIV viral load was measured. A participant is considered censored once he starts antiretroviral therapy (ART), which interferes with the assessment of vaccine effects on viral load and other biomarkers of interest. The censoring time is the time from to ART initiation, study drop-out, or the end of follow-up, whichever comes first. The right censoring rate of was 69.8%.

The data for analysis includes 159 participants (97 vaccine group, 62 placebo group) with 785 visits from HIV infection diagnosis and prior to ART initiation. One hundred and twenty-two participants were in North America or Australia and the rest were in the Caribbean or South America. Information was available on whether the participants were fully adherent to vaccinations and on baseline anti-Ad5 neutralizing antibody titer (Ad5 titer), each of which affect the T cell response to the vaccine and hence could associate with the viral load response. A spaghetti plot of the raw viral load data with one line for each participant by vaccination status and the study region is given in Web Figure 4 in Web Appendix F. We investigate the effect of MRKAd5 HIV-1 gag/pol/nef vaccine versus placebo on longitudinal HIV viral loads among the 159 HIV infected men. With ⁠, the logarithm (log₁₀) of HIV viral load at time t in years, we analyzed the data with the following model:

(18)

where is the treatment indicator (⁠ if participant i was assigned vaccine and 0 if placebo), is the study site indicator (⁠ if North America or Australia and 0 otherwise), is the natural logarithm of baseline Ad5 titer, and is the per-protocol indicator (⁠ if participant i was fully adherent to vaccinations and 0 otherwise).

We choose years since there are very few observations after and ⁠. The Kaplan–Meier estimator is used to estimate distribution of since the covariates in model (18) are not significant with large p-values in fitting the Cox model. Figure 5 shows the plot of the Kaplan–Meier estimator ⁠. We let for as the smallest uncensored is 0.1397 and the smallest value of is 0.0493.

Figure 5

Kaplan–Meier estimator of the distribution function of the time from actual HIV acquisition to the first positive Elisa confirmed by Western Blot or RNA for male HIV infected cases in Step trial

Open in new tab Download slide

We select bandwidth using the formula ⁠, where ⁠. Here is the sample variance of ⁠, for and each participant i, and is the sample variance of ⁠. Some discussions on bandwidth choice are given in Web Appendix E. With ⁠, the bandwidth choice is ⁠. The estimates of γ₁, γ₂, and γ₃ using unit weight are 0.0241, −0.0127, and −0.0124, with standard error 0.0406, 0.1935, and 0.1573, respectively. The results indicate no significant associations of baseline Ad5 titer, study site, or per-protocol status with the HIV viral load level. The estimates of time-varying effects with 95% pointwise confidence intervals are shown in Figure 6. Figure 6(b) suggests that there is a nonsignificant vaccine effect to reduce viral load after infection. However, the vaccine group tends to have lower viral load than the placebo group and the benefit increases over time.

Analysis of HIV viral load level for the male HIV infected cases in Step trial with model (18). (a) The estimate of the intercept and its 95% pointwise confidence interval. (b) The estimated effect of the vaccine and its 95% pointwise confidence interval. The solid curves are the point estimates and the dashed curves are the confidence intervals

Figure 6

Analysis of HIV viral load level for the male HIV infected cases in Step trial with model (18). (a) The estimate of the intercept formula and its 95% pointwise confidence interval. (b) The estimated effect formula of the vaccine and its 95% pointwise confidence interval. The solid curves are the point estimates and the dashed curves are the confidence intervals

Open in new tab Download slide

The two-stage estimation procedure with the estimated weight is applied to the analysis of Step study. The results of analysis using the estimated weight are similar and are presented in Web Appendix F.

6 Concluding Remarks

This paper investigated the semiparametric additive time-varying coefficient model for longitudinal data starting from a possibly censored relevant event. The developed method can be applied to assess the HIV vaccine effect on the post HIV infection longitudinal biomarkers such as HIV viral load level since the time of infection. However, the exact time of infection can be censored. The censorship status can be determined by an HIV antibody test followed by a more sensitive antigen-based HIV-specific PCR assay, or by an expanded set of HIV diagnostic assays with varying sensitivity properties (Grebe et al., 2019) and/or sequence diversification molecular clock models (Giorgi et al., 2010; Rossenkhan et al., 2019). The time from HIV acquisition to first HIV antibody positive test is subject to left censoring and its distribution can be estimated based on the Kaplan–Meier estimator or a failure time regression model such as the Cox model. We used the expectation maximization approach to deal with the censored time origin and developed the weighted profile least squares estimation procedure. The nonparametric kernel smoothing method was employed to estimate time-varying covariate effects. We investigated the asymptotic properties of both the parametric and nonparametric estimators and proposed consistent variance estimators.

Our simulation study showed that the proposed estimators work well with small biases and satisfying coverage probabilities for different levels of left censoring and moderate sample sizes while substantial bias was incurred for estimation of time-varying effects when misplacing the time origin by ignoring left censoring. The method was applied to analyze the male HIV infected cases from Step study, suggesting a borderline significant vaccine effect to lower viral load after infection by about a half to three-quarters of a log (base 10) during the period beyond 1.5 years post HIV acquisition. This result should be interpreted with caution given the potential for post-randomization selection bias, which could occur if there are baseline factors not adjusted for in the analysis that predict both viral load and the vaccine effect on HIV acquisition (Shepherd et al., 2006).

As with most statistical modeling, the application of the proposed method should be accompanied by model assessment. Web Figure 5 shows a scatter plot of residuals of the participants with for the Step study application. The residual plot is approximately centered around 0 between −2 and 2, which indicates the model fits the data reasonably well. However, it is more desirable to develop formal goodness-of-fit tests to examine the model fitness. The test statistics can be constructed based on certain test processes that are functionals of residual processes. The critical values of the test statistics can be estimated similarly to the technique of Sun et al. (2019). Further work is warranted to understand the empirical and theoretical properties of the tests. Although the development for the censored time origin is based on the semiparametric additive time-varying coefficients model, extensions to the generalized semiparametric regression models of Sun et al. (2013) are possible. The extension can be carried out by applying the EM approach to the expositions of Sun et al. (2013). It would also be interesting to investigate the random mixed-effects model with censored time origin that can be used to estimate subject-specific effects of covariates.

Data Availability Statement

The data that support the findings in this paper are available in the Supporting Information of this article.

Supporting Information

Web Appendices A–F, Tables 1–3, and Figures 1–5 referenced in Sections 2–5 together with the data and computer code are posted online with this paper at the Biometrics website on Wiley Online Library.

Acknowledgments

The authors thank an associate editor and two referees for their valuable comments and suggestions that have greatly improved the paper. This research was partially supported by the National Institutes of Health NIAID [grant number R37 AI054165]. The research of Yanqing Sun was also partially supported by the National Science Foundation [grant numbers DMS-1208978 and DMS-1915829] and the Reassignment of Duties fund provided by the University of North Carolina at Charlotte. We thank the HIV Vaccine Trials Network (HVTN) and Merck for providing the data analyzed in this article. The HVTN is supported through a cooperative agreement with the National Institutes of Health Division of AIDS, grant 5 U01 AI068635. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

References

Andersen

,

P.K.

&

Gill

,

R.D.

(

1982

)

Cox's regression model for counting processes: A large sample study

.

Annals of Statistics

,

10

,

1100

–

1120

.

Google Scholar

Crossref

WorldCat

Bickel

,

P.J.

,

Klaassen

,

C.A.

,

Bickel

,

P.J.

,

Ritov

,

Y.

,

Klaassen

,

J.

,

Wellner

,

J.A.

et al. (

1993

)

Efficient and adaptive estimation for semiparametric models

.

Baltimore, MD

:

Johns Hopkins University Press

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Buchbinder

,

S.

,

Mehrotra

,

D.

,

Duerr

,

A.

,

Fitzgerald

,

D.

,

Mogg

,

R.

,

Li

,

D.

et al. (

2008

)

Efficacy assessment of a cell-mediated immunity HIV-1 vaccine (the Step Study): a double-blind, randomised, placebo-controlled, test-of-concept trial

.

Lancet

,

372

,

1881

–

1893

.

Duerr

,

A.

,

Huang

,

Y.

,

Buchbinder

,

S.

,

Coombs

,

R.W.

,

Sanchez

,

J.

,

del Rio

,

C.

et al. (

2012

)

Extended follow-up confirms early vaccine-enhanced risk of HIV acquisition and demonstrates waning effect over time among participants in a randomized trial of recombinant adenovirus HIV vaccine (Step study)

.

Journal of Infectious Diseases

,

206

,

258

–

266

.

Fan

,

J.

&

Gijbels

,

I.

(

1996

)

Local polynomial modelling and its applications: Monographs on statistics and applied probability

.

Boca Raton, FL

:

CRC Press

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Fan

,

J.

,

Huang

,

T.

&

Li

,

R.

(

2007

)

Analysis of longitudinal data with semiparametric estimation of covariance function

.

Journal of the American Statistical Association

,

102

,

632

–

641

.

Fitzgerald

,

D.

,

Janes

,

H.

,

Robertson

,

M.

,

Coombs

,

R.

,

Frank

,

I.

,

Gilbert

,

P.

et al. (

2011

)

An ad5-vectored HIV-1 vaccine elicitscell-mediated immunity but does not affect disease progression in HIV-1- infected male subjects: results from a randomized placebo-controlled trial (the step study)

.

Journal of Infectious Diseases

,

203

,

765

–

772

.

Giorgi

,

E.

,

Funkhouser

,

B.

,

Athreya

,

G.

,

Perelson

,

A.

,

Korber

,

B.

&

Bhattacharya

,

T.

(

2010

)

Estimating time since infection in early homogeneous HIV-1 samples using a Poisson model

.

BMC Bioinformatics

,

11

,

532

.

Grebe

,

E.

,

Facente

,

S.N.

,

Bingham

,

J.

,

Pilcher

,

C.D.

,

Powrie

,

A.

,

Gerber

,

J.

et al. (

2019

)

Interpreting diagnostic histories into HIV infection time estimates: analytical framework and online tool

.

BMC Infectious Diseases

,

19

,

894

.

Hu

,

Z.

,

Wang

,

N.

&

Carroll

,

R.J.

(

2004

)

Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data

.

Biometrika

,

91

,

251

–

262

.

Google Scholar

Crossref

WorldCat

Janes

,

H.

,

Herbeck

,

J.T.

,

Tovanabutra

,

S.

,

Thomas

,

R.

,

Frahm

,

N.

,

Duerr

,

A.

et al. (

2015

)

HIV-1 infections with multiple founders are associated with higher viral loads than infections with single founders

.

Nature Medicine

,

21

,

1139

.

Lin

,

X.

&

Carroll

,

R.J.

(

2000

)

Nonparametric function estimation for clustered data when the predictor is measured without/with error

.

Journal of the American statistical Association

,

95

,

520

–

534

.

Google Scholar

Crossref

WorldCat

Lin

,

X.

&

Carroll

,

R.J.

(

2001

)

Semiparametric regression for clustered data using generalized estimating equations

.

Journal of the American Statistical Association

,

96

,

1045

–

1056

.

Google Scholar

Crossref

WorldCat

Lin

,

D.Y.

&

Ying

,

Z.

(

2001

)

Semiparametric and nonparametric regression analysis of longitudinal data (with discussion)

.

Journal of the American Statistical Association

,

96

,

103

–

113

.

Google Scholar

Crossref

WorldCat

Qi

,

L.

,

Sun

,

Y.

&

Gilbert

,

P.

(

2017

)

Generalized semiparametric varying-coefficient model for longitudinal data with applications to adaptive treatment randomizations

.

Biometrics

,

73

,

441

–

451

.

Qu

,

A.

&

Li

,

R.

(

2006

)

Quadratic inference functions for varying-coefficient models with longitudinal data

.

Biometrics

,

62

,

379

–

391

.

Rerks-Ngarm

,

S.

,

Paris

,

R.

,

Chunsutthiwat

,

S.

,

Premsri

,

N.

,

Namwat

,

C.

,

Bowonwatanuwong

,

C.

et al. (

2013

)

Extended evaluation of the virologic, immunologic, and clinical course of volunteers who acquired HIV-1 infection in a phase III vaccine trial of ALVAC-HIV and AIDSVAX B/E

.

Journal of Infectious Diseases

,

207

(

8

),

1195

–

1205

.

Robins

,

J.

,

Rotnitzky

,

A.

&

Zhao

,

L.

(

1994

)

Estimation of regression coefficients when some regressors are not always observed

.

Journal of the American Statistical Association

,

89

,

846

–

866

.

Google Scholar

Crossref

WorldCat

Rossenkhan

,

R.

,

Rolland

,

M.

,

Labuschagne

,

J.

,

Ferreira

,

R.

,

Magaret

,

C.

,

Carpp

,

L.

et al. (

2019

)

Combining viral genetics and statistical modeling to improve HIV-1 time-of-infection estimation towards enhanced vaccine efficacy assessment

.

Viruses

,

11

,

607

.

Shepherd

,

B.

,

Gilbert

,

P.B.

,

Jemiai

,

Y.

&

Rotnitzky

,

A.

(

2006

)

Sensitivity analyses comparing outcomes only existing in a subset selected post-randomization, conditional on covariates, with application to HIV vaccine trials

.

Biometrics

,

62

,

332

–

342

.

Sun

,

Y.

&

Gilbert

,

P.

(

2012

)

Estimation of stratified mark-specific proportional hazards models with missing marks

.

Scandinavian Journal of Statistics

,

39

,

34

–

52

. PMCID: PMC3601495.

Google Scholar

Crossref

WorldCat

Sun

,

Y.

,

Qi

,

L.

,

Heng

,

F.

&

Gilbert

,

P.B.

(

2019

)

Analysis of generalized semiparametric mixed varying-coefficients models for longitudinal data

.

Canadian Journal of Statistics

,

47

,

352

–

373

.

Google Scholar

Crossref

WorldCat

Sun

,

Y.

,

Qian

,

X.

,

Shou

,

Q.

&

Gilbert

,

P.

(

2017

)

Analysis of two-phase sampling data with semiparametric additive hazards models

.

Lifetime Data Analysis

,

23

,

377

–

399

.

Sun

,

Y.

,

Sun

,

L.

&

Zhou

,

J.

(

2013

)

Profile local linear estimation of generalized semiparametric regression model for longitudinal data

.

Lifetime Data Analysis

,

19

,

317

–

349

.

Sun

,

Y.

&

Wu

,

H.

(

2005

)

Semiparametric time-varying coefficients regression model for longitudinal data

.

Scandinavian Journal of Statistics

,

32

,

21

–

48

.

Google Scholar

Crossref

WorldCat

van der Vaart

,

A.W.

(

1998

)

Asymptotic statistics

.

Cambridge

:

Cambridge University Press

.

Wang

,

N.

,

Carroll

,

R.J.

&

Lin

,

X.

(

2005

)

Efficient semiparametric marginal estimation for longitudinal/clustered data

.

Journal of the American Statistical Association

,

100

,

147

–

157

.

Google Scholar

Crossref

WorldCat

Yang

,

G.

,

Sun

,

Y.

,

Qi

,

L.

&

Gilbert

,

P.

(

2017

)

Estimation of stratified mark-specific proportional hazards models under two-phase sampling with application to HIV vaccine efficacy trials

.

Statistics in Biosciences

,

9

,

259

–

283

.

Ying

,

Z.

(

1989

)

A note on the asymptotic properties of the product-limit estimator on the whole line

.

Statistics & Probability Letters

,

7

,

311

–

314

.

Google Scholar

Crossref

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
January 2024	45
February 2024	27
March 2024	22
April 2024	23
May 2024	19
June 2024	24
July 2024	19
August 2024	16
September 2024	38
October 2024	27
November 2024	20
December 2024	32
January 2025	24
February 2025	15
March 2025	37
April 2025	44
May 2025	17

Article Contents

Semiparametric Additive Time-Varying Coefficients Model for Longitudinal Data with Censored Time Origin

Abstract

1 Introduction

2 Profile Weighted Least Squares Estimation through EM Algorithm

2.1 Preliminaries

2.2 Estimation Procedures

2.3 Computational Issues and the Weight Selection

3 Asymptotic Properties

4 Simulation Study

4.1 Simulation Study Using Unit Weight

4.2 Simulation Study Using the Estimated Weight

5 Analysis of Step Study

6 Concluding Remarks

Data Availability Statement

Supporting Information

Acknowledgments

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Related articles in

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Semiparametric Additive Time-Varying Coefficients Model for Longitudinal Data with Censored Time Origin Free

Abstract

1 Introduction

2 Profile Weighted Least Squares Estimation through EM Algorithm

2.1 Preliminaries

2.2 Estimation Procedures

2.3 Computational Issues and the Weight Selection

3 Asymptotic Properties

4 Simulation Study

4.1 Simulation Study Using Unit Weight

4.2 Simulation Study Using the Estimated Weight

5 Analysis of Step Study

6 Concluding Remarks

Data Availability Statement

Supporting Information

Acknowledgments

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Related articles in

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Semiparametric Additive Time-Varying Coefficients Model for Longitudinal Data with Censored Time Origin