Summary

Identifying biomarkers as surrogates for clinical endpoints in randomized vaccine trials is useful for reducing study duration and costs, relieving participants of unnecessary discomfort, and understanding vaccine-effect mechanism. In this article, we use risk models with multiple vaccine-induced immune response biomarkers to measure the causal association between a vaccine’s effects on these biomarkers and that on the clinical endpoint. In this setup, our main objective is to combine and select markers with high surrogacy from a list of many candidate markers, allowing us to get a more parsimonious model which can potentially increase the predictive quality of the true markers. To address the missing “potential” biomarker value if a subject receives placebo, we utilize the baseline immunogenicity predictor design augmented with a “closeout placebo vaccination” group. We then impute the missing potential marker values and conduct marker selection through a stepwise resampling and imputation method called stability selection. We test our proposed strategy under relevant simulation settings and on (partially simulated) biomarker data from a HIV vaccine trial (RV144).

1. Introduction

The identification of immune response biomarkers as surrogate endpoints is an important research area in HIV vaccine development. If the effect of a vaccine on some immune response biomarkers, measured shortly after vaccination, can reliably predict the effect of the vaccine on the clinical endpoint (e.g. HIV infection), then that information can be used to promptly screen vaccine candidates for further refinement, and to predict vaccine efficacy when designing a large efficacy trial. Moreover, analyzing these surrogate biomarkers might help understand the mechanism behind the effect of a given vaccine on the primary endpoint.

Existing work in this field has mainly focused on evaluating a univariate immune response marker as a surrogate endpoint. Several authors (Gilbert and Hudgens, 2008; Huang and Gilbert, 2011; Wolfson and Gilbert, 2010, and others) have studied methods within the principal surrogate framework for its advantages, which will be our preferred setup as well in this article. To understand a marker’s surrogacy under this framework, it is necessary to model the clinical outcome conditional on treatment status, the pair of potential biomarker values if assigned to each treatment, and their interaction with the treatment. Evaluation of principal surrogate markers is quite challenging, however, due to the potential marker value that is missing, as each individual is assigned only to either the intervention or the placebo. Even in simpler settings, such as HIV vaccine trials, where there is no variability in immune response biomarker values if receiving placebo (called the “Constant Biomarker” setting; see Gilbert and Hudgens, 2008; Wolfson and Gilbert, 2010), principal surrogate estimands still remain unidentifiable in standard vaccine trials.

Follmann (2006) proposed two ways to improve the study design for the better identification and estimation of the principal surrogate in case of a univariate surrogate. The first strategy, baseline immunogenicity predictors (BIP) design, imputes the unobserved immune response biomarker based on the observed relationship between the baseline covariates and the biomarker values. The second strategy, “closeout placebo vaccination” (CPV), enhances the design by vaccinating uninfected placebo recipients at the end of the trial and measuring their subsequent biomarker values, as if they had been recorded from participants assigned to vaccine at the beginning of the trial. For univariate markers, Gilbert and Hudgens (2008) showed that strong, untestable assumptions are required on the risk model to estimate the surrogacy measures in a BIP design, while under relatively weaker assumptions of “no infections during closeout” and “time constancy” (see Section 2.4 for these assumptions), the inclusion of the CPV component allows estimation of the risk model and evaluation of risk model assumptions.

Despite these developments, research regarding selection and modeling of surrogate endpoints in the presence of a large number of candidates is lacking. In a vaccine trial, multiple immune response markers are typically measured including binding antibody (Ab), neutralizing Ab, and T-cell responses. Moreover, the search for a “perfect” surrogate has typically been unsuccessful (Fleming and DeMets, 1996; Weir and Walley, 2006). Thus, it might very well happen that any one biomarker on its own has only mediocre surrogate value but combining multiple of these biomarkers together show much improved surrogacy. A crucial challenge in combining multiple biomarkers in a regression model is the “curse of dimensionality,” which is expected when most of these candidate markers have no principal surrogacy values of their own, or when conditioned on some of the other “more relevant” biomarkers in the model. Thus, it is important to find a parsimonious combination of markers from a given list that has a high surrogacy value, which can also ensure that the risk model does not suffer from overfitting. This will be our main research objective in this article.

We make two major contributions in this article. Firstly, to our knowledge, this is the first article to investigate the identifiability issue for evaluating multiple biomarkers together. We show that if the number of biomarkers considered in the risk model exceeds the dimension of the baseline covariates, the identifiability that the BIP design promises, falls through. However, augmenting the BIP design with a CPV component in this situation will reaffirm identifiability of the risk model parameters. Secondly, for the BIP + CPV design, to select useful surrogates in the presence of the missing potential outcomes, we develop multiple-imputation based feature selection methods utilizing parametric and nonparametric imputation models, based on the idea of combining bootstrap imputation with stability selection (Long and Johnson, 2015). Our results have important applications in planning of future vaccine trials.

In Section 2, we present the problem setting and our main objectives. In Section 3, we show how these assumptions may not still guarantee identifiability in risk models with multiple markers, and show that by augmenting the BIP design with a CPV component, we can identify all the estimands in the conditional risk model. In Section 4, we propose methods to impute the missing potential marker values and estimate the risk model parameters. Then in Section 5, we make a case for our proposed methodology through various simulated studies. In Section 6, we apply our methods to marker data from an HIV vaccine trial with simulated CPV component. Finally, in Section 7, we discuss the relative advantages and disadvantages of the proposed methodology for surrogate marker identification.

2. Background

In this section, we first introduce the notation that we are going to follow for the rest of the article, introduce our main objectives, and present the problem setup in detail.

2.1. Notation

We consider a two-arm randomized trial. Let |$Z$| be the binary treatment indicator, 0 for placebo and 1 for vaccine. Let |$\textbf{S}=\{S_1,\dots,S_J\}$| be a set of candidate surrogates in consideration, such that each |$S_j,\: j=1,\dots,J$| is measured on the continuous scale at fixed time |$\tau$| after randomization. Let |$Y$| be the binary clinical endpoint, which takes value 0 for the non-diseased and 1 for the diseased, and we assume that |$\textbf{S}$| is measured only if |$Y^{\tau}=0$| where |$Y^{\tau}$| is the indicator that a person develops disease before |$\tau$|⁠. We also measure baseline covariates |$\textbf{W}$| of size |$K$| that include demographics and laboratory measurements. The observed data are thus |$n$| iid copies of |$O_i = (Z_i,\textbf{W}_i, \textbf{S}_i, Y_i, Y^{\tau}_i)', i=1,\cdots, n$|⁠. Let |$\textbf{S}(z)$|⁠, |$Y^{\tau}(z)$|⁠, |$Y(z)$| be the corresponding potential outcomes under treatment assignment |$z$|⁠, for |$z=0,1$|⁠. If |$Y^{\tau}(z)=1$|⁠, |${\textbf{S}}(z)$| is undefined and we set |${\textbf{S}}(z)=\ast$|⁠.

Here, we assume the “Equal Early Clinical Risk” condition (see Huang and Gilbert, 2011), which assumes |$Y^{\tau}(1) = Y^{\tau}(0)$| for all subjects. This assumption may be violated if the vaccine is efficacious at an early time point, but it becomes plausible if relatively few clinical events happen before the biomarker is measured, which is typically the case in HIV vaccine trials. Under this assumption, |$Y^{\tau}= 0$| indicates |$Y^{\tau}(1) = Y^{\tau}(0) = 0$|⁠. Note that the principal stratum characterized by |$Y^{\tau}(1) = Y^{\tau} (0) = 0$| is the population of individuals who will be disease-free at time |$\tau$| when the biomarker is measured, whether or not they receive vaccine or placebo. This is the target population within which the modeling of biomarker/disease relation will be performed in this article, and within this target population, all individuals have well-defined |${\textbf{S}}(1)$| and |${\textbf{S}}(0)$|⁠. Henceforth, we simplify the notation by dropping the conditioning of all probabilities on |$Y^{\tau}(1) = Y^{\tau} (0) = 0$|⁠.

2.2. Motivation and objectives

In a vaccine trial, it is an important objective to model and select the true surrogate endpoints among S, the list of the collected biomarkers, for potential gain in surrogacy. However, under the principal surrogate framework, identifiability is a serious concern for the risk model conditional on potential biomarker values, because of missing potential outcomes (⁠|$\textbf{S}(1)$| is unobserved in the placebo recipients, while |$\textbf{S}(0)$| is unobserved in the vaccinees).

Our first objective here is to show that under the BIP design (see Section 1), the risk model parameters (as defined in Assumption (A4) in Section 2.3) become non-identifiable whenever the number of markers exceeds the number of baseline predictors (BIPs) in the model. However, the additional CPV component in the BIP + CPV design alleviates this issue and allows for identification and estimation of the risk model parameters through proper imputation of the remaining missing (potential) marker values. Another crucial objective for us is to address the challenge of “curse of dimensionality” in the presence of missing potential outcomes, which is expected when most of these candidate markers have no principal surrogacy values. To achieve this, we develop imputation models built on the resampling approach of Long and Johnson (2015), that combines bootstrap imputation and stability selection for the BIP + CPV design (Section 4).

2.3. Assumptions in a BIP design

The following assumptions are necessary for estimation in the BIP design (Follmann, 2006).

(A1) SUTVA and consistency: |$\{\textbf{S}(1),\textbf{S}(0),Y(1),Y(0)\}$| is independent of the treatment assignments of other subjects, that is, |$\{\textbf{S}_i(1),\textbf{S}_i(0),Y_i(1),Y_i(0)\} \perp Z_j$| for |$i\neq j$|⁠. Also, given the treatment |$Z=z$| that subject |$i$| actually received, his/her potential outcome will equal the observed outcome, or |$Y_i = Y_i(z)$|⁠.

(A2) Ignorable treatment assignments: |$Z\perp \textbf{W},\textbf{S}(1),\textbf{S}(0),Y(1),Y(0)$|⁠.

(A3) Constant biomarkers (Case CB): |$\textbf{S}(0)=0$| for every individual.

(A4) Risk of |$Y(z)$| given |$\textbf{S}(1)$| and |$\textbf{W}$| for |$z\in\{0,1\}$| follows a generalized linear model with a pre-specified link function |$g$|⁠. In our article, we will consider risk models of the following form:

(2.1)

where |$\beta_0$|⁠, |$\beta_1$| are scalars, |$\beta_2$| and |$\beta_3$| are |$J\times 1$| vectors, |$\beta_4$| and |$\beta_5$| are |$K\times 1$| vectors.

In short, A1 assumes that participants in the trial are non-interacting with one another, while A2 assumes the premise of a randomized blinded design. A3 becomes relevant in applications such as HIV vaccine trials, where biomarkers of interest are immune responses to HIV targets. In these trials, only individuals without previous exposure to HIV-specific antigens are enrolled and subjects receiving placebo have no HIV-specific immune response. Thus, |$\textbf{S}(0)$| equals the zero-vector |$0$| in such cases, which can be sufficiently ignored from the risk model. Finally, assumption A4 posits a generalized linear model for the structural risks |$risk_{(Z)}(S_1(1),\ldots, S_J(1),\textbf{W})=P\left(Y(Z)=1|S_1(1),\ldots,S_J(1),\textbf{W}\right)$|⁠, our model of interest for this article.

Note that from assumption A2, we have that the conditional marker distribution |$\textbf{S}(1)|W$| is independent of |$Z$|⁠.

(2.2)

This is the main foundation of the BIP design, where the former distribution can be estimated from subjects in the |$Z=1$| arm and applied unbiasedly to participants in the |$Z=0$| arm. |$\textbf{W}$| is a set of baseline immunogenicity variables that not only helps predict the risk of |$Y$|⁠, but more importantly helps predict |$\textbf{S}(1)$|⁠. In a BIP design, the precision of the principal surrogacy estimand depends critically on the strength of the correlation between the BIP and the potential biomarker. Covariates that are easy to acquire from participants (e.g. demographics) typically have low correlations with vaccine-induced immune responses and are, therefore, not useful as BIPs. However, biomarker measures related to the immune response of interest are potentially “good” baseline predictors (see Huang, 2018).

2.4. Assumptions in BIP+CPV design

This enhanced design (Follmann, 2006) includes a CPV component, where a proportion of placebo recipients who are uninfected at the end of the trial receive vaccine at closeout and their immune response |$\textbf{S}^c$| is measured |$\tau$| time-units after close-out. If further conditions A5 and A6 (stated below) hold, then for uninfected placebo recipients we can substitute |$\textbf{S}^c$| for their unobserved counterfactuals |$\textbf{S}$|(1).

(A5) No infections during the closeout period. That is, no placebo recipients uninfected at close-out experienced a disease event over the next |$\tau$| time-units.

(A6) Time constancy of the immune response distribution for uninfected placebo recipients: |$(S_{i1}(1),\ldots, S_{iJ}(1))=(S_1^{true},\ldots, S_J^{true})+(U_{i1},\ldots,U_{iJ})$|⁠, |$(S_{i1}^c,\ldots, S_{iJ}^c)=(S_1^{true},\ldots, S_J^{true})+(U_{i1}^{\star},\ldots, U_{iJ}^{\star})$|⁠, where |$S_1^{true},\ldots,S_J^{true}$| are the “true” means of the immune response markers under vaccine, |$\textbf{S}(1)=\{S_1(1),\ldots,S_J(1)\}$|⁠, while |$(U_{i1},\ldots, U_{iJ})$| and |$(U_{i1}^{\star},\ldots, U_{iJ}^{\star})$| are iid vectors of random errors with mean zero and the same distribution. Thus, A6 implies |$\textbf{S}^c$| can be used in place of |$\textbf{S}(1)$| for the uninfected placebo recipients without changing the risk model.

Note:Since |$\textbf{S}(0)$| is inconsequential for our purpose, we will simplify the notation and use |$\textbf{S}$| to denote |$\textbf{S}(1)$| (or its substitute |$\textbf{S}^c$|⁠) from now on.

3. Identifiability in the BIP and in the BIP + CPV designs

Now, we investigate identifiability of the risk model under the BIP and the BIP + CPV design. Previously, a one marker - one BIP model (⁠|${\textbf{S}}:=\{S_1\}$| and |$\textbf{W}:=W$|⁠) was considered in Gilbert and Hudgens (2008),

(3.1)

where parametric and nonparametric estimated-likelihood approaches were used to estimate parameters in the risk model. Although identifiability could be established for the above risk model for |$g:=\Phi$| (the normal cumulative density function), it could only be achieved through imposition of the untestable constraint |$\beta_5=0$| (see web appendix of Gilbert and Hudgens, 2008).

Huang and Gilbert (2011) extended the above setup by considering a combination of more than one marker(s) in the model, as adding new marker(s) to the risk model might improve the surrogate value of the combination, and proposed a semiparametric approach to maximize the likelihood. Although their estimation methods are applicable to the BIP design, we show below that the risk model becomes non-identifiable when the number of markers in the model is larger than the number of BIPs (which is often the case), and thus estimation of the risk model parameters becomes intractable.

Thus, we consider model (2.1) from Assumption A4 (restated below with revised notation), a multimarker extension of the univariate model in (3.1), with similar distributional assumptions on |$\textbf{S}|\textbf{W}$| as in Gilbert and Hudgens (2008), that is, we assume |$\textbf{S}|\textbf{W} \sim N_J(\mu(\textbf{W}),\Sigma)$|⁠, where |$\mu(\textbf{W})$| is a linear function of |$\textbf{W}$|⁠.

(3.2)

We will show that identifiability is difficult to achieve for model (3.2), even after imposing several untestable constraints. As noted before, |$\beta_0$|⁠, |$\beta_1$| are scalars, |$\beta_2$| and |$\beta_3$| are |$J\times 1$| vectors, |$\beta_4$| and |$\beta_5$| are |$K\times 1$| vectors. Note that given usual regulatory conditions are satisfied, we can identify |$1+J+K$| parameters from the |$Z=1$| group |$\{\beta_0+\beta_1,\beta_{21}+\beta_{31},\dots,\beta_{2J}+\beta_{3J},\beta_{41}+\beta_{51},\dots, \beta_{4K}+\beta_{5K}\}$| from the above risk model (3.2). Now, we will use a convenient fact here, that when |$\textbf{S}|\textbf{W}$| follows |$N_J(\mu(\textbf{W}),\Sigma)$|⁠, |$Y|Z=0, \textbf{W}$| follows a probit model, given by |$P(Y=1|Z=0,\textbf{W})=\Phi(\alpha_0+\alpha_1^T\textbf{W})$|⁠, where |$\alpha_0$| is a scalar and |$\alpha_1$| is a |$K\times 1$| vector. A proof of this convenient fact is provided in supplementary material Appendix A available at Biostatistics online. Note that this convenient fact allows us to identify another |$1+K$| parameters based on observed data (⁠|$Y$| and |$\textbf{W}$|⁠) from the placebo recipients. This means we can only identify |$2+J+2K$| parameters in the model, while there are |$2+2J+2K$| parameters in total. Even if we enforce an untestable constraint on the interaction effect between each of the baseline variables |$\textbf{W}$| and the treatment |$Z$|⁠, |$\beta_{5k}=0,\:k=1,\dots,K$|⁠, the risk model will still have |$2+2J+K$| parameters, which means that identifiability will still be an issue if |$\dim(\textbf{W})=K<J$|⁠.

One obvious way to solve this problem is by increasing the dimension of |$\textbf{W}$|⁠. However, this approach is not very practical as good BIPs are usually hard to find. Thus in BIP designs, when we have a relatively large number of markers, identifiability becomes an issue that cannot be ignored. It is more likely that from the list of available biomarkers, only a few will contribute towards the optimal biomarker combination that achieves the best surrogacy. Under no prior knowledge of this combination, we need to consider the entire list of markers for specifying the risk model, making the BIP design an insufficient approach for estimation of the risk model parameters in (3.2).

3.1. Reinforced identifiability in BIP+CPV design

Follmann (2006) proposed the BIP+CPV design as an alternative to counter the identifiability issues in the BIP design. It is worthwhile to note that while the CPV design was proposed initially for a single marker, it can be applied in the same way if there are multiple markers, and under A1–A3 and A5–A6, the |$risk_{(0)}$| model becomes fully testable. This is because under A1–A3 and A5–A6, we can directly observe each of the two conditional distribution functions |$F_1^W(\cdot)$| and |$F_{00}^W(\cdot)$|⁠, such that,

  • |$\textbf{S}|Z=1,\textbf{W}\sim F_1^W(\cdot)$|⁠.

  • |$\textbf{S}|Z=0,Y=0,\textbf{W} \sim F_{00}^W(\cdot)$|⁠.

And additionally we can also observe the binomial probability function

Now from A2 (see Section 3) we have that |$\textbf{S}|Z=0,\textbf{W} \stackrel{d}{=} \textbf{S}|Z=1,\textbf{W}$|⁠, which means that we can fully identify the missing conditional distribution |$\textbf{S}|Z=0,Y=1,\textbf{W}$| using the observed conditional distributions from above. Now letting |$\textbf{S}|Z=0,Y=1,\textbf{W} \sim F_{01}^W(\cdot),$| we have

(3.3)

Assuming |$\pi_0(\textbf{W})>0$| almost surely, the BIP+CPV design allows us to consistently identify all the conditional distributions in the model, estimate the risk model parameters correctly and assess the goodness of fit of the estimated risk model.

To see the importance of result (3.3), note that although A2 in the BIP design allows us to draw instances from |$P(\textbf{S}|Z=0, \textbf{W})$| by using the distribution of |$\textbf{S}|Z=1, \textbf{W}$| among the vaccinees, it does not allow us to identify the two conditional distributions |$\textbf{S}|Z=0,Y=0,\textbf{W}$| and |$\textbf{S}|Z=0,Y=1,\textbf{W}$| separately. Hence, while we can impute |$\textbf{S}$| for a placebo recipient (at a given level of |$\textbf{W}$|⁠) from its marginal distribution and assess the reasonableness of the assumed |$risk_{(0)}$| model, we cannot separately impute them for the diseased and nondiseased individuals among the placebo recipients. But with the BIP + CPV design and additional conditions A5 and A6, we can now impute consistent values of |$\textbf{S}$| for the two groups |$(Z=0, Y=1)$| and |$(Z=0, Y=0)$| separately. This lets us complete the dataset (⁠|$D_{\text{obs}}\rightarrow D_{\text{comp}}$|⁠) and use the generalized linear models framework to estimate the risk model parameters in equation (3.2).

4. Estimation and marker selection in the BIP+CPV design

Analysis of high-dimensional biomedical data poses many challenges for proper inference and prediction from the risk model. In high-dimensional data, it is often desirable to have parsimonious or sparse representations of prediction models (Donoho, 2000). One way to overcome the challenges of analyzing such data is by finding a smaller set of biomarkers that can perform the task of prediction sufficiently well. Hence, it is crucial that we overcome this “curse of high dimensionality” by removing the non-essential biomarkers from the model, which can be achieved through the use of an appropriate and effective feature selection method, resulting in a model with better predictive accuracy.

Since highly complex models are penalized more, regularization can help control the complexity of a binary classification by minimizing over-fitting of the training data. This approach is generally evaluated by simultaneously maximizing the likelihood and minimizing the number of biomarkers selected. One straight forward way to include regularization in our analysis is by including sparsity specification in the risk model using the completed data |$D_{\text{comp}}$|⁠, so instead of maximizing the likelihood, we minimize

where |$\ell(\cdot)$| denotes the log likelihood given the data. But since we are dealing with missing data and imputed datasets here, a better alternative is to use the mechanism of regularization combined with imputation. For that purpose, we develop a resampling-based approach that combines the idea of bootstrap imputation from the conditional distribution |$F^{W}_{01}(\cdot)$|⁠, using equation (3.3), with stability selection, following the idea of Long and Johnson (2015). We describe the proposed methodology below.

4.1. Bootstrapped imputation with stability selection (BI-SS)

To obtain the risk model parameters, our methods utilize the stability selection procedure combined with bootstrap imputation, as originally proposed in Long and Johnson (2015). One necessary assumption for stability selection is that the data are missing at random (MAR), and it can be seen that assumption is satisfied in our design. The procedure contains two basic steps as follows: (i) generate |$B$| bootstrap data sets |${D^{(b)}, b =1, \dots , B}$| with associated missing indicator matrix |${\delta^{(b)}, b =1, \dots , B}$| based on the observed data |$D_{\text{obs}} =\{Y, \textbf{S}_{\text{obs}}, Z, \textbf{W}\}$| and |$\delta$| and (ii) conduct imputation for each bootstrap data set |$D^{(b)}$| and |$\delta^{(b)}$| using an imputation method of choice. The resultant imputed data sets are denoted by |${D^{(b)}_I =\{Y^{(b)}, \textbf{S}^{(b)}_I, Z^{(b)}, \textbf{W}^{(b)}\}, b =1, \dots , B}$|⁠, where |$\textbf{S}^{(b)}_I$| is the completed set of marker values (constructed through the imputation step) from the |$b^{th}$| bootstrapped data set. The detailed steps of the algorithm are given below:

  1. Start off with a set |$\Lambda$| of feasible values of |$\lambda$|⁠. For example, we can do this: for a threshold parameter |$\Theta$|⁠, define |$\lambda_{\Theta}$| as the value of |$\lambda$| such that for every |$\lambda \geq \lambda_{\Theta}$| at least |$\Theta$| proportion of |$\beta$|s are removed from the model. Hence |$\Lambda=\{\lambda \geq \lambda_{\Theta}\}$|⁠.

  2. For |$b=\{1,\dots,B\}$|⁠, do the following:

    • (a) Generate bootstrap sample |$D_{\text{obs}}^{(b)} =\{Y^{(b)}, \textbf{S}^{(b)}_{\text{obs}}, Z^{(b)}, \textbf{W}^{(b)}\}$|⁠.

    • (b) Impute |$\textbf{S}^{(b)}_{\text{mis}}$| by a chosen imputation method to complete the data by augmenting |$\textbf{S}^{(b)}_{I}=\{\textbf{S}^{(b)}_{\text{obs}},\textbf{S}^{(b)}_{\text{mis}}\}$|⁠.

    • (c) Generate |$v_j^{(b)}$|’s, independently and identically distributed RVs in |$[\alpha,1]$| with |$\alpha \in (0,1)$|⁠.

    • (d) Using |$D_{I}^{(b)}=\{Y^{(b)}, \textbf{S}^{(b)}_{I}, Z^{(b)}, \textbf{W}^{(b)}\}$|⁠, we obtain randomized lasso estimate |$\widehat{\beta}^{(b)}_{\lambda}$| as
      where |$X_{(b),i}$| is the |$i^{th}$| row of |$X_{(b)}$|⁠, the covariate matrix obtained from the dataset |$D_I^{(b)}$|⁠, |$p$| is the number of rows in |$X_{(b),i}$|⁠, and |$\beta=\{\beta_0,\dots,\beta_5\}$|
    • (e) Compute |$\widehat{\beta}^{(b)}_{\lambda}$| for all |$\lambda \in \Lambda$|⁠. Let |$\widehat{\mathcal{S}}_\lambda^{(b)}=\{I(\beta^{(b)}_{\lambda,1}\neq 0),\dots,I(\beta^{(b)}_{\lambda, p}\neq 0)\}$|⁠.

  3. The final estimated active set is given as |$\widehat{\mathcal{S}}_{\pi}=\{j:\max_{\lambda \in \Lambda}\Pi_j^{\lambda}\geq \pi\}$|⁠, where |$\Pi_j^{\lambda}=(1/B)\sum_{b=1}^B I(j\in \widehat{\mathcal{S}}^{(b)}_{\lambda})$|⁠. According to Long and Johnson (2015), a good choice for |$\pi$| is |$\pi \in (0.6,0.9)$|⁠.

The imputation step (b) can be approached in several ways. Long and Johnson (2015) proposed to use standard imputation programs such as the R packages |$mi$| and |$mice$| or to conduct a single imputation for each bootstrap sample when |$p<n$|⁠. This, however, does not seem to be the appropriate approach in our problem setting. Even with the enhanced BIP + CPV design, each infected placebo recipient has zero probability of observing |$\textbf{S}$|⁠. This will make it difficult for |$mice$| to build a correct predictive model for |$\textbf{S}|Z=0,Y=1,\textbf{W}$| in the absence of any observable data from this cohort, and without the knowledge of assumption A2 (or equation (2.2)).

Hence, we propose to conduct step (b) in the following way,

  • (b1) Use equation 3.3 to obtain the estimated probability distribution |$\widehat{F}(\textbf{S}|Z=0,Y=1,\textbf{W})$|⁠.

We develop different parametric and nonparametric methods to estimate |$\widehat{F}(\textbf{S}|Z=0,Y=1,\textbf{W})$| in (b1), with more details provided in Section 5.1 and supplementary material Appendix B available at Biostatistics online.

  • (b2) Impute |$\textbf{S}^{(b)}_{\text{mis}, i}\sim \widehat{F}(\textbf{S}|Z=0,Y=1,\textbf{W})$|⁠.

This method utilizes equation (3.3) to estimate the distribution |$F(\textbf{S}|Z=0,Y=1,\textbf{W})$| correctly, and thus allow us to impute the missing potential marker values unbiasedly.

It is worthwhile to note here is that our main focus in this article lies in identifying markers that interact with the vaccine, since for a marker to be a good surrogate, a large vaccine effect on the marker must indicate a large vaccine effect on the infection outcome as well, which is why it is important to select candidate markers that modify vaccine efficacy for further study. That being said, other parameters in the risk model, such as the vaccine effect |$\beta_1$| are also important for assessment of principal surrogates (see Huang and Gilbert, 2011; Gilbert and Hudgens, 2008, for assessing surrogacy based on estimated risk model). In a practical setting, only a few markers in the risk model will ultimately show surrogacy, and thus for the rest, their |$\beta_3$| coefficient will be zero. We thus introduce a new notation here, |$J_0$|⁠, to denote the number of markers for which their |$\beta_3$| coefficients are nonzero in the risk model.

5. A simulationstudy

5.1. Description of the simulation settings

We consider a vaccine trial, where |$N = 4000$| subjects are randomized in a 1:1 ratio to vaccine (⁠|$Z = 1$|⁠) and placebo (⁠|$Z = 0$|⁠). We observe |$\textbf{W}:=W$|⁠, a baseline covariate measured for everyone, the treatment indicator |$Z$|⁠, and the response |$Y$|⁠. Additionally, we assume that we have measurements of the vaccine-induced immune response biomarkers, |$\{S_1, \dots, S_J\}$|⁠, for all individuals in the vaccine arm, and that all uninfected individuals in the placebo arm (the CPV component) have |$\textbf{S}$| measured. Through the entire simulation exercise, we consider 25 markers (⁠|$J=25$|⁠), of which only |$J_0=2$| markers have nonzero interactions with |$Z$|⁠. We study five different simulation settings here, which are different from each with respect to three aspects:

  • Number of main effects in the model, that is, the number of markers that have an effect on the response |$Y$|⁠, regardless of whether any of them are expressed differently under exposure to vaccine than when not. We consider three different sizes for this, (i) low main effects, with only five markers with main effects in the risk model, (ii) medium main effects, with 12 markers with main effects in the risk model, and (iii) high main effects, with all 25 markers with main effects in the risk model.

  • Type of |$W$|, where both continuous and discrete forms of |$W$| are considered.

  • Type of the conditional marker distributions, we use two different parametric models for the conditional distributions |$F(\textbf{S}|Z=0,Y=1,W)$| and |$F(\textbf{S}|Z=0,Y=0,W)$|⁠, namely,

    • (a) Gaussian,
    • (b) Exponential,

The first three settings consider Gaussian models for the two conditional distributions with a continuous |$W$|⁠. Among them, the first one considers low main effects, the second one considers medium main effects, and the third one considers high main effects. The fourth setting also considers Gaussian models for the conditional distributions, but with a discrete |$W$|⁠, and medium main effects. The last setting considers exponential models for the two conditional distributions, with a discrete |$W$|⁠, and medium main effects. Additionally, in each setting, we assume a logit model for the risk of |$Y$| conditional on baseline covariates |$W$| among the vaccine recipients,

This assumption ensures that the GLM model as specified in A4 holds (see supplementary material Appendix B available at Biostatistics online). The inference procedure consists of three distinct steps. The first step consists of selecting the form of the risk model. In our simulation studies, we have considered the logit link, although potentially other forms of generalized linear model can be used in this step, depending on the assumptions on the risk model and the marker distribution. The second step consists of choosing the imputation model. We consider three different types of imputation steps, namely,

  1. Parametric and nonparametric: We use identity (2.2) to impute marker values in the infected placebo recipients. For the parametric method, parametric assumptions are used on underlying conditional marker distributions, |$F(\textbf{S}|Z=0,Y=0,W)$| and |$F(\textbf{S}|Z=0,Y=1,W)$|⁠, and then (3.3) is used to simulate from |$\widehat{F}(\textbf{S}|Z=0,Y=0,W)$|⁠. We discuss this in more details in supplementary material Appendix B available at Biostatistics online. For the nonparametric method, however, we propose to estimate |$\widehat{F}(\textbf{S}|Z=0,Y=0,W)$| nonparametrically. This is achieved by first empirically estimating |$\widehat{F}(\textbf{S}|Z=0,Y=0,w)$| and |$\widehat{F}(\textbf{S}|Z=1,w)$| for a given |$W=w$| from the observed data, and then using identity (3.3) to simulate from |$\widehat{F}(\textbf{S}|Z=0,Y=1,w)$|⁠. Here, we consider this method only in studies with discrete (and univariate) |$W$|⁠, but note the approach can be generalized to continuous (and multivariate) settings using smoothing techniques.

  2. MICE: As a comparative tool, we also use the mice (Multivariate Imputation by Chained Equations) package from CRAN for imputing the missing values. This was suggested as the ad hoc imputation step for stability selection by the authors in Long and Johnson (2015).

The third step of the inference procedure includes the variable selection step. We evaluate both the proposed BI-SS procedure for variable selection and two other variable selection/estimation mechanism for comparison.

  1. BI-SS. Used as described in Section 4.1, with one of the above imputation steps. The tuning parameter |$\pi$| in BI-SS controls the amount of feature selection, and a lower value of |$\pi$| tends to allow retention of more covariates in the model than a higher value of |$\pi$|⁠. Thus in this simulation study, we run BI-SS over a grid of values for |$\pi \in (0.6,0.9)$| as recommended by Long and Johnson (2015), which we then optimize based on performance.

  2. Direct selection. We alternatively conduct a single imputation step on the originally observed data and then perform direct |$\ell_1$| regularization on the imputed data. We conduct this step over multiple replicates (10 replicates have been used in all of our simulation settings). The estimated effects are calculated as the average of estimates from these 10 replicates.

  3. No selection. We only consider imputation and fitting the GLM model without pursuing any variable selection step.

To evaluate the proposed methodology, we look at the performance of each of the inferred risk models with respect to four different performance metrics. While misclassification error measures the test set performance of the chosen risk model, the other three measure the performance of the risk models based on our oracle knowledge of the simulation parameters.

  1. Sensitivity in feature selection: The probability that the truly nonzero interaction effect of a marker with the vaccine indicator |$Z$| is correctly selected for retention in the risk model.

  2. Specificity in feature selection: The probability that a truly zero interaction effect of a marker with the vaccine indicator |$Z$| is correctly discarded for retention in the risk model.

It can be seen that under these definitions, sensitivity in feature selection for the BI-SS procedure will be a decreasing function of the tuning parameter |$\pi$|⁠, while the specificity will be an increasing function of |$\pi$|⁠. Hence, besides looking at sensitivity and specificity scores separately, we also look at a weighted score of type

(5.1)

Then for a given weight |$w$|⁠, the threshold |$\pi$| for which it attains the best performance is evaluated.

  • 3. Misclassification error: The probability of being misclassified, i.e., proportion of individuals in the test data set, whose prediction from the model do not match the actual disease status.

Misclassification error is used to assess the test set performance of the chosen risk models. To do that, we generate a test dataset of size 5000, and then apply the fitted model to predict the disease status for each individual |$i$| in the test data set. We then report the proportion of predictions that do not match with the actual disease status.

  • 4. Model prediction error (ME): As proposed by Fan and Li (2001), if |$\widehat{\mu}(x)$| is the prediction procedure constructed using the above procedures, then the ME for a set of new observations |$(Y,x)$| is defined as,
    where |$E(Y|x)$| denotes the true risk model.

For each simulation setting, 100 simulations are conducted. For each simulated dataset, 800 bootstrapped datasets are first generated to perform the variable selection process under the BI-SS procedure. Once the risk model is selected, 500 additional imputed bootstrapped datasets are generated to estimate the final risk model parameters, which are averaged over these replicates.

5.2. Results

The results of our simulation study are summarized in Tables 13 in the main text (for simulation Settings 1, 2, and 4, respectively) and supplementary material Appendix E Tables S1 and S2 available at Biostatistics online (simulation Settings 3 and 5, respectively). Each table compares the performance between selection using BI-SS, direct selection, and no selection. The first eight columns show results for |$Score_w$|⁠, as defined in (5.1), for different values of weight |$w$|⁠. We consider four different weights in each of our tables, (i) sensitivity (⁠|$w=1$|⁠), (ii) specificity (⁠|$w=0$|⁠), (iii) equal weights to both (⁠|$w=1/2$|⁠), and (iv) more weight on specificity (⁠|$w=1/3$|⁠). The last option is important in this context, because one might want to put more emphasis on specificity as there are often a large number of irrelevant markers in the model, and being able to discard them becomes an important goal. For each value of |$w$|⁠, we also show the threshold for the BI-SS procedure that obtains the best |$Score_w$| value. Some of our observations from this exercise are summarized below.

Table 1.

Gaussian conditional distributions, continuous |$W$|⁠, low main effects

MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.725–0.710.9–0.850.830.7250.820.750.0050.7750.0010.775
Mice10.9–0.60.10.90.550.90.40.90.190.90.1170.9
DirectLassoPara10.140.570.430.0080.006
Mice100.50.330.260.025
NoSelPara100.50.330.0530.051
Mice100.50.330.0530.051
MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.725–0.710.9–0.850.830.7250.820.750.0050.7750.0010.775
Mice10.9–0.60.10.90.550.90.40.90.190.90.1170.9
DirectLassoPara10.140.570.430.0080.006
Mice100.50.330.260.025
NoSelPara100.50.330.0530.051
Mice100.50.330.0530.051
Table 1.

Gaussian conditional distributions, continuous |$W$|⁠, low main effects

MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.725–0.710.9–0.850.830.7250.820.750.0050.7750.0010.775
Mice10.9–0.60.10.90.550.90.40.90.190.90.1170.9
DirectLassoPara10.140.570.430.0080.006
Mice100.50.330.260.025
NoSelPara100.50.330.0530.051
Mice100.50.330.0530.051
MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.725–0.710.9–0.850.830.7250.820.750.0050.7750.0010.775
Mice10.9–0.60.10.90.550.90.40.90.190.90.1170.9
DirectLassoPara10.140.570.430.0080.006
Mice100.50.330.260.025
NoSelPara100.50.330.0530.051
Mice100.50.330.0530.051
Table 2.

Gaussian conditional distributions, continuous |$W$|⁠, medium main effects

MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.675–0.610.9–0.8750.880.7250.860.7250.0110.750.0070.75
Mice10.9–0.60.040.90.520.90.360.90.0210.8750.1270.875
DirectLassoPara0.980.120.550.410.0090.006
Mice100.50.330.020.022
NoSelPara100.50.330.0580.057
Mice100.50.330.0580.057
MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.675–0.610.9–0.8750.880.7250.860.7250.0110.750.0070.75
Mice10.9–0.60.040.90.520.90.360.90.0210.8750.1270.875
DirectLassoPara0.980.120.550.410.0090.006
Mice100.50.330.020.022
NoSelPara100.50.330.0580.057
Mice100.50.330.0580.057
Table 2.

Gaussian conditional distributions, continuous |$W$|⁠, medium main effects

MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.675–0.610.9–0.8750.880.7250.860.7250.0110.750.0070.75
Mice10.9–0.60.040.90.520.90.360.90.0210.8750.1270.875
DirectLassoPara0.980.120.550.410.0090.006
Mice100.50.330.020.022
NoSelPara100.50.330.0580.057
Mice100.50.330.0580.057
MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.675–0.610.9–0.8750.880.7250.860.7250.0110.750.0070.75
Mice10.9–0.60.040.90.520.90.360.90.0210.8750.1270.875
DirectLassoPara0.980.120.550.410.0090.006
Mice100.50.330.020.022
NoSelPara100.50.330.0580.057
Mice100.50.330.0580.057
Table 3.

Gaussian conditional distributions, discrete |$W$|⁠, medium main effects

MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.675–0.610.9-0.8750.890.7250.870.7250.0130.750.0110.775
Nonpara10.875–0.610.9-0.87510.87510.8750.0170.90.0130.9
Mice10.9–0.60.140.90.570.90.420.90.0150.8750.0180.875
DirectLassoPara0.990.130.560.420.0060.004
Nonpara10.110.550.410.0040.003
Mice100.50.330.0150.022
NoSelPara100.50.330.0530.051
Nonpara100.50.330.0530.051
Mice100.50.330.0530.051
MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.675–0.610.9-0.8750.890.7250.870.7250.0130.750.0110.775
Nonpara10.875–0.610.9-0.87510.87510.8750.0170.90.0130.9
Mice10.9–0.60.140.90.570.90.420.90.0150.8750.0180.875
DirectLassoPara0.990.130.560.420.0060.004
Nonpara10.110.550.410.0040.003
Mice100.50.330.0150.022
NoSelPara100.50.330.0530.051
Nonpara100.50.330.0530.051
Mice100.50.330.0530.051
Table 3.

Gaussian conditional distributions, discrete |$W$|⁠, medium main effects

MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.675–0.610.9-0.8750.890.7250.870.7250.0130.750.0110.775
Nonpara10.875–0.610.9-0.87510.87510.8750.0170.90.0130.9
Mice10.9–0.60.140.90.570.90.420.90.0150.8750.0180.875
DirectLassoPara0.990.130.560.420.0060.004
Nonpara10.110.550.410.0040.003
Mice100.50.330.0150.022
NoSelPara100.50.330.0530.051
Nonpara100.50.330.0530.051
Mice100.50.330.0530.051
MethodPenaltyImpute|${w=1}$||${\pi}$||${w=0 }$||${\pi}$||${w=1/2}$||${\pi}$||${w=1/3}$||${\pi}$|Best|${\pi}$|Best|${\pi}$|
(Sensitivity) (Specificity)     Misclas ModErr 
BI-SSLassoPara10.675–0.610.9-0.8750.890.7250.870.7250.0130.750.0110.775
Nonpara10.875–0.610.9-0.87510.87510.8750.0170.90.0130.9
Mice10.9–0.60.140.90.570.90.420.90.0150.8750.0180.875
DirectLassoPara0.990.130.560.420.0060.004
Nonpara10.110.550.410.0040.003
Mice100.50.330.0150.022
NoSelPara100.50.330.0530.051
Nonpara100.50.330.0530.051
Mice100.50.330.0530.051
  1. All variable selection methods show very high sensitivity (⁠|$w=1$|⁠) in retaining the interaction markers in all of the settings, except in some cases (e.g. in Setting 3; see supplementary material Table S1 available at Biostatistics online) for the direct selection method with parametric imputation method.

  2. The BI-SS with LASSO method outperforms the direct selection method in feature selection performance (⁠|$Score_w$| values) such as specificity (when |$w=0$|⁠), and also in the case of other weights |$w<1$| as well. This was seen in every setting (Tables 13 and supplementary material Tables S1 and S2 available at Biostatistics online). The no selection method retains every marker in the model and thus has specificity 0 in all settings.

  3. The BI-SS procedure with the parametric imputation method outperforms the BI-SS procedure with imputation using MICE package in every setting, both in feature selection performance |$Score_w$| for all |$w<1$|⁠, as well as for model and misclassification errors (Tables 13 and supplementary material Tables S1 and S2 available at Biostatistics online).

  4. The same trend as in three is seen when we compare the parametric imputation method and MICE with the direct selection procedure.

  5. The direct selection method with the parametric imputation method and the corresponding BI-SS procedure consistently achieve the lowest misclassification and model error rates, and their performance remains similar with BI-SS performing slightly better than the direct method in some settings (e.g. Settings 1 and 5; see Table 1 and supplementary material Table S2 available at Biostatistics online), and vice versa in the others (e.g. Settings 2 and 4; see Tables 2 and 3).

  6. The no selection procedure is clearly the worst performing method, with inflated misclassification and model error rates compared to the others, showing the importance of marker selection in this context (Tables 13 and supplementary material Tables S1 and S2 available at Biostatistics online).

  7. The nonparametric imputation method, used with BI-SS and the direct selection method in Settings 4 and 5 when |$W$| is discrete, performs quite well: for example, in Setting 4, it achieves better |$Score_w$| performance with BI-SS than the parametric method both when |$w=1/2$| and |$w=1/3$| (see Table 3). However, this method can sometimes show inflated misclassification error, for example, in Setting 4, it achieves the highest misclassification rate out of the three.

  8. In Setting 5, when the biomarker values were exponentially distributed, the parametric imputation of the missing potential marker values were conducted using both (i) the correctly specified parametric model based on exponential distribution (“Exp”), and (ii) the misspecified parametric model based on Gaussian distribution (“Gauss”). All methods (including the one based on the misspecified parametric model) perform quite well with high |$Score_w$| values. The correctly specified parametric model dominates all others in terms of model error rates, and does slightly better than MICE and the misspecified parametric model in misclassification rates, along with the nonparametric method (see supplementary material Table S2 available at Biostatistics online).

6. Real data analysis

We apply our proposed methodology to data from an HIV vaccine trial (RV144). The RV144 trial (n = 16402, randomized with 8200 on placebo and 8202 on canarypox vector vaccine [ALVAC-HIV vCP1521] boosted with gp120 AIDSVAX B/E vaccine) showed an estimated vaccine efficacy of 31.2% for the prevention of HIV-1 infection over 42 months (Rerks-Ngarm and others, 2009). A follow-up immune correlates study assessed vaccine-induced immune responses at peak immunogenicity in 41 vaccinees who acquired HIV-1 infection as compared with 205 vaccinees who did not acquire infection. Among others, six primary variables and eight secondary variables were of particular interest to study (Haynes and others, 2012). However, the RV144 trial does not have a CPV; nor are we aware of any existing trial dataset with a CPV component. Still, to further the discussion about scope of our approach and to plan for future BIP + CPV designs, we have used the available data from RV144 and synthetically produced a CPV component. We consider all six primary and seven secondary (one was removed for collinearity) variables to construct the list of potential biomarkers of interest (see supplementary material Appendix C available at Biostatistics online). Sex (two levels) and baseline self-reported behavioral risk factors (three levels) are used to create |$W$|⁠, a BIP with six levels.

To synthetically create the CPV component, we assume a normal model for |$P(\textbf{S}|Z=0,Y=0,W=W_j)=N_{13}(\mu_{W_j},\Sigma)$|⁠. Since the goal of our proposed method is to identify markers with high surrogacy from a list of many markers, our synthetic CPV component needs to be simulated in a way that allows for one or more of the biomarkers considered in the risk model (supplementary material Appendix C available at Biostatistics online) to be identified as principal surrogates. In Haynes and others (2012), biomarkers “IgA antibodies binding to Env” and “gp70-V1V2 binding” were identified as correlates of risk for HIV, and we generate the synthetic CPV component assuming that these two biomarkers (so, |$J_0=2$|⁠) are also the true correlates of protection that modify the vaccine’s effect on infection. In particular, parameters |$\mu_{W_j}$| for each level |$W:=W_j$| are chosen in a way such that the infection risk conditional on potential biomarker values and intervention status follows a logistic model (as in A4), where nonzero interactions exist only between the above-mentioned biomarkers and the infection status (details can be found in supplementary material Appendix D available at Biostatistics online). The estimation methods follow similarly as before, except that inverse probability weighting (Horvitz and Thompson, 1952) was applied during LASSO to account for the case-control sampling. The chosen risk models for the different approaches (BI-SS, direct selection, and no selection with parametric, nonparametric, and MICE imputation methods) are analyzed and the feature selection performance of the first two (BI-SS and direct) are measured (in their ability to retain the two aforementioned biomarkers and to screen out the rest) at different values of |$\pi$| within the proposed range of 0.6–0.9. A five-fold cross validation is then performed on the observed data, where in each iteration the competing methods are used to estimate the risk model from the four training folds and its predictions are compared with the observed disease levels in the remaining fold to calculate the cross-validated misclassification error of each method at the given values of |$\pi$|⁠. For each measure, we also calculate the 95% bootstrap confidence intervals, based on 100 bootstrap samples of the observed data.

The results of the analysis are presented in Tables 4 (feature selection) and 5 (prediction). For feature selection, we consider IgA and gp70-V1V2 to be the only “true” correlates of protection desired to be selected among the 13 biomarkers. As can be seen from Table 4, the BI-SS procedures with parametric and nonparametric imputation perform quite well in this regard. For |$w=1$|⁠, the BI-SS procedure with each of the three imputation methods yields a perfect (feature selection) score of 1 (meaning both IgA and gp70-V1V2 were correctly chosen by each) for |$\pi=0.6/0.75$|⁠, while only the parametric method achieved a score of 1 for |$\pi=0.9$|⁠. For |$w=0$|⁠, the BI-SS procedure with the nonparametric imputation yields the highest score (0.91), followed by BISS with parametric imputation, with a score 0.82, both for |$\pi=0.9$|⁠. For |$0<w<1$|⁠, the performance of different imputation methods combined with the BI-SS procedure decreases in the following order: parametric, then nonparametric, and then MICE. Also, performance of the BI-SS procedure (with each of the three imputation methods) is never worse than their direct selection counterparts, and for optimal values of |$\pi$| it performs much better the direct method. The cross-validated misclassification error (Table 5) are also the lowest for BI-SS with parametric imputation at |$\pi=0.9$|⁠, followed by at |$\pi=0.75$|⁠. The BI-SS procedures with nonparametric imputation and MICE perform poorly for |$\pi=0.9$| (even worse than when we do not perform any marker selection), most probably because, as their sensitivity scores show in Table 4, they missed picking up one of the two signals more often than not. The No Selection method yields similar results under each imputation method.

Table 4.

Feature selection performance for each method in the RV144 trial (with 95% bootstrap confidence intervals)

MethodImpute|${\pi}$||${w=1}$||${w=0 }$||${w=1/2}$||${w=1/3}$|
(Sensitivity)(Specificity)  
BI-SSPara0.61 (0.5–1.0)0.55 (0.36–0.73)0.77 (0.48–0.86)0.70 (0.47–0.82)
0.751 (0.5–1.0)0.73 (0.55–0.91)0.86 (0.57–0.95)0.82 (0.59–0.94)
0.91 (0.0–1.0)0.82 (0.73–1.00)0.91 (0.41–1.00)0.88 (0.55–1.00)
Nonpara0.61 (0.5–1.0)0.45 (0.36–0.73)0.73 (0.43–0.82)0.64 (0.41–0.76)
0.751 (0.5–1.0)0.73 (0.55–0.91)0.86 (0.52–0.93)0.82 (0.53–0.90)
0.90.5 (0.0–1.0)0.91 (0.73–1.00)0.70 (0.41–1.00)0.77 (0.55–1.00)
Mice0.61 (0.5–1.0)0.09 (0.09–0.30)0.55 (0.53–0.65)0.39 (0.38–0.53)
0.751 (0.5–1.0)0.36 (0.19–0.55)0.68 (0.34–0.77)0.58 (0.29–0.70)
0.90.5 (0.5–1.0)0.64 (0.45–0.82)0.57 (0.48–0.86)0.59 (0.47–0.82)
DirectPara0.5 (0.0–0.5)0.55 (0.45–0.82)0.53 (0.27–0.66)0.53 (0.36–0.71)
Nonpara0.5 (0.0–0.5)0.45 (0.12–0.64)0.48 (0.21–0.58)0.47 (0.23–0.55)
Mice1 (0.5–1)0.09 (0.00–0.36)0.55 (0.25–0.64)0.39 (0.17–0.51)
MethodImpute|${\pi}$||${w=1}$||${w=0 }$||${w=1/2}$||${w=1/3}$|
(Sensitivity)(Specificity)  
BI-SSPara0.61 (0.5–1.0)0.55 (0.36–0.73)0.77 (0.48–0.86)0.70 (0.47–0.82)
0.751 (0.5–1.0)0.73 (0.55–0.91)0.86 (0.57–0.95)0.82 (0.59–0.94)
0.91 (0.0–1.0)0.82 (0.73–1.00)0.91 (0.41–1.00)0.88 (0.55–1.00)
Nonpara0.61 (0.5–1.0)0.45 (0.36–0.73)0.73 (0.43–0.82)0.64 (0.41–0.76)
0.751 (0.5–1.0)0.73 (0.55–0.91)0.86 (0.52–0.93)0.82 (0.53–0.90)
0.90.5 (0.0–1.0)0.91 (0.73–1.00)0.70 (0.41–1.00)0.77 (0.55–1.00)
Mice0.61 (0.5–1.0)0.09 (0.09–0.30)0.55 (0.53–0.65)0.39 (0.38–0.53)
0.751 (0.5–1.0)0.36 (0.19–0.55)0.68 (0.34–0.77)0.58 (0.29–0.70)
0.90.5 (0.5–1.0)0.64 (0.45–0.82)0.57 (0.48–0.86)0.59 (0.47–0.82)
DirectPara0.5 (0.0–0.5)0.55 (0.45–0.82)0.53 (0.27–0.66)0.53 (0.36–0.71)
Nonpara0.5 (0.0–0.5)0.45 (0.12–0.64)0.48 (0.21–0.58)0.47 (0.23–0.55)
Mice1 (0.5–1)0.09 (0.00–0.36)0.55 (0.25–0.64)0.39 (0.17–0.51)
Table 4.

Feature selection performance for each method in the RV144 trial (with 95% bootstrap confidence intervals)

MethodImpute|${\pi}$||${w=1}$||${w=0 }$||${w=1/2}$||${w=1/3}$|
(Sensitivity)(Specificity)  
BI-SSPara0.61 (0.5–1.0)0.55 (0.36–0.73)0.77 (0.48–0.86)0.70 (0.47–0.82)
0.751 (0.5–1.0)0.73 (0.55–0.91)0.86 (0.57–0.95)0.82 (0.59–0.94)
0.91 (0.0–1.0)0.82 (0.73–1.00)0.91 (0.41–1.00)0.88 (0.55–1.00)
Nonpara0.61 (0.5–1.0)0.45 (0.36–0.73)0.73 (0.43–0.82)0.64 (0.41–0.76)
0.751 (0.5–1.0)0.73 (0.55–0.91)0.86 (0.52–0.93)0.82 (0.53–0.90)
0.90.5 (0.0–1.0)0.91 (0.73–1.00)0.70 (0.41–1.00)0.77 (0.55–1.00)
Mice0.61 (0.5–1.0)0.09 (0.09–0.30)0.55 (0.53–0.65)0.39 (0.38–0.53)
0.751 (0.5–1.0)0.36 (0.19–0.55)0.68 (0.34–0.77)0.58 (0.29–0.70)
0.90.5 (0.5–1.0)0.64 (0.45–0.82)0.57 (0.48–0.86)0.59 (0.47–0.82)
DirectPara0.5 (0.0–0.5)0.55 (0.45–0.82)0.53 (0.27–0.66)0.53 (0.36–0.71)
Nonpara0.5 (0.0–0.5)0.45 (0.12–0.64)0.48 (0.21–0.58)0.47 (0.23–0.55)
Mice1 (0.5–1)0.09 (0.00–0.36)0.55 (0.25–0.64)0.39 (0.17–0.51)
MethodImpute|${\pi}$||${w=1}$||${w=0 }$||${w=1/2}$||${w=1/3}$|
(Sensitivity)(Specificity)  
BI-SSPara0.61 (0.5–1.0)0.55 (0.36–0.73)0.77 (0.48–0.86)0.70 (0.47–0.82)
0.751 (0.5–1.0)0.73 (0.55–0.91)0.86 (0.57–0.95)0.82 (0.59–0.94)
0.91 (0.0–1.0)0.82 (0.73–1.00)0.91 (0.41–1.00)0.88 (0.55–1.00)
Nonpara0.61 (0.5–1.0)0.45 (0.36–0.73)0.73 (0.43–0.82)0.64 (0.41–0.76)
0.751 (0.5–1.0)0.73 (0.55–0.91)0.86 (0.52–0.93)0.82 (0.53–0.90)
0.90.5 (0.0–1.0)0.91 (0.73–1.00)0.70 (0.41–1.00)0.77 (0.55–1.00)
Mice0.61 (0.5–1.0)0.09 (0.09–0.30)0.55 (0.53–0.65)0.39 (0.38–0.53)
0.751 (0.5–1.0)0.36 (0.19–0.55)0.68 (0.34–0.77)0.58 (0.29–0.70)
0.90.5 (0.5–1.0)0.64 (0.45–0.82)0.57 (0.48–0.86)0.59 (0.47–0.82)
DirectPara0.5 (0.0–0.5)0.55 (0.45–0.82)0.53 (0.27–0.66)0.53 (0.36–0.71)
Nonpara0.5 (0.0–0.5)0.45 (0.12–0.64)0.48 (0.21–0.58)0.47 (0.23–0.55)
Mice1 (0.5–1)0.09 (0.00–0.36)0.55 (0.25–0.64)0.39 (0.17–0.51)
Table 5.

Cross-validated misclassification errors in prediction from the risk model estimated by each method in the RV144 trial (with 95% bootstrap confidence intervals)

 Method
ImputeBISS w LASSODirect w LASSONoSel
|${\pi }$|=0.6|${\pi}$|=0.75|${\pi}$|=0.9
Para0.00610.00580.00570.00650.0064
(0.0047–0.0071)(0.0047–0.0075)(0.0047–0.0074)(0.0053–0.0082)(0.0046–0.0072)
Nonpara0.00640.00620.00690.00670.0064
(0.0050–0.0082)(0.0050–0.0079)(0.0051–0.0081)(0.0051–0.0081)(0.0045–0.0070)
Mice0.00680.00590.00730.00640.0063
(0.0049–0.0078)(0.0047–0.0076)(0.0051–0.0084)(0.0050–0.0075)(0.0046–0.0072)
 Method
ImputeBISS w LASSODirect w LASSONoSel
|${\pi }$|=0.6|${\pi}$|=0.75|${\pi}$|=0.9
Para0.00610.00580.00570.00650.0064
(0.0047–0.0071)(0.0047–0.0075)(0.0047–0.0074)(0.0053–0.0082)(0.0046–0.0072)
Nonpara0.00640.00620.00690.00670.0064
(0.0050–0.0082)(0.0050–0.0079)(0.0051–0.0081)(0.0051–0.0081)(0.0045–0.0070)
Mice0.00680.00590.00730.00640.0063
(0.0049–0.0078)(0.0047–0.0076)(0.0051–0.0084)(0.0050–0.0075)(0.0046–0.0072)
Table 5.

Cross-validated misclassification errors in prediction from the risk model estimated by each method in the RV144 trial (with 95% bootstrap confidence intervals)

 Method
ImputeBISS w LASSODirect w LASSONoSel
|${\pi }$|=0.6|${\pi}$|=0.75|${\pi}$|=0.9
Para0.00610.00580.00570.00650.0064
(0.0047–0.0071)(0.0047–0.0075)(0.0047–0.0074)(0.0053–0.0082)(0.0046–0.0072)
Nonpara0.00640.00620.00690.00670.0064
(0.0050–0.0082)(0.0050–0.0079)(0.0051–0.0081)(0.0051–0.0081)(0.0045–0.0070)
Mice0.00680.00590.00730.00640.0063
(0.0049–0.0078)(0.0047–0.0076)(0.0051–0.0084)(0.0050–0.0075)(0.0046–0.0072)
 Method
ImputeBISS w LASSODirect w LASSONoSel
|${\pi }$|=0.6|${\pi}$|=0.75|${\pi}$|=0.9
Para0.00610.00580.00570.00650.0064
(0.0047–0.0071)(0.0047–0.0075)(0.0047–0.0074)(0.0053–0.0082)(0.0046–0.0072)
Nonpara0.00640.00620.00690.00670.0064
(0.0050–0.0082)(0.0050–0.0079)(0.0051–0.0081)(0.0051–0.0081)(0.0045–0.0070)
Mice0.00680.00590.00730.00640.0063
(0.0049–0.0078)(0.0047–0.0076)(0.0051–0.0084)(0.0050–0.0075)(0.0046–0.0072)

7. Discussion

In this article, we proposed methodology for estimating the risk model conditional on multiple vaccine-induced immune response biomarkers in order to identify principal surrogate endpoints that modifies a vaccine’s protective effect. To counteract the issue of missing potential marker values, we propose to use the BIP design augmented with a CPV group following the recommendation of Follmann (2006), and develop imputation methods to impute the missing potential marker values and perform biomarker selection through stepwise resampling and imputation.

Based on the numerical studies conducted in this article, clearly the BI-SS procedure with parametric imputation is the best performing method overall. We also showed that MICE fails to correctly impute the missing potential marker values more often than not, as it fails to utilize the randomization condition in (2.2) needed for their correct imputation in our special problem setting. This is also shown by the improved performance of the parametric and the nonparametric imputation methods over MICE.

As discussed in Long and Johnson (2015), usually there is no perfect value for the tuning parameter |$\pi$| in BI-SS, but values between 0.6 and 0.9 are considered good. Here, in most of our simulation settings, the optimal |$\pi$| fell in the proposed range (0.6–0.9), so the performance of the proposed estimator seemed to be relatively robust within this interval. Also, note that these different metrics behave differently with |$\pi$| (e.g. specificity increases with |$\pi$| while sensitivity decreases with |$\pi$|⁠), so finding an optimal |$\pi$| depends on the specific interest of the researcher.

The parametric method used in our simulations was based either on (i) Gaussian assumption or (ii) Exponential assumption on the conditional marker distributions, although it can be potentially implemented with any underlying parametric assumption. We studied one example in our simulation section (Setting 5) that considers a misspecified parametric model for imputation. As seen there, the imputation performance using a misspecified Gaussian model when the true model is exponential does not yield a very dramatic drop in performance than when we used the correct parametric model, suggesting the common Gaussian working model is reasonably robust. When baseline predictors |$\textbf{W}$| are discrete, our proposed nonparametric method to impute the missing potential marker values provides a valid alternative.

Lastly, note that Assumption A4 is not necessary for identifying disease risk model conditional on |$\textbf{S}(1)$| in a BIP+CPV design. The GLM model assumption is important for model estimation in a BIP-only design and has been commonly assumed for principal surrogates evaluation in vaccine research. We also adopt it here mainly for two reasons: (i) it is compatible with the parametric imputation of the missing |$\textbf{S}(1)$| values in infected placebo recipients using our parametric method; and (ii) it allows us to use commonly used and efficient feature selection algorithms such as LASSO.

8. Supplementary Material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Acknowledgments

The authors also thank the participants, investigators, and sponsors of the RV144 trials. Conflict of Interest: None declared.

Funding

National Institutes of Health (R01 GM106177-01 and 2R37AI05465-10).

References

Donoho,
D. L.
(
2000
).
High-dimensional data analysis: the curses and blessings of dimensionality
.
AMS Math Challenges Lecture
32
.

Fan,
J.
and
Li,
R.
(
2001
).
Variable selection via nonconcave penalized likelihood and its oracle properties
.
Journal of the American Statistical Association
96
,
1348
1360
.

Fleming,
T. R.
and
DeMets,
D. L.
(
1996
).
Surrogate endpoints in clinical trials: are we being misled?
Annals of Internal Medicine
125
,
605
613
.

Follmann,
D. A.
(
2006
).
Augmented designs to assess immune response in vaccine trials
.
Biometrics
62
,
1161
1169
.

Gilbert,
P. B.
and
Hudgens,
M. G.
(
2008
).
Evaluating candidate principal surrogate endpoints
.
Biometrics
64
,
1146
1154
.

Haynes,
B. F.
,
Gilbert,
P. B.
,
McElrath,
M. J.
,
Zolla-Pazner,
S.
,
Tomaras,
G. D.
,
Alam,
S. M.
,
Evans,
D. T.
,
Montefiori,
D. C.
,
Karnasuta,
C.
,
Sutthent,
R. L.
, and others (
2012
).
Immune-correlates analysis of an HIV-1 vaccine efficacy trial
.
New England Journal of Medicine
366
,
1275
1286
.

Horvitz,
D. G.
and
Thompson,
D. J.
(
1952
).
A generalization of sampling without replacement from a finite universe
.
Journal of the American statistical Association
47
,
663
685
.

Huang,
Y.
(
2018
).
Evaluating principal surrogate markers in vaccine trials in the presence of multiphase sampling
.
Biometrics
74
,
27
39
.

Huang,
Y.
and
Gilbert,
P. B.
(
2011
).
Comparing biomarkers as principal surrogate endpoints
.
Biometrics
67
,
1442
1451
.

Long,
Q.
and
Johnson,
B. A.
(
2015
).
Variable selection in the presence of missing data: resampling and imputation
.
Biostatistics
16
,
596
610
.

Rerks-Ngarm,
S.
,
Pitisuttithum,
P.
,
Nitayaphan,
S.
,
Kaewkungwal,
J.
,
Chiu,
J.
,
Paris,
R.
,
Premsri,
N.
,
Namwat,
C.
,
de Souza,
M.
,
Adams,
E.
, and others. (
2009
).
Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand
.
New England Journal of Medicine
361
,
2209
2220
.

Weir,
C. J.
and
Walley,
R. J.
(
2006
).
Statistical evaluation of biomarkers as surrogate endpoints: a literature review
.
Statistics in Medicine
25
,
183
203
.

Wolfson,
J.
and
Gilbert,
P. B.
(
2010
).
Statistical identifiability and the surrogate endpoint problem, with application to vaccine trials
.
Biometrics
66
,
1153
1161
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data