Summary

Nonparametric covariate adjustment is considered for log-rank-type tests of the treatment effect with right-censored time-to-event data from clinical trials applying covariate-adaptive randomization. Our proposed covariate-adjusted log-rank test has a simple explicit formula and a guaranteed efficiency gain over the unadjusted test. We also show that our proposed test achieves universal applicability in the sense that the same formula of test can be universally applied to simple randomization and all commonly used covariate-adaptive randomization schemes such as the stratified permuted block and the Pocock–Simon minimization, which is not a property enjoyed by the unadjusted log-rank test. Our method is supported by novel asymptotic theory and empirical results for Type-I error and power of tests.

1 Introduction

In clinical trials, adjusting for baseline covariates has been widely advocated as a way to improve efficiency for demonstrating treatment effects ‘under approximately the same minimal statistical assumptions that would be needed for unadjusted estimation’ (ICH E9, 1998; EMA, 2015; FDA, 2023). In testing for an effect between two treatments with right-censored time-to-event outcomes, adjusting for covariates using the Cox proportional hazards model has been demonstrated to yield valid tests even if the Cox model is misspecified (Lin & Wei, 1989; Kong & Slud, 1997; DiRienzo & Lagakos, 2002). However, these tests may be less powerful than the log-rank test that does not adjust for any covariates when the Cox model is misspecified (Kong & Slud, 1997). Although efforts have been made to improve the efficiency of the log-rank test through covariate adjustment from semiparametric theory (Lu & Tsiatis, 2008; Moore & van der Laan, 2009), the solutions are complicated and their validity is established only under simple randomization, i.e., treatments are assigned to patients completely at random.

To balance the number of patients in each treatment arm across baseline prognostic factors in clinical trials with sequentially arrived patients, covariate-adaptive randomization has become the new norm. From 1989 to 2008, covariate-adaptive randomization was used in more than 500 clinical trials (Taves, 2010); among nearly 300 trials published in two years, 2009 and 2014, 237 of them applied covariate-adaptive randomization (Ciolino et al., 2019). The two most popular covariate-adaptive randomization schemes are the stratified permuted block (Zelen, 1974) and the Pocock–Simon minimization (Taves, 1974; Pocock & Simon, 1975). Other schemes can be found in the reviews of Schulz & Grimes (2002) and Shao (2021). Unlike simple randomization, covariate-adaptive randomization generates a dependent sequence of treatment assignments, which may render conventional methods developed under simple randomization not necessarily valid under covariate-adaptive randomization (EMA, 2015; FDA, 2023). For time-to-event data under covariate-adaptive randomization, Ye & Shao (2020) showed that some conventional tests including the log-rank test are conservative and Wang et al. (2023) showed that the Kaplan–Meier estimator of the survival function has reduced variance compared to that under simple randomization.

The discussion so far has brought up two issues in adjusting for covariates. First is the need for guaranteed efficiency gains over unadjusted methods, without requiring additional assumptions. Second is the need for methods with wide applicability to all commonly used covariate-adaptive randomizations. These issues have been well addressed when adjustments are made under linear working models for non-time-to-event data (Tsiatis et al., 2008; Zhang et al., 2008; Lin, 2013; Ye et al., 2022). Ye et al. (2022) also showed that adjustment via linear working models can achieve universal applicability in the sense that the same inference procedure can be universally applied to all commonly used covariate-adaptive randomization schemes, a desirable property for application. For right-censored time-to-event outcomes, to the best of our knowledge, no result has been established for covariate adjustment with guaranteed efficiency gain and universal applicability.

In this paper we propose a nonparametric covariate adjustment method for the log-rank test, which has a simple explicit form and can achieve the goal of guaranteed efficiency gain over the unadjusted log-rank test as well as universal applicability to simple randomization and all commonly used covariate-adaptive randomization schemes. The unadjusted log-rank test is not valid under covariate-adaptive randomization; although it can be modified to be applicable to some randomization schemes (Ye & Shao, 2020), the modification needs to be tailored to each randomization scheme, i.e., no universal applicability. Our main idea is to obtain a particular derived outcome for each patient from linearizing the log-rank test statistic and then apply the generalized regression adjustment or augmentation (Cassel et al., 1976; Lu & Tsiatis, 2008; Tsiatis et al., 2008; Zhang et al., 2008) to the derived outcomes. We also develop parallel results for the stratified log-rank test with adjustment for additional covariates. Our proposed tests are supported by novel asymptotic theory of the existing and proposed statistics under the null hypothesis and alternative without requiring any specific model assumption, and under all commonly used covariate-adaptive randomization schemes. Estimation and confidence intervals for treatment effects after testing are also discussed. Our theoretical results are corroborated by a simulation study that examines finite sample Type-I error and power of tests. A real data example is included for illustration.

2 Preliminaries

For a patient from the population under investigation, let Tj and Cj be the potential failure time and right-censoring time, respectively, under treatment j = 0 or 1, and W be a vector containing all observed baseline covariates. Suppose that a random sample of n patients is obtained from the population with independent (Ti0,Ci0,Ti1,Ci1,Wi) (i=1,,n), identically distributed as (T0,C0,T1,C1,W). For each patient, only one of the two treatments is received. Thus, if patient i receives treatment j, then the observed outcome with possible right censoring is {min(Tij,Cij),δij}, where δij is the indicator of the event Tij  Cij.

Let Ii be a binary treatment indicator for patient i and 0<π<1 be the prespecified treatment assignment proportion for treatment 1. Consider the design, i.e., the generation of the Ii for n sequentially arrived patients. Simple randomization assigns patients to treatments completely at random with pr(Ii=1)=π for all i, which does not make use of baseline covariates and may yield treatment proportions that substantially deviate from the target π across levels of some prognostic factors. Because of this, covariate-adaptive randomization using a subvector Z of W is widely applied, which does not use any model and is nonparametric. All commonly used covariate-adaptive randomization schemes satisfy the following mild condition (Baldi Antognini & Zagoraiou, 2015).

 
Condition 1.

The covariate Z for which we want to balance in treatment assignment is an observed discrete baseline covariate with finitely many joint levels; conditioned on (Zi,i=1,,n),(Ii,i=1,,n) is conditionally independent of (Ti1,Ci1,Ti0,Ci0,Wi,i=1,,n); E(Ii|Z1,,Zn)=π for all i and, for every level z of Z, nz1/nzπ in probability as n, where nz is the number of patients with Zi = z and nz1 is the number of patients with Zi = z and Ii = 1.

Although simple randomization is not counted as covariate-adaptive randomization, it also satisfies Condition 1.

We focus on testing the following null hypothesis of the no-treatment effect, which is the null hypothesis when the conventional log-rank test is applied: H0:λ1(t)=λ0(t) for all times t, versus the alternative that H  0 does not hold, where λj(t) is the unspecified hazard function of Tj, unconditional on covariates.

After data are collected from all patients, a test statistic T is a function of observed data, constructed such that H  0 is rejected if and only if |T|>zα/2, where α is a given significance level and zα/2 is the (1α/2)th quantile of the standard normal distribution. A test T is asymptotically valid if, under H  0, limnpr(|T|>zα/2)  α, with equality holding for at least one parameter value under the null hypothesis H  0. A test T is asymptotically conservative if, under H  0, there exists an α  0 such that limnpr(|T|>zα/2)  α0<α.

The test statistic of the log-rank test is
(1)
(Mantel, 1966; Kalbfleisch & Prentice, 2011), where
with Yij(t) the indicator of the event min(Tij,Cij)  t,Ni(t)=IiNi1(t)+(1Ii)Ni0(t) the counting process of observed failures, Nij(t) the indicator of the event Tij  min(t,Cij), and the upper limit τ in the integral a point satisfying pr{min(Tij,Cij)  τ}>0 for j = 0, 1.

The log-rank test TL in (1) is valid under simple randomization and the following assumption.

 
Assumption 1.

We have CITI|I, where I is the treatment indicator, denotes independence and the vertical line denotes conditioning.

Assumption 1 is needed for a valid nonparametric log-rank test without requiring any model on Tj or Cj (Kong & Slud, 1997; DiRienzo & Lagakos, 2002; Lu & Tsiatis, 2008; Parast et al., 2014; Zhang, 2015).

As TL does not utilize any baseline covariate information, it is used as the benchmark in considering baseline covariate adjustment for efficiency gain, under the same Assumption 1 ‘that would be needed for unadjusted’ TL (FDA, 2023).

There is a line of research weakening Assumption 1 to censoring at random (Robins & Finkelstein, 2000; Lu & Tsiatis, 2011; Díaz et al., 2019), under which, however, the log-rank test is not valid and needs to be replaced by a weighted log-rank test that requires a correctly specified censoring distribution as the weights are inverse probabilities of censoring. Thus, the conditions and properties of weighted log-rank tests are not comparable with those of the log-rank test. Furthermore, the validity of weighted log-rank tests has only been established under simple randomization. The study of weighted log-rank tests under covariate-adaptive randomization is left for future work.

3 Covariate-adjusted log-rank test

Let XW be the vector containing observed baseline covariates to be adjusted in the construction of tests, with a nonsingular covariance matrix ΣX=var(X). In this section, we develop a nonparametric covariate-adjusted log-rank test that has a simple and explicit formula, enjoys guaranteed efficiency gain over the log-rank test and is universally valid under all covariate-adaptive randomization schemes satisfying Condition 1.

To develop our covariate adjustment method, we first consider the following linearization of U^L in (1):
(Lin & Wei, 1989; Ye & Shao, 2020). Here μ(t)=E{Ii|Yi(t)=1},p(t)dt=E{dNi(t)}/E{Yi(t)} and op(1) denotes a term converging to 0 in probability as n. Note that Ulin is an average of random variables that are independent and identically distributed under simple randomization. If we treat the Oij in Ulin as outcomes and apply the generalized regression adjustment or augmentation (Cassel et al., 1976; Tsiatis et al., 2008) then we obtain the covariate-adjusted statistic
(2)
where X¯ is the sample mean of all the Xi, aT is the transpose of vector a and βj=X1cov(Xi,Oij) for j = 0, 1. Because the distribution of baseline covariate Xi is not affected by treatment, the last term on the right-hand side of (2) has mean 0. Under simple randomization, it follows from the theory of generalized regression (Cassel et al., 1976) that var(UClin)  var(Ulin), and thus the covariate adjusted UClin in (2) has a guaranteed efficiency gain over the unadjusted Ulin. This also holds under covariate-adaptive randomization; see Theorem S1 in the Supplementary Material.
To derive our covariate-adjusted procedure, it remains to find appropriate statistics to replace the Oij and βj in (2) because they involve unknown quantities. We consider the following sample analog of Oij:
(3)
with N¯(t)=i=1nNi(t)/n. Using a correct form of O^ij is important, as it captures the true correlation between Oij and Xi; see the discussion after Theorem S1 in the Supplementary Material. Replacing Oij in (2) by the derived outcome O^ij in (3), we obtain the following covariate-adjusted version of U^L:
(4)
The second equality follows from the algebraic identity U^L=n1i=1n{IiO^i1(1Ii)O^i0},
(5)
is a sample analog of βj=X1cov(Xi,Oij) and X¯j is the sample mean of the Xi with Ii = j. By Lemma S1 in the Supplementary Material, β^j in (5) converges to βj in probability, which guarantees that U^CL in (4) reduces the variability of U^L in (1). Thus, we propose the covariate-adjusted log-rank test
(6)
where σ^CL2=σ^L2π(1π)(β^1+β^0)TΣ^X(β^1+β^0), whose form is suggested by σCL2 in Theorem 1, σ^L2 is defined in (1) and Σ^X is the sample covariance matrix of all the Xi.

Asymptotic properties of covariate-adjusted log-rank test TCL in (6) are established in the following theorem. All technical proofs are given in the Supplementary Material. In what follows, d and p, respectively, denote convergence in distribution and in probability, as n.

 
Theorem 1.

Suppose that Condition 1 and  Assumption 1  hold, and that all levels of Zi used in covariate-adaptive randomization are included in Xi as a subvector. Then, the following results hold regardless of which covariate-adaptive randomization scheme is applied.  

  • Under the null H0 or alternative hypothesis,

    whereθj=E(Oij)  , nj is the number of patients in treatment j, σCL2=σL2π(1π)(β1+β0)TΣX(β1+β0)  andσL2=πvar(Oi1)+(1π)var(Oi0).

  • Under the null hypothesis H0,

    i.e., TCL  is valid.

  • Under the local alternative hypothesis thatθj=cjn1/2  with the cj not depending on n and thatλ1(t)/λ0(t)  is bounded and tends to 1 for every t,

The results under an alternative hypothesis in Theorem 1 are obtained without any specific model on the distribution of Tj or Cj, different from many published research articles that assume a specific model under an alternative hypothesis, such as the Cox proportional hazards model for Tj.

Theorem 1 shows that TCL in (6) is applicable to all randomization schemes satisfying Condition 1 with a universal formula, if all levels of Zi are included in Xi. Tests with universal applicability are desirable for application, as the complication of using tailored formulas for different randomization schemes is avoided.

To show that TCL in (6) has a guaranteed efficiency gain over the benchmark TL in (1), we establish an asymptotic result for TL under covariate-adaptive randomization satisfying an additional condition.

 
Condition 1

. As n,n1/2(nz1/nzπ,zZ)T|Z1,,ZndN(0,Ω), where Z is the set containing all levels of Z, Ω is the diagonal matrix whose diagonal entries are ν/pr(Z=z),zZ, and ν  π(1π) is a known constant depending on the randomization scheme.

 
Theorem 2.

Suppose that Conditions 1 and 1    and  Assumption 1 hold. Then the following results hold.  

  • Under the null H0 or alternative hypothesis,

    where nj and θj are given in  Theorem 1, σL2(ν)=σL2{π(1π)ν}var{E(Oi1|Zi)+E(Oi0|Zi)}  for ν given in Condition 1    andσL2  is defined in  Theorem 1.

  • Under the null hypothesis H0,

    Hence, TL  is conservative unlessν=π(1π)  orE(Oi1|Zi)+E(Oi0|Zi)=0  almost surely under H0.

  • Under the local alternative hypothesis in  Theorem 1(c),

Under simple randomization, Condition 1 holds with ν=π(1π) and, hence, Theorem 2 also applies with σL2(ν)=σL2. Under the local alternative specified in Theorem 1(c) with πc1(1π)c00, by Theorems 1(c) and 2(c), Pitman’s asymptotic relative efficiency of TCL in (6) with respect to the benchmark TL in (1) is σL2/σCL2=1+π(1π)(β1+β0)TΣX(β1+β0)/σCL2  1 with the strict inequality holding unless β1+β0=0. Thus, TCL has a guaranteed efficiency gain over TL under simple randomization.

Under covariate-adaptive randomization satisfying Condition 1 with ν<π(1π), Theorem 2(b) shows that TL is not valid, but conservative, as σL2(ν)<σL2 unless E(Oi1|Zi)+E(Oi0|Zi)=0 almost surely under H  0, which holds under some extreme scenarios, e.g., Z used for randomization is independent of the outcome. This conservativeness can be corrected by a multiplication factor r^(ν)pσL/σL(ν) under H  0. The resulting r^(ν)TL is the modified log-rank test in Ye & Shao (2020), which is valid and always more powerful than TL. Under the local alternative specified in Theorem 1(c) with πc1(1π)c00, Pitman’s asymptotic relative efficiency of r^(ν)TL with respect to TCL in (6) is
with the strict inequality holding unless β1+β0=0, e.g., Xi is uncorrelated with Oij, or ν = 0 and E{cov(Xi,Oi1|Zi)+cov(Xi,Oi0|Zi)}=0, e.g., covariates in Xi, but not in Zi, are uncorrelated with Oij conditioned on Zi. Hence, the adjusted TCL has a guaranteed efficiency gain over both the log-rank test TL and modified log-rank test r^(ν)TL under any covariate-adaptive randomization schemes satisfying Conditions 1 and 1.

The Pocock–Simon minimization satisfies Condition 1, but not necessarily Condition 1 as the Ii are correlated across strata. Hence, under the Pocock–Simon minimization, Theorem 2 is not applicable and TL may not be valid, whereas TCL is valid according to Theorem 1, another advantage of covariate adjustment.

In the numerator of (6)  U^CL is the same as the augmented score in Lu & Tsiatis (2008), which shares the same idea as those in Tsiatis et al. (2008) and Zhang et al. (2008) for noncensored data. However, the denominator σ^CL in (6) is different from that used by Lu & Tsiatis (2008). The key difference between our result on guaranteed efficiency gain and the result in Lu & Tsiatis (2008) is that our result is obtained under covariate-adaptive randomization and an alternative hypothesis without any specific model on the distribution of Tj or Cj, whereas the result in Lu & Tsiatis (2008) is for simple randomization and an alternative hypothesis under a correctly specified Cox proportional hazards model for Tj.

After testing H  0, it is often of interest to estimate and construct a confidence interval for an effect size (Lu & Tsiatis, 2008; Parast et al., 2014; Zhang, 2015; Díaz et al., 2019). A commonly considered effect size is the hazard ratio eθ under the Cox proportional hazards model λ1(t)=λ0(t)eθ. The hazard ratio eθ is interpretable only when the Cox proportional hazards model is correctly specified. Thus, in the rest of this section we consider covariate-adjusted estimation and a confidence interval for θ, assuming that λ1(t)=λ0(t)eθ.

Without using any covariate, the score from the partial likelihood under model λ1(t)=λ0(t)eθ is
The maximum partial likelihood estimator θ^L of θ is a solution to U^L(ϑ)=0. Using the idea in (4) with Xi containing all levels of Zi used in covariate-adaptive randomization, our covariate-adjusted score is
where, for j = 0, 1, β^j(ϑ) is equal to β^j in (5) with O^ij replaced by
Solving U^CL(ϑ)=0 gives the covariate-adjusted estimator θ^CL. As U^CL(θ) has reduced variability compared to U^L(θ), and U^CL(ϑ)/ϑ=U^L(ϑ)/ϑ, by a standard argument for M-estimators, θ^CL is guaranteed to have smaller variance than θ^L. It is established in the Supplementary Material that n1/2(θ^CLθ)dN{0,σ2(θ)} under any covariate-adaptive randomization satisfying Condition 1, with σ2(θ) given in Theorem S2 in the Supplementary Material. An asymptotic confidence interval for θ can be obtained based on this result and a consistent estimator of σ2(θ) given by
where g(ϑ)=U^L(ϑ)/ϑ.

4 Covariate-adjusted stratified log-rank test

The stratified log-rank test (Peto et al., 1976) is a weighted average of the stratum-specific log-rank test statistics with finitely many strata constructed using a discrete baseline covariate. We consider stratification with all levels of Zi. Results can be obtained similarly for stratifying on more levels than those of Zi or fewer levels than those of Zi with levels of Zi not used in stratification included in Xi. Here, we remove the part of Xi that can be linearly represented by Zi and still denote the remaining as Xi. As such, it is reasonable to assume that E{var(Xi|Zi)} is positive definite.

The stratified log-rank test using levels of Zi as strata is
(7)
where

Y¯z1(t)=i:Zi=zIiYi(t)/n,Y¯z0(t)=i:Zi=z(1Ii)Yi(t)/n and Y¯z(t)=Y¯z1(t)+Y¯z0(t).

With stratification, TSL in (7) actually tests the null hypothesis H˜0:λ1(t|z)=λ0(t|z) for all (t, z), where λj(t|z) is the hazard function of Tj conditional on Z = z. Hypothesis H˜0 may be stronger than H0:λ1(t)=λ0(t) for all t, the null hypothesis for unstratified log-rank test TL and its adjustment TCL considered in § 2–§ 3. In some scenarios, H˜0=H0. For example, the two hypotheses are the same when there exists a transformation model h{pr(T0  t|W)}=θ+h{pr(T1  t|W)} for all (t, W) and an unknown constant θ, where h is an increasing function that is possibly unknown (Cheng et al., 1995). This transformation model includes many commonly used semiparametric models as special cases, for example the Cox proportional hazards model with h(s)=log{log(s)}.

To further adjust for baseline covariate Xi, we still linearize U^SL as (Ye & Shao, 2020)
where
with
Following the same idea as in § 3, we apply the generalized regression adjustment by using
as derived outcomes, where N¯z(t)=i:Zi=zNi(t)/n. The resulting covariate-adjusted version of U^SL is
where the second equality follows from the algebraic identity U^SL=n1zi:Zi=z{IiO^zi1(1Ii)O^zi0}, X¯z is the sample mean of the Xi with Zi = z,
converging to a limit value γj in probability, and X¯zj is the sample mean of the Xi with Zi = z and treatment j,j=0,1. Our proposed covariate-adjusted stratified log-rank test is
(8)
where σ^CSL2=σ^SL2π(1π)(γ^1+γ^0)T{z(nz/n)Σ^X|z}(γ^1+γ^0) and Σ^X|z is the sample covariance matrix of the Xi within stratum z.

The following theorem establishes the asymptotic properties of the stratified log-rank test TSL and covariate-adjusted stratified log-rank test TCSL.

 
Theorem 3.

Suppose that Condition 1 holds and thatCITI|(I,Z)  . Then, the following results hold regardless of which covariate-adaptive randomization is applied.  

  • Under the nullH˜0  or alternative hypothesis,

    and the same result holds withU^CSL  andσCSL2  replaced byU^SL  andσSL2  , respectively, whereθzj=E(Ozij|Zi=z)  , nzj is the number of patients with treatment j in stratum z, j = 0, 1, σCSL2=σSL2π(1π)(γ1+γ0)TE{var(Xi|Zi)}(γ1+γ0)  andσSL2=zpr(Zi=z){πvar(Ozi1|Zi=z)+(1π)var(Ozi0|Zi=z)}.

  • Under the null hypothesisH˜0,

    i.e., bothTSL  andTCSL  are valid for testing null hypothesisH˜0.

  • Under the local alternative hypothesis thatθzj=czjn1/2  with the czj not depending on n and thatλ1(t|z)/λ0(t|z)  is bounded and tends to 1 for every t and z,

    and the same result holds withTCSL  andσCSL  replaced byTSL  andσSL  , respectively.

Like TCL in (6), both TSL in (7) and TCSL in (8) are applicable to all covariate-adaptive randomization schemes with universal formulas, i.e., they achieve the universal applicability. In terms of Pitman’s asymptotic efficiency under the local alternative specified in Theorem 3(c), TCSL is always more efficient than TSL, since σCSL2  σSL2with the strict inequality holding unless γ1+γ0=0.

The condition CITI|(I,Z) in Theorem 3 for the stratified log-rank test and its adjustment is in general not comparable with Theorem 1(c) for the unstratified log-rank test.

Is TSL or TCSL more efficient than the unstratified log-rank test TL? The answer is not clear because, firstly, the null hypotheses H˜0 and H  0 may be different, as we discussed earlier, and secondly, even if H˜0=H0, under the alternative, the asymptotic mean (n1θ1n0θ0)/n of U^L may not be comparable with the asymptotic mean z(nz1θz1nz0θz0)/n of U^SL or U^CSL. In fact, the indefiniteness of relative efficiency between the stratified and unstratified log-rank tests is a standing problem in the literature.

There is also no definite answer when comparing the efficiencies of TCL and the stratified TCSL.

Similar to the discussion at the end of § 3, after testing hypothesis H˜0, we can obtain a covariate-adjusted confidence interval for the effect size θ under a stratified Cox proportional hazards model λ1z(t)=λ0z(t)eθ for every z; see the Supplementary Material for further details.

5 Simulations

To supplement the theory and examine finite sample Type-I error and power of tests TL,TCL,TSL and TCSL, we carry out a simulation study under the following four cases/models.

 
Case I.

The conditional hazard function follows a Cox model, λj(t|W)=(log2)exp(θj+ηTW) for j = 0, 1, where θ denotes a scalar parameter, η=(0.5,0.5,0.5)T and W is a three-dimensional covariate vector following the three-dimensional standard normal distribution. The censoring variables C  0 and C  1 follow a uniform distribution on the interval (10, 40) and are independent of W.

 
Case II.

The conditional hazard function is the same as that in Case I. Conditional on W and treatment assignment j, Cj(33j) follows a standard exponential distribution.

 
Case III.

We have Tj=exp(θj+ηTW)+E,j=0,1, where θ, η and W are the same as in Case I, and E is a random variable independent of (C1,C0,W) and has the standard exponential distribution. The setting for censoring is the same as that in Case I.

 
Case IV.

The models for the Tj and Cj are the same as those in Cases III and II, respectively.

In this simulation, the significance level α=5%, the target treatment assignment proportion π=0.5, the overall sample size n = 500, the null hypothesis H0:θ=0, and H˜0=H0 since a transformation model described in § 4 holds in Cases I–IV. Three randomization schemes are considered: simple randomization, stratified permuted block randomization with block size 4 and levels of Z as strata, and the Pocock–Simon minimization assigning a patient with probability 0.8 to the preferred arm minimizing the sum of balance scores over marginal levels of Z, where Z is the two-dimensional vector whose first component is a two-level discretized first component of W and the second component is a three-level discretized second component of W. For stratified log-rank tests, levels of Z are used as strata. For covariate adjustment, X is the vector containing Z and the third component of W for TCL, and X is the third component of W for TCSL.

Based on 10 000 simulations, Type-I error rates for four tests under four cases and three randomization schemes are shown in Table 1. The results agree with our theory. For TCL,TSL and TCSL, there is no substantial difference among the three randomization schemes. The log-rank test TL preserves the 5% rate under simple randomization, but it is conservative under stratified permuted block randomization and minimization.

Table 1

Type-I errors (in percentages) based on 10 000 simulations

CaseRandomizationTLTCLTSLTCSL
ISimple4.915.164.864.78
Permuted block3.255.224.804.85
Minimization3.405.435.025.23
IISimple5.395.145.004.97
Permuted block3.595.034.944.82
Minimization4.015.235.115.28
IIISimple5.075.435.275.16
Permuted block2.294.794.764.82
Minimization2.885.435.235.52
IVSimple5.415.305.395.21
Permuted block4.445.485.105.49
Minimization4.215.185.045.06
CaseRandomizationTLTCLTSLTCSL
ISimple4.915.164.864.78
Permuted block3.255.224.804.85
Minimization3.405.435.025.23
IISimple5.395.145.004.97
Permuted block3.595.034.944.82
Minimization4.015.235.115.28
IIISimple5.075.435.275.16
Permuted block2.294.794.764.82
Minimization2.885.435.235.52
IVSimple5.415.305.395.21
Permuted block4.445.485.105.49
Minimization4.215.185.045.06
Table 1

Type-I errors (in percentages) based on 10 000 simulations

CaseRandomizationTLTCLTSLTCSL
ISimple4.915.164.864.78
Permuted block3.255.224.804.85
Minimization3.405.435.025.23
IISimple5.395.145.004.97
Permuted block3.595.034.944.82
Minimization4.015.235.115.28
IIISimple5.075.435.275.16
Permuted block2.294.794.764.82
Minimization2.885.435.235.52
IVSimple5.415.305.395.21
Permuted block4.445.485.105.49
Minimization4.215.185.045.06
CaseRandomizationTLTCLTSLTCSL
ISimple4.915.164.864.78
Permuted block3.255.224.804.85
Minimization3.405.435.025.23
IISimple5.395.145.004.97
Permuted block3.595.034.944.82
Minimization4.015.235.115.28
IIISimple5.075.435.275.16
Permuted block2.294.794.764.82
Minimization2.885.435.235.52
IVSimple5.415.305.395.21
Permuted block4.445.485.105.49
Minimization4.215.185.045.06

Based on 10 000 simulations, power curves of four tests for θ ranging from 0 to 0.6, under four cases and stratified permuted block randomization are plotted in Fig. 1. Similar figures for simple randomization and minimization are given in the Supplementary Material. In all cases, the power curves of covariate-adjusted tests TCL and TCSL are better than those of unadjusted tests TL and TSL, especially the benchmark TL. Under Cox’s model, TCSL is better than TCL, but not necessarily under the non-Cox model. The stratified TSL is mostly better than the unstratified TL, but unlike TCL and TCSL, there is no guaranteed efficiency gain, e.g., case III when θ>0.4. The difference in censoring model also has some effect.

Power curves based on 10 000 simulations.
Fig. 1

Power curves based on 10 000 simulations.

More simulation results can be found in the Supplementary Material.

6 A real data application

We apply four tests TL,TCL,TSL and TCSL to the data from the AIDS Clinical Trials Group Study 175, ACTG 175, a randomized controlled trial evaluating antiretroviral treatments in adults infected with human immunodeficiency virus type 1 whose CD4 cell counts were from 200 to 500 per cubic millimeter (Hammer et al., 1996). The primary endpoint was time to a composite event defined as a   50% decline in the CD4 cell count, an AIDS-defining event, or death. Stratified permuted block randomization with equal allocation was applied with covariate Z having three levels related with the length of prior antiretroviral therapy: Z = 1, 2 and 3, representing 0 weeks, between 1 to 52 weeks and more than 52 weeks of prior antiretroviral therapy, respectively. The dataset is publicly available in the R package speff2trial (R Development Core Team, 2024).

We focus on the comparison of treatment 0 (zidovudine) versus treatment 1 (didanosine). For stratified log-rank test TSL, the three-level Z is used as the stratification variable. For covariate adjustment, two additional prognostic baseline covariates are considered as X: the baseline CD4 cell count and the number of days receiving antiretroviral therapy prior to treatment. In addition to testing treatment effect for all patients, a subgroup analysis with Z strata as subgroups is also of interest because responses to antiretroviral therapy may vary according to the extent of prior drug exposure. Within each subgroup defined by Z, the stratified tests become the same as their unstratified counterparts, and thus we only apply tests TL and TCL in the subgroup analysis.

Table 2 reports the number of patients, numerator and denominator of each test, and a p-value for testing with all patients or with a subgroup. The effect of covariate adjustment is clear: for the covariate-adjusted tests, the standard errors σ^CL and σ^CSL are smaller than σ^L and σ^SL in all analyses.

Table 2

Statistics for the ACTG 175 example

Subgroup
All patientsZ = 1Z = 2Z = 3
Number of patients1093461198434
Log-rank test
n1/2U^L–1.223–0.542–0.144–1.292
σ^L0.2650.2350.2700.290
p-value (adjusted for subgroup analysis)< 0.0010.0641< 0.001
 Estimated θ–0.528–0.455–0.140–0.740
 Standard error of the estimated θ0.1160.1990.2630.171
Covariate-adjusted log-rank test
n1/2U^CL–1.273–0.553–0.1291.382
σ^CL0.2570.2300.2650.282
p-value (adjusted for subgroup analysis)< 0.0010.0491< 0.001
 Estimated θ–0.550–0.464–0.1270.793
 Standard error of the estimated θ0.1130.1950.2570.166
Stratified log-rank test
n1/2U^SL–1.228
σ^SL0.264
p-value< 0.001
 Estimated θ–0.531
 Standard error of the estimated θ0.116
Covariate-adjusted stratified log-rank test
n1/2U^CSL1.284
σ^CSL0.258
p-value< 0.001
 Estimated θ0.556
 Standard error of the estimated θ0.113
Subgroup
All patientsZ = 1Z = 2Z = 3
Number of patients1093461198434
Log-rank test
n1/2U^L–1.223–0.542–0.144–1.292
σ^L0.2650.2350.2700.290
p-value (adjusted for subgroup analysis)< 0.0010.0641< 0.001
 Estimated θ–0.528–0.455–0.140–0.740
 Standard error of the estimated θ0.1160.1990.2630.171
Covariate-adjusted log-rank test
n1/2U^CL–1.273–0.553–0.1291.382
σ^CL0.2570.2300.2650.282
p-value (adjusted for subgroup analysis)< 0.0010.0491< 0.001
 Estimated θ–0.550–0.464–0.1270.793
 Standard error of the estimated θ0.1130.1950.2570.166
Stratified log-rank test
n1/2U^SL–1.228
σ^SL0.264
p-value< 0.001
 Estimated θ–0.531
 Standard error of the estimated θ0.116
Covariate-adjusted stratified log-rank test
n1/2U^CSL1.284
σ^CSL0.258
p-value< 0.001
 Estimated θ0.556
 Standard error of the estimated θ0.113

Here θ denotes the log hazard ratio for all patients and for each subgroup.

Table 2

Statistics for the ACTG 175 example

Subgroup
All patientsZ = 1Z = 2Z = 3
Number of patients1093461198434
Log-rank test
n1/2U^L–1.223–0.542–0.144–1.292
σ^L0.2650.2350.2700.290
p-value (adjusted for subgroup analysis)< 0.0010.0641< 0.001
 Estimated θ–0.528–0.455–0.140–0.740
 Standard error of the estimated θ0.1160.1990.2630.171
Covariate-adjusted log-rank test
n1/2U^CL–1.273–0.553–0.1291.382
σ^CL0.2570.2300.2650.282
p-value (adjusted for subgroup analysis)< 0.0010.0491< 0.001
 Estimated θ–0.550–0.464–0.1270.793
 Standard error of the estimated θ0.1130.1950.2570.166
Stratified log-rank test
n1/2U^SL–1.228
σ^SL0.264
p-value< 0.001
 Estimated θ–0.531
 Standard error of the estimated θ0.116
Covariate-adjusted stratified log-rank test
n1/2U^CSL1.284
σ^CSL0.258
p-value< 0.001
 Estimated θ0.556
 Standard error of the estimated θ0.113
Subgroup
All patientsZ = 1Z = 2Z = 3
Number of patients1093461198434
Log-rank test
n1/2U^L–1.223–0.542–0.144–1.292
σ^L0.2650.2350.2700.290
p-value (adjusted for subgroup analysis)< 0.0010.0641< 0.001
 Estimated θ–0.528–0.455–0.140–0.740
 Standard error of the estimated θ0.1160.1990.2630.171
Covariate-adjusted log-rank test
n1/2U^CL–1.273–0.553–0.1291.382
σ^CL0.2570.2300.2650.282
p-value (adjusted for subgroup analysis)< 0.0010.0491< 0.001
 Estimated θ–0.550–0.464–0.1270.793
 Standard error of the estimated θ0.1130.1950.2570.166
Stratified log-rank test
n1/2U^SL–1.228
σ^SL0.264
p-value< 0.001
 Estimated θ–0.531
 Standard error of the estimated θ0.116
Covariate-adjusted stratified log-rank test
n1/2U^CSL1.284
σ^CSL0.258
p-value< 0.001
 Estimated θ0.556
 Standard error of the estimated θ0.113

Here θ denotes the log hazard ratio for all patients and for each subgroup.

For the analysis based on all patients, all four tests significantly reject the null hypothesis H  0 of the no-treatment effect. In the subgroup analysis, the p-values are adjusted using Bonferroni’s correction to control for the familywise error rate. From Table 2, p-values in the subgroup analysis are substantially larger than those in the analysis of all patients, because of reduced sample sizes as well as Bonferroni’s correction. The empirical result in this example illustrates the benefit of covariate adjustment in testing when the sample size is not very large. Using the adjusted log-rank test TCL, together with the estimated effect size and its standard error shown in Table 2, we can conclude the superiority of treatment 1 for both Z = 1 and Z = 3, which is consistent with the evidence of Hammer et al. (1996).

Acknowledgement

We would like to thank all reviewers for useful comments and suggestions. Our research was supported by the National Natural Science Foundation of China and the U.S. National Science Foundation. Shao is also affiliated with the East China Normal University.

Supplementary material

The Supplementary Material contains all technical proofs and some additional results.

References

Baldi Antognini
A.
,
Zagoraiou
M
, . (
2015
).
On the almost sure convergence of adaptive allocation procedures
.
Bernoulli
 
21
,
881
908
.

Cassel
C. M.
,
Särndal
C. E.
,
Wretman
J. H
, . (
1976
).
Some results on generalized difference estimation and generalized regression estimation for finite populations
.
Biometrika
 
63
,
615
20
.

Cheng
S.
,
Wei
L.
,
Ying
Z
, . (
1995
).
Analysis of transformation models with censored data
.
Biometrika
 
82
,
835
45
.

Ciolino
J. D.
,
Palac
H. L.
,
Yang
A.
,
Vaca
M.
,
Belli
H. M
, . (
2019
).
Ideal vs. real: a systematic review on handling covariates in randomized controlled trials
.
BMC Med. Res. Methodol
.
19
,
136
.

Díaz
I.
,
Colantuoni
E.
,
Hanley
D. F.
,
Rosenblum
M.
, (
2019
).
Improved precision in the analysis of randomized trials with survival outcomes, without assuming proportional hazards
.
Lifetime Data Anal
.
25
,
439
68
.

DiRienzo
A. G.
,
Lagakos
S. W.
, (
2002
).
Effects of model misspecification on tests of no randomized treatment effect arising from Cox’s proportional hazards model
.
J. R. Statist. Soc. B
 
63
,
745
57
.

EMA
(
2015
). Guideline on adjustment for baseline covariates in clinical trials, EMA/CHMP/295050/2013. London, UK: European Medicines Agency (EMA). https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-adjustment-baseline-covariates-clinical-trials_en.pdf. Accessed August 3, 2023.

FDA
(
2023
). Adjusting for covariates in randomized clinical trials for drugs and biological products. Guidance for Industry. Center for Drug Evaluation and Research and Center for Biologics Evaluation and Research, Food and Drug Administration (FDA), U.S. Department of Health and Human Services. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/adjusting-covariates-randomized-clinical-trials-drugs-and-biological-products. Accessed August 3 2023.

Hammer
S. M.
,
Katzenstein
D. A.
,
Hughes
M. D.
,
Gundacker
H.
,
Schooley
R. T.
,
Haubrich
R. H.
,
Henry
W. K.
,
Lederman
M. M.
,
Phair
J. P.
,
Niu
M.
,  et al. (
1996
).
A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter
.
New Engl. J. Med
.
335
,
1081
90
.

ICH E9
. (
1998
). Statistical principles for clinical trials E9. International Council for Harmonisation (ICH). https://database.ich.org/sites/default/files/E9_Guideline.pdf. Accessed August 3, 2023.

Kalbfleisch
J. D.
,
Prentice
R. L.
, (
2011
).
The Statistical Analysis of Failure Time Data
.
New York
:
John Wiley
,.

Kong
F. H.
,
Slud
E
, . (
1997
).
Robust covariate-adjusted logrank tests
.
Biometrika
 
84
,
847
62
.

Lin
D. Y.
,
Wei
L. J
, . (
1989
).
The robust inference for the Cox proportional hazards model
.
J. Am. Statist. Assoc
.
84
,
1074
8
.

Lin
W
, . (
2013
).
Agnostic notes on regression adjustments to experimental data: reexamining Freedman’s critique
.
Ann. Appl. Statist
.
7
,
295
318
.

Lu
X.
,
Tsiatis
A. A.
, (
2008
).
Improving the efficiency of the log-rank test using auxiliary covariates
.
Biometrika
 
95
,
679
94
.

Lu
X.
,
Tsiatis
A. A
, . (
2011
).
Semiparametric estimation of treatment effect with time-lagged response in the presence of informative censoring
.
Lifetime Data Anal
.
17
,
566
93
.

Mantel
N
, . (
1966
).
Evaluation of survival data and two new rank order statistics arising in its consideration
.
Cancer Chemother. Rep
.
50
,
163
70
.

Moore
K. L.
,
van der Laan
M. J
, . (
2009
).
Increasing power in randomized trials with right censored outcomes through covariate adjustment
.
J. Biopharm. Statist
.
19
,
1099
131
.

Parast
L.
,
Tian
L.
,
Cai
T
, . (
2014
).
Landmark estimation of survival and treatment effect in a randomized clinical trial
.
J. Am. Statist. Assoc
.
109
,
384
94
.

Peto
R.
,
Pike
M. C.
,
Armitage
P.
,
Breslow
N. E.
,
Cox
D. R.
,
Howard
S. V.
,
Mantel
N.
,
McPherson
K.
,
Peto
J.
,
Smith
P. G
, . (
1976
).
Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design
.
Br. J. Cancer
 
34
,
585
612
.

Pocock
S. J.
,
Simon
R
, . (
1975
).
Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial
.
Biometrics
 
31
,
103
15
.

R Development Core Team
(
2024
). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0, http://www.R-project.org.

Robins
J. M.
,
Finkelstein
D. M
, . (
2000
).
Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests
.
Biometrics
 
56
,
779
88
.

Schulz
K. F.
,
Grimes
D. A
, . (
2002
).
Generation of allocation sequences in randomised trials: chance, not choice
.
Lancet
 
359
,
515
19
.

Shao
J
, . (
2021
).
Inference for covariate-adaptive randomization: aspects of methodology and theory (with discussions)
.
Statist. Theory Rel. Fields
 
5
,
172
86
.

Taves
D. R
, . (
1974
).
Minimization: a new method of assigning patients to treatment and control groups
.
Clin. Pharmacol. Ther
.
15
,
443
53
.

Taves
D. R
, . (
2010
).
The use of minimization in clinical trials
.
Contemp. Clin. Trials
 
31
,
180
4
.

Tsiatis
A. A.
,
Davidian
M.
,
Zhang
M.
,
Lu
X
, . (
2008
).
Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach
.
Statist. Med
.
27
,
4658
77
.

Wang
B.
,
Susukida
R.
,
Mojtabai
R.
,
Amin-Esmaeili
M.
,
Rosenblum
M
, . (
2023
).
Model-robust inference for clinical trials that improve precision by stratified randomization and covariate adjustment
.
J. Am. Statist. Assoc
.
118
,
1152
63
.

Ye
T.
,
Shao
J
, . (
2020
).
Robust tests for treatment effect in survival analysis under covariate-adaptive randomization
.
J. R. Statist. Soc. B
 
82
,
1301
23
.

Ye
T.
,
Shao
J.
,
Yi
Y.
,
Zhao
Q
, . (
2022
).
Toward better practice of covariate adjustment in analyzing randomized clinical trials
.
J. Am. Statist. Assoc
., doi:

Zelen
M
, . (
1974
).
The randomization and stratification of patients to clinical trials
.
J. Chronic Dis
.
27
,
365
75
.

Zhang
M
, . (
2015
).
Robust methods to improve efficiency and reduce bias in estimating survival curves in randomized clinical trials
.
Lifetime Data Anal
.
21
,
119
37
.

Zhang
M.
,
Tsiatis
A. A.
,
Davidian
M
, . (
2008
).
Improving efficiency of inferences in randomized clinical trials using auxiliary covariates
.
Biometrics
 
64
,
707
15
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)

Supplementary data