-
PDF
- Split View
-
Views
-
Cite
Cite
Yang Li, Hao Liu, Xiaoshen Wang, Wanzhu Tu, Semi-Parametric Time-to-Event Modelling of Lengths of Hospital Stays, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 71, Issue 5, November 2022, Pages 1623–1647, https://doi.org/10.1111/rssc.12593
- Share Icon Share
Abstract
Length of stay (LOS) is an essential metric for the quality of hospital care. Published works on LOS analysis have primarily focused on skewed LOS distributions and the influences of patient diagnostic characteristics. Few authors have considered the events that terminate a hospital stay: Both successful discharge and death could end a hospital stay but with completely different implications. Modelling the time to the first occurrence of discharge or death obscures the true nature of LOS. In this research, we propose a structure that simultaneously models the probabilities of discharge and death. The model has a flexible formulation that accounts for both additive and multiplicative effects of factors influencing the occurrence of death and discharge. We present asymptotic properties of the parameter estimates so that valid inference can be performed for the parametric as well as nonparametric model components. Simulation studies confirmed the good finite-sample performance of the proposed method. As the research is motivated by practical issues encountered in LOS analysis, we analysed data from two real clinical studies to showcase the general applicability of the proposed model.
1 INTRODUCTION
Length of hospital stay (LOS), that is, the number of inpatient days, is an important indicator of patient's health and care status. The concept is extensively used in clinical practice and health services research for quantification of resource consumption and quality of care (Chima et al., 1997; Freitas et al., 2012). Prolonged hospitalisation adds care cost, reduces operational efficiency and increases patient stress and risks for hospital-acquired infections (Clarke & Rosen, 2001; Graves et al., 2005). Reducing LOS has become one of the generally accepted policy goals of health care organisations (Pearson et al., 2001). In the United States, Centers for Medicare and Medicaid Services (CMS) use a prospective payment system that reimburses care cost by procedures performed, regardless of hospital days (CMS.gov, 2021), thus creating a great incentive for hospitals to reduce LOS.
Data analyses involving LOS are abundant in health services research. A simple Google Scholar search of ‘hospital length of stay analysis’ yields more than 2.1 million published items. Published analyses of LOS data tended to focus on skewed LOS distributions and employed common statistical models such as logistic regression and time-to-event analysis (Faddy et al., 2009; Marazzi et al., 1998; Vekaria et al., 2021; von Cube et al., 2017). Most do not differentiate the events that terminate a hospital stay. Discharge and death are treated the same, whichever comes first defines the LOS, despite the obvious clinical differences between the two. The situation resembles that described by various competing risk or semi-competing risk models, where the former depicts the occurrence of the first event by cause-specific hazards (Poguntke et al., 2018; Vekaria et al., 2021), and the latter quantifies the transition rates among multiple states with pre-defined subject-specific frailties or copula-based correlation structures (Hsieh et al., 2008; Lee et al., 2015; Peng & Fine, 2007; Xu et al., 2010).
With LOS data, analysts face two challenges: (1) Estimating the hazard function of discharge among survivors in the presence of censoring; (2) accommodating independent variables that are not adequately modelled by the Cox regression model. We present a model that addresses both challenges. Specifically, we describe a structure that simultaneously models the survival probabilities and discharge hazards, without pre-specifying the correlation structure between the two, by extending an approach as advocated by Sun et al. (2004, 2006), Zhao et al. (2011, 2013), Lee et al. (2015), and Wang et al. (2015). To enhance the model's flexibility, we allow independent variable effects to modify both the linear predictor and also the baseline hazard function, as suggested by Lin et al. (1995), Liu et al. (2010), Martinussen and Scheike (2007), and Sarker et al. (2015). To illustrate the proposed method, we analyse data from two clinical applications, one is a randomised clinical trial of anti-thrombotic therapies in patients suffering from stroke, and the other is a group of patients hospitalised for treatment of COVID.
The rest of this paper is organised as follows: In Section 2, we introduce the model, describe the estimation and inference procedures, as well as related theoretical results. In Section 3, we present simulation results. In Section 4, we use the proposed method to analyse data from two clinical studies. We conclude the paper in Section 5 with a few remarks.
2 METHOD
2.1 Model specification
Let and represent the event times for successful discharge and death, both are defined from the date of hospital admission. LOS is the time to or , whichever occurs first, that is, LOS where . When LOS , in-hospital death occurs and the hazard of is terminated. If LOS and the clinical follow-up continues, could still be observed at a future time LOS. We write the counting processes for and as and , respectively, where is the indicator function.
The conventional hazard function for is
For LOS analysis, the above hazard function is appropriate if is subject only to censoring. But with death, the formulation becomes questionable because cannot happen after , that is, whenever . This is a fact that (1) does not reflect.
To describe the hazard of discharge conditional on survival, we write the discharge intensity among survivors as , where
represents the corresponding hazard function. The conditional hazard function in (2) is the same as the ‘cause-specific’ hazard discussed in the competing-risks literature (Prentice et al., 1978). However, unlike the the competing-risks models that focus on the occurrence of or , whichever occurs first, in LOS data, could still happen after if LOS .
To assess the effects of risk factors on , we let and be random covariate vectors of dimensions and , respectively. Letting , , we present a general model as follows:
where is an unknown baseline hazard function, and are pre-specified twice differentiable functions representing the additive and multiplicative covariate effects, and and are the corresponding regression parameters. We assume for any , and , to avoid identifiability issues. To reflect the fact that no live discharge is possible after death, we define . In this model, modifies the baseline hazard function, whereas influences the linear predictor. In practical data analysis, the composition of and should be determined by scientific consideration and hypotheses of interest, as we shall demonstrate in the real data applications.
Model (3) differs from most of the existing models that depict time to the first occurrence of either or (Poguntke et al., 2018; Vekaria et al., 2021), and also from the semi-competing risk models (Lee et al., 2015; Peng & Fine, 2007; Xu et al., 2010) as Model (3) does not require specification of a correlation structure between and . The discharge hazards associated with the covariates are defined by (2), instead of by the marginal distribution of , as done by Peng and Fine (2007) among others. Of note, the existing semi-competing risk models typically require copula-based correlations or shared frailties among successful discharge, in-hospital death, and death after discharge; estimation and inference, therefore, are susceptible to the risk of correlation misspecification. To avoid superimposing an arbitrary dependency structure or distributional assumptions, we extend an approach from event history analyses (Cheng et al., 2015; Sun et al., 2004, 2006; Zhao et al., 2011, 2013). Model (3) describes the conditional hazard function of given , allowing for dependent and without requiring correlation structure specification.
For a special case of model (3), the survival function of can involve a latent variable
where depends on , and satisfies . Other correlation structures between and are also possible but the marginal survival function of would take a different form. In addition to its flexibility in accommodating correlation structures, model (3) provides a desired conditional hazard interpretation, which is not always ascertainable from either the copula (Peng & Fine, 2007) or the frailty semi-competing risk models (Lee et al., 2015; Sun & Shen, 2009; Wang et al., 2015; Zhang et al., 2005, 2007).
The simultaneous inclusion of and creates a structure that covers a broader class of models. Choices of , , , and can be made based on investigators understanding of the variables' influences, and the intended interpretations of the resulting parameters. For example, when or , model (3) is reduced to the Cox proportional hazards model or the Aalen additive hazards model, respectively (Sun et al., 2004).
Model (3) can be used to assess proportional hazards across different levels of ; within each level, the baseline hazard rate can be further modified by , whose effects may deviate from the exponential form. This structure lends enhanced flexibility to parameter interpretation in the LOS analyses. For example, including the comorbidity and treatment indicators in and continuous characteristics (such as age) in gives the comorbidity/treatment indicators' hazard ratio interpretations at a given age. Similarly, for subjects of the same comorbidity profile, represents an additive effect associated with per unit change of age on the baseline hazard, as one year change in age typically does not alter the hazard exponentially. More generally, the model depicts the proportional hazard of at different levels of . In particular, when and , for one-dimensional covariates and , one unit change in leads to the interpretation of as a hazard ratio:
Similarly, one unit change in while keeping constant gives a covariate-dependent hazard difference interpretation:
In the absence of death or when death functions as a noninformative censoring event, model (3) is reduced to the Cox-Aalen regression model (Scheike & Zhang, 2002), except for the additive term . But as we shall demonstrate in the simulation study, unbiased inference cannot be derived from the Cox–Aalen regression by treating death time as noninformative censoring, even with the time-varying covariate effects averaged out over time.
The terminal event time can be modelled by a standard Cox model:
where is the baseline hazard function, is a vector of regression coefficients for the covariates .
Here, we let be a non-informaitve censoring time that is independent of both and given and . Letting , , , , we write independent and identically distributed quadruplets as . Let be a counting process that jumps by one at if and only if and . The survival information for the th subject is contained in , regardless of and whether and are dependent.
2.2 Estimation and inference
For narrative convenience, we discuss estimation and inference of the proposed model when and , while noting that the estimation and inference procedures described below are readily extendable to other forms of and .
2.2.1 Estimation
To estimate the unknown parameters associated with model (3), we define
where and represents the subject-specific survival function, that is,
Under model (3), it is straightforward to show 's are zero-mean stochastic processes. Hence one could estimate , , and by solving the following estimating equations:
and
where and a pre-specified value representing a known finite maximum duration of the study. Combining Equations (5) and (6), estimates of and can be obtained by solving
where
Let be the cumulative baseline hazard function. When and , can be estimated by
based on Equation (5).
We note that corresponding to model (3), the estimation involves , which is in general unknown. On the other hand, it can be estimated under model (4) with right-censored survival data from the study cohort. Specifically, can be estimated by , where
where , is the solution to the equation
, for any vector and .
We denote the true values of and by and , and their estimators by and , solved from (7) with replaced by . Let , , and . The estimate of given by Equation (8) is now updated by
Since the baseline cumulative hazard estimator is not necessarily non-decreasing in . Without altering the asymptotic properties of , we modify the estimator as . Following arguments similar to Lin et al. (1994), we show that and are asymptotically equivalent as .
2.2.2 Prediction
Survival and discharge probabilities are both important clinical endpoints. Patient-specific trajectories of predictive probabilities are useful in aiding hospital management and patient care planning (Li et al., 2021; Li & Cheng, 2016; Sun et al., 2006). Although prediction is not the focus of the current research, we note that the existing semi-competing risk models with shared frailty terms do not lend themselves to a straightforward predictive probability interpretation unless the frailty variables are integrated out over pre-specified prior distributions (Lee et al., 2015). From models (3) and (4), predictive probabilities can be ascertained in a straightforward fashion. More specifically, for a patient with covariates , the probability of death can be calculated from . Suppose the patient is alive at time , the cumulative hazard that the patient has a successful discharge at time can be estimated from:
where , , , are the estimators obtained from Section 2.2.1. The predicted cumulative incidence probability is then given by .
2.2.3 Asymptotic properties
The following theorems state the asymptotic properties of and .
Theorem 1Under regularity conditions C1–C6 in the Appendix, andare consistent estimators ofand, respectively. Additionally, converges weakly to a zero-mean normal distribution with a covariance matrixconsistently estimated by, where
Theorem 2Under regularity conditions C1–C6 in the Appendix, converges in probability touniformly inandconverges weakly to a zero-mean Gaussian process with a covariance function that can be consistently estimated byfor, where
besides other quantities defined in Theorem1. In particular, , and the covariancecan be estimated consistently by. Theorems1and2imply that the SEs of, andcan be consistently estimated by, and , respectively, where, are diagonal entries ofin Theorem1.
3 SIMULATION STUDIES
We conducted an extensive simulation study to assess the finite sample performance of the proposed estimators , and . We evaluated the estimation performance under the sample sizes of and 200, with 1000 replication for each parameter setting.
3.1 Data generation
We considered two major scenarios:
(i) and were independent. for subject was generated with the hazard function:
where and were generated from the uniform and Bernoulli distribution with a probability of success of 0.5, respectively. was generated from the Cox model with covariates with a baseline hazard .
(ii) and were correlated. Besides sharing covariates and generated in the same way as in Scenario (i), we involved in the distribution functions of and a latent variable from the positive stable distribution with Laplace transformation for for subject (Luong & Doray, 2009). From and , had a hazard function . It can be shown that satisfies model (4) with . The discharge time was generated from survival function
where
. One can show that . Hence among survivors,
which satisfies model (3). Different values of , and yield event times and with various magnitudes of correlation. In all simulated settings, and were negatively correlated with the empirical estimates of the Spearman correlation coefficient ranged around to (Schemper et al., 2013). The censoring time was taken to be a common value at , which represented the study duration.
3.2 Metrics of evaluation
We calculated summaries of estimation bias (Bias), sample standard error (SSE), averaged standard error estimate (SEE) according to the theorems, and empirical 95% coverage probability (CP). When the estimators worked as the theorems dictated, the evaluation metrics were expected to show a bias approaching zero, strong agreement between SSE and SEE, and CP close to the 95% nominal level.
3.3 Simulation results
The estimation results for and are in Tables 1 and 2 and Tables S1–S3, where Table 1 corresponds to settings of independent and in Scenario (i) with , and Tables 2 and S1–S3 represent correlated and . The Spearman correlation coefficient between and ranged around to as described in Scenario (ii). All proposed estimators performed well with small biases, values of SSEs and SEEs close to each other, and empirical CPs approached the nominal level. The SEEs decreased with an increasing sample size, showing improved estimation efficiency.
Simulation results on and for Scenario (i) when and were generated independently
. | . | . | ||||||
---|---|---|---|---|---|---|---|---|
n . | Bias . | SEE . | SSE . | CP . | Bias . | SEE . | SSE . | CP . |
= | ||||||||
100 | 0.341 | 0.358 | 0.944 | 0.197 | 0.200 | 0.953 | ||
200 | 0.238 | 0.237 | 0.955 | 0.008 | 0.138 | 0.143 | 0.948 | |
= | ||||||||
100 | 0.028 | 0.374 | 0.391 | 0.945 | 0.201 | 0.220 | 0.929 | |
200 | 0.008 | 0.260 | 0.269 | 0.948 | 0.003 | 0.140 | 0.143 | 0.945 |
= | ||||||||
100 | 0.005 | 0.330 | 0.356 | 0.939 | 0.214 | 0.225 | 0.929 | |
200 | 0.234 | 0.239 | 0.943 | 0.151 | 0.155 | 0.936 | ||
= | ||||||||
100 | 0.012 | 0.361 | 0.397 | 0.923 | 0.215 | 0.227 | 0.938 | |
200 | 0.005 | 0.255 | 0.267 | 0.937 | 0.152 | 0.155 | 0.947 | |
= | ||||||||
100 | 0.341 | 0.361 | 0.932 | 0.004 | 0.199 | 0.207 | 0.943 | |
200 | 0.005 | 0.238 | 0.240 | 0.953 | 0.007 | 0.139 | 0.141 | 0.952 |
= | ||||||||
100 | 0.017 | 0.376 | 0.398 | 0.937 | 0.006 | 0.202 | 0.200 | 0.957 |
200 | 0.261 | 0.254 | 0.942 | 0.141 | 0.137 | 0.956 | ||
= | ||||||||
100 | 0.333 | 0.356 | 0.922 | 0.217 | 0.225 | 0.932 | ||
200 | 0.012 | 0.234 | 0.240 | 0.937 | 0.152 | 0.157 | 0.940 | |
= | ||||||||
100 | 0.029 | 0.362 | 0.389 | 0.932 | 0.218 | 0.233 | 0.930 | |
200 | 0.009 | 0.254 | 0.255 | 0.947 | 0.153 | 0.152 | 0.954 |
. | . | . | ||||||
---|---|---|---|---|---|---|---|---|
n . | Bias . | SEE . | SSE . | CP . | Bias . | SEE . | SSE . | CP . |
= | ||||||||
100 | 0.341 | 0.358 | 0.944 | 0.197 | 0.200 | 0.953 | ||
200 | 0.238 | 0.237 | 0.955 | 0.008 | 0.138 | 0.143 | 0.948 | |
= | ||||||||
100 | 0.028 | 0.374 | 0.391 | 0.945 | 0.201 | 0.220 | 0.929 | |
200 | 0.008 | 0.260 | 0.269 | 0.948 | 0.003 | 0.140 | 0.143 | 0.945 |
= | ||||||||
100 | 0.005 | 0.330 | 0.356 | 0.939 | 0.214 | 0.225 | 0.929 | |
200 | 0.234 | 0.239 | 0.943 | 0.151 | 0.155 | 0.936 | ||
= | ||||||||
100 | 0.012 | 0.361 | 0.397 | 0.923 | 0.215 | 0.227 | 0.938 | |
200 | 0.005 | 0.255 | 0.267 | 0.937 | 0.152 | 0.155 | 0.947 | |
= | ||||||||
100 | 0.341 | 0.361 | 0.932 | 0.004 | 0.199 | 0.207 | 0.943 | |
200 | 0.005 | 0.238 | 0.240 | 0.953 | 0.007 | 0.139 | 0.141 | 0.952 |
= | ||||||||
100 | 0.017 | 0.376 | 0.398 | 0.937 | 0.006 | 0.202 | 0.200 | 0.957 |
200 | 0.261 | 0.254 | 0.942 | 0.141 | 0.137 | 0.956 | ||
= | ||||||||
100 | 0.333 | 0.356 | 0.922 | 0.217 | 0.225 | 0.932 | ||
200 | 0.012 | 0.234 | 0.240 | 0.937 | 0.152 | 0.157 | 0.940 | |
= | ||||||||
100 | 0.029 | 0.362 | 0.389 | 0.932 | 0.218 | 0.233 | 0.930 | |
200 | 0.009 | 0.254 | 0.255 | 0.947 | 0.153 | 0.152 | 0.954 |
Note: Summaries of biasses, sample standard errors (SSE), standard error estimates (SEE) and coverage probabilities (CP) for and 200, based on 1000 simulations.
Simulation results on and for Scenario (i) when and were generated independently
. | . | . | ||||||
---|---|---|---|---|---|---|---|---|
n . | Bias . | SEE . | SSE . | CP . | Bias . | SEE . | SSE . | CP . |
= | ||||||||
100 | 0.341 | 0.358 | 0.944 | 0.197 | 0.200 | 0.953 | ||
200 | 0.238 | 0.237 | 0.955 | 0.008 | 0.138 | 0.143 | 0.948 | |
= | ||||||||
100 | 0.028 | 0.374 | 0.391 | 0.945 | 0.201 | 0.220 | 0.929 | |
200 | 0.008 | 0.260 | 0.269 | 0.948 | 0.003 | 0.140 | 0.143 | 0.945 |
= | ||||||||
100 | 0.005 | 0.330 | 0.356 | 0.939 | 0.214 | 0.225 | 0.929 | |
200 | 0.234 | 0.239 | 0.943 | 0.151 | 0.155 | 0.936 | ||
= | ||||||||
100 | 0.012 | 0.361 | 0.397 | 0.923 | 0.215 | 0.227 | 0.938 | |
200 | 0.005 | 0.255 | 0.267 | 0.937 | 0.152 | 0.155 | 0.947 | |
= | ||||||||
100 | 0.341 | 0.361 | 0.932 | 0.004 | 0.199 | 0.207 | 0.943 | |
200 | 0.005 | 0.238 | 0.240 | 0.953 | 0.007 | 0.139 | 0.141 | 0.952 |
= | ||||||||
100 | 0.017 | 0.376 | 0.398 | 0.937 | 0.006 | 0.202 | 0.200 | 0.957 |
200 | 0.261 | 0.254 | 0.942 | 0.141 | 0.137 | 0.956 | ||
= | ||||||||
100 | 0.333 | 0.356 | 0.922 | 0.217 | 0.225 | 0.932 | ||
200 | 0.012 | 0.234 | 0.240 | 0.937 | 0.152 | 0.157 | 0.940 | |
= | ||||||||
100 | 0.029 | 0.362 | 0.389 | 0.932 | 0.218 | 0.233 | 0.930 | |
200 | 0.009 | 0.254 | 0.255 | 0.947 | 0.153 | 0.152 | 0.954 |
. | . | . | ||||||
---|---|---|---|---|---|---|---|---|
n . | Bias . | SEE . | SSE . | CP . | Bias . | SEE . | SSE . | CP . |
= | ||||||||
100 | 0.341 | 0.358 | 0.944 | 0.197 | 0.200 | 0.953 | ||
200 | 0.238 | 0.237 | 0.955 | 0.008 | 0.138 | 0.143 | 0.948 | |
= | ||||||||
100 | 0.028 | 0.374 | 0.391 | 0.945 | 0.201 | 0.220 | 0.929 | |
200 | 0.008 | 0.260 | 0.269 | 0.948 | 0.003 | 0.140 | 0.143 | 0.945 |
= | ||||||||
100 | 0.005 | 0.330 | 0.356 | 0.939 | 0.214 | 0.225 | 0.929 | |
200 | 0.234 | 0.239 | 0.943 | 0.151 | 0.155 | 0.936 | ||
= | ||||||||
100 | 0.012 | 0.361 | 0.397 | 0.923 | 0.215 | 0.227 | 0.938 | |
200 | 0.005 | 0.255 | 0.267 | 0.937 | 0.152 | 0.155 | 0.947 | |
= | ||||||||
100 | 0.341 | 0.361 | 0.932 | 0.004 | 0.199 | 0.207 | 0.943 | |
200 | 0.005 | 0.238 | 0.240 | 0.953 | 0.007 | 0.139 | 0.141 | 0.952 |
= | ||||||||
100 | 0.017 | 0.376 | 0.398 | 0.937 | 0.006 | 0.202 | 0.200 | 0.957 |
200 | 0.261 | 0.254 | 0.942 | 0.141 | 0.137 | 0.956 | ||
= | ||||||||
100 | 0.333 | 0.356 | 0.922 | 0.217 | 0.225 | 0.932 | ||
200 | 0.012 | 0.234 | 0.240 | 0.937 | 0.152 | 0.157 | 0.940 | |
= | ||||||||
100 | 0.029 | 0.362 | 0.389 | 0.932 | 0.218 | 0.233 | 0.930 | |
200 | 0.009 | 0.254 | 0.255 | 0.947 | 0.153 | 0.152 | 0.954 |
Note: Summaries of biasses, sample standard errors (SSE), standard error estimates (SEE) and coverage probabilities (CP) for and 200, based on 1000 simulations.
Simulation results on and for Scenario (ii) when correlated and were generated, , and
. | . | . | ||||||
---|---|---|---|---|---|---|---|---|
n . | Bias . | SEE . | SSE . | CP . | Bias . | SEE . | SSE . | CP . |
= | ||||||||
100 | 0.014 | 0.401 | 0.419 | 0.939 | 0.203 | 0.208 | 0.952 | |
200 | 0.281 | 0.284 | 0.950 | 0.002 | 0.142 | 0.145 | 0.942 | |
= | ||||||||
100 | 0.026 | 0.437 | 0.453 | 0.941 | 0.205 | 0.210 | 0.956 | |
200 | 0.306 | 0.311 | 0.950 | 0.001 | 0.144 | 0.148 | 0.945 | |
= | ||||||||
100 | 0.009 | 0.400 | 0.424 | 0.930 | 0.218 | 0.223 | 0.943 | |
200 | 0.283 | 0.288 | 0.939 | 0.154 | 0.159 | 0.940 | ||
= | ||||||||
100 | 0.025 | 0.435 | 0.463 | 0.935 | 0.218 | 0.222 | 0.947 | |
200 | 0.308 | 0.311 | 0.941 | 0.154 | 0.158 | 0.936 | ||
= | ||||||||
100 | 0.015 | 0.405 | 0.421 | 0.940 | 0.205 | 0.210 | 0.950 | |
200 | 0.284 | 0.287 | 0.952 | 0.143 | 0.146 | 0.946 | ||
= | ||||||||
100 | 0.030 | 0.440 | 0.455 | 0.945 | 0.207 | 0.213 | 0.951 | |
200 | 0.308 | 0.315 | 0.947 | 0.145 | 0.150 | 0.943 | ||
= | ||||||||
100 | 0.012 | 0.402 | 0.425 | 0.935 | 0.220 | 0.225 | 0.939 | |
200 | 0.284 | 0.290 | 0.943 | 0.155 | 0.161 | 0.933 | ||
= | ||||||||
100 | 0.029 | 0.437 | 0.461 | 0.937 | 0.219 | 0.223 | 0.948 | |
200 | 0.308 | 0.313 | 0.941 | 0.155 | 0.160 | 0.933 |
. | . | . | ||||||
---|---|---|---|---|---|---|---|---|
n . | Bias . | SEE . | SSE . | CP . | Bias . | SEE . | SSE . | CP . |
= | ||||||||
100 | 0.014 | 0.401 | 0.419 | 0.939 | 0.203 | 0.208 | 0.952 | |
200 | 0.281 | 0.284 | 0.950 | 0.002 | 0.142 | 0.145 | 0.942 | |
= | ||||||||
100 | 0.026 | 0.437 | 0.453 | 0.941 | 0.205 | 0.210 | 0.956 | |
200 | 0.306 | 0.311 | 0.950 | 0.001 | 0.144 | 0.148 | 0.945 | |
= | ||||||||
100 | 0.009 | 0.400 | 0.424 | 0.930 | 0.218 | 0.223 | 0.943 | |
200 | 0.283 | 0.288 | 0.939 | 0.154 | 0.159 | 0.940 | ||
= | ||||||||
100 | 0.025 | 0.435 | 0.463 | 0.935 | 0.218 | 0.222 | 0.947 | |
200 | 0.308 | 0.311 | 0.941 | 0.154 | 0.158 | 0.936 | ||
= | ||||||||
100 | 0.015 | 0.405 | 0.421 | 0.940 | 0.205 | 0.210 | 0.950 | |
200 | 0.284 | 0.287 | 0.952 | 0.143 | 0.146 | 0.946 | ||
= | ||||||||
100 | 0.030 | 0.440 | 0.455 | 0.945 | 0.207 | 0.213 | 0.951 | |
200 | 0.308 | 0.315 | 0.947 | 0.145 | 0.150 | 0.943 | ||
= | ||||||||
100 | 0.012 | 0.402 | 0.425 | 0.935 | 0.220 | 0.225 | 0.939 | |
200 | 0.284 | 0.290 | 0.943 | 0.155 | 0.161 | 0.933 | ||
= | ||||||||
100 | 0.029 | 0.437 | 0.461 | 0.937 | 0.219 | 0.223 | 0.948 | |
200 | 0.308 | 0.313 | 0.941 | 0.155 | 0.160 | 0.933 |
Note: Summaries of biases, sample standard errors (SSE), standard error estimates (SEE) and coverage probabilities (CP) for and 200, based on 1000 simulations.
Simulation results on and for Scenario (ii) when correlated and were generated, , and
. | . | . | ||||||
---|---|---|---|---|---|---|---|---|
n . | Bias . | SEE . | SSE . | CP . | Bias . | SEE . | SSE . | CP . |
= | ||||||||
100 | 0.014 | 0.401 | 0.419 | 0.939 | 0.203 | 0.208 | 0.952 | |
200 | 0.281 | 0.284 | 0.950 | 0.002 | 0.142 | 0.145 | 0.942 | |
= | ||||||||
100 | 0.026 | 0.437 | 0.453 | 0.941 | 0.205 | 0.210 | 0.956 | |
200 | 0.306 | 0.311 | 0.950 | 0.001 | 0.144 | 0.148 | 0.945 | |
= | ||||||||
100 | 0.009 | 0.400 | 0.424 | 0.930 | 0.218 | 0.223 | 0.943 | |
200 | 0.283 | 0.288 | 0.939 | 0.154 | 0.159 | 0.940 | ||
= | ||||||||
100 | 0.025 | 0.435 | 0.463 | 0.935 | 0.218 | 0.222 | 0.947 | |
200 | 0.308 | 0.311 | 0.941 | 0.154 | 0.158 | 0.936 | ||
= | ||||||||
100 | 0.015 | 0.405 | 0.421 | 0.940 | 0.205 | 0.210 | 0.950 | |
200 | 0.284 | 0.287 | 0.952 | 0.143 | 0.146 | 0.946 | ||
= | ||||||||
100 | 0.030 | 0.440 | 0.455 | 0.945 | 0.207 | 0.213 | 0.951 | |
200 | 0.308 | 0.315 | 0.947 | 0.145 | 0.150 | 0.943 | ||
= | ||||||||
100 | 0.012 | 0.402 | 0.425 | 0.935 | 0.220 | 0.225 | 0.939 | |
200 | 0.284 | 0.290 | 0.943 | 0.155 | 0.161 | 0.933 | ||
= | ||||||||
100 | 0.029 | 0.437 | 0.461 | 0.937 | 0.219 | 0.223 | 0.948 | |
200 | 0.308 | 0.313 | 0.941 | 0.155 | 0.160 | 0.933 |
. | . | . | ||||||
---|---|---|---|---|---|---|---|---|
n . | Bias . | SEE . | SSE . | CP . | Bias . | SEE . | SSE . | CP . |
= | ||||||||
100 | 0.014 | 0.401 | 0.419 | 0.939 | 0.203 | 0.208 | 0.952 | |
200 | 0.281 | 0.284 | 0.950 | 0.002 | 0.142 | 0.145 | 0.942 | |
= | ||||||||
100 | 0.026 | 0.437 | 0.453 | 0.941 | 0.205 | 0.210 | 0.956 | |
200 | 0.306 | 0.311 | 0.950 | 0.001 | 0.144 | 0.148 | 0.945 | |
= | ||||||||
100 | 0.009 | 0.400 | 0.424 | 0.930 | 0.218 | 0.223 | 0.943 | |
200 | 0.283 | 0.288 | 0.939 | 0.154 | 0.159 | 0.940 | ||
= | ||||||||
100 | 0.025 | 0.435 | 0.463 | 0.935 | 0.218 | 0.222 | 0.947 | |
200 | 0.308 | 0.311 | 0.941 | 0.154 | 0.158 | 0.936 | ||
= | ||||||||
100 | 0.015 | 0.405 | 0.421 | 0.940 | 0.205 | 0.210 | 0.950 | |
200 | 0.284 | 0.287 | 0.952 | 0.143 | 0.146 | 0.946 | ||
= | ||||||||
100 | 0.030 | 0.440 | 0.455 | 0.945 | 0.207 | 0.213 | 0.951 | |
200 | 0.308 | 0.315 | 0.947 | 0.145 | 0.150 | 0.943 | ||
= | ||||||||
100 | 0.012 | 0.402 | 0.425 | 0.935 | 0.220 | 0.225 | 0.939 | |
200 | 0.284 | 0.290 | 0.943 | 0.155 | 0.161 | 0.933 | ||
= | ||||||||
100 | 0.029 | 0.437 | 0.461 | 0.937 | 0.219 | 0.223 | 0.948 | |
200 | 0.308 | 0.313 | 0.941 | 0.155 | 0.160 | 0.933 |
Note: Summaries of biases, sample standard errors (SSE), standard error estimates (SEE) and coverage probabilities (CP) for and 200, based on 1000 simulations.
In addition to the parametric estimators, we also calculated the pointwise SSE, SEE and CP for the non-parametric estimator , which is asymptotically equivalent to as defined in Equation (10). Observations showed a similar performance as with the parametric estimators. For example, Figure 1 shows the pointwise estimates on when = , as one set-up in Table 2. We noted that had generally accurate SEEs, and the empirical coverage probabilities fluctuated closely around the nominal-level. When the sample size increased, the estimates showed improved efficiency with reduced standard errors. To gauge the magnitude of bias over time, we calculated the integrated squared bias (ISB) defined as
where , ,…, are equally spaced time points separated by , ,
and is the mth replicate of . The ISB values were 0.0022 and 0.0019, respectively, when and . The results indicated the 's were unbiassed.
![Pointwise SE estimates and the coverage probabilities for Scenario (ii) when T and U are correlated, n=100 and 200, β=0, θ=0.5, γ=0, based on 1000 simulations. CP, coverage probabilities; SEE, standard error estimates; SSE, sample standard errors [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/5/10.1111_rssc.12593/1/m_rssc_71_5_1623_fig-1.jpeg?Expires=1747952872&Signature=xMszpsWLXPKf63XcLWa9CfSR5T6Zs0EjFDRZRl-wqRY5xnO4wHEmoLBM85wkSMC24s4WZKqp3Apinhqd5X5L~ZtCGKsXMjtd8DjrERIzT8VRaxo4QEU-KnuSPduHCBXKN~VZsyWdhit2ehP6BP3n9yBI7z~83AK4ssOfNSd9~6MTlCYuHJpAj7zjoBDdJxJqFLYK7Vp9i3bc0MAdFqL~wnbUC0qn-V~eRF2PNc2YvDgZPf3r2xmuj6ehSk1Mssi75cqWZvt1gAFTT69uSWRvDEiR170UAQTEYxHLRH8fFOWa~3SpDeEOlRV99G2bSAmsR6PBeZtQ6Me4UiCKiOdtVw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Pointwise SE estimates and the coverage probabilities for Scenario (ii) when and are correlated, and 200, , , , based on 1000 simulations. CP, coverage probabilities; SEE, standard error estimates; SSE, sample standard errors [Colour figure can be viewed at https://dbpia.nl.go.kr]
We also note that when there were no deaths or when was treated as non-informative censoring, Scheike and Zhang (2002) proposed the following model:
coupled with an estimation procedure for and cumulative time-varying effects , where . This model yields a parameter estimator , where is the largest observed. We refer to as the SZ2002 estimator, which is different from the estimator we proposed under the conditional hazard in (2). We presented simulation results showing that the SZ2002 estimator has considerably larger biases whereas our estimator performed well in both simulation scenarios (i and ii). See Table 3. The results, therefore, confirm that treating deaths as noninformative censoring events leads to biased inference for model (3).
Comparisons of the biases associated with the proposed estimator and a modified SZ2002 estimator, for Scenarios (i) and (ii) when
. | Proposed . | Modified SZ2002 . | ||
---|---|---|---|---|
. | . | . | . | . |
Scenario (i) | ||||
0.029 | 0.820 | 1.241 | ||
0.009 | 0.275 | |||
Scenario (ii) | ||||
0.015 | 0.029 | 0.264 | ||
. | Proposed . | Modified SZ2002 . | ||
---|---|---|---|---|
. | . | . | . | . |
Scenario (i) | ||||
0.029 | 0.820 | 1.241 | ||
0.009 | 0.275 | |||
Scenario (ii) | ||||
0.015 | 0.029 | 0.264 | ||
Comparisons of the biases associated with the proposed estimator and a modified SZ2002 estimator, for Scenarios (i) and (ii) when
. | Proposed . | Modified SZ2002 . | ||
---|---|---|---|---|
. | . | . | . | . |
Scenario (i) | ||||
0.029 | 0.820 | 1.241 | ||
0.009 | 0.275 | |||
Scenario (ii) | ||||
0.015 | 0.029 | 0.264 | ||
. | Proposed . | Modified SZ2002 . | ||
---|---|---|---|---|
. | . | . | . | . |
Scenario (i) | ||||
0.029 | 0.820 | 1.241 | ||
0.009 | 0.275 | |||
Scenario (ii) | ||||
0.015 | 0.029 | 0.264 | ||
3.4 Sensitivity analyses
Finally, we assessed the estimation robustness against covariate mis-specification by simulating various situations where in model (4) is incorrectly specified. We generated correlated and for Scenario (ii) with , and ; Bernoulli(). was misspecified in various ways: (i) Inclusion of an extra variable Uniform, that is, using when the true . (ii) Inclusion of a wrong variable. We used when the true , where Uniform and Uniform. (iii) Leaving out a variable in . We used or when the true , where Uniform. Simulation results are presented in Tables S4 and S5. Briefly, estimates of and were not strongly influenced by the misspecification of , probably because only the survival function estimate () but not was needed in the estimating equations for and . For all tested settings, the performance appeared to improve with sample size.
4 REAL DATA APPLICATIONS
The model proposed in the paper is general enough to be applicable to a broad range of LOS analyses. Here we present analyses of data from two clinical studies, each of which requires some of the features of the proposed model. As discussed in earlier sections, although existing frailty and copula-based semi-competing risk models can be used to analyse such data, those methods do not accommodate deaths that occurred after discharge, nor do they offer the conditional hazard interpretation, and often require pre-specification of correlation structures that are not easily verifiable in practical data analysis.
4.1 LOS with the International Stroke Trial
We first considered the International Stroke Trial (IST), a randomised clinical trial of anti-thrombotic therapies in patients suffering from strokes (Group, 1997; Sandercock et al., 2011). The study followed a factorial design, testing the effects of various combinations of aspirin and heparin at different dose levels on the survival of stroke patients. Treatment started upon enrollment and continued for 14 days or until hospital discharge or death, whichever came first. Among the 6415 trial participants whose medical images showed visible infarctions at randomisation, 1351 had died before discharge; 3159 had been successfully discharged, of which 296 died after discharge. The maximum follow-up length was 567 days, at which time patients without a discharge or death record were censored.
The IST trial has four treatment groups. Among the 6415 patients, 1662 were on aspirin, 1564 on heparin, 1675 on both, and 1514 on neither. In this secondary analysis, we focused on the time from treatment initiation to hospital discharge, with death as the terminal event. Indicators of treatment groups were the main independent variables of interest, with other patient characteristics such as age and conscious state at enrollment as relevant covariates (Kwah & Diong, 2014).
We first examined the treatment effects on death while adjusting for the influences of age and the baseline level of consciousness by using the following Cox model:
where represented the hazard of death. Because of the wide age range (16–98 years) of the trial participants, we categorised age by tertiles. A total of six covariates were included in : Age , Age , full consciousness at randomisation (CONSC), aspirin alone (ASP), heparin alone (HEP), and both aspirin and heparin (Both). All independent variables were binary. Table 4 shows that being in the younger age groups and being fully alert at randomisation were associated with reduced hazards of death. Aspirin reduced the hazards of death slightly but its effect did not reach the level of statistical significance.
Results from Cox regression analysis containing independent variable effects on death hazards of the 6415 patients whose medical images showed visible infarctions at randomisation in International Stroke Trial
. | EST (SE) . | -value . | 95%CI . |
---|---|---|---|
Age | 0.0000 | () | |
Age | 0.0000 | () | |
CONSC | 0.0000 | () | |
ASP | 0.6978 | (, 0.1093) | |
HEP | 0.0082 (0.0684) | 0.9045 | (, 0.1422) |
Both | 0.7124 | (, 0.1113) |
. | EST (SE) . | -value . | 95%CI . |
---|---|---|---|
Age | 0.0000 | () | |
Age | 0.0000 | () | |
CONSC | 0.0000 | () | |
ASP | 0.6978 | (, 0.1093) | |
HEP | 0.0082 (0.0684) | 0.9045 | (, 0.1422) |
Both | 0.7124 | (, 0.1113) |
Abbreviations: 95% CI, 95% confidence interval; EST, point estimate; SE, standard error estimate.
Results from Cox regression analysis containing independent variable effects on death hazards of the 6415 patients whose medical images showed visible infarctions at randomisation in International Stroke Trial
. | EST (SE) . | -value . | 95%CI . |
---|---|---|---|
Age | 0.0000 | () | |
Age | 0.0000 | () | |
CONSC | 0.0000 | () | |
ASP | 0.6978 | (, 0.1093) | |
HEP | 0.0082 (0.0684) | 0.9045 | (, 0.1422) |
Both | 0.7124 | (, 0.1113) |
. | EST (SE) . | -value . | 95%CI . |
---|---|---|---|
Age | 0.0000 | () | |
Age | 0.0000 | () | |
CONSC | 0.0000 | () | |
ASP | 0.6978 | (, 0.1093) | |
HEP | 0.0082 (0.0684) | 0.9045 | (, 0.1422) |
Both | 0.7124 | (, 0.1113) |
Abbreviations: 95% CI, 95% confidence interval; EST, point estimate; SE, standard error estimate.
We then examined LOS with successful discharge by fitting the model below:
We determined the composition of and primarily based on the objective of the analysis, which is to determine the effects of the four treatments on LOS with successful discharge. We, therefore, included the treatment indicators in to retain the hazard ratio interpretation. Age and levels of consciousness were included in as adjustment factors to the baseline hazard. Interestingly, previous studies suggested that age was a poor predictor of functional recovery from stroke (Kugler et al., 2003) and levels of consciousness correlated with various stroke outcomes but their predictive power was not high (Li et al., 2020). The studies have not tested hypotheses whether age and conscientious state alter the baseline hazards. To test, we included them in as modifiers of the baseline hazards. Analytical results supported the hypothesis that younger patients and those that had been fully alert at the randomisation had greater baseline hazards for a successful discharge (shorter LOS). For the treatment effects, the analysis showed that the average treatment effects of aspirin and heparin corresponded to shorter LOS, but the effects did not reach the level of significance. The findings were generally consistent with the findings of the primary paper, which did not find significant effects for either heparin or aspirin on mortality (Group, 1997). Detailed results are presented in the left panel of Table 5.
Results from the proposed model containing independent variable effects on length of stay among survivors from 6415 International Stroke Trial participants
. | Proposed method . | Scheike and Zhang (2002) . | ||||
---|---|---|---|---|---|---|
. | EST (SE) . | -value . | 95% CI . | EST (SE) . | -value . | 95% CI . |
Age | 0.0010 (0.0002) | 0.0000 | (0.0006, 0.0013) | — | — | — |
Age | 0.0004 (0.0002) | 0.0132 | (0.0001, 0.0007) | — | — | — |
CONSC | 0.0012 (0.0001) | 0.0000 | (0.0009, 0.0015) | — | — | — |
ASP | 0.0993 (0.0534) | 0.0631 | (, 0.2041) | 0.1105 (0.0502) | 0.0276 | (0.0122, 0.2088) |
HEP | 0.0256 (0.0534) | 0.6319 | (, 0.1301) | 0.0282 (0.0360) | 0.4336 | (, 0.0986) |
Both | 0.0099 (0.0540) | 0.8542 | (, 0.1158) | 0.0249 (0.0359) | 0.4890 | (, 0.0953) |
. | Proposed method . | Scheike and Zhang (2002) . | ||||
---|---|---|---|---|---|---|
. | EST (SE) . | -value . | 95% CI . | EST (SE) . | -value . | 95% CI . |
Age | 0.0010 (0.0002) | 0.0000 | (0.0006, 0.0013) | — | — | — |
Age | 0.0004 (0.0002) | 0.0132 | (0.0001, 0.0007) | — | — | — |
CONSC | 0.0012 (0.0001) | 0.0000 | (0.0009, 0.0015) | — | — | — |
ASP | 0.0993 (0.0534) | 0.0631 | (, 0.2041) | 0.1105 (0.0502) | 0.0276 | (0.0122, 0.2088) |
HEP | 0.0256 (0.0534) | 0.6319 | (, 0.1301) | 0.0282 (0.0360) | 0.4336 | (, 0.0986) |
Both | 0.0099 (0.0540) | 0.8542 | (, 0.1158) | 0.0249 (0.0359) | 0.4890 | (, 0.0953) |
Notes: Age groups and conscious status were included in as additive effects, and treatment groups included in as multiplicative effects. The left panel presents the results from the proposed model and the right panel from Scheike and Zhang's (2002) model by treating deaths as noninformative censoring events.
Abbreviations: 95% CI, 95% confidence interval; EST, point estimate; SE, standard error estimate.
Results from the proposed model containing independent variable effects on length of stay among survivors from 6415 International Stroke Trial participants
. | Proposed method . | Scheike and Zhang (2002) . | ||||
---|---|---|---|---|---|---|
. | EST (SE) . | -value . | 95% CI . | EST (SE) . | -value . | 95% CI . |
Age | 0.0010 (0.0002) | 0.0000 | (0.0006, 0.0013) | — | — | — |
Age | 0.0004 (0.0002) | 0.0132 | (0.0001, 0.0007) | — | — | — |
CONSC | 0.0012 (0.0001) | 0.0000 | (0.0009, 0.0015) | — | — | — |
ASP | 0.0993 (0.0534) | 0.0631 | (, 0.2041) | 0.1105 (0.0502) | 0.0276 | (0.0122, 0.2088) |
HEP | 0.0256 (0.0534) | 0.6319 | (, 0.1301) | 0.0282 (0.0360) | 0.4336 | (, 0.0986) |
Both | 0.0099 (0.0540) | 0.8542 | (, 0.1158) | 0.0249 (0.0359) | 0.4890 | (, 0.0953) |
. | Proposed method . | Scheike and Zhang (2002) . | ||||
---|---|---|---|---|---|---|
. | EST (SE) . | -value . | 95% CI . | EST (SE) . | -value . | 95% CI . |
Age | 0.0010 (0.0002) | 0.0000 | (0.0006, 0.0013) | — | — | — |
Age | 0.0004 (0.0002) | 0.0132 | (0.0001, 0.0007) | — | — | — |
CONSC | 0.0012 (0.0001) | 0.0000 | (0.0009, 0.0015) | — | — | — |
ASP | 0.0993 (0.0534) | 0.0631 | (, 0.2041) | 0.1105 (0.0502) | 0.0276 | (0.0122, 0.2088) |
HEP | 0.0256 (0.0534) | 0.6319 | (, 0.1301) | 0.0282 (0.0360) | 0.4336 | (, 0.0986) |
Both | 0.0099 (0.0540) | 0.8542 | (, 0.1158) | 0.0249 (0.0359) | 0.4890 | (, 0.0953) |
Notes: Age groups and conscious status were included in as additive effects, and treatment groups included in as multiplicative effects. The left panel presents the results from the proposed model and the right panel from Scheike and Zhang's (2002) model by treating deaths as noninformative censoring events.
Abbreviations: 95% CI, 95% confidence interval; EST, point estimate; SE, standard error estimate.
For comparison, we also analysed the LOS data by treating death as non-informative censoring by using the model described by Scheike and Zhang (2002):
where and were defined similarly as in model (16). The cumulative coefficient estimates of are presented in Figures S1 and S2, and the estimates of elements in the right panel of Table 5. The analysis showed a significant effect of aspirin in reducing LOS. The finding, however, had a questionable validity because death was treated as non-informative censoring, and the hazards of discharge should be interpreted in the full sample which included people that had died. To characterise the hazards of discharge, it is more justified to use the conditional hazards in model (16) per definition (2). On the other hand, the hazards of discharge could be underestimated among survivors if death is ignored. This may explain why most positive estimates from the right panel are slightly larger than the corresponding estimates on the left.
As an illustration for prediction, we presented the trajectories of survival and successful discharge probabilities for patients with otherwise comparable health conditions and mortality rates as within the IST cohort. For example, for a patient of 69 years of age, drowsy at admission and assigned to aspirin (Patient #1), and another patient aged 63, fully alert at admission and assigned heparin (Patient #2), Figure 2 present their death and successful discharge probabilities within 90 days. It can be seen the two sets of different covariates lead to quite different trajectories, where Patient #2 had overall lower death risks and higher probabilities of successful discharge.
![Predictive trajectories of (a) death probabilities and (b) successful discharge probabilities for observed International Stroke Trial covariates [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/5/10.1111_rssc.12593/1/m_rssc_71_5_1623_fig-2.jpeg?Expires=1747952872&Signature=MvLg1HrYFSTwVj6IzCQ49-~dsAqgHmtUoUnRfWfwO84PirTNn2gD3vSNflOeFq-8ah7SwV27z0vxKzwQFCpXEsdoIHGhNKSlC7KgAEJh3FMgQMOP7~93XozQDKJyYd4eNLkORSdojcmjMWkx78ESele4AY--dKlg~BXNaoA5e5iZ9NP4t6cv4dPOZ5q~l5-4llMsUGAyGRIeyTPVMdpvmEpGVcymyk~aXYCHKtQ7CpNC9OtokawFZcYgvx-LXlivu00OIIjR9j-LO~SbyPuIB244wZ7o4gcHinIO7Jh9Q2mGLdSFheyYC4-H-vbIentHmTBBnVx8CBm1AKpq25knMw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Predictive trajectories of (a) death probabilities and (b) successful discharge probabilities for observed International Stroke Trial covariates [Colour figure can be viewed at https://dbpia.nl.go.kr]
4.2 LOS in patients with COVID
The second application is a study of LOS of patients hospitalised for COVID in a midwestern hospital. In this application, censoring happened when a patient remained in the hospital at the time of data extraction. Among the patients that were not censored, six died with survival times ranging from 0.8 to 30.5 days since admission; 480 survived and were discharged within 20 days. The maximum length of the observational window was 31 days.
In this application, LOS was calculated from the date of admission to discharge or death, whichever came first. No patients were followed up after LOS. Patient information was ascertained at the time of admission, as recorded by an electronic health record system: Age (AGE), smoking status (SMO), and a set of pre-specified medical conditions associated with COVID-related disease severity (CDC, 2020), including dyspnea (DYS), diabetes (DIA), hypertension (HYP), chronic obstructive pulmonary disease (COPD), asthma (AST), cancer (CANC), chronic kidney disease (CKD) and congestive heart failure (CHF). In this analysis, we assessed the influences of these comorbidities on successful discharge with death as the terminal event. For the convenience of interpretation, we characterised AGE by decade (ranged from 0 to 9). Binary indicators were created for all comobidities. Because of the low number of mortality in the data set, we used the following model to screen the potential correlates of death:
Our analysis showed age was the only significant factor associated with increased hazard of death (; ), which is consistent with the existing literature (Weiss & Murdoch, 2020).
We then examined the comorbidity effects on LOS with discharge alive, while adjusting for the influence of AGE as an additive effect in model (3),
The independent variables of primary interest were comorbidity and smoking status. We included them in to retain the hazard ratio interpretation. Age was included in as a modifier to the baseline hazard because we do not believe it would affect the hazard of discharge exponentially. Here we coded age as , representing the age difference from the oldest age category (9 in decades) to ensure (). We then screened the comorbidity and smoking indicators one at a time. Significant variables were included in for the final model (19), which also included age in . We presented the model fitting results in the left panel of Table 6. Briefly, SMO, DIA, CKD, CHF and AGE were significantly associated with longer LOS. Based on the point estimates, a decade increase in age was associated with a reduced baseline hazard for discharge by 8.7%. SMO, DIA, CKD and CHF reduced discharge hazards by 31.3%, 19.5%, 28.2%, 24.0%, respectively. Among patients that were successfully discharged but had LOS greater than the median value (0.15 days), 15.8% were smokers, 35.8% had DIA, 13.3% CKD, 11.7% CHF, and the mean age was 50. A vast majority of the patients (98.3%) were successfully discharged and had LOS in less than 10 days.
Results of the length of stay analysis for patients that tested positive for COVID and were admitted to the hospital
. | The proposed method . | Scheike and Zhang (2002) . | ||||
---|---|---|---|---|---|---|
. | EST (SE) . | -value . | 95% CI . | EST (SE) . | -value . | 95% CI . |
AGE | (0.0182) | 0.0000 | () | — | — | — |
SMO | (0.1716) | 0.0288 | () | (0.1605) | 0.2365 | (, 0.1246) |
DIA | (0.0835) | 0.0094 | () | (0.1229) | 0.2919 | (, 0.1113) |
CKD | (0.1280) | 0.0097 | () | (0.2073) | 0.1051 | (, 0.0704) |
CHF | (0.1104) | 0.0129 | () | (0.2222) | 0.1221 | (, 0.0920) |
. | The proposed method . | Scheike and Zhang (2002) . | ||||
---|---|---|---|---|---|---|
. | EST (SE) . | -value . | 95% CI . | EST (SE) . | -value . | 95% CI . |
AGE | (0.0182) | 0.0000 | () | — | — | — |
SMO | (0.1716) | 0.0288 | () | (0.1605) | 0.2365 | (, 0.1246) |
DIA | (0.0835) | 0.0094 | () | (0.1229) | 0.2919 | (, 0.1113) |
CKD | (0.1280) | 0.0097 | () | (0.2073) | 0.1051 | (, 0.0704) |
CHF | (0.1104) | 0.0129 | () | (0.2222) | 0.1221 | (, 0.0920) |
Notes: The left panel presents results from the proposed model. Significant multiplicative effects included smoking status (SMO), diabetes (DIA), chronic kidney disease (CKD), and congestive heart failure (CHF). AGE was included in as an additive effect. The right panel are result from Scheike and Zhang's (2002) model where deaths are treated as noninformative censoring events.
Abbreviations: 95% CI: 95% confidence interval; EST, point estimate; SE, standard error estimate.
Results of the length of stay analysis for patients that tested positive for COVID and were admitted to the hospital
. | The proposed method . | Scheike and Zhang (2002) . | ||||
---|---|---|---|---|---|---|
. | EST (SE) . | -value . | 95% CI . | EST (SE) . | -value . | 95% CI . |
AGE | (0.0182) | 0.0000 | () | — | — | — |
SMO | (0.1716) | 0.0288 | () | (0.1605) | 0.2365 | (, 0.1246) |
DIA | (0.0835) | 0.0094 | () | (0.1229) | 0.2919 | (, 0.1113) |
CKD | (0.1280) | 0.0097 | () | (0.2073) | 0.1051 | (, 0.0704) |
CHF | (0.1104) | 0.0129 | () | (0.2222) | 0.1221 | (, 0.0920) |
. | The proposed method . | Scheike and Zhang (2002) . | ||||
---|---|---|---|---|---|---|
. | EST (SE) . | -value . | 95% CI . | EST (SE) . | -value . | 95% CI . |
AGE | (0.0182) | 0.0000 | () | — | — | — |
SMO | (0.1716) | 0.0288 | () | (0.1605) | 0.2365 | (, 0.1246) |
DIA | (0.0835) | 0.0094 | () | (0.1229) | 0.2919 | (, 0.1113) |
CKD | (0.1280) | 0.0097 | () | (0.2073) | 0.1051 | (, 0.0704) |
CHF | (0.1104) | 0.0129 | () | (0.2222) | 0.1221 | (, 0.0920) |
Notes: The left panel presents results from the proposed model. Significant multiplicative effects included smoking status (SMO), diabetes (DIA), chronic kidney disease (CKD), and congestive heart failure (CHF). AGE was included in as an additive effect. The right panel are result from Scheike and Zhang's (2002) model where deaths are treated as noninformative censoring events.
Abbreviations: 95% CI: 95% confidence interval; EST, point estimate; SE, standard error estimate.
We also analysed the COVID LOS data by treating death as a form of noninformative censoring by applying the method from Scheike and Zhang (2002) to fit model (17), where and are defined as the same as with (19). The results are presented in Figure S3 for the cumulative coefficient estimates of from AGE, and the right panel of Table 6 for the estimates of elements representing comorbidity and smoking effects. Again, the model fitting results were different from those obtained from the proposed model. In particular, all comorbidities lost their significance. Such comparisons showed that ignoring the terminating events of death and treating them as noninformative censoring could lead to different and possibly questionable conclusions.
The median discharge time is 0.15 days, suggesting a significant portion of the patients were admitted for observation instead of treatment. We visually compared the predictive trajectories of death and discharge probabilities up to 7 days from two hypothetical patients as displayed by Figure 3. Suppose Patients #1 and #2 were both nonsmokers, free of DIA, CKD and CHF at admission, but one was 90 and the other one was 63 years of age. The younger patient is expected to be discharged sooner with a much lower probability of death.
![Predictive trajectories of (a) death probabilities and (b) successful discharge probabilities for observed COVID covariates [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/5/10.1111_rssc.12593/1/m_rssc_71_5_1623_fig-3.jpeg?Expires=1747952872&Signature=vaxZnkUMPGsrZGxYxSzaYivvLiVy0dS0IKWt9eHL78qAYt7n12nAMJR64OeccSWq9pMsdyVmsImfjttrjboA-abfqO3YKlHS-FRcHwmAZjSTR3LVm9bKuBd6aFF0SkuZXFu2-kPyVSlWcmQvtQ9QRi65KKZQluY5UypIymQqYKr693Bb~lQAY5xIaGbQsHdu1rWke1bZfrD9aIyUQIuM44DtDW7Kp3EALyBNLz6Ydv9dN7JUwPjaKM7UQTrbRtZaGcM0o4UNykUWxxj-M0EyRYaYAZs9XBb9eoXsk-jpz8BbYwGRypvCfPu6YOnHhGBcbAFDGpi41kIryUXC8WIJuA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Predictive trajectories of (a) death probabilities and (b) successful discharge probabilities for observed COVID covariates [Colour figure can be viewed at https://dbpia.nl.go.kr]
5 DISCUSSION
LOS is a critically important metric for the health status of hospitalised patients and a widely used indicator for care quality. A major challenge in analysing LOS with successful discharge is the handling of patient death, a terminating event that prevents the full observation of a hospital stay. LOS analyses in the medical literature have often ignored the influence of terminal events, and as a result, provided no interpretation for the factors associated with LOS among survivors. On the other hand, the competing risk or semi-competing risk models have focused on analysing either the first occurrence of the discharge and death events or the transition rates among different multi-states under unverifiable correlation assumptions. In this research, by modelling the survival probabilities and conditional discharge hazards, we present an alternative analysis that is more closely aligned with the practical interest of LOS analysis. The proposed model is flexible enough to accommodate a more general correlation structure instead of operating on assumed dependency. The accommodation of the additive and multiplicative effects has extended the model's flexibility. We show that the regression estimates also can be used for predicting both deaths and discharge probability trajectories. Coupled with asymptotic properties, the proposed method lays the foundation for formal statistical inference.
Methodologically, this research represents a more comprehensive approach for modelling hospital LOS. The model that we presented has a flexible structure that allows for further extensions. For example, one could extend the model to include other functional forms for and , and thus further enhancing its flexibility. An intuitive formulation is to replace and with and , respectively, in Equations (5) and (6), as long as and yield positive hazard values on the time domain. Although this extension will not fundamentally alter the modelling structure, it will provide alternative interpretations for the additive and multiplicative effects. Another extension is to include time-varying effects in the additive or multiplicative components, by using appropriate smoothing techniques.
Through two real clinical studies, we have illustrated the use of the proposed model. We wish to emphasise that the model provides not only a structure for incorporating patient deaths when analysing LOS but also enhances the analysts' ability to account for the effects of different patient characteristics. This said, as with other sophisticated statistical models, increased modelling flexibility typically comes at the expense of model justification. For practical data analysts, a question of foremost importance is what factors should be included in and what factors should be included in . Such decisions ultimately rest with the investigators. How each factor exerts its influence on the care process and what hypotheses the investigators intend to test should be the primary consideration. As we have shown in the example data analysis, one might want to include the variables of primary interest in to retain the usual hazard ratio interpretation. , on the other hand, can be viewed as adjustments to the baseline hazard. When independent variables are of higher dimensions, variable selection may become necessary. Penalisation or regularisation steps prior to employing the proposed inference could help alleviate the challenge (Li et al., 2021). All things considered, we have put forward a new analytical approach for studying the LOS, one that could be used in a broad range of clinical investigations. Computational code for method implementation is available upon request.
DATA AVAILABILITY STATEMENT
The IST data that partly support the findings of this study are available from the International Stroke Trial database published by Edinburgh DataShare: https://datashare.ed.ac.uk/handle/10283/124. The authors do not currently have the permission to publish or share the raw COVID data that were analysed in Section 4.2.
ACKNOWLEDGEMENTS
The authors are grateful to the Editor, the Associate Editor, and two anonymous reviewers for their many insightful comments and suggestions, which have greatly improved this manuscript. The work was partially supported by the National Institutes of Health Grant U24 AA026969 and the Indiana Clinical and Translational Sciences Institute UL1TR002529.
REFERENCES
The following regularity conditions are needed in the proof:
- (C1)
are independent and identically distributed.
- (C2)
, and .
- (C3)
The covariates and are almost surely bounded on .
- (C4)
The quantity is non-singular, and , .
- (C5)
There is a convex compact set containing such that (, ) is the unique solution to the system , where , , .
- (C6)
= is non-singular, where and
The above regularity conditions are frequently used in the failure-time data analysis literature, and they are easily satisfied in many applications. Conditions (C1) and (C2) are similar to those of Andersen and Gill (1982, theorem 4.1) and are the standard conditions for failure-time data. Condition (C3) is about the boundedness and continuity and is commonly used in survival analysis. Conditions (C4)–(C6) are needed to ensure the existence and uniqueness of the proposed estimator. They are also needed to avoid technical distractions in the proofs, which are analogous to Scheike and Zhang (2002). For (C6), if is positive definite, will be the only solution to so that the uniqueness in (C5) holds automatically.
A.1 Proof of Theorem 1
To prove the consistency and asymptotic normality of , from Equations (5) and (6) we have
where . Define
It follows that uniformly in by uniform consistency of to in probability (Fleming & Harrington, 2011) and the uniform strong law of large numbers (Pollard, 1990). The consistency of follows under condition (C6) by theorem 5.9 of Van der Vaart (2000). To show that the asymptotic normality of , we note that by the Taylor series expansion,
where
According to the functional delta method (van der Vaart & Wellner, 1996, theorem 3.9.4, p. 374), we have
Back to (A2),
where
From (A1), (A4) and multivariate central limit theorem Theorem 1 holds.
A.2 Proof of Theorem 2
The uniform consistency of follows using arguments similar to those in Spiekerman and Lin (1998) (Appendix B). To establish asymptotic normality of for , where
denote . Consider
where
and
where
By the multivariate central limit theorem and following similar arguments in Zhao et al. (2013, supplementary materials), Theorem 2 holds.
Author notes
Funding information Indiana Clinical and Translational Sciences Institute, Grant/Award Number: UL1TR002529; National Institutes of Health, Grant/Award Number: U24 AA026969