Abstract

Data collected from wearable devices can shed light on an individual's pattern of behavioral and circadian routine. Phone use can be modeled as alternating processes, between the state of active use and the state of being idle. Markov chains and alternating recurrent event models are commonly used to model state transitions in cases such as these, and the incorporation of random effects can be used to introduce diurnal effects. While state labels can be derived prior to modeling dynamics, this approach omits informative regression covariates that can influence state memberships. We instead propose an alternating recurrent event proportional hazards (PH) regression to model the transitions between latent states. We propose an expectation–maximization algorithm for imputing latent state labels and estimating parameters. We show that our E-step simplifies to the hidden Markov model (HMM) forward–backward algorithm, allowing us to recover an HMM with logistic regression transition probabilities. In addition, we show that PH modeling of discrete-time transitions implicitly penalizes the logistic regression likelihood and results in shrinkage estimators for the relative risk. This new estimator favors an extended stay in a state and is useful for modeling diurnal rhythms. We derive asymptotic distributions for our parameter estimates and compare our approach against competing methods through simulation as well as in a digital phenotyping study that followed smartphone use in a cohort of adolescents with mood disorders.

1 Introduction

Diurnal and circadian rhythm studies often model physiological processes as periodic cycles, such as a person's active and rest cycle. Sleep and diurnal rhythm are essential components of many circadian physiological processes with a clear time-of-day effect on active and rest cycles (Lagona et al., 2014). Here, we consider this problem of estimating an individual's cycles between active and rest states in a mobile health (mHealth) setting based on wearable device or smartphone sensor data. If the true state labels are known, then a Markov chain can be used to model state transitions over time, otherwise a hidden Markov model (HMM) can be used to simultaneously perform classification and state transition estimation (Langrock et al., 2013). In addition, HMMs have been extended to incorporate time-of-day effects as periodicity or seasonality using random effects (Bartolucci & Farcomeni, 2015; Holsclaw et al., 2017; Stoner & Economou, 2020). Continuous-time hidden Markov models (CT-HMM) have also been used for state classification in similar contexts but have difficulty accounting for random effects (Bartolucci & Farcomeni, 2019; Bureau et al., 2003; Jackson et al., 2003). Most mixed effects HMM estimation procedures estimate logistic regression for discrete-time processes and do not account for time between state transitions (Maruotti & Rocci, 2012).

It is important to note that active and rest state transitions are ergodic processes and the sojourn time between these states can be modeled with a proportional hazards (PH) regression in both directions, active-to-rest and rest-to-active. These two directions of transitions can be viewed as an alternating recurrent event PH model (Król & Saint-Pierre, 2015; Shinohara et al., 2018; Wang et al., 2020). However, the ability for alternating recurrent event processes to accurately model sojourn times complements HMMs and provides many useful properties in addition to being computationally scalable. This novel model can be viewed as a latent state analog of an alternating recurrent event process (Król & Saint-Pierre, 2015; Wang et al., 2020). Mainly, if the underlying data-generating process is a continuous-time process, then PH models are a more appropriate modeling choice (Abbott, 1985; Ingram & Kleinman, 1989). If the data-generating process involves discrete-time transitions, we showed that PH modeling penalizes the probability of leaving a state.

We propose an approach that takes advantage of the strengths of both HMMs and alternating recurrent event models to jointly estimate latent states while simultaneously providing flexible modeling of sojourn times. Our expectation-maximization (EM) algorithm imputes latent active and rest state labels while modeling state transitions with an alternating recurrent event process using exponential PH regressions. Informative regression covariates, often omitted in state labeling, are incorporated into the latent state imputation using the EM algorithm. Under the EM algorithm, we show that the E-step in this case simplifies to imputations involving an HMM forward–backward algorithm, where state transition probabilities are defined as logistic or multinomial regression probabilities (Altman, 2007; Baum et al., 1970). We also show that the M-step reduces to fitting independent PH models weighted by E-step imputations allowing for a potentially large number of latent states, as well as, providing a means to obtain large-sample theory inference. Our EM approach involving PH models provides a scaleable M-step while returning multinomial regression transition probabilities commonly found in HMMs (Altman, 2007; Holsclaw et al., 2017; Maruotti & Rocci, 2012). Furthermore, we show that applying PH models to discrete-time transitions, implicitly penalizes logistic regression to shrink the transition probability matrices of HMMs toward the identity matrix and mitigates overfitting in many practical settings. As a result, the PH models favor processes with a low incidence of state transitions such as diurnal cycles, where we expect few state transitions within a 24 h period. This direction of penalization is in line with hidden semi-Markov models (HSMMs) extending the dwell time of staying in a state while providing a means to incorporate random effects (Zucchini et al., 2017).

We apply this approach to estimate active–rest diurnal cycles in a sample of patients with affective disorders using their passively collected smartphone sensor data, namely through the accelerometer, screen on/off data of patient smartphones and time-of-day random intercepts. This quantification of routine can be correlated with a myriad of relevant clinical outcomes, as the regularity of diurnal rhythms plays an important role in psychopathology with past studies having shown associations between irregular rhythms and adverse health outcomes (Monk et al., 1991). In addition, we fit a population level HMM to study effects of individual specific covariates on state transitions.

2 Data and Methods

Our data consist of formula individuals, with each individual i having a sequence formula of covariates to be modeled with separate HMMs. The HMM of the ith individual has formula sequences of active or rest states formula, where hourly time-stamps formula are increasing in j. In our example, we denote active and rest states as formula and formula, respectively, with an outline of our HMM in Figure 1. Within each sequence, we define event times for formula state transitions as formula, which follow from an exponential distribution and can be fitted with a PH regression. Linking multiple exponential event time processes results in a recurrent event model. Furthermore, recurrent exponential PH models are analogous with a non-homogeneous Poisson processes which retains independent increments, allowing us to chain multiple transitions together.

Diagram of hidden Markov model. State-dependent distribution of screen-on counts  follow a Poisson distribution. The previous hour's average acceleration magnitudes  are used to estimate state transitions probabilities . This figure appears in color in the electronic version of this paper, and any mention of color refers to that version.
Figure 1

Diagram of hidden Markov model. State-dependent distribution of screen-on counts formula follow a Poisson distribution. The previous hour's average acceleration magnitudes formula are used to estimate state transitions probabilities formula. This figure appears in color in the electronic version of this paper, and any mention of color refers to that version.

The covariates used in the exponential PH regression are mean acceleration magnitudes (Euclidean norm) from the preceding hour evaluated at formula transitions and are outlined in Figure 1. We denote the intercept and covariates as formula. We make an ergodic state transition assumption, where states will inevitably communicate with each other, that is, the active-to-rest and rest-to-active transitions will eventually occur. This allows the survival function to capture the likelihood contribution of when a state transition did not occur, meaning the transition will occur at some future time. Because we do not have the true state labels formula, we must rely on state-dependent observations. We use screen-on counts for each time-stamp formula as observations from state-dependent distributions formula, where formula, formula are state-specific parameters and we expect formula for the rest state.

2.1 Alternating Recurrent Event Proportional Hazards Model

In our two states setting, rates of transition from state s to the other state are defined as formula. For example, formula denotes the rate of transition from state 1-to-2 (active-to-rest). Alternating recurrent event PH models often need to account for the longitudinal nature of the data, that is, repeated measurements. Mixed effects or frailties can be used to account for the recurrent nature of the data (McGilchrist & Aisbett, 1991; Wang et al., 2020). Modifying the standard exponential PH model with a shared log-normal frailties or normal random intercepts, state transition hazards become formula, where formula. Here, formula are 24 hour-of-day indicators, one-hot vectors designed to toggle the appropriate random intercepts within formula.

An HMM for individual i is given by the complete data likelihood

(1)

where formula are the true state labels. The PH likelihoods for state transitions are given as

where formula and formula are derived from the exponential distribution. Note that formula is the likelihood of an alternating event process (Król & Saint-Pierre, 2015; Wang et al., 2020). We denote indicators for state 1-to-2 transitions formula, as formula and 2-to-1 transitions as formula. We interpret failure to transition out of state 1, formula as censoring, denoted as formula and we similarly define formula. The screen-on count state conditional Poisson likelihood is given as formula where state memberships formula are denoted as indicators formula. Since the true labels are unknown, formula, formula, and formula are latent variables and Equation (1) becomes a mixture model.

2.2 Expectation–Maximization Algorithm for Proportional Hazards Regression and Hidden Markov Model Parameters

The log-likelihood of Equation (1) is linear functions of latent variables formula, formula, and formula, allowing us to use an EM algorithm our optimization (Dempster et al., 1977). Through the EM algorithm, indicators formula, formula, and formula are imputed as continuous probabilities, in the process of obtaining maximum likelihood estimates (MLEs). As a result, the alternating recurrent event exponential PH model reduces to two weighted frailty models. We denote PH model weights as formula, which belong to a four-dimensional probability simplex, that is, values are non-negative and formula. Poisson mixture model (PMM) weights formula belong to a two-dimensional probability simplex. Our EM algorithm iteratively estimates the weights formula and formula using the forward–backward algorithm of Baum et al. (1970) and formula using survival modeling. While the complete data likelihood is written as an alternating event processes, we see an equivalence with logistic regression transition probabilities in the E-step calculations, effectively retooling alternating event processes to fit HMMs to data that have heterogeneous event times transitions.

2.2.1 E-step

In the E-step, we derive the expectation of formula conditional on model parameters and observed data in the ith HMM denoted by formula and formula, as

formula, formula and

are forward and backward probabilities of an HMM. Vectors formula are of initial-state distribution probabilities for the HMM. The transition probability matrix formula, derived by normalizing the alternating recurrent event exponential PH models are

where formula. Under PH modeling of multiple-state transitions, EM updates simplify to multinomial transitions (see Web Appendix A). The weights from formula can more generally be written as formula. The E-step involves imputing formula through an HMM, using a forward–backward algorithm, where transition probabilities are standard logistic functions. We denote forward and backward probabilities vectors formula and formula, where formula and formula are 2 × 1 vectors. The state-dependent distributions are contained in the 2 × 2 diagonal matrix formula. Note that formula, and are used to normalize E-step probabilities. The E-step update for iteration formula, simplifies to calculating probabilities formula as

such that the sum of all elements is equal to one, formula. Operations ⊗ and ⊙ are Kronecker and Hadamard products, respectively. The update for the PMM weights are formula, such that formula.

2.2.2 M-step

The M-step update for formula involves solving the mixture model formula, where solutions are known for most distributions and formula in the Poisson setting. From formula, we have updates formula. The update for formula involves fitting two frailty models for formula which can be accomplished by recognizing formula can be factored into indicators and log-likelihood weights or case weights. The weights can be interpreted as the probability that a specific state transition, for example, if formula, then an active-to-rest transition likely occurred. We duplicate each row of data to be both a transition event and a censored outcome and then weight the rows by formula, and formula, respectively. A data augmentation example is outlined in Web Table 1.

While there are numerous survival packages in R, we need a package that can incorporate weights, parametric PH models and normally distributed random intercepts (Therneau & Lumley, 2015). The R package tramME can be used to fit our weighted exponential parametrization of shared log-normal frailty models (Tamási & Hothorn, 2021). The package tramME uses a transformation model approach combined with an efficient implementation of the Laplace approximation to fit the shared log-normal frailty models (Hothorn et al., 2018; Hothorn, 2020; Kristensen et al., 2016). By updating formula we also update our transition rates, formula and formula which are used to calculate formula.

The M-step updates for the initial-state distribution are formula, such that formula. Finally, we iteratively calculate the E-step: formula, and M-step: formula until convergence to obtain the MLEs. Estimating shared population parameters can be done by taking the product of all individual likelihoods and applying EM.

2.3 Comparison of Relative Risks from Proportional Hazards and Logistic Regression

The estimated relative risk or coefficient estimates from PH and logistic regression have been shown to be similar under a variety of situations (Abbott, 1985; Callas et al., 1998; Ingram & Kleinman, 1989; Thompson Jr, 1977). Many useful properties of PH modeling can be leveraged in Markov chains which can improve the robustness of parameter estimation and reduce computational burden. Though many analyses look at the Cox PH case, we can adapt these approaches to the exponential PH which is a special case of the Cox PH. Following the derivations of Abbott (1985), we have formula and formula with formula, where formula is the baseline hazard from the exponential distribution. When the event times are discrete, formula can be arbitrarily assigned as the event time and formula. Under this setting, formula and the Taylor expansion of the survival function formula at formula results in the power series formula where formula when the incidence rate formula. The respective power series of formula is given as formula where we set formula when the incidence rates are low. Note that in the case of low incidence rates, formula and writing formula as a function of formula or formula, we get formula. As noted in previous studies, PH and logistic regression estimate similar relative risk under a group event time, low incidence rate setting (Abbott, 1985; Ingram & Kleinman, 1989). In many practical settings, discrete-time applications of binary outcome models involve low incidence rates, letting exponential PH regression serve as an alternative to logistic regression. The analogous Markov chain with rare outcomes is a process with an extended stay in the current state, commonly found in diurnal biological processes. However, when event times are heterogeneous, then PH regression is the more appropriate choice for estimating relative risk. As noted in Abbott (1985), in many cases, we observed shrinkage toward zero for the relative risk under PH regression when compared to the logistic regression due to the inequality between the remainder terms formula and formula (Callas et al., 1998; Thompson Jr, 1977).

We further expand on this observed shrinkage by showing that the exponential PH regression is a penalized logistic regression. The exponential canonical form of exponential PH and logistic regression is given as formula, with formula. Here, formula for logistic regression and formula for exponential PH regression under the discrete-time setting. We see that the logistic regression likelihood is uniformly bounded below by the PH likelihood due to formula, which simply follows from the inequality formula for formula. The convex optimization problems of PH and logistic regression are to minimize loss functions: formula and formula. There is a positive convex penalty difference between the logistic and PH optimization problems, formula and formula with the penalty as the difference of formula and formula. It is straightforward to see that formula is convex with hessian formula and Ω as a diagonal matrix with elements formula. Under a simple parameterization, such as the intercept only model, it follows that the penalty formula favors a low incidence rate; this relationship is further discussed in Web Appendix E. The function formula is relatively flat for formula and dominated by formula for formula. However, in practice this will shrink coefficients to the solution of formula which tends to be near formula when considering that each formula is a different linear combination of formula. The penalty formula modifies the convex hull of the logistic regression loss function to induce shrinkage and results in a PH loss function which has many readily available software for estimation.

As noted in our E-step, the conditional expectations of the EM algorithms reduces to transition probability matrices that are comprised of logistic regression probabilities. In the case of an HMM with three or more states, the E-step reduces to transition probabilities comprised of multinomial logistic regression probabilities. However, the M-step conveniently involves fitting several independent exponential PH models. In order to illustrate this point, we define the complete data likelihood of transitioning out of state 1 in a three-state HMM as:

(2)

with formula for formula, formula and formula for formula. The minimum of two or more independent exponential random variables follows an exponential distribution with a new rate equal to the sum of rates, in our case: formula. The likelihood contribution of staying in the same state is the survival function formula. Continuing our example, calculating the E-step for the transition from state 1 to 2 is given as multinomial probability formula, where we leave the derivation details to the Web Appendix A and B. Note that in the three-state HMM, E-step imputations are constrained to a nine-dimensional simplex. The M-step for the three-state HMM parameters from likelihood (2), can be estimated with two independent exponential PH models. This EM procedure can be extended to an arbitrary number of states. The M-step fits independent weighted PH models rather than a cumbersome multinomial regression.

As noted in the previous work, when the event times are heterogenous, PH regression is the correct model for estimating relative risk (Abbott, 1985; Callas et al., 1998; Ingram & Kleinman, 1989; Thompson Jr, 1977). However, PH and logistic regression estimate similar relative risk for the discrete and grouped event time setting with low incidence rates. We showed that under the discrete-time setting, exponential PH regression is an implicitly penalized logistic regression resulting in shrinkage of the relative risk estimates. This is desirable in many situations, specifically in our case where we are building HMMs, a model with a complicated error in the response mechanism. The penalty formula slightly favors low probabilities of transitioning out of a state, which is useful to mitigate false positives. As a result, penalization shrinks the transition probability matrices formula, toward the identity matrix and favors an extended sojourn time. Of note, post-estimation HMM procedures such as the Viterbi algorithm for finding the most likely path also favors lower probabilities of transitioning out of a state (Forney, 1973; Viterbi, 1967). In addition, we also invoke mixed effect modeling which numerically benefits from penalization. Next, fitting multiple independent weighted frailty models in the M-step is a more straight forward procedure than fitting an analogous weighted mixed multinomial logistic regression. The PH model is the correct model for heterogenous event times, but even in the case of discrete-time transitions, PH modeling's implicit penalization has several useful properties for estimating HMMs.

2.4 Competing Methods

We denote the alternating recurrent event exponential PH model outlined in Section 2.2 as PH-HMM. For our competing methods, we use the CT-HMM and discrete-time mixed effect logistic regression HMM (denoted as DT-HMM), described in our Web Appendix F. In addition, we define a two-step estimator by calculating PMM maximum a posteriori (MAP) estimates and fitting PH regression using MAP labels (Web Appendix F). All competing methods are initialized using PMM MAP estimates for state labels and EM is repeated until formula.

3 Asymptotic Results

The EM algorithm converges and obtains the MLEs formula, with the final likelihood evaluation being equivalent to weighted exponential frailty models. As a result, we can generalize existing large-sample theory asymptotic inference for mixed effect models for our weighted setting, using weights derived in Section 2.2.1.

 
Theorem 1.
Regression coefficients formula are asymptotically normally distributed formula, where formula are corresponding formula elements of the inverse observed informations formula. The observed informations are given as formula where formula,

and formula.

4 Simulation Studies

For the alternating event process, we use a 24 h period sine function as our time-varying covariate, formula and independently draw censoring times from a uniform distribution, formula. For the PH models, we sequentially increment formula by formula and simulate covariates formula. We simulate the complete data likelihood with multiple individuals but shared formula, by drawing from:

and formula with formula, formula, and formula. Pseudocode for generating the alternating survival data can be found in Algorithm 1 of Web Appendix G. Similarly, for a discrete-time alternating event process, we simulate transition events as formula and increment formula by formula. Pseudocode for generating the discrete alternating event data can be found in Algorithm 2 of Web Appendix G. Once true state labels formula are obtained, we simulate state-dependent observations from Poisson distributions to complete the simulation of the HMMs.

We simulated 500 replicates using four different sets of parameters and used three methods for generating the data. For each set of parameters, we simulated data using Algorithm 1 with maximum censoring times set to formula and formula to evaluate performance under heterogeneous and grouped event times. We also used Algorithm 2 in order study models fitted to data simulated from a discrete-time process. We have a total of 12 different cases outlined in Table 1. In summary, Cases 1.1–1.3 looked at low incidence rate of state transition and a large distributional difference between formula and formula; Cases 2.1–2.3 looked at high incidence rate and large distributional difference; Cases 3.1–3.3 looked at low incidence rate and small distributional difference; and Cases 4.1–4.3 looked at high incidence rate and small distributional difference. For each case, we fit models using the PMM, DT-HMM, CT-HMM, and PH-HMM approaches. The E-step update can be computed quickly in parallel where each ith HMM is processed independently. We present mean accuracy of recovering the true state labels using the MAP and empirical standard error (SE), over 500 replicates in Table 1. We also present mean parameter estimates, empirical SEs and mean square errors (MSE) in Tables 2 and 3 and coverage results in Web Appendix H.

Table 1

Simulation: case parameters and accuracy for competing methods.

Methods: mean accuracy (SE)
CaseSimulationParametersPMMDT-HMMCT-HMMPH-HMM
1.1formulaformula0.985(0.003)0.990(0.003)0.983(0.003)0.984(0.004)
1.2formulaformula0.985(0.004)0.999(0.001)0.998(0.001)0.998(0.001)
1.3formulaformula0.985(0.003)0.997(0.001)0.997(0.001)0.997(0.001)
2.1formulaformula0.985(0.003)0.994(0.002)0.979(0.005)0.991(0.003)
2.2formulaformula0.984(0.026)0.998(0.001)0.997(0.001)0.998(0.001)
2.3formulaformula0.985(0.003)0.997(0.002)0.997(0.002)0.997(0.002)
3.1formulaformula0.898(0.008)0.926(0.008)0.896(0.010)0.873(0.015)
3.2formulaformula0.894(0.034)0.988(0.004)0.986(0.004)0.982(0.007)
3.3formulaformula0.896(0.016)0.977(0.005)0.977(0.005)0.977(0.005)
4.1formulaformula0.897(0.019)0.964(0.005)0.801(0.022)0.904(0.013)
4.2formulaformula0.897(0.009)0.986(0.004)0.984(0.005)0.978(0.007)
4.3formulaformula0.897(0.013)0.977(0.005)0.975(0.005)0.974(0.005)
Methods: mean accuracy (SE)
CaseSimulationParametersPMMDT-HMMCT-HMMPH-HMM
1.1formulaformula0.985(0.003)0.990(0.003)0.983(0.003)0.984(0.004)
1.2formulaformula0.985(0.004)0.999(0.001)0.998(0.001)0.998(0.001)
1.3formulaformula0.985(0.003)0.997(0.001)0.997(0.001)0.997(0.001)
2.1formulaformula0.985(0.003)0.994(0.002)0.979(0.005)0.991(0.003)
2.2formulaformula0.984(0.026)0.998(0.001)0.997(0.001)0.998(0.001)
2.3formulaformula0.985(0.003)0.997(0.002)0.997(0.002)0.997(0.002)
3.1formulaformula0.898(0.008)0.926(0.008)0.896(0.010)0.873(0.015)
3.2formulaformula0.894(0.034)0.988(0.004)0.986(0.004)0.982(0.007)
3.3formulaformula0.896(0.016)0.977(0.005)0.977(0.005)0.977(0.005)
4.1formulaformula0.897(0.019)0.964(0.005)0.801(0.022)0.904(0.013)
4.2formulaformula0.897(0.009)0.986(0.004)0.984(0.005)0.978(0.007)
4.3formulaformula0.897(0.013)0.977(0.005)0.975(0.005)0.974(0.005)

Note: Mean accuracy and empirical standard errors based on 500 replicates. Each replicate had formula individuals with formula state transitions. The survival models were simulate with Algorithm 1 and logistic models were simulate with Algorithm 2 from the Web Appendix. Cases 1.1–1.3 looked at low incidence rate and large distributional difference; Cases 2.1–2.3 looked at high incidence rate and large distributional difference; Cases 3.1–3.3 looked at low incidence rate and small distributional difference; and Cases 4.1–4.3 looked at high incidence rate and small distributional difference.

Table 1

Simulation: case parameters and accuracy for competing methods.

Methods: mean accuracy (SE)
CaseSimulationParametersPMMDT-HMMCT-HMMPH-HMM
1.1formulaformula0.985(0.003)0.990(0.003)0.983(0.003)0.984(0.004)
1.2formulaformula0.985(0.004)0.999(0.001)0.998(0.001)0.998(0.001)
1.3formulaformula0.985(0.003)0.997(0.001)0.997(0.001)0.997(0.001)
2.1formulaformula0.985(0.003)0.994(0.002)0.979(0.005)0.991(0.003)
2.2formulaformula0.984(0.026)0.998(0.001)0.997(0.001)0.998(0.001)
2.3formulaformula0.985(0.003)0.997(0.002)0.997(0.002)0.997(0.002)
3.1formulaformula0.898(0.008)0.926(0.008)0.896(0.010)0.873(0.015)
3.2formulaformula0.894(0.034)0.988(0.004)0.986(0.004)0.982(0.007)
3.3formulaformula0.896(0.016)0.977(0.005)0.977(0.005)0.977(0.005)
4.1formulaformula0.897(0.019)0.964(0.005)0.801(0.022)0.904(0.013)
4.2formulaformula0.897(0.009)0.986(0.004)0.984(0.005)0.978(0.007)
4.3formulaformula0.897(0.013)0.977(0.005)0.975(0.005)0.974(0.005)
Methods: mean accuracy (SE)
CaseSimulationParametersPMMDT-HMMCT-HMMPH-HMM
1.1formulaformula0.985(0.003)0.990(0.003)0.983(0.003)0.984(0.004)
1.2formulaformula0.985(0.004)0.999(0.001)0.998(0.001)0.998(0.001)
1.3formulaformula0.985(0.003)0.997(0.001)0.997(0.001)0.997(0.001)
2.1formulaformula0.985(0.003)0.994(0.002)0.979(0.005)0.991(0.003)
2.2formulaformula0.984(0.026)0.998(0.001)0.997(0.001)0.998(0.001)
2.3formulaformula0.985(0.003)0.997(0.002)0.997(0.002)0.997(0.002)
3.1formulaformula0.898(0.008)0.926(0.008)0.896(0.010)0.873(0.015)
3.2formulaformula0.894(0.034)0.988(0.004)0.986(0.004)0.982(0.007)
3.3formulaformula0.896(0.016)0.977(0.005)0.977(0.005)0.977(0.005)
4.1formulaformula0.897(0.019)0.964(0.005)0.801(0.022)0.904(0.013)
4.2formulaformula0.897(0.009)0.986(0.004)0.984(0.005)0.978(0.007)
4.3formulaformula0.897(0.013)0.977(0.005)0.975(0.005)0.974(0.005)

Note: Mean accuracy and empirical standard errors based on 500 replicates. Each replicate had formula individuals with formula state transitions. The survival models were simulate with Algorithm 1 and logistic models were simulate with Algorithm 2 from the Web Appendix. Cases 1.1–1.3 looked at low incidence rate and large distributional difference; Cases 2.1–2.3 looked at high incidence rate and large distributional difference; Cases 3.1–3.3 looked at low incidence rate and small distributional difference; and Cases 4.1–4.3 looked at high incidence rate and small distributional difference.

Table 2

Simulation: estimates, standard errors and mean square errors for μ1 and β1.

CaseModelμ1Est.SEMSEβ10Est.SEMSEβ11Est.SEMSE
1.1PMM1010.0000.1340.018−3−2.9130.0930.016−1−0.9190.1260.022
survivalDT-HMM109.9960.1310.017−3−1.2900.1072.935−1−1.0870.1610.034
formulaCT-HMM109.9020.1380.029−3−3.0720.0980.015−1−1.0280.1410.021
PH-HMM109.8840.1400.033−3−3.1380.1040.030−1−1.0710.1490.027
1.2PMM1010.0090.1350.018−3−2.1090.2500.856−1−1.1620.5170.293
survivalDT-HMM1010.0040.1290.017−3−3.7040.3960.651−1−1.0860.5540.313
formulaCT-HMM109.9990.1300.017−3−3.0580.3950.159−1−1.0790.5500.309
PH-HMM1010.0100.1280.017−3−2.9520.3660.136−1−1.0540.5410.295
1.3PMM109.9970.1280.016−3−2.6090.1780.184−1−0.7720.2410.110
logisticDT-HMM109.9960.1220.015−3−3.0070.2180.048−1−1.0290.3200.103
CT-HMM109.9960.1220.015−3−3.0720.2080.048−1−0.9700.2980.089
PH-HMM109.9960.1220.015−3−3.0720.2080.048−1−0.9700.2980.089
2.1PMM109.9950.1280.016−2−1.9910.1270.016−5−3.7570.4751.769
survivalDT-HMM109.9920.1260.016−2−0.1340.1703.509−5−5.6160.5610.693
formulaCT-HMM109.8350.1450.048−2−2.2500.1040.073−5−3.3480.3642.860
PH-HMM109.9600.1280.018−2−2.1230.1070.027−5−4.3560.3600.544
2.2PMM109.9730.3220.104−2−1.5120.2710.312−5−2.7850.4815.137
survivalDT-HMM1010.0000.1270.016−2−2.6630.4520.644−5−5.6081.5092.643
formulaCT-HMM109.9890.1290.017−2−2.1200.4020.176−5−5.4251.0831.350
PH-HMM1010.0060.1270.016−2−1.9240.3410.122−5−4.8020.9110.868
2.3PMM109.9940.1320.017−2−2.0140.1480.022−5−2.6280.2485.689
logisticDT-HMM109.9900.1290.017−2−2.0040.2350.055−5−5.1510.7360.564
CT-HMM109.9880.1290.017−2−2.3390.1630.141−5−3.4340.3142.552
PH-HMM109.9900.1290.017−2−2.3270.1620.133−5−3.3920.3092.680
3.1PMM55.0240.1200.015−3−2.5140.0700.241−1−0.5710.0990.194
survivalDT-HMM55.0280.1070.012−3−1.3090.1482.880−1−1.1040.2130.056
formulaCT-HMM54.8260.1200.044−3−3.4300.1460.206−1−1.1010.1930.047
PHformulaHMM54.6410.1450.150−3−4.1220.2511.322−1−1.4210.3170.277
3.2PMM54.9940.2430.059−3−0.6890.1945.378−1−0.7490.2100.107
survivalDTformulaHMM55.0120.0940.009−3−3.6690.4820.679−1−1.1350.6180.400
formulaCTformulaHMM54.9990.0950.009−3−3.1610.4800.256−1−1.1330.6120.392
PH−HMM55.0670.1000.015−3−2.3000.3960.647−1−0.8950.6760.467
3.3PMM55.0010.1570.024−3−1.4540.1132.404−1−0.3820.1200.396
logisticDTformulaHMM55.0040.0930.009−3−3.0110.2530.064−1−1.0540.3320.113
CTformulaHMM55.0010.0930.009−3−3.1000.2390.067−1−1.0050.3060.094
PHformulaHMM55.0010.0930.009−3−3.0990.2390.067−1−1.0000.3060.093
4.1PMM55.0020.1530.023−2−1.8580.0960.029−5−2.0780.1858.572
survivalDTformulaHMM55.0040.0900.008−2−0.1350.2213.525−5−5.8210.9091.500
formulaCT−HMM54.2950.2070.539−2−3.2190.2281.539−5−1.7100.13610.845
PH-HMM54.8500.1290.039−2−2.9520.1350.924−5−2.7980.2264.900
4.2PMM55.0030.1350.018−2−0.4870.1112.302−5−1.2340.13714.205
survivalDT-HMM55.0060.0940.009−2−2.5320.4480.484−5−5.5971.8403.737
formulaCT-HMM54.9860.0950.009−2−2.2680.4880.310−5−5.0401.1291.273
PH-HMM55.0800.1030.017−2−1.2430.2360.628−5−2.4930.4286.470
4.3PMM55.0020.1620.026−2−1.1570.0810.716−5−1.2000.10014.451
logisticDT-HMM55.0080.0940.009−2−1.9670.3310.111−5−5.1951.0911.225
CT-HMM55.0070.0930.009−2−2.3890.1850.185−5−3.2460.3033.167
PH-HMM55.0160.0930.009−2−2.3280.1770.139−5−3.0100.2834.038
CaseModelμ1Est.SEMSEβ10Est.SEMSEβ11Est.SEMSE
1.1PMM1010.0000.1340.018−3−2.9130.0930.016−1−0.9190.1260.022
survivalDT-HMM109.9960.1310.017−3−1.2900.1072.935−1−1.0870.1610.034
formulaCT-HMM109.9020.1380.029−3−3.0720.0980.015−1−1.0280.1410.021
PH-HMM109.8840.1400.033−3−3.1380.1040.030−1−1.0710.1490.027
1.2PMM1010.0090.1350.018−3−2.1090.2500.856−1−1.1620.5170.293
survivalDT-HMM1010.0040.1290.017−3−3.7040.3960.651−1−1.0860.5540.313
formulaCT-HMM109.9990.1300.017−3−3.0580.3950.159−1−1.0790.5500.309
PH-HMM1010.0100.1280.017−3−2.9520.3660.136−1−1.0540.5410.295
1.3PMM109.9970.1280.016−3−2.6090.1780.184−1−0.7720.2410.110
logisticDT-HMM109.9960.1220.015−3−3.0070.2180.048−1−1.0290.3200.103
CT-HMM109.9960.1220.015−3−3.0720.2080.048−1−0.9700.2980.089
PH-HMM109.9960.1220.015−3−3.0720.2080.048−1−0.9700.2980.089
2.1PMM109.9950.1280.016−2−1.9910.1270.016−5−3.7570.4751.769
survivalDT-HMM109.9920.1260.016−2−0.1340.1703.509−5−5.6160.5610.693
formulaCT-HMM109.8350.1450.048−2−2.2500.1040.073−5−3.3480.3642.860
PH-HMM109.9600.1280.018−2−2.1230.1070.027−5−4.3560.3600.544
2.2PMM109.9730.3220.104−2−1.5120.2710.312−5−2.7850.4815.137
survivalDT-HMM1010.0000.1270.016−2−2.6630.4520.644−5−5.6081.5092.643
formulaCT-HMM109.9890.1290.017−2−2.1200.4020.176−5−5.4251.0831.350
PH-HMM1010.0060.1270.016−2−1.9240.3410.122−5−4.8020.9110.868
2.3PMM109.9940.1320.017−2−2.0140.1480.022−5−2.6280.2485.689
logisticDT-HMM109.9900.1290.017−2−2.0040.2350.055−5−5.1510.7360.564
CT-HMM109.9880.1290.017−2−2.3390.1630.141−5−3.4340.3142.552
PH-HMM109.9900.1290.017−2−2.3270.1620.133−5−3.3920.3092.680
3.1PMM55.0240.1200.015−3−2.5140.0700.241−1−0.5710.0990.194
survivalDT-HMM55.0280.1070.012−3−1.3090.1482.880−1−1.1040.2130.056
formulaCT-HMM54.8260.1200.044−3−3.4300.1460.206−1−1.1010.1930.047
PHformulaHMM54.6410.1450.150−3−4.1220.2511.322−1−1.4210.3170.277
3.2PMM54.9940.2430.059−3−0.6890.1945.378−1−0.7490.2100.107
survivalDTformulaHMM55.0120.0940.009−3−3.6690.4820.679−1−1.1350.6180.400
formulaCTformulaHMM54.9990.0950.009−3−3.1610.4800.256−1−1.1330.6120.392
PH−HMM55.0670.1000.015−3−2.3000.3960.647−1−0.8950.6760.467
3.3PMM55.0010.1570.024−3−1.4540.1132.404−1−0.3820.1200.396
logisticDTformulaHMM55.0040.0930.009−3−3.0110.2530.064−1−1.0540.3320.113
CTformulaHMM55.0010.0930.009−3−3.1000.2390.067−1−1.0050.3060.094
PHformulaHMM55.0010.0930.009−3−3.0990.2390.067−1−1.0000.3060.093
4.1PMM55.0020.1530.023−2−1.8580.0960.029−5−2.0780.1858.572
survivalDTformulaHMM55.0040.0900.008−2−0.1350.2213.525−5−5.8210.9091.500
formulaCT−HMM54.2950.2070.539−2−3.2190.2281.539−5−1.7100.13610.845
PH-HMM54.8500.1290.039−2−2.9520.1350.924−5−2.7980.2264.900
4.2PMM55.0030.1350.018−2−0.4870.1112.302−5−1.2340.13714.205
survivalDT-HMM55.0060.0940.009−2−2.5320.4480.484−5−5.5971.8403.737
formulaCT-HMM54.9860.0950.009−2−2.2680.4880.310−5−5.0401.1291.273
PH-HMM55.0800.1030.017−2−1.2430.2360.628−5−2.4930.4286.470
4.3PMM55.0020.1620.026−2−1.1570.0810.716−5−1.2000.10014.451
logisticDT-HMM55.0080.0940.009−2−1.9670.3310.111−5−5.1951.0911.225
CT-HMM55.0070.0930.009−2−2.3890.1850.185−5−3.2460.3033.167
PH-HMM55.0160.0930.009−2−2.3280.1770.139−5−3.0100.2834.038

Note: Mean parameter estimates (Est.), empirical standard error (SE) and mean squared error (MSE) based on 500 replicates. Cases 1.1–1.3 looked at low incidence rate and large distributional difference; Cases 2.1–2.3 looked at high incidence rate and large distributional difference; Cases 3.1–3.3 looked at low incidence rate and small distributional difference; and Cases 4.1–4.3 looked at high incidence rate and small distributional difference.

Table 2

Simulation: estimates, standard errors and mean square errors for μ1 and β1.

CaseModelμ1Est.SEMSEβ10Est.SEMSEβ11Est.SEMSE
1.1PMM1010.0000.1340.018−3−2.9130.0930.016−1−0.9190.1260.022
survivalDT-HMM109.9960.1310.017−3−1.2900.1072.935−1−1.0870.1610.034
formulaCT-HMM109.9020.1380.029−3−3.0720.0980.015−1−1.0280.1410.021
PH-HMM109.8840.1400.033−3−3.1380.1040.030−1−1.0710.1490.027
1.2PMM1010.0090.1350.018−3−2.1090.2500.856−1−1.1620.5170.293
survivalDT-HMM1010.0040.1290.017−3−3.7040.3960.651−1−1.0860.5540.313
formulaCT-HMM109.9990.1300.017−3−3.0580.3950.159−1−1.0790.5500.309
PH-HMM1010.0100.1280.017−3−2.9520.3660.136−1−1.0540.5410.295
1.3PMM109.9970.1280.016−3−2.6090.1780.184−1−0.7720.2410.110
logisticDT-HMM109.9960.1220.015−3−3.0070.2180.048−1−1.0290.3200.103
CT-HMM109.9960.1220.015−3−3.0720.2080.048−1−0.9700.2980.089
PH-HMM109.9960.1220.015−3−3.0720.2080.048−1−0.9700.2980.089
2.1PMM109.9950.1280.016−2−1.9910.1270.016−5−3.7570.4751.769
survivalDT-HMM109.9920.1260.016−2−0.1340.1703.509−5−5.6160.5610.693
formulaCT-HMM109.8350.1450.048−2−2.2500.1040.073−5−3.3480.3642.860
PH-HMM109.9600.1280.018−2−2.1230.1070.027−5−4.3560.3600.544
2.2PMM109.9730.3220.104−2−1.5120.2710.312−5−2.7850.4815.137
survivalDT-HMM1010.0000.1270.016−2−2.6630.4520.644−5−5.6081.5092.643
formulaCT-HMM109.9890.1290.017−2−2.1200.4020.176−5−5.4251.0831.350
PH-HMM1010.0060.1270.016−2−1.9240.3410.122−5−4.8020.9110.868
2.3PMM109.9940.1320.017−2−2.0140.1480.022−5−2.6280.2485.689
logisticDT-HMM109.9900.1290.017−2−2.0040.2350.055−5−5.1510.7360.564
CT-HMM109.9880.1290.017−2−2.3390.1630.141−5−3.4340.3142.552
PH-HMM109.9900.1290.017−2−2.3270.1620.133−5−3.3920.3092.680
3.1PMM55.0240.1200.015−3−2.5140.0700.241−1−0.5710.0990.194
survivalDT-HMM55.0280.1070.012−3−1.3090.1482.880−1−1.1040.2130.056
formulaCT-HMM54.8260.1200.044−3−3.4300.1460.206−1−1.1010.1930.047
PHformulaHMM54.6410.1450.150−3−4.1220.2511.322−1−1.4210.3170.277
3.2PMM54.9940.2430.059−3−0.6890.1945.378−1−0.7490.2100.107
survivalDTformulaHMM55.0120.0940.009−3−3.6690.4820.679−1−1.1350.6180.400
formulaCTformulaHMM54.9990.0950.009−3−3.1610.4800.256−1−1.1330.6120.392
PH−HMM55.0670.1000.015−3−2.3000.3960.647−1−0.8950.6760.467
3.3PMM55.0010.1570.024−3−1.4540.1132.404−1−0.3820.1200.396
logisticDTformulaHMM55.0040.0930.009−3−3.0110.2530.064−1−1.0540.3320.113
CTformulaHMM55.0010.0930.009−3−3.1000.2390.067−1−1.0050.3060.094
PHformulaHMM55.0010.0930.009−3−3.0990.2390.067−1−1.0000.3060.093
4.1PMM55.0020.1530.023−2−1.8580.0960.029−5−2.0780.1858.572
survivalDTformulaHMM55.0040.0900.008−2−0.1350.2213.525−5−5.8210.9091.500
formulaCT−HMM54.2950.2070.539−2−3.2190.2281.539−5−1.7100.13610.845
PH-HMM54.8500.1290.039−2−2.9520.1350.924−5−2.7980.2264.900
4.2PMM55.0030.1350.018−2−0.4870.1112.302−5−1.2340.13714.205
survivalDT-HMM55.0060.0940.009−2−2.5320.4480.484−5−5.5971.8403.737
formulaCT-HMM54.9860.0950.009−2−2.2680.4880.310−5−5.0401.1291.273
PH-HMM55.0800.1030.017−2−1.2430.2360.628−5−2.4930.4286.470
4.3PMM55.0020.1620.026−2−1.1570.0810.716−5−1.2000.10014.451
logisticDT-HMM55.0080.0940.009−2−1.9670.3310.111−5−5.1951.0911.225
CT-HMM55.0070.0930.009−2−2.3890.1850.185−5−3.2460.3033.167
PH-HMM55.0160.0930.009−2−2.3280.1770.139−5−3.0100.2834.038
CaseModelμ1Est.SEMSEβ10Est.SEMSEβ11Est.SEMSE
1.1PMM1010.0000.1340.018−3−2.9130.0930.016−1−0.9190.1260.022
survivalDT-HMM109.9960.1310.017−3−1.2900.1072.935−1−1.0870.1610.034
formulaCT-HMM109.9020.1380.029−3−3.0720.0980.015−1−1.0280.1410.021
PH-HMM109.8840.1400.033−3−3.1380.1040.030−1−1.0710.1490.027
1.2PMM1010.0090.1350.018−3−2.1090.2500.856−1−1.1620.5170.293
survivalDT-HMM1010.0040.1290.017−3−3.7040.3960.651−1−1.0860.5540.313
formulaCT-HMM109.9990.1300.017−3−3.0580.3950.159−1−1.0790.5500.309
PH-HMM1010.0100.1280.017−3−2.9520.3660.136−1−1.0540.5410.295
1.3PMM109.9970.1280.016−3−2.6090.1780.184−1−0.7720.2410.110
logisticDT-HMM109.9960.1220.015−3−3.0070.2180.048−1−1.0290.3200.103
CT-HMM109.9960.1220.015−3−3.0720.2080.048−1−0.9700.2980.089
PH-HMM109.9960.1220.015−3−3.0720.2080.048−1−0.9700.2980.089
2.1PMM109.9950.1280.016−2−1.9910.1270.016−5−3.7570.4751.769
survivalDT-HMM109.9920.1260.016−2−0.1340.1703.509−5−5.6160.5610.693
formulaCT-HMM109.8350.1450.048−2−2.2500.1040.073−5−3.3480.3642.860
PH-HMM109.9600.1280.018−2−2.1230.1070.027−5−4.3560.3600.544
2.2PMM109.9730.3220.104−2−1.5120.2710.312−5−2.7850.4815.137
survivalDT-HMM1010.0000.1270.016−2−2.6630.4520.644−5−5.6081.5092.643
formulaCT-HMM109.9890.1290.017−2−2.1200.4020.176−5−5.4251.0831.350
PH-HMM1010.0060.1270.016−2−1.9240.3410.122−5−4.8020.9110.868
2.3PMM109.9940.1320.017−2−2.0140.1480.022−5−2.6280.2485.689
logisticDT-HMM109.9900.1290.017−2−2.0040.2350.055−5−5.1510.7360.564
CT-HMM109.9880.1290.017−2−2.3390.1630.141−5−3.4340.3142.552
PH-HMM109.9900.1290.017−2−2.3270.1620.133−5−3.3920.3092.680
3.1PMM55.0240.1200.015−3−2.5140.0700.241−1−0.5710.0990.194
survivalDT-HMM55.0280.1070.012−3−1.3090.1482.880−1−1.1040.2130.056
formulaCT-HMM54.8260.1200.044−3−3.4300.1460.206−1−1.1010.1930.047
PHformulaHMM54.6410.1450.150−3−4.1220.2511.322−1−1.4210.3170.277
3.2PMM54.9940.2430.059−3−0.6890.1945.378−1−0.7490.2100.107
survivalDTformulaHMM55.0120.0940.009−3−3.6690.4820.679−1−1.1350.6180.400
formulaCTformulaHMM54.9990.0950.009−3−3.1610.4800.256−1−1.1330.6120.392
PH−HMM55.0670.1000.015−3−2.3000.3960.647−1−0.8950.6760.467
3.3PMM55.0010.1570.024−3−1.4540.1132.404−1−0.3820.1200.396
logisticDTformulaHMM55.0040.0930.009−3−3.0110.2530.064−1−1.0540.3320.113
CTformulaHMM55.0010.0930.009−3−3.1000.2390.067−1−1.0050.3060.094
PHformulaHMM55.0010.0930.009−3−3.0990.2390.067−1−1.0000.3060.093
4.1PMM55.0020.1530.023−2−1.8580.0960.029−5−2.0780.1858.572
survivalDTformulaHMM55.0040.0900.008−2−0.1350.2213.525−5−5.8210.9091.500
formulaCT−HMM54.2950.2070.539−2−3.2190.2281.539−5−1.7100.13610.845
PH-HMM54.8500.1290.039−2−2.9520.1350.924−5−2.7980.2264.900
4.2PMM55.0030.1350.018−2−0.4870.1112.302−5−1.2340.13714.205
survivalDT-HMM55.0060.0940.009−2−2.5320.4480.484−5−5.5971.8403.737
formulaCT-HMM54.9860.0950.009−2−2.2680.4880.310−5−5.0401.1291.273
PH-HMM55.0800.1030.017−2−1.2430.2360.628−5−2.4930.4286.470
4.3PMM55.0020.1620.026−2−1.1570.0810.716−5−1.2000.10014.451
logisticDT-HMM55.0080.0940.009−2−1.9670.3310.111−5−5.1951.0911.225
CT-HMM55.0070.0930.009−2−2.3890.1850.185−5−3.2460.3033.167
PH-HMM55.0160.0930.009−2−2.3280.1770.139−5−3.0100.2834.038

Note: Mean parameter estimates (Est.), empirical standard error (SE) and mean squared error (MSE) based on 500 replicates. Cases 1.1–1.3 looked at low incidence rate and large distributional difference; Cases 2.1–2.3 looked at high incidence rate and large distributional difference; Cases 3.1–3.3 looked at low incidence rate and small distributional difference; and Cases 4.1–4.3 looked at high incidence rate and small distributional difference.

Table 3

Simulation: estimates, standard errors and mean square errors for μ2 and β2.

CaseModelμ2Est.SEMSEβ20Est.SEMSEβ21Est.SEMSE
1.1PMM11.0010.0450.002−3−2.9160.0920.01610.9180.1230.022
survivalDT-HMM10.9980.0430.002−3−1.3040.1092.88711.0950.1570.034
formulaCT-HMM11.0000.0450.002−3−3.0690.0980.01411.0320.1380.020
PH-HMM11.0030.0450.002−3−3.1300.1020.02711.0810.1430.027
1.2PMM11.0030.0460.002−3−2.1170.2530.84411.1390.4900.259
survivalDT-HMM11.0000.0410.002−3−3.7260.3640.66011.0510.6030.366
formulaCT-HMM11.0010.0410.002−3−3.0750.3560.13211.0220.5920.350
PH-HMM11.0000.0410.002−3−2.9760.3470.12111.0850.5950.361
1.3PMM11.0000.0430.002−3−2.5990.1720.19010.7340.2270.122
logisticDT-HMM10.9980.0390.002−3−3.0040.2030.04110.9990.3020.091
CT-HMM10.9980.0390.002−3−3.0680.1930.04210.9410.2810.082
PH-HMM10.9980.0390.002−3−3.0680.1930.04210.9410.2810.082
2.1PMM11.0000.0430.002−2−1.9540.1250.01853.9470.5011.359
survivalDT-HMM10.9980.0410.002−2−0.1480.1733.46055.6050.5550.673
formulaCT−HMM10.9980.0490.002−2−2.1290.1000.02753.8260.4301.563
PH-HMM10.9990.0420.002−2−2.1100.1070.02454.3580.3980.570
2.2PMM11.0310.4050.165−2−1.5571.2421.73752.7781.0926.124
survivalDT-HMM10.9990.0360.001−2−2.6700.3550.57455.5911.2621.940
formulaCT-HMM11.0010.0370.001−2−2.0910.3330.11955.1511.0061.033
PH-HMM10.9990.0370.001−2−1.9800.2980.08954.6800.8050.750
2.3PMM10.9990.0420.002−2−1.9420.1360.02252.5170.2306.219
logisticDT-HMM10.9950.0380.002−2−2.0200.2180.04855.1690.6710.478
CT-HMM10.9950.0380.002−2−2.3510.1530.14753.4430.2992.513
PH-HMM10.9950.0390.002−2−2.3460.1520.14353.4060.2882.623
3.1PMM11.0080.0680.005−3−2.6120.0730.15610.6220.1010.153
survivalDT-HMM10.9800.0540.003−3−1.2960.1432.92411.1180.2130.059
formulaCT-HMM11.0070.0620.004−3−3.3390.1400.13511.1260.1970.055
PH-HMM11.0580.0800.010−3−4.0100.2821.09911.6020.3660.495
3.2PMM11.0330.2450.061−3−0.8540.3564.73111.0230.2100.045
survivalDT-HMM10.9880.0410.002−3−3.7490.4690.78111.2180.7330.584
formulaCT-HMM10.9920.0410.002−3−3.2670.7120.57711.1470.8640.766
PH-HMM10.9780.0420.002−3−2.3580.4110.58111.4490.6550.630
3.3PMM11.0100.1140.013−3−1.5700.1192.05910.5100.1290.257
logisticDT-HMM10.9860.0440.002−3−3.0800.2540.07111.1140.3490.135
CT-HMM10.9870.0440.002−3−3.1670.2410.08611.0590.3250.109
PH-HMM10.9870.0440.002−3−3.1660.2400.08511.0530.3240.107
4.1PMM11.0080.1110.012−2−2.1160.1760.04452.0380.1498.793
survivalDT-HMM10.9940.0430.002−2−0.1320.2203.53955.7720.9311.461
formulaCT-HMM10.9010.1120.022−2−2.2560.1700.09451.5950.16611.618
PH−HMM10.9890.0650.004−2−2.7980.1430.65852.9440.2404.286
4.2PMM11.0090.0710.005−2−0.6750.1231.77251.4820.15512.401
survivalDT-HMM10.9900.0420.002−2−2.6360.4820.63656.1202.2496.304
formulaCT-HMM10.9940.0430.002−2−2.2660.3930.22554.9001.0771.168
PH-HMM10.9780.0440.002−2−1.4510.2440.36152.7330.3475.260
4.3PMM11.0110.1110.012−2−1.3420.0850.44051.4370.11412.706
logisticDT-HMM10.9920.0420.002−2−2.0500.2970.09155.2760.9861.046
CT-HMM10.9890.0420.002−2−2.4540.1780.23753.2630.3003.106
PH-HMM10.9850.0430.002−2−2.4120.1710.19953.0410.2673.907
CaseModelμ2Est.SEMSEβ20Est.SEMSEβ21Est.SEMSE
1.1PMM11.0010.0450.002−3−2.9160.0920.01610.9180.1230.022
survivalDT-HMM10.9980.0430.002−3−1.3040.1092.88711.0950.1570.034
formulaCT-HMM11.0000.0450.002−3−3.0690.0980.01411.0320.1380.020
PH-HMM11.0030.0450.002−3−3.1300.1020.02711.0810.1430.027
1.2PMM11.0030.0460.002−3−2.1170.2530.84411.1390.4900.259
survivalDT-HMM11.0000.0410.002−3−3.7260.3640.66011.0510.6030.366
formulaCT-HMM11.0010.0410.002−3−3.0750.3560.13211.0220.5920.350
PH-HMM11.0000.0410.002−3−2.9760.3470.12111.0850.5950.361
1.3PMM11.0000.0430.002−3−2.5990.1720.19010.7340.2270.122
logisticDT-HMM10.9980.0390.002−3−3.0040.2030.04110.9990.3020.091
CT-HMM10.9980.0390.002−3−3.0680.1930.04210.9410.2810.082
PH-HMM10.9980.0390.002−3−3.0680.1930.04210.9410.2810.082
2.1PMM11.0000.0430.002−2−1.9540.1250.01853.9470.5011.359
survivalDT-HMM10.9980.0410.002−2−0.1480.1733.46055.6050.5550.673
formulaCT−HMM10.9980.0490.002−2−2.1290.1000.02753.8260.4301.563
PH-HMM10.9990.0420.002−2−2.1100.1070.02454.3580.3980.570
2.2PMM11.0310.4050.165−2−1.5571.2421.73752.7781.0926.124
survivalDT-HMM10.9990.0360.001−2−2.6700.3550.57455.5911.2621.940
formulaCT-HMM11.0010.0370.001−2−2.0910.3330.11955.1511.0061.033
PH-HMM10.9990.0370.001−2−1.9800.2980.08954.6800.8050.750
2.3PMM10.9990.0420.002−2−1.9420.1360.02252.5170.2306.219
logisticDT-HMM10.9950.0380.002−2−2.0200.2180.04855.1690.6710.478
CT-HMM10.9950.0380.002−2−2.3510.1530.14753.4430.2992.513
PH-HMM10.9950.0390.002−2−2.3460.1520.14353.4060.2882.623
3.1PMM11.0080.0680.005−3−2.6120.0730.15610.6220.1010.153
survivalDT-HMM10.9800.0540.003−3−1.2960.1432.92411.1180.2130.059
formulaCT-HMM11.0070.0620.004−3−3.3390.1400.13511.1260.1970.055
PH-HMM11.0580.0800.010−3−4.0100.2821.09911.6020.3660.495
3.2PMM11.0330.2450.061−3−0.8540.3564.73111.0230.2100.045
survivalDT-HMM10.9880.0410.002−3−3.7490.4690.78111.2180.7330.584
formulaCT-HMM10.9920.0410.002−3−3.2670.7120.57711.1470.8640.766
PH-HMM10.9780.0420.002−3−2.3580.4110.58111.4490.6550.630
3.3PMM11.0100.1140.013−3−1.5700.1192.05910.5100.1290.257
logisticDT-HMM10.9860.0440.002−3−3.0800.2540.07111.1140.3490.135
CT-HMM10.9870.0440.002−3−3.1670.2410.08611.0590.3250.109
PH-HMM10.9870.0440.002−3−3.1660.2400.08511.0530.3240.107
4.1PMM11.0080.1110.012−2−2.1160.1760.04452.0380.1498.793
survivalDT-HMM10.9940.0430.002−2−0.1320.2203.53955.7720.9311.461
formulaCT-HMM10.9010.1120.022−2−2.2560.1700.09451.5950.16611.618
PH−HMM10.9890.0650.004−2−2.7980.1430.65852.9440.2404.286
4.2PMM11.0090.0710.005−2−0.6750.1231.77251.4820.15512.401
survivalDT-HMM10.9900.0420.002−2−2.6360.4820.63656.1202.2496.304
formulaCT-HMM10.9940.0430.002−2−2.2660.3930.22554.9001.0771.168
PH-HMM10.9780.0440.002−2−1.4510.2440.36152.7330.3475.260
4.3PMM11.0110.1110.012−2−1.3420.0850.44051.4370.11412.706
logisticDT-HMM10.9920.0420.002−2−2.0500.2970.09155.2760.9861.046
CT-HMM10.9890.0420.002−2−2.4540.1780.23753.2630.3003.106
PH-HMM10.9850.0430.002−2−2.4120.1710.19953.0410.2673.907

Note: Mean parameter estimates, empirical standard error, and mean squared error based on 500 replicates. Cases 1.1–1.3 looked at low incidence rate and large distributional difference; Cases 2.1–2.3 looked at high incidence rate and large distributional difference; Cases 3.1–3.3 looked at low incidence rate and small distributional difference; and Cases 4.1–4.3 looked at high incidence rate and small distributional difference.

Table 3

Simulation: estimates, standard errors and mean square errors for μ2 and β2.

CaseModelμ2Est.SEMSEβ20Est.SEMSEβ21Est.SEMSE
1.1PMM11.0010.0450.002−3−2.9160.0920.01610.9180.1230.022
survivalDT-HMM10.9980.0430.002−3−1.3040.1092.88711.0950.1570.034
formulaCT-HMM11.0000.0450.002−3−3.0690.0980.01411.0320.1380.020
PH-HMM11.0030.0450.002−3−3.1300.1020.02711.0810.1430.027
1.2PMM11.0030.0460.002−3−2.1170.2530.84411.1390.4900.259
survivalDT-HMM11.0000.0410.002−3−3.7260.3640.66011.0510.6030.366
formulaCT-HMM11.0010.0410.002−3−3.0750.3560.13211.0220.5920.350
PH-HMM11.0000.0410.002−3−2.9760.3470.12111.0850.5950.361
1.3PMM11.0000.0430.002−3−2.5990.1720.19010.7340.2270.122
logisticDT-HMM10.9980.0390.002−3−3.0040.2030.04110.9990.3020.091
CT-HMM10.9980.0390.002−3−3.0680.1930.04210.9410.2810.082
PH-HMM10.9980.0390.002−3−3.0680.1930.04210.9410.2810.082
2.1PMM11.0000.0430.002−2−1.9540.1250.01853.9470.5011.359
survivalDT-HMM10.9980.0410.002−2−0.1480.1733.46055.6050.5550.673
formulaCT−HMM10.9980.0490.002−2−2.1290.1000.02753.8260.4301.563
PH-HMM10.9990.0420.002−2−2.1100.1070.02454.3580.3980.570
2.2PMM11.0310.4050.165−2−1.5571.2421.73752.7781.0926.124
survivalDT-HMM10.9990.0360.001−2−2.6700.3550.57455.5911.2621.940
formulaCT-HMM11.0010.0370.001−2−2.0910.3330.11955.1511.0061.033
PH-HMM10.9990.0370.001−2−1.9800.2980.08954.6800.8050.750
2.3PMM10.9990.0420.002−2−1.9420.1360.02252.5170.2306.219
logisticDT-HMM10.9950.0380.002−2−2.0200.2180.04855.1690.6710.478
CT-HMM10.9950.0380.002−2−2.3510.1530.14753.4430.2992.513
PH-HMM10.9950.0390.002−2−2.3460.1520.14353.4060.2882.623
3.1PMM11.0080.0680.005−3−2.6120.0730.15610.6220.1010.153
survivalDT-HMM10.9800.0540.003−3−1.2960.1432.92411.1180.2130.059
formulaCT-HMM11.0070.0620.004−3−3.3390.1400.13511.1260.1970.055
PH-HMM11.0580.0800.010−3−4.0100.2821.09911.6020.3660.495
3.2PMM11.0330.2450.061−3−0.8540.3564.73111.0230.2100.045
survivalDT-HMM10.9880.0410.002−3−3.7490.4690.78111.2180.7330.584
formulaCT-HMM10.9920.0410.002−3−3.2670.7120.57711.1470.8640.766
PH-HMM10.9780.0420.002−3−2.3580.4110.58111.4490.6550.630
3.3PMM11.0100.1140.013−3−1.5700.1192.05910.5100.1290.257
logisticDT-HMM10.9860.0440.002−3−3.0800.2540.07111.1140.3490.135
CT-HMM10.9870.0440.002−3−3.1670.2410.08611.0590.3250.109
PH-HMM10.9870.0440.002−3−3.1660.2400.08511.0530.3240.107
4.1PMM11.0080.1110.012−2−2.1160.1760.04452.0380.1498.793
survivalDT-HMM10.9940.0430.002−2−0.1320.2203.53955.7720.9311.461
formulaCT-HMM10.9010.1120.022−2−2.2560.1700.09451.5950.16611.618
PH−HMM10.9890.0650.004−2−2.7980.1430.65852.9440.2404.286
4.2PMM11.0090.0710.005−2−0.6750.1231.77251.4820.15512.401
survivalDT-HMM10.9900.0420.002−2−2.6360.4820.63656.1202.2496.304
formulaCT-HMM10.9940.0430.002−2−2.2660.3930.22554.9001.0771.168
PH-HMM10.9780.0440.002−2−1.4510.2440.36152.7330.3475.260
4.3PMM11.0110.1110.012−2−1.3420.0850.44051.4370.11412.706
logisticDT-HMM10.9920.0420.002−2−2.0500.2970.09155.2760.9861.046
CT-HMM10.9890.0420.002−2−2.4540.1780.23753.2630.3003.106
PH-HMM10.9850.0430.002−2−2.4120.1710.19953.0410.2673.907
CaseModelμ2Est.SEMSEβ20Est.SEMSEβ21Est.SEMSE
1.1PMM11.0010.0450.002−3−2.9160.0920.01610.9180.1230.022
survivalDT-HMM10.9980.0430.002−3−1.3040.1092.88711.0950.1570.034
formulaCT-HMM11.0000.0450.002−3−3.0690.0980.01411.0320.1380.020
PH-HMM11.0030.0450.002−3−3.1300.1020.02711.0810.1430.027
1.2PMM11.0030.0460.002−3−2.1170.2530.84411.1390.4900.259
survivalDT-HMM11.0000.0410.002−3−3.7260.3640.66011.0510.6030.366
formulaCT-HMM11.0010.0410.002−3−3.0750.3560.13211.0220.5920.350
PH-HMM11.0000.0410.002−3−2.9760.3470.12111.0850.5950.361
1.3PMM11.0000.0430.002−3−2.5990.1720.19010.7340.2270.122
logisticDT-HMM10.9980.0390.002−3−3.0040.2030.04110.9990.3020.091
CT-HMM10.9980.0390.002−3−3.0680.1930.04210.9410.2810.082
PH-HMM10.9980.0390.002−3−3.0680.1930.04210.9410.2810.082
2.1PMM11.0000.0430.002−2−1.9540.1250.01853.9470.5011.359
survivalDT-HMM10.9980.0410.002−2−0.1480.1733.46055.6050.5550.673
formulaCT−HMM10.9980.0490.002−2−2.1290.1000.02753.8260.4301.563
PH-HMM10.9990.0420.002−2−2.1100.1070.02454.3580.3980.570
2.2PMM11.0310.4050.165−2−1.5571.2421.73752.7781.0926.124
survivalDT-HMM10.9990.0360.001−2−2.6700.3550.57455.5911.2621.940
formulaCT-HMM11.0010.0370.001−2−2.0910.3330.11955.1511.0061.033
PH-HMM10.9990.0370.001−2−1.9800.2980.08954.6800.8050.750
2.3PMM10.9990.0420.002−2−1.9420.1360.02252.5170.2306.219
logisticDT-HMM10.9950.0380.002−2−2.0200.2180.04855.1690.6710.478
CT-HMM10.9950.0380.002−2−2.3510.1530.14753.4430.2992.513
PH-HMM10.9950.0390.002−2−2.3460.1520.14353.4060.2882.623
3.1PMM11.0080.0680.005−3−2.6120.0730.15610.6220.1010.153
survivalDT-HMM10.9800.0540.003−3−1.2960.1432.92411.1180.2130.059
formulaCT-HMM11.0070.0620.004−3−3.3390.1400.13511.1260.1970.055
PH-HMM11.0580.0800.010−3−4.0100.2821.09911.6020.3660.495
3.2PMM11.0330.2450.061−3−0.8540.3564.73111.0230.2100.045
survivalDT-HMM10.9880.0410.002−3−3.7490.4690.78111.2180.7330.584
formulaCT-HMM10.9920.0410.002−3−3.2670.7120.57711.1470.8640.766
PH-HMM10.9780.0420.002−3−2.3580.4110.58111.4490.6550.630
3.3PMM11.0100.1140.013−3−1.5700.1192.05910.5100.1290.257
logisticDT-HMM10.9860.0440.002−3−3.0800.2540.07111.1140.3490.135
CT-HMM10.9870.0440.002−3−3.1670.2410.08611.0590.3250.109
PH-HMM10.9870.0440.002−3−3.1660.2400.08511.0530.3240.107
4.1PMM11.0080.1110.012−2−2.1160.1760.04452.0380.1498.793
survivalDT-HMM10.9940.0430.002−2−0.1320.2203.53955.7720.9311.461
formulaCT-HMM10.9010.1120.022−2−2.2560.1700.09451.5950.16611.618
PH−HMM10.9890.0650.004−2−2.7980.1430.65852.9440.2404.286
4.2PMM11.0090.0710.005−2−0.6750.1231.77251.4820.15512.401
survivalDT-HMM10.9900.0420.002−2−2.6360.4820.63656.1202.2496.304
formulaCT-HMM10.9940.0430.002−2−2.2660.3930.22554.9001.0771.168
PH-HMM10.9780.0440.002−2−1.4510.2440.36152.7330.3475.260
4.3PMM11.0110.1110.012−2−1.3420.0850.44051.4370.11412.706
logisticDT-HMM10.9920.0420.002−2−2.0500.2970.09155.2760.9861.046
CT-HMM10.9890.0420.002−2−2.4540.1780.23753.2630.3003.106
PH-HMM10.9850.0430.002−2−2.4120.1710.19953.0410.2673.907

Note: Mean parameter estimates, empirical standard error, and mean squared error based on 500 replicates. Cases 1.1–1.3 looked at low incidence rate and large distributional difference; Cases 2.1–2.3 looked at high incidence rate and large distributional difference; Cases 3.1–3.3 looked at low incidence rate and small distributional difference; and Cases 4.1–4.3 looked at high incidence rate and small distributional difference.

Cases 1.1–1.3 represent data that is often associated with HMMs. These non-stationary processes often involve an extended stay in a state after a transition with clear separation in their state-dependent distributions (Holsclaw et al., 2017). We found that PH-HMM performed well in Cases 1.1–1.3 settings, due to the PH and logistic likelihoods being similar for low incidence settings as described in Section 2.3. In Cases 1.1–1.3, formula a region where there is little difference in PH and logistic likelihoods, with a plot of the penalty difference given in Web Figure 1. When there are heterogeneous event times (Case 1.1), PH-HMM results in more accurate coefficient estimates by specifying the correct parametric form, while DT-HMM does not achieve the same accuracy. PMM, as a two-step estimation, is less accurate due to not incorporating covariate information throughout the estimation process. Under common data generative settings of low incidence rates and for both heterogeneous and discrete times, PH-HMM is a highly reliable and robust parameterization for fitting HMMs, while logistic regression DT-HMMs does not have comparable utility. In addition, PH regression, being a derivative of the Poisson regression link function, is more numerically stable in the rare outcome settings of Cases 1.1–1.3.

In Cases 2.1–2.3, settings with clear separation in their state-dependent distributions and high incidence of state transitions, PH-HMM has low bias when the data-generating process is a survival model and DT-HMM yields a poor fit in this setting. For Case 2.3, when the data-generating model is a logistic regression, PH-HMM results in biased parameter estimation. In Cases 2.1–2.3 where formula, parameter estimation for Case 2.3 from PH-HMM fits formula with similar results for formula. This attenuation is in line with Section 2.3, penalizing formula, noting that formula for formula. Though this induces bias, it is useful for robust estimation of HMMs for common non-stationary processes with few incidences of state transitions. This bias is reflected in low coverage rates which can be found in Web Appendix H. Our data-generating model sets formula to illustrate a case with many incidences of state transitions. In practice, an extended region of the formula, where formula and formula is a level of certainty seldom encountered in real data. Existing HMM methods such as Viterbi algorithm and HSMMs work to attenuate transition probabilities formula in order to mitigate instances of erroneous state transitions (Bulla et al., 2010; Viterbi, 1967). Our PH-HMM method estimates milder transition probabilities through the implicit penalization of PH regression without changing the covariate link function to the HSMM parameterization which can be difficult to interpret.

For Cases 3.1–4.3, we present simulation studies that mirror the regression parameters used in Cases 1.1–2.3 but with small distributional differences. As we close the gap between state-dependent distributions, we observe less accurate coefficient estimation overall. In practice, settings with small distributional differences often require additional consideration when deciding model parameterization. In general, DT-HMM tends to over estimate the range of formula, while PH-HMM tends to be more conservative with probability values of formula being closer to 0. The CT-HMM estimation procedure that calculates transition probabilities using continuous-time Markov chains work well in grouped and discrete-time settings. Additional discussion of simulation studies can be found in Web Appendix I.

From our simulation studies, we found that PH-HMM excels in a number of different cases. DT-HMM and CT-HMM are sensitive to heterogeneous event times leading to inaccurate state label recovery and bias parameter estimation. While DT-HMM is more accurate than PH-HMM at state label recovery in Cases 3.1 and 4.1, their parameter estimates have higher MSE and bias. DT-HMM generally has higher MSE than PH-HMM for cases where the data are generated from PH models (Cases 1.1, 1.2, 2.1, 2.2, 3.1, 3.2, 4.1, and 4.2). The shrinkage properties outlined in Section 2.3 carry over into our simulation study. In most cases and when the data are generated from a logistic regression, DT-HMM and PH-HMM have comparable accuracy and state-dependent distribution estimates, even though PH-HMM coefficients exhibit shrinkage. This shrinkage property has many uses which extend beyond simple simulation studies which we outline next in our real-data analysis examples.

5 Application: mHealth Data

A sample of formula individuals, recruited via the Penn/CHOP Lifespan Brain Institute or through the Outpatient Psychiatry Clinic at the University of Pennsylvania as part of a study of affective instability in youth (Xia et al., 2022). All participants provided informed consent to all study procedures. This study was approved by the University of Pennsylvania Institutional Review Board. For each individual, roughly 3 months of data was collected using the Beiwe platform (Torous et al., 2016). Accelerometer measures (meters per second squared) for x, y, and z-axes, screen-on events for Android devices and screen-unlock events for iOS (Apple) devices were acquired through the Beiwe platform—we refer to both as “screen-on events” in this paper. In general, we recommend a minimum of 30 days of data in order to fit our individual-specific model with hour-of-day random intercepts.

First, we analyzed the data under a discrete-time setting where we impute data for each missing hour. By collecting both screen-on events and accelerometer data, we are able to construct a missing at random (MAR) model for imputing missing accelerometer data. In our case, there is missing accelerometer data during of periods screen-on activity, that is, accelerometer data are missing over a given hour, while there are many screen-on events observed over the same period. On the other hand, there are periods of dormancy: accelerometer data are missing and there are also no screen-on events. More specifically periods of dormancy occur when accelerometer measurements are missing due to user and device-related factors such as the phone being powered off or being in the airplane mode. Periods of dormancy have greater probability of missing accelerometer features and are identified using a two-state HSMM with Bernoulli state-dependent distributions (Bulla et al., 2010). Missing mean acceleration magnitudes from dormant periods were imputed using the minimum of acceleration features (excluding outliers). While missing data assigned to the periods screen-on activity were imputed by regressing accelerometer features on formula over all hours where data are completely observed. For the heterogeneous event time example, we did not impute missing data and absorbed the duration of dormancy into the event times formula. Periods of consecutive missing acceleration magnitudes over 24 h constitutes an end of a Markov chain and the start of a new chain, where the likelihoods of multiple HMM sequences can be multiplied together for parameter estimation.

5.1 Estimating Strength of Routine in Youth with Affective Disorders

In psychiatric studies, regularity of a rhythm is defined as the association of time-of-day and state membership; the effect of hour-of-day on activity and rest state membership represents the diurnal rhythm. As an illustrative example, we fit a model with hour-of-day effects, as normally distributed random intercepts in our HMMs and for each individual we fit PH models formula, where formula are 24 h-of-day random intercepts. By fitting hour-of-day random intercepts, formula and formula are each of length 24 and formula only if formula is in the rth h of the day. Rates of transition from active-to-rest states are given as formula, rates of transition from rest-to-active states are given as formula and an example of PH-HMM outputs can be found in Figure 2. The variances of the random intercepts can be interpreted as an L2 penalty on hour-of-day effects and disappears as formula. We quantify the strength of diurnal effects for an individual, by looking at the variances formula, where large variances correspond with large hour-of-day effect sizes and greater regularity in diurnal rhythms with an example in Figure 2.

Example of discrete-time mHealth data and fitted PH-HMM. Probabilities of being in the active state, screen-on counts, mean acceleration magnitude, and hour-of-day random intercepts are plotted against time (hours). Regression models: ,  were fitted for individual i and MAP estimates were calculated using final E-step probabilities . Random intercepts capture the diurnal rhythm of active–rest cycles, with active states mainly occurring between the hours of 6 am–10 pm. Large values of  correspond with a high magnitude in the cyclic diurnal effects. This figure appears in color in the electronic version of this paper, and any mention of color refers to that version.
Figure 2

Example of discrete-time mHealth data and fitted PH-HMM. Probabilities of being in the active state, screen-on counts, mean acceleration magnitude, and hour-of-day random intercepts are plotted against time (hours). Regression models: formula, formula were fitted for individual i and MAP estimates were calculated using final E-step probabilities formula. Random intercepts capture the diurnal rhythm of active–rest cycles, with active states mainly occurring between the hours of 6 am–10 pm. Large values of formula correspond with a high magnitude in the cyclic diurnal effects. This figure appears in color in the electronic version of this paper, and any mention of color refers to that version.

5.2 Population HMM: Differences between Operating Systems

In addition, we can fit a population model, with random intercepts being specific to each individual through estimation using the likelihood formula. However, for iOS devices we only have screen-unlock events, that is, entering in a passcode to unlock the phone. Android devices have screen-on events which occur when the phone screen turns on such as when receiving a message; the phone does not need to be unlocked for the screen to be turned on. Screen-unlock events are less frequent and a subset of screen-on events, causing the counts formula, to be lower for iOS devices. The relationship between acceleration, formula and formula, may be experience effect modification due to the operating system (OS). We can test interaction between OS and acceleration, while controlling for the interaction with user sex in our regression and other individual effects with random intercepts. Android devices and males serve as baseline in this analysis. For the active-to-rest model: formula and rest-to-active model: formula, formula, where formula, formula are 41 individual specific random intercepts and formula are individual specific indicators. We fit our competing methods and test interaction for the discrete and heterogeneous event time settings, where estimates can be found in Table 4. Sensitivity analysis for our discrete-time model with MAR missing data imputation can be found in Web Appendix J.

Table 4

Population HMM: parameter estimates for discrete-time and heterogeneous-time setting.

Methods: Estimate (SE)
TimeTransitionParametersPMMDT-HMMCT-HMMPH-HMM
DiscreteActiveβ102.8973(0.8384)10.8254(1.4074)9.0154(1.2162)8.5893(1.2036)
Timetoβ11−0.4182(0.0850)−1.2431(0.1428)−1.0828(0.1232)−1.0395(0.1219)
Restβ120.0288(0.0121)0.0379(0.0184)0.0257(0.0126)0.0257(0.0126)
β130.0070(0.0124)0.0052(0.0189)0.0054(0.0130)0.0050(0.0129)
formula0.10990.25850.11710.1160
Restβ20−6.6357(0.3846)−16.2898(1.3782)−6.0404(0.4941)−5.9631(0.5029)
toβ210.5359(0.0405)1.5280(0.1401)0.4671(0.0512)0.4599(0.0521)
Activeβ22−0.0278(0.0190)−0.0441(0.0203)−0.0357(0.0183)−0.0355(0.0183)
β23−0.0431(0.0191)−0.0541(0.0208)−0.0503(0.0187)−0.0505(0.0188)
formula0.28020.32100.26310.2634
μ18.53218.08668.02518.0114
μ20.52380.42630.41950.4153
HeterogeneousActiveβ100.3714(0.7637)6.4420(1.2828)2.8110(1.1055)2.0846(1.0867)
Timetoβ11−0.2567(0.0786)−0.7771(0.1305)−0.5526(0.1130)−0.5209(0.1120)
Restβ120.0541(0.0214)0.0496(0.0231)0.0528(0.0233)0.0685(0.0285)
β130.0867(0.0221)0.0094(0.0241)0.0840(0.0240)0.1127(0.0296)
formula0.35540.39410.41020.6132
Restβ20−6.4771(0.3775)−20.1172(1.2551)−6.9784(0.4544)−6.8518(0.4673)
toβ210.4597(0.0411)2.0046(0.1298)0.5196(0.0481)0.4889(0.0500)
Activeβ22−0.0211(0.0235)−0.0620(0.0294)−0.0383(0.0217)−0.0327(0.0241)
β230.0033(0.0240)−0.0874(0.0309)−0.0199(0.0225)−0.0077(0.0252)
formula0.40510.62070.34220.4040
μ19.39399.05308.95228.9428
μ21.05340.95790.95280.9534
Methods: Estimate (SE)
TimeTransitionParametersPMMDT-HMMCT-HMMPH-HMM
DiscreteActiveβ102.8973(0.8384)10.8254(1.4074)9.0154(1.2162)8.5893(1.2036)
Timetoβ11−0.4182(0.0850)−1.2431(0.1428)−1.0828(0.1232)−1.0395(0.1219)
Restβ120.0288(0.0121)0.0379(0.0184)0.0257(0.0126)0.0257(0.0126)
β130.0070(0.0124)0.0052(0.0189)0.0054(0.0130)0.0050(0.0129)
formula0.10990.25850.11710.1160
Restβ20−6.6357(0.3846)−16.2898(1.3782)−6.0404(0.4941)−5.9631(0.5029)
toβ210.5359(0.0405)1.5280(0.1401)0.4671(0.0512)0.4599(0.0521)
Activeβ22−0.0278(0.0190)−0.0441(0.0203)−0.0357(0.0183)−0.0355(0.0183)
β23−0.0431(0.0191)−0.0541(0.0208)−0.0503(0.0187)−0.0505(0.0188)
formula0.28020.32100.26310.2634
μ18.53218.08668.02518.0114
μ20.52380.42630.41950.4153
HeterogeneousActiveβ100.3714(0.7637)6.4420(1.2828)2.8110(1.1055)2.0846(1.0867)
Timetoβ11−0.2567(0.0786)−0.7771(0.1305)−0.5526(0.1130)−0.5209(0.1120)
Restβ120.0541(0.0214)0.0496(0.0231)0.0528(0.0233)0.0685(0.0285)
β130.0867(0.0221)0.0094(0.0241)0.0840(0.0240)0.1127(0.0296)
formula0.35540.39410.41020.6132
Restβ20−6.4771(0.3775)−20.1172(1.2551)−6.9784(0.4544)−6.8518(0.4673)
toβ210.4597(0.0411)2.0046(0.1298)0.5196(0.0481)0.4889(0.0500)
Activeβ22−0.0211(0.0235)−0.0620(0.0294)−0.0383(0.0217)−0.0327(0.0241)
β230.0033(0.0240)−0.0874(0.0309)−0.0199(0.0225)−0.0077(0.0252)
formula0.40510.62070.34220.4040
μ19.39399.05308.95228.9428
μ21.05340.95790.95280.9534

Note: Parameter estimates using EM algorithm and asymptotic standard errors. Regression for state transitions are given as formula, where formula are individual specific random intercepts, Android devices and males serve as baseline.

Table 4

Population HMM: parameter estimates for discrete-time and heterogeneous-time setting.

Methods: Estimate (SE)
TimeTransitionParametersPMMDT-HMMCT-HMMPH-HMM
DiscreteActiveβ102.8973(0.8384)10.8254(1.4074)9.0154(1.2162)8.5893(1.2036)
Timetoβ11−0.4182(0.0850)−1.2431(0.1428)−1.0828(0.1232)−1.0395(0.1219)
Restβ120.0288(0.0121)0.0379(0.0184)0.0257(0.0126)0.0257(0.0126)
β130.0070(0.0124)0.0052(0.0189)0.0054(0.0130)0.0050(0.0129)
formula0.10990.25850.11710.1160
Restβ20−6.6357(0.3846)−16.2898(1.3782)−6.0404(0.4941)−5.9631(0.5029)
toβ210.5359(0.0405)1.5280(0.1401)0.4671(0.0512)0.4599(0.0521)
Activeβ22−0.0278(0.0190)−0.0441(0.0203)−0.0357(0.0183)−0.0355(0.0183)
β23−0.0431(0.0191)−0.0541(0.0208)−0.0503(0.0187)−0.0505(0.0188)
formula0.28020.32100.26310.2634
μ18.53218.08668.02518.0114
μ20.52380.42630.41950.4153
HeterogeneousActiveβ100.3714(0.7637)6.4420(1.2828)2.8110(1.1055)2.0846(1.0867)
Timetoβ11−0.2567(0.0786)−0.7771(0.1305)−0.5526(0.1130)−0.5209(0.1120)
Restβ120.0541(0.0214)0.0496(0.0231)0.0528(0.0233)0.0685(0.0285)
β130.0867(0.0221)0.0094(0.0241)0.0840(0.0240)0.1127(0.0296)
formula0.35540.39410.41020.6132
Restβ20−6.4771(0.3775)−20.1172(1.2551)−6.9784(0.4544)−6.8518(0.4673)
toβ210.4597(0.0411)2.0046(0.1298)0.5196(0.0481)0.4889(0.0500)
Activeβ22−0.0211(0.0235)−0.0620(0.0294)−0.0383(0.0217)−0.0327(0.0241)
β230.0033(0.0240)−0.0874(0.0309)−0.0199(0.0225)−0.0077(0.0252)
formula0.40510.62070.34220.4040
μ19.39399.05308.95228.9428
μ21.05340.95790.95280.9534
Methods: Estimate (SE)
TimeTransitionParametersPMMDT-HMMCT-HMMPH-HMM
DiscreteActiveβ102.8973(0.8384)10.8254(1.4074)9.0154(1.2162)8.5893(1.2036)
Timetoβ11−0.4182(0.0850)−1.2431(0.1428)−1.0828(0.1232)−1.0395(0.1219)
Restβ120.0288(0.0121)0.0379(0.0184)0.0257(0.0126)0.0257(0.0126)
β130.0070(0.0124)0.0052(0.0189)0.0054(0.0130)0.0050(0.0129)
formula0.10990.25850.11710.1160
Restβ20−6.6357(0.3846)−16.2898(1.3782)−6.0404(0.4941)−5.9631(0.5029)
toβ210.5359(0.0405)1.5280(0.1401)0.4671(0.0512)0.4599(0.0521)
Activeβ22−0.0278(0.0190)−0.0441(0.0203)−0.0357(0.0183)−0.0355(0.0183)
β23−0.0431(0.0191)−0.0541(0.0208)−0.0503(0.0187)−0.0505(0.0188)
formula0.28020.32100.26310.2634
μ18.53218.08668.02518.0114
μ20.52380.42630.41950.4153
HeterogeneousActiveβ100.3714(0.7637)6.4420(1.2828)2.8110(1.1055)2.0846(1.0867)
Timetoβ11−0.2567(0.0786)−0.7771(0.1305)−0.5526(0.1130)−0.5209(0.1120)
Restβ120.0541(0.0214)0.0496(0.0231)0.0528(0.0233)0.0685(0.0285)
β130.0867(0.0221)0.0094(0.0241)0.0840(0.0240)0.1127(0.0296)
formula0.35540.39410.41020.6132
Restβ20−6.4771(0.3775)−20.1172(1.2551)−6.9784(0.4544)−6.8518(0.4673)
toβ210.4597(0.0411)2.0046(0.1298)0.5196(0.0481)0.4889(0.0500)
Activeβ22−0.0211(0.0235)−0.0620(0.0294)−0.0383(0.0217)−0.0327(0.0241)
β230.0033(0.0240)−0.0874(0.0309)−0.0199(0.0225)−0.0077(0.0252)
formula0.40510.62070.34220.4040
μ19.39399.05308.95228.9428
μ21.05340.95790.95280.9534

Note: Parameter estimates using EM algorithm and asymptotic standard errors. Regression for state transitions are given as formula, where formula are individual specific random intercepts, Android devices and males serve as baseline.

We found that there was no significant interaction between OS and acceleration in the discrete-time active-to-rest model but did find significant interaction in the heterogeneous event time active-to-rest model. In addition, we found that there was significant interaction between OS and acceleration in the discrete-time rest-to-active model but did not find significant interaction in the heterogeneous event time rest-to-active model. During the active state, we know that counts, formula, are lower for iOS devices. In order to compensate for lower screen-on counts in iOS devices, iOS rest-to-active transitions require a higher magnitude of acceleration to achieve the same transition rate as an Android device, hence the negative sign of β23. During the rest state, we know that iOS devices have zero-inflated counts, where excess zeros are due to not being able to record a screen-on event. In the active-to-rest model, we may achieve the same transition rate while having higher acceleration in iOS devices than Android devices, hence the positive sign of β13. Our results suggest that the magnitude of effect acceleration has on state transition depends on OS, but further investigation is needed.

For the discrete-time setting, we estimated active-state distribution formula and rest-state distribution formula. For the heterogeneous event time setting, we estimated formula and formula. Screen-on counts separate into stark clusters, where μ1 and μ2 are similar to the large distributional difference from our simulation study. We found that rate of transition from active-to-rest states is negatively associated with acceleration; rate of transition for rest-to-active states is positively associated with acceleration and is statistically significant (formula) for all competing methods. These HMM parameter estimates related to screen-on counts and acceleration align with common intuition. For the PMM method, modeling the MAP estimates first and then combining the estimates to obtain a population level model, does not account for acceleration when imputing state labels and resulted in a poor fit. CT-HMM and PH-HMM estimates are comparable and aligned with the findings from the simulation study. In addition to large distributional differences, diurnal active–rest cycles are expected to be low-incidence rate processes, where we anticipate a few transition between states in a 24 h period. We observed that the magnitude of the relative risk estimates for CT-HMM and PH-HMM is comparable to each other but less than the DT-HMM estimates. This difference becomes more noticeable for the heterogeneous event time setting, which DT-HMM is not equipped to handle. For many parameters, DT-HMM relative risk estimates are three times that of PH-HMM estimates, while estimates for state-dependent distribution parameters formula are similar. PH-HMM allows us to achieve comparable estimation of state-dependent distributions while leveraging shrinkage to avoid overfitting coefficients.

6 Discussion

For a latent state setting, we proposed a method for estimating alternating recurrent event exponential PH model with shared log-normal frailties using the EM algorithm. Our E-step imputations involves a discrete-time HMM using logistic or multinomial regression transition probabilities with normally distributed random intercepts. The HMM obtained during the E-step of our EM algorithm is an alternative method for estimating mixed HMM which are typically obtained by estimating logistic or multinomial regressions (Altman, 2007; Maruotti & Rocci, 2012). The M-step conveniently involves fitting several independent PH models rather than multinomial regression with many states. In addition, we showed that the PH model applied to the discrete-time setting is a penalized logistic regression which shrinks transition probability matrices toward the identity matrix. Our framework can accommodate random intercepts to account for longitudinal data, such as data collected from the same individual or hour-of-day periodic effects. We derived asymptotic distributions for the PH regression coefficients and random intercepts, where coefficients have a hazard ratio interpretation akin to the Cox PH model. Our PH-HMM approach is a flexible method for modeling complex mHealth datasets, where heterogeneous event times can be incorporated into the PH regression while accounting for latent states. If the underlying data are a discrete-time process, then PH modeling offers penalization to mitigate overfitting, otherwise PH models are more appropriate for heterogeneous event time processes. Detail coverage rate results are presented in Web Appendix H and show that DT-HMM performs favorably in settings with frequent state changes, despite the PH-HMM excelling in numerous other situations.

Though we focused on alternating two-state models, our proposed framework can be generalized to arbitrary number of states. However, we caution that the number of model parameters grows quadratically with the number of states. In this case, model selection criteria such as Akaike information criterion/Bayesian information criterion can be used to inform the number of states that best fit the data. Furthermore, we provide R code for our analyses in Web Appendix K, which can be easily extended to consider model selection in addition to alternative parameter initialization.

As for our MAR assumption in our imputation model, we demonstrate via a sensitivity analysis that this assumption is robust even in the presence of moderate MNAR-induced systematic bias. Web Appendix J shows robust estimates for regression coefficients in the presence of biased imputation. Overall we present a flexible regression procedure that can accommodate different degrees of missing data, parameterization of random effects, and the use of other statistical structures such as semiparametric regression and multilevel models. However, parameter initialization, computational complexity, and interpretability should be considered when parameterizing complicated models such as HMMs (Maruotti & Punzo, 2021). Finally, a strong appeal of our approach is that complicated statistical structures can be incorporated into independent PH regressions which then ultimately simplify to multinomial transition probabilities during the E-step. Indeed, our application to quantify routine activity data is but one example of a framework that can flexibly adapt to a variety of different data types and corresponding modeling choices.

Data Availability Statement

The data that support the findings in this paper are available on request from the corresponding author (Benny Ren [email protected]). The data are not publicly available due to privacy or ethical restrictions.

Acknowledgments

This work was supported by grants from the NIMH: R01MH116884. Support for data collection was provided by the AE Foundation: R01MH113550 and R01MH107703. The authors wish to thank the Lifespan Informatics & Neuroimaging Center for providing the data

References

Abbott
,
R.D.
(
1985
)
Logistic regression in survival analysis
.
American Journal of Epidemiology
,
121
(
3
),
465
471
.

Altman
,
R.M.
(
2007
)
Mixed hidden Markov models: an extension of the hidden Markov model to the longitudinal data setting
.
Journal of the American Statistical Association
,
102
(
477
),
201
210
.

Bartolucci
,
F.
&
Farcomeni
,
A.
(
2015
)
A discrete time event-history approach to informative drop-out in mixed latent Markov models with covariates
.
Biometrics
,
71
(
1
),
80
89
.

Bartolucci
,
F.
&
Farcomeni
,
A.
(
2019
)
A shared-parameter continuous-time hidden Markov and survival model for longitudinal data with informative dropout
.
Statistics in Medicine
,
38
(
6
),
1056
1073
.

Baum
,
L.E.
,
Petrie
,
T.
,
Soules
,
G.
&
Weiss
,
N.
(
1970
)
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains
.
The Annals of Mathematical Statistics
,
41
(
1
),
164
171
.

Bulla
,
J.
,
Bulla
,
I.
&
Nenadić
,
O.
(
2010
)
HSMM-AN R package for analyzing hidden semi-Markov models
.
Computational Statistics & Data Analysis
,
54
(
3
),
611
619
.

Bureau
,
A.
,
Shiboski
,
S.
&
Hughes
,
J.P.
(
2003
)
Applications of continuous time hidden Markov models to the study of misclassified disease outcomes
.
Statistics in Medicine
,
22
(
3
),
441
462
.

Callas
,
P.W.
,
Pastides
,
H.
&
Hosmer
,
D.W.
(
1998
)
Empirical comparisons of proportional hazards, Poisson, and logistic regression modeling of occupational cohort data
.
American Journal of Industrial Medicine
,
33
(
1
),
33
47
.

Dempster
,
A.P.
,
Laird
,
N.M.
&
Rubin
,
D.B.
(
1977
)
Maximum likelihood from incomplete data via the EM algorithm
.
Journal of the Royal Statistical Society: Series B (Methodological)
,
39
(
1
),
1
22
.

Forney
,
G.D.
(
1973
)
The viterbi algorithm
.
Proceedings of the IEEE
,
61
(
3
),
268
278
.

Holsclaw
,
T.
,
Greene
,
A.M.
,
Robertson
,
A.W.
&
Smyth
,
P.
(
2017
)
Bayesian nonhomogeneous Markov models via Pólya-gamma data augmentation with applications to rainfall modeling
.
The Annals of Applied Statistics
,
11
(
1
),
393
426
.

Hothorn
,
T.
(
2020
)
Most likely transformations: the mlt package
.
Journal of Statistical Software
,
92
(
1
),
v092
i01
.

Hothorn
,
T.
,
Moest
,
L.
&
Buehlmann
,
P.
(
2018
)
Most likely transformations
.
Scandinavian Journal of Statistics
,
45
(
1
),
110
134
.

Ingram
,
D.D.
&
Kleinman
,
J.C.
(
1989
)
Empirical comparisons of proportional hazards and logistic regression models
.
Statistics in Medicine
,
8
(
5
),
525
538
.

Jackson
,
C.H.
,
Sharples
,
L.D.
,
Thompson
,
S.G.
,
Duffy
,
S.W.
&
Couto
,
E.
(
2003
)
Multistate Markov models for disease progression with classification error
.
Journal of the Royal Statistical Society: Series D (The Statistician)
,
52
(
2
),
193
209
.

Kristensen
,
K.
,
Nielsen
,
A.
,
Berg
,
C.W.
,
Skaug
,
H.
&
Bell
,
B.M.
(
2016
)
Tmb: automatic differentiation and Laplace approximation
.
Journal of Statistical Software
,
70
(
5
),
1
21
.

Król
,
A.
&
Saint-Pierre
,
P.
(
2015
)
Semimarkov: an R package for parametric estimation in multi-state semi-Markov models
.
Journal of Statistical Software
,
66
,
1
16
.

Lagona
,
F.
,
Jdanov
,
D.
&
Shkolnikova
,
M.
(
2014
)
Latent time-varying factors in longitudinal analysis: a linear mixed hidden Markov model for heart rates
.
Statistics in Medicine
,
33
(
23
),
4116
4134
.

Langrock
,
R.
,
Swihart
,
B.J.
,
Caffo
,
B.S.
,
Punjabi
,
N.M.
&
Crainiceanu
,
C.M.
(
2013
)
Combining hidden Markov models for comparing the dynamics of multiple sleep electroencephalograms
.
Statistics in Medicine
,
32
(
19
),
3342
3356
.

Maruotti
,
A.
&
Punzo
,
A.
(
2021
)
Initialization of hidden Markov and semi-Markov models: a critical evaluation of several strategies
.
International Statistical Review
,
89
(
3
),
447
480
.

Maruotti
,
A.
&
Rocci
,
R.
(
2012
)
A mixed non-homogeneous hidden Markov model for categorical data, with application to alcohol consumption
.
Statistics in Medicine
,
31
(
9
),
871
886
.

McGilchrist
,
C.
&
Aisbett
,
C.
(
1991
)
Regression with frailty in survival analysis
.
Biometrics
,
461
466
.

Monk
,
T.H.
,
Kupfer
,
D.J.
,
Frank
,
E.
&
Ritenour
,
A.M.
(
1991
)
The social rhythm metric (srm): measuring daily social rhythms over 12 weeks
.
Psychiatry Research
,
36
(
2
),
195
207
.

Shinohara
,
R.
,
Sun
,
Y.
&
Wang
,
M.-C.
(
2018
)
Alternating event processes during lifetimes: population dynamics and statistical inference
.
Lifetime Data Analysis
,
24
(
1
),
110
125
.

Stoner
,
O.
&
Economou
,
T.
(
2020
)
An advanced hidden Markov model for hourly rainfall time series
.
Computational Statistics & Data Analysis
,
152
, 107045.

Tamási
,
B.
&
Hothorn
,
T.
(
2021
)
tramme: Mixed-effects transformation models using template model builder
.

Therneau
,
T.M.
&
Lumley
,
T.
(
2015
)
Package ‘survival'
.
R Top Doc
,
128
(
10
),
28
33
.

Thompson Jr
,
W.
(
1977
)
On the treatment of grouped observations in life studies
.
Biometrics
,
33
(
3
),
463
470
.

Torous
,
J.
,
Kiang
,
M.V.
,
Lorme
,
J.
&
Onnela
,
J.-P.
(
2016
)
New tools for new research in psychiatry: a scalable and customizable platform to empower data driven smartphone research
.
JMIR Mental Health
,
3
(
2
), e16.

Viterbi
,
A.
(
1967
)
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
.
IEEE Transactions on Information Theory
,
13
(
2
),
260
269
.

Wang
,
L.
,
He
,
K.
&
Schaubel
,
D.E.
(
2020
)
Penalized survival models for the analysis of alternating recurrent event data
.
Biometrics
,
76
(
2
),
448
459
.

Xia
,
C.H.
,
Barnett
,
I.
,
Tapera
,
T.M.
,
Adebimpe
,
A.
,
Baker
,
J.T.
,
Bassett
,
D.S.
, et al. (
2022
)
Mobile footprinting: linking individual distinctiveness in mobility patterns to mood, sleep, and brain functional connectivity
.
Neuropsychopharmacology
,
47
,
1
10
.

Zucchini
,
W.
,
MacDonald
,
I.L.
&
Langrock
,
R.
(
2017
)
Hidden Markov models for time series: an introduction using R
.
Boca Raton, FL
:
CRC Press
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data