Multiphasic stochastic epidemic models

Author Notes

Abstract

At the onset of the COVID-19 pandemic, various non-pharmaceutical interventions aimed to reduce infection levels, leading to multiple phases of transmission. The disease reproduction number, $R_{t}$ ⁠, quantifies transmissibility and is central to evaluating these interventions. This article discusses hierarchical stochastic epidemic models with piece-wise constant $R_{t}$ ⁠, suitable for capturing distinct epidemic phases and estimating disease magnitude. The timing and scale of $R_{t}$ changes are inferred from data, while the number of phases is allowed to vary. The model uses Poisson point processes and Dirichlet process components to learn the number of phases, providing insight into epidemic dynamics. We test the models on synthetic data and apply them to freely available data from the UK, Greece, California, and New York. We estimate the true number of infections and $R_{t}$ and independently validate this approach via a large seroprevalence study. The results show that key disease characteristics can be derived from publicly available data without imposing strong assumptions.

Bayesian learning, disease transmission, effective reproduction number, latent infection path, multiple transmission phases

1 Introduction

At the start of 2020 the emergence of COVID-19, an infectious disease caused by the virus SARS-CoV-2, has placed health systems around the globe under immense pressure. In March 2020, the World Health Organization declared COVID-19 as a global pandemic, and as of the end of September 2022, more than 6.5 million have died due to illness or complications of it. At the beginning of the pandemic in the absence of available vaccines or suitable medication, the majority of governments around the globe resorted to Non-Pharmaceutical-Interventions (NPIs) in an attempt to stop the exponential spreading of the virus and reduce transmissibility. Such NPIs involved measures like work-from-home policies, school, and university closures, stay-at-home guidance for people in high-risk groups and full lockdowns.

These measures had an effect on reducing the transmissibility and resulted in spreading trajectories that could not be properly described by the standard epidemic models due to the resulting multiphasic nature of transmission. The first systematic technique to assess these interventions was due to Flaxman et al. (2020) who proposed a renewal equation model whose infection dynamics were modelled through a multilevel framework incorporating NPIs. We amend this model by inferring the points in time at which the transmissibility changes as well as the magnitude of infectiousness in a data-driven manner. We determine the number of phases by using appropriate stochastic processes based upon variations of the Poisson process (PP) and Dirichlet process (DP)-based priors via their stick-breaking constructions (Miller & Harrison, 2018; Sethuraman, 1994).

Several models have been proposed in the literature for the estimation of multiphasic infectious diseases, particularly COVID-19. Briefly, a stochastic Susceptible-Exposed-Infectious-Removed (SEIR) model with a regression framework for the effect of the NPIs on transmissibility is used in Knock et al. (2021) while Birrell et al. (2021), Li et al. (2021), and Chatzilena et al. (2022) use stochastic SEIR models where the transmission mechanism is described by a system of non-linear ordinary differential equations and the transmission rate is modelled by a diffusion process. Modelling the transmission rate as a random walk facilitates gradual and smooth changes in time. A piecewise linear quantile trend model was proposed by Jiang et al. (2021), a kernel-based SIR model distinguishing the different phases of the transmissibility in space was developed by Geng et al. (2021) while Wistuba et al. (2022) incorporated splines to estimate the reproduction number in Germany.

Simpler forms of deterministic and stochastic multiphasic epidemic models have been considered before. In the context of modelling SARS-CoV-2 transmission, (Flaxman et al., 2020) used an approach with a fixed number, date, and scale of the $R_{t}$ change. Related work based upon variations of DP mixtures is presented in Hu and Geng (2021) and Creswell et al. (2023). In the former, the authors used a mixture of finite mixtures (MFM) model on a Susceptible-Infected-Recovered-Susceptible model, while in the latter the authors used a suitably modified Pitman-Yor process but only fitted it to the reported cases, thus dispensing with the effort to estimate the complete epidemic burden and the suitable adjustment for the reproduction number. The main advantage of this class of methods, based on piece-wise constant transmission rates, is the intuitive characterization of the epidemic in terms of multiple phases of transmissibility. The number and magnitude of the distinct phases are determined purely by data without forced assumptions on the effect of policy changes and NPIs. This approach should be central to a retrospective assessment of the NPIs: an evidence-based method for estimating the timing and effect of those interventions, minimizing the risk of introducing several types of bias.

The article is organized as follows. In Section 2, we define the proposed compartmental process, elucidate its equivalence with renewal process-based models and describe the observation regimes of the data. In Section 3, we complete the model definition by characterizing the complexity regimes. Section 4 assesses the proposed models via simulation experiments while Section 5 contains the application to data from California and New York state, the UK and Greece. The article concludes with a discussion.

2 Modelling disease transmission

2.1 Model definition and related characterizations

The methodology for modelling the time-varying disease transmissibility has been implemented under two distinct but equivalent models, the compartmental Susceptible-Infectious-Removed (SIR) model and the seemingly simpler time-since-infection model with population susceptibility reduction. Here, we define both models and delineate their equivalence.

The model assumes that the population has size n, is closed (demographic changes during the course of the epidemic are ignored) homogeneous and homogeneously mixing. In the stochastic SIR model, an infected individual makes contact with any other individual on day t at the points of a PP with time-varying intensity $\frac{λ_{t}}{n}$ ⁠. This scaling is commonly adopted as it makes the contact process independent of the size of the population (e.g. Andersson & Britton, 2000). If these (close) contacts of an infected individual occur with a susceptible individual they result in an infection. Each individual remains infectious for a random time period Y. All PPs in this construction are assumed to be independent. The disease reproduction number is defined as $R_{t} = λ_{t} * E [Y]$ ⁠, $t = 1, \dots, T$ ⁠, where T is the time horizon of the study.

The infection rate $λ_{t}$ will generally be assumed piece-wise constant in this article and further details are given below. For this model, let $c_{t}$ denote the number of new infections on day t. The expected number of new infections on day $t + 1$ is given by:

E [c_{t + 1}] = S_{t} * \frac{λ_{t}}{n} * I_{t},

(1)

where $S_{t}$ represents the number of individuals that remain susceptible:

S_{t} = n - \sum_{s = 0}^{t} c_{s}

(2)

and $I_{t}$ denotes the active set of infectives:

I_{t} = \sum_{s = 0}^{t} \sum_{j = 1}^{c_{s}} I_{{Y_{j} > t - s}}

(3)

and $I_{{Y_{j} > t - s}}$ the indicator function of the event that the individual j, infected on day s, remains infectious on day t. This event is implicitly determined by the disease characteristics. Then, (1) can be rewritten as

\begin{aligned} E [c_{t + 1}] & = S_{t} * \frac{R_{t}}{n} * \frac{\sum_{s = 0}^{t} \sum_{j = 1}^{c_{s}} I_{{Y_{j} > t - s}}}{E [Y]} \\ \approx \frac{S_{t}}{n} * R_{t} * \sum_{s = 0}^{t} c_{s} * g_{s} (t) \end{aligned}

(4)

since $I_{{Y_{j} > t - s}} \approx P (Y > t - s)$ and $g_{s} (t) = \frac{P (Y > t - s)}{E [Y]}$ denotes the generation interval which characterises the time from the infection of an individual until they generate their first infection, see for example (Champredon et al., 2018; Svensson, 2007, 2015). Such approximation works reasonably well in homogeneous populations but there are notable exceptions, including on spatial epidemics, see Mollison (1991) and Durrett and Levin (1994) for extensive discussion and counter examples.

Note that equation (4) is used in the commonly adopted technique of Cori et al. (2013) for estimating the instantaneous reproduction number and this approach gives a different justification to the derivation of the $\frac{S_{t}}{n}$ correction factor in Bhatt et al. (2023). The $\frac{S_{t}}{n}$ term accounts for the depletion of the susceptible population and is sometimes ignored when the aim of modelling disease transmissibility is somewhat different. We will use equation (4) for the subsequent analyses as it is computationally more efficient compared to the SIR structure. One should also consider potential ‘superspreading’ events when certain individuals infected unusually large numbers of secondary cases (Lipsitch et al., 2003; Shen et al., 2004). We account for this variability assuming that the individual reproduction number is gamma distributed with mean $R_{t}$ and dispersion parameter $k_{C}$ ⁠, yielding $c_{t} \sim NegativeBinomial (μ_{c_{t}}, k_{C})$ (Lloyd-Smith et al., 2005), where $μ_{c_{t}}$ is equal to $E [c_{t + 1}]$ of equation (4).

The reproduction number $R_{t}$ is of great practical interest as it is used to assess if the epidemic is growing or shrinking. Here, we consider two distinct instances of reproduction number. The effective reproduction number $R_{e} (t) = S_{t} * R_{t}$ describes the expected number of secondary cases generated by an infectious individual. Then, $R_{e} (t) > 1$ and $R_{e} (t) < 1$ indicate that the epidemic is growing or shrinking, respectively and reducing $R_{e} (t)$ below unity is the typical target of public health authorities. In contrast, $R_{t}$ quantifies contacts that may not always result in new infections, due to mixing with the immune proportion of the population. Therefore, $R_{t} > 1$ does not necessarily mean that the epidemic is growing. A detailed discussion about reproduction numbers can be found in Pellis et al. (2022).

2.2 Observation regimes

We consider two distinct observation regimes, one where the observed number of cases corresponds to the total number of infections and one where the total number of infections is indirectly estimated, outlined below.

2.2.1 Observed infections

The regime where the total number of infections is observed may be of interest in its own right but may also be used for certain transmissible diseases, for example in the analysis of influenza-like-illness data when seroprevalence study information is available. Epidemic models are attractive for analysing such data and are naturally defined in terms of infector–infectee pair and the timing of such events. In reality, however, this type of data is rarely available. Disease monitoring is based on the daily reported infections, which are known to be susceptible to multiple problems, including a time lag between the timing of infection and symptom onset or testing positive.

In the case of COVID-19, a large proportion of the population experiences asymptomatic or mild disease (Ward et al., 2021) leading to severe under-reporting. Inference about the reproduction number can be robust when the reported cases are used if depletion of the susceptible population is accounted for, or if the observed proportion of cases remains constant over time. One way to validate this assumption is by sequentially performing seroprevalence studies to estimate the true disease prevalence and the proportion of unreported incidences. However, such information was not available in most countries. In the following subsection, we describe an alternative approach that dispenses with the need for this assumption.

2.2.2 Unobserved cases

The case where infections may not be directly observed has been studied in a different context by Demiris et al. (2014). In the case of the pandemic, it became immediately apparent that the observed number of infections only partially accounts for the complete epidemic burden. An alternative technique was proposed by Flaxman et al. (2020) where the true cases were estimated by back-calculating infections from the daily reported deaths which are likely less prone to under-reporting. This method has the additional advantage of yielding an estimate of $S_{t}$ ⁠. We adopt this approach by introducing another hierarchical level into our model and the daily deaths are linked with the true cases via:

\begin{matrix} D_{t} \sim NegativeBinomial (d_{t}, k_{D}) \\ d_{t} = IFR * \sum_{i = 0}^{t - 1} c_{i} * π (t - i), \end{matrix}

(5)

where $d_{t}$ and $k_{D}$ are the mean and dispersion parameters of the negative binomial distribution, respectively. IFR is the infection fatality ratio and π is the distribution of the time from infection to death, with $π (i)$ denoting the probability an infected individual will die i days after becoming infected. Accurate estimates of the IFR and π are necessary for estimating incidence, which is treated here as a latent parameter. The IFR and π parameters may be calculated independently from external data or in a single stage, leveraging additional evidence from seroprevalence studies.

3 Epidemic complexity determination

The number of phases may be treated as a fixed but unknown integer or as a random quantity to be modelled and estimated from data. We describe two such frameworks in the following two subsections.

3.1 Deterministic number of phases

In Flaxman et al. (2020), the number of phases was a priori selected and the times when the reproduction number $R_{t}$ changed were also predefined. The dates of these points were informed by the NPIs implemented by each government leading to a piece-wise constant reproduction number $R_{t}$ ⁠, effectively assuming the immediate effect of those NPIs. We study the limitations of this approach in the case when someone makes inaccurate assumptions regarding the start or the end of an epidemic phase or indeed their number. We do this using the simulated data of the Section 4 and present the results in the Supplementary material. We follow Flaxman et al. (2020) considering $R_{t}$ as a piece-wise constant function and infer the point in time and magnitude of $R_{t}$ changes directly from the data. The number, K, of epidemic phases is investigated using models with different K values and the best model is selected using the Watanabe–Akaike information criterion (WAIC) (Watanabe, 2013) and approximate leave-one-out cross-validation (LOO) (Vehtari et al., 2017). Note that the time-ordering of the data suggests that k-step ahead forecasting would be a better predictive model selection process. However, model complexity renders such repeated evaluations practically infeasible and therefore we proceed with the commonly used approximation via information criteria. The model is defined as follows:

\begin{aligned} R_{t} = {\begin{matrix} r_{1}, & t \leq T_{1} \\ \dots \\ r_{j + 1}, & T_{j} < t \leq T_{j + 1} \\ \dots \\ r_{K}, & T_{K - 1} < t \leq T \end{matrix} \\ r_{j} \sim f (\cdot), r_{j} \in (0, \infty), j = 1, \dots, K \\ T_{i + 1} = T_{i} + e_{i} \\ T_{1} \sim Uniform (3, T) \\ e_{i} \sim Uniform (0, 100), i = 1, \dots, K - 1, \end{aligned}

(6)

where $f (\cdot)$ is any suitable positive defined probability distribution. We initially search for the first point in time, $T_{1}$ ⁠, when the epidemic changes phase for the first time. This search spans the entire duration of our study. To avoid identifiability issues with timepoints $T_{i}$ ⁠, each new point is determined by adding a positive quantity $e_{i}$ to the previous one.

3.2 Stochastic number of phases

Under the Bayesian paradigm, a natural but not trivial way is to treat the number of epidemic phases K as a parameter and learn its posterior distribution. The ‘reversible jump’ algorithm (e.g. Richardson & Green, 1997) could be used to explore the joint space of K and within-K models. Here we adopt a different approach and model K as a characteristic of two stochastic models, the PP and variations of the DP (Ferguson, 1973). For both processes, we use the stick-breaking representation, see Miller and Harrison (2018) and Sethuraman (1994) for the PP and DP respectively, facilitating inference for K. The directed acyclic graph (Figure 1) represents the general structure of our modelling framework.

Figure 1.

Directed acyclic graph of the model. Ellipses denote parameters to be learned by the model. The number of phases K is estimated by the DP/PP model or via model selection criteria.

Open in new tab Download slide

Estimating the number of phases of the epidemic and the associated date and magnitude of the $R_{t}$ changes can lead to identifiability problems for $R_{t}$ and its generative quantities, notably the total number of infections. In order to overcome such issues, we explore a multi-stage modelling procedure (e.g. Bhatt et al., 2023) for the stochastic number of phases models under the unobserved cases paradigm. At the first stage, the latent disease cases are estimated using a Gaussian Process (GP) model or a GP mixture with Student-t marginals (Heyde & Leonenko, 2005; Shah et al., 2014) and then the smoothed medians of these latent cases are treated as data with the likelihood given in (4). The GP model was used as our baseline since both models performed similarly based on WAIC. More details for the estimation of cases can be found in the supplementary material.

3.2.1 Poisson point process-based model

We consider that the arrival of new phases in the time horizon (0,T] is driven by a time-homogeneous PP with rate λ, with K growing linearly with time. Hence, following the first epidemic phase, the number, $K - 1$ ⁠, of new phases follows a Poisson distribution with rate $λ * T$ while the duration of each phase a priori follows an Exponential distribution with rate λ. We follow Miller and Harrison (2018) and use the representation:

\begin{aligned} R_{t} & = r_{z_{t}} \\ r_{j} & \sim f (\cdot), r_{j} \in (0, \infty), j = 1, \dots, K \\ z_{t} & \sim Categorical (p_{1 : K}), t = 1, \dots, T \\ p_{K} & = 1 - \sum_{k = 1}^{K} p_{k}, K = min {j : \sum_{i = 1}^{j} T_{i} \geq T} \\ p_{k} & = \frac{T_{k}}{T}, k = 1, \dots, K - 1 \\ T_{i} & \sim Exponential (λ), i = 1, \dots, K_{max} \\ λ & \sim Gamma (0.02, 1) \end{aligned}

(7)

truncating K at $K \max = 100$ ⁠, far higher than data-supported estimates. We use an informative prior on λ with a mean of 0.02, so that the mean of the Poisson distribution for the number of phases $λ T$ is around 5 for 250 days. This is a broad estimate for the prior mean, following a visual inspection of the simulated daily deaths.

3.2.2 DP-based model

An alternative model for the number of phases is based on the DP and its stick-breaking construction. This is based on a stick of unit length, where one samples iteratively the $v_{i}$ from a $Beta (1, θ)$ distribution as a portion $v_{i}$ of the remaining stick, deriving the weights of the DP:

\begin{aligned} R_{t} & = r_{z_{t}} \\ r_{j} & \sim f (\cdot), r_{j} \in (0, \infty), j = 1, \dots, L \\ z_{t} & \sim Categorical (w_{1 : L}), t = 1, \dots, T \\ w_{L} & = \prod_{k < L} (1 - v_{k}), K = \sum_{k = 1}^{L} I {w_{k} \geq 0} \\ w_{l} & = v_{l} * \prod_{j = 1}^{l - 1} (1 - v_{j}), l = 2, 3, \dots, L - 1 \\ w_{1} & = v_{1}, v_{i} \sim Beta (1, θ), i = 1, \dots, L - 1 \\ θ & \sim Gamma (1, 1), \end{aligned}

(8)

where L is the truncation point of the DP, set here to 36. Here, K is increasing with the scaling parameter θ. While inherently very different, the PP and DP models share a similar intuition in their construction. The goal is to construct weights that represent the proportion of each epidemic phase over the complete duration of the study, T. These weights are then used in a Categorical distribution that assigns each daily observation to a specific phase.

4 Synthetic data experiments

Simultaneously learning the parameters and the dimension of a model is typically a challenging statistical task. We assess the performance of our methods by simulating epidemics of various characteristics for 250 days. The epidemic model defined in Section 2 was used for simulating daily infections and deaths. The population size was set at $10^{8}$ with $IFR = 2 %$ ⁠. The discretized infectious period and the infection-to-death interval are described in the supplementary material. The epidemic was simulated with 5 distinct increasing/decreasing phases resembling the observed COVID-19 outbreaks after the introduction of multiple NPIs. The time-varying reproduction number was set as follows:

R_{t} = {\begin{matrix} 1.5, & t \leq 60 \\ 0.95, & 60 < t \leq 100 \\ 1.35, & 100 < t \leq 150 \\ 0.8, & 150 < t \leq 200 \\ 1.8, & 200 < t \leq 250 \end{matrix}

Using the model in (6) and the daily deaths as data (unobserved infections regime) the lowest WAIC and LOO selected six phases (Table 1). Models with varying (5, 6, and 7) number of phases incorrectly identified the first 10 days of the simulation as a distinct phase (Figure 2). This can be attributed to the lack of information at the start, where we essentially only draw information from our prior distribution, a common issue in epidemic models. Following this period the model with six phases correctly identifies the different epidemic phases, including their timing and magnitude of change. The total daily infections (Figure 2) are also accurately recovered. Inference was initiated on the day that 10 cumulative deaths were observed. Plots for the other models may be found in the supplementary material.

Figure 2.

Simulation and estimates based on observing deaths. (a) Simulated (triangles) and estimated daily infections with 95% credible intervals (line) and (b) Real (solid line) and estimated reproduction number $R_{t}$ with 95% credible intervals (dashed line).

Open in new tab Download slide

Table 1.

Open in new tab

Fixed number of phases—simulated dataset

Number of phases	WAIC	LOO
Five phases	3,635.0	3,635.1
Six phases	2,252.9	2,253.4
Seven phases	2,260.6	2,261.8

Table 1.

Open in new tab

Fixed number of phases—simulated dataset

Number of phases	WAIC	LOO
Five phases	3,635.0	3,635.1
Six phases	2,252.9	2,253.4
Seven phases	2,260.6	2,261.8

In addition to the findings that the models correctly select the right complexity, it is interesting to summarize the model behaviour when investigating model misspecification. Broadly, these findings may be summarized as follows; when we fix the number of phases to be smaller than the true one then the model is correctly recovering the early phases while it is averaging the final ones leading to poorly fitted models. In contrast, when fixing K to be larger than the true one, we recover the true patterns and get a good fit. Hence, overestimating the number of phases does not materially affect the recovery of the true signal. A list of detailed results is outlined in the supplement.

When fitting the models with a stochastic number of phases using as data the daily infections (observed infections regime), both the PP and DP models precisely estimate the number of epidemic phases, the time of change and the true $R_{t}$ value (Figure 3). The model was run for 1,00,000 iterations and 8 chains. The analysis based on observing deaths is included in the supplementary material (unobserved infections regime—multi-stage approach). Briefly, the intermediate phases of the epidemic are well estimated while the first and final phases are recovered with noise. The level of smoothing of the noisy estimated cases affects the estimation of the reproduction number; the smoother the estimation of cases the smoother and with fewer phases the reproduction number is recovered.

Figure 3.

True (solid line) and estimated reproduction number $R_{t}$ with 95% credible intervals (dashed line) based on observing infections. (a) Dirichlet process model and (b) Poisson process model.

Open in new tab Download slide

We utilized readily available software such as Rstan and Nimble within the R statistical programming language. In Rstan, we used the No-U-Turn Sampler (NUTS) algorithm for the model in (6), while in Nimble, we used a combination of the Random Walk Metropolis–Hastings sampler and a categorical sampler for the DP and PP models. The code replicating all the experiments may be found in https://github.com/pbarmpounakis/Multiphasic-stochastic-epidemic-models.

5 Real-data application

5.1 Data description and prepocessing

The models were fitted to daily reported deaths (unobserved cases regime) from two US states, California and New York, and two European countries, the UK and Greece. The data are accessible from John Hopkins University and ECDC and the time horizon ran to the end of June 2021 when many NPIs were lifted. Due to a lack of data availability, the model does not account for reinfections. The age-standardized IFR for each country was informed by the meta-analysis from COVID-19 Forecasting Team (2022) accounting for time, geography, and population characteristics. We allowed the IFR to vary over time, accounting for the age structure of those infected, the burden of health systems and amendments in treating the disease. The infection-to-death time and generation interval were given a Gamma distribution with (mean, standard deviation) set to (19, 8.5) and (6.5, 4.4) days, respectively following Flaxman et al. (2020).

5.2 Analyses and results

California was one of the first US states to report cases on the 26th of January, 2020. A state of emergency was declared on March 4, 2020, and mass/social gatherings were banned while a mandatory statewide stay-at-home order was issued on 19 March 2020. We fitted the model to daily deaths (unobserved cases regime) and using WAIC/LOO selected seven phases. Figures 4 and 5 suggest that $R_{e} (t)$ was reduced after imposing restrictions and fell below the critical value of 1 after April 2020 when school closure was decided for the remainder of the 2019–2020 academic year. The epidemic remained under control until the summer of 2020 when $R_{e} (t)$ jumped slightly above 1 following a gradual relaxation of measures. On 31 August 2020, a new set of measures called ‘Blueprint for a Safer Economy’ was applied and all models show that they were effective, alongside the gained immunity of the population, at reducing the effective reproduction number below one and keeping the epidemic under control until the first half of October 2020. All models estimate a sharp increase in $R_{e} (t)$ ⁠, which resulted in an increase in the daily reported cases and deaths between November 2020 and January 2021. Nighttime curfew and regional stay-at-home orders were announced at the start of December 2020 whence $R_{e} (t)$ remained stable and began declining. The initiation of the vaccination program in early 2021 brought the epidemic under control with $R_{e} (t)$ remaining below 1.

Figure 4.

Estimation of effective reproduction number $R_{e} (t)$ with 95% credible intervals (solid and dashed lines) based on observing deaths, fixed number of phases model. (a) California state. (b) New York state. (c) The UK and (d) Greece.

Open in new tab Download slide

Figure 5.

Estimation of effective reproduction number $R_{e} (t)$ with 95% credible intervals (solid and dashed lines) based on observing deaths, multi-stage approach. (a) California state—DP model. (b) California state—PP model. (c) New York state—DP model and (d) New York state—PP.

Open in new tab Download slide

New York state had, by 10 April 2020, more confirmed cases than any country outside the US and was heavily affected at the start of the pandemic, with daily recorded deaths reaching a thousand in April. On 15 March, all New York City schools were closed and on 20 March state-wide stay-at-home order was declared. As a result, the models show a drop of $R_{e} (t)$ below 1 from mid-March 2020 until August 2020 (Figures 4 and 5). The best-performing models based on WAIC and LOO had eight distinct phases. This model estimates that after the summer of 2020, $R_{e} (t)$ remained above 1 up until the start of 2021 with a small increase during November and the holiday season. The DP and PP models (unobserved cases regime $-$ multi-stage approach) show similar estimates for $R_{e} (t)$ (Figure 5).

For the UK, a model with nine phases was selected by WAIC and LOO. Until early March 2020, when a lockdown was imposed we estimate that $R_{t} \approx 3.5$ (Figure 4). These measures were lifted in early June and during the lockdown $R_{e} (t)$ remained below 1, and therefore under control. After the summer $R_{e} (t)$ increased above 1 and the so-called rule of six was imposed while on 5 November 2020, the second lockdown was announced. The number of reported deaths was reduced after the initiation of the vaccination program on 4 January 2021. Virtually identical estimates for the UK $R_{e} (t)$ are inferred by the DP and PP models (figures and additional details in the supplementary material).

We conducted an independent (or ‘external’) validation of the model performance based upon REACT-2, an antibody prevalence study conducted in the UK with the participation of more than 100,000 adults (Ward et al., 2021). This appeared like an optimal choice as it took place in early July 2020 when waning immunity was relatively unlikely. As such, it provides a reasonable estimate of the total disease burden up to that time. The estimated prevalence for the adult population (children were excluded) was 6.0% (95% CI: 5.8, 6.1) and our estimate for the whole population is 7.5% (95% credible intervals (Cr.I.): 5.7, 10.) (Figure 6) well compatible with that independent estimate.

Figure 6.

Cumulative sum of estimated daily infections with 95% credible intervals (dashed lines) and the estimation of REACT-2 with 95% confidence intervals (solid lines) for the UK.

Open in new tab Download slide

For Greece WAIC and LOO selected the eight-phases model. At the starting phase, we estimate $R_{e} (t) = 3.36$ $(s d = 0.88)$ and a decrease below 1 in the first half of March 2020 (Figure 4). On 10 March, the government suspended most activities, including educational, shopping, and recreational while a week later all nonessential movement was restricted. The $R_{e} (t)$ estimate remained below 1 until early June 2020 when it increased following the lifting of restrictions. During summer $R_{e} (t)$ remained over 1 until November 2020 since a case spike in October led to new measures. Similar estimates for the $R_{e} (t)$ are obtained by the DP and PP models (Supplementary material).

The computation time was similar for the PP and DP models with the DP being faster. More importantly, we get valuable insights into the effectiveness of the measures imposed by the governments. For New York and the UK, it appears that the school closures and the strict lockdowns predate the reductions in transmissibility, and we can assume that they were pivotal in achieving that. California and Greece adopted the measures before a large first wave, like other EU countries and US states. All regions were similar when these two measures were relaxed: multiple epidemic waves emerged and the estimated $R_{e} (t)$ remained above 1. These findings closely align with similar results documented in the existing literature, where authors found that the national lockdowns consistently helped with the spread of the disease (Flaxman et al., 2020; Knock et al., 2021).

The results of our simulation experiments corroborate the findings of the application to real data from different areas. The time-ordering of the data facilitates avoiding label-switching problems typically encountered when fitting mixture-type models. By selecting the number of phases we capture mortality changes in all the real-world examples (Figure 7). The DP and PP models can infer a higher number of phases but the conclusions are not materially affected. This observation is in line with Rousseau and Mengersen (2011) who show a generally stable behaviour of such so-called overfitted mixture models, theoretically verifying the robust behaviour of the developed models.

Figure 7.

Reported (triangles) and estimated deaths with 95% credible intervals (solid and dashed lines) based on observing deaths, fixed number of phases model. (a) California state. (b) New York state. (c) The UK and (d) Greece.

Open in new tab Download slide

6 Discussion

In this article, we propose three models for the transmission mechanism of infectious diseases with multiple epidemic phases. We use freely available data to estimate the points in time when transmissibility changes and the realized magnitude of the NPI effects. We focus on publicly available data, as data availability and usability were raised as concerns during the acute COVID-19 pandemic phase. Several potentially useful datasets, including those regarding the pressure on a country’s health system, such as the number of patients in ICU beds, may not be generally available.

A number of interventions were applied in overlapping time intervals, and identifiability issues can arise when trying to disentangle individual effects and the associated time lags. In particular, it is generally hard to estimate the timing of the effect and the possible time lag between cause and effect. In this work, we retrospectively assess the effect of the NPIs by comparing changes in the reproduction number with the dates that these measures were imposed. Such a purely evidence-based approach represents an alternative to forced assumptions that aim for (potentially unwarranted) causal statements about intervention effects. Our code is made freely available, and the reliance on publicly available data facilitates reproducibility and potential uptake by other researchers.

Selecting the number of phases requires multiple runs and the computation time can be an issue when nowcasting is essential for decision-making. Estimating the number of phases via the DP and PP models represents an alternative approach that is computationally efficient and statistically robust. The DP and PP models may overestimate the number of epidemic phases and this issue is discussed in detail in Rousseau and Mengersen (2011), see also Miller and Harrison (2013). In our setting, this effect essentially relates to the start and end of the epidemic and the inherent challenges of limited information. At the start of the epidemic, such uncertainty dictates that estimates should be interpreted with caution. In the end, this is less of an issue and is mostly due to the time lag between cases and deaths. When one is working with the observed infections these issues largely disappear and inference is generally accurate throughout the duration of the data as indicated by our simulation experiments.

In times of crisis, there is a demand for rapid predictions that facilitate decision support and one may employ the multi-stage approach of the DP or PP models to gauge the evolution of the transmissibility. After obtaining an initial estimate of the reproduction number from these models, a more focused exploration of the number of phases using the deterministic number of phases model could be conducted. This targeted investigation aims to minimize the need for extensive model fitting by narrowing down the range of potential values for the number of epidemic phases, K.

It is not apparent how our models can be optimally used for predicting new epidemic phases. Now-casting or short-term future predictions are typically performed under the assumption that the main conditions, including transmissibility, remain the same, or by simulating particular scenarios. In this article, we focus on the retrospective assessment of the imposed NPIs and the inferred quantities may be used in feeding forward simulations in a similar manner.

The models developed in this work assume a homogeneous and homogeneously mixed population similar to many studies on SARS-CoV-2 transmission. This assumption may be appropriate for large populations such as working at the state or country level since functional central limit theorems can reasonably be thought of as applicable (e.g. Andersson & Britton, 2000). Incorporating additional information or structure from data sources such as hospitalizations, seroprevalence studies, or vaccination programs is desirable in principle but may also pose challenges. The inclusion of data sources such as those discussed in Knock et al. (2021) would necessitate model expansion. Our models can naturally be extended when more detailed information is available and this is the subject of current research.

Acknowledgments

The authors are grateful to the two referees whose comments substantially improved the content and presentation of this work. We also wish to thank Kostas Kalogeropoulos and Petros Dellaportas for useful comments on an earlier version of this article and the editor and associate editor for a number of useful suggestions.

Funding

This article is part of the first author’s doctoral thesis, co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme ‘Human Resources Development, Education and Lifelong Learning’ in the context of the Act ‘Enhancing Human Resources Research Potential by undertaking a Doctoral Research’ Sub-action 2: State Scholarships Foundation (IKY, Greece) Scholarship Programme for PhD candidates in the Greek Universities.

Data availability

All the data used in the synthetic data experiments resulted from simulation studies, which were carefully designed to replicate realistic conditions and are detailed within the methodology to provide transparency and allow for potential replication by others. The data for the real-data application were derived from publicly available sources, no proprietary or restricted data were used. All real-world data are collectively available at COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University at GitHub.

Author contributions

P.B. and N.D. conceived the experiment(s); P.B. conducted the experiment(s); P.B. and N.D. analysed the results; P.B. and N.D. wrote and reviewed the manuscript.

Supplementary material

Supplementary material is available online at Journal of the Royal Statistical Society: Series C.

References

Andersson

, &

Britton

(

2000

Stochastic epidemic models and their statistical analysis

Lecture notes in statistics

(2000 ed.).

Springer

Bhatt

Ferguson

Flaxman

Gandy

Mishra

, &

Scott

J. A.

(

2023

Semi-mechanistic Bayesian modelling of COVID-19 with renewal processes

Journal of the Royal Statistical Society Series A: Statistics in Society

186

(

601

–

615

10.1093/jrsssa/qnad030

Google Scholar

Crossref

WorldCat

Birrell

Blake

van Leeuwen

Gent

, &

De Angelis

(

2021

Real-time nowcasting and forecasting of covid-19 dynamics in England: The first wave

Philosophical Transactions of the Royal Society B: Biological Sciences

376

(

1829

), Article 20200279.

10.1098/rstb.2020.0279

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Champredon

Dushoff

, &

Earn

D. J. D.

(

2018

Equivalence of the Erlang-distributed Seir epidemic model and the renewal equation

SIAM Journal on Applied Mathematics

(

3258

–

3278

Chatzilena

Demiris

, &

Kalogeropoulos

(

2022

A modelling framework for the analysis of the transmission of SARS-CoV2

Cori

Ferguson

N. M.

Fraser

, &

Cauchemez

(

2013

A new framework and software to estimate time-varying reproduction numbers during epidemics

American Journal of Epidemiology

178

(

1505

–

1512

COVID-19 Forecasting Team

(

2022

Variation in the COVID-19 infection-fatality ratio by age, time, and geography during the pre-vaccine era: A systematic analysis

Lancet

399

(

10334

1469

–

1488

10.1016/S0140-6736(21)02867-1

Crossref

PubMed

WorldCat

Creswell

Robinson

Gavaghan

Parag

K. V.

Lei

C. L.

, &

Lambert

(

2023

A Bayesian nonparametric method for detecting rapid changes in disease transmission

Journal of Theoretical Biology

558

, Article 111351.

10.1016/j.jtbi.2022.111351

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Demiris

Kypraios

, &

Smith

L. V.

(

2014

On the epidemic of financial crises

Journal of the Royal Statistical Society. Series A (Statistics in Society)

177

(

697

–

723

Durrett

, &

Levin

(

1994

The importance of being discrete (and spatial)

Theoretical Population Biology

(

363

–

394

10.1006/tpbi.1994.1032

Google Scholar

Crossref

WorldCat

Ferguson

T. S.

(

1973

A Bayesian analysis of some nonparametric problems

Annals of Statistics

(

209

–

230

10.1214/aos/1176342360

Google Scholar

Crossref

WorldCat

Flaxman

Mishra

Gandy

Unwin

H. J. T.

Mellan

T. A.

Coupland

Whittaker

Zhu

Berah

Eaton

J. W.

Monod

Perez-Guzman

P. N.

Schmit

Cilloni

Ainslie

K. E. C.

Baguelin

Boonyasiri

Boyd

Cattarino

, …

Team

I. C. C.-. R.

(

2020

Estimating the effects of non-pharmaceutical interventions on covid-19 in Europe

Nature

584

(

7820

257

–

261

10.1038/s41586-020-2405-7

Geng

Katul

G. G.

Gerges

Bou-Zeid

Nassif

, &

Boufadel

M. C.

(

2021

A kernel-modulated sir model for covid-19 contagious spread from county to continent

Proceedings of the National Academy of Sciences of the United States of America

118

(

e2023321118

10.1073/pnas.2023321118

Heyde

C. C.

, &

Leonenko

N. N.

(

2005

Student processes

Advances in Applied Probability

(

342

–

365

10.1239/aap/1118858629

Google Scholar

Crossref

WorldCat

, &

Geng

(

2021

Heterogeneity learning for sirs model: An application to the covid-19

Statistics and Its Interface

(

–

Jiang

Zhao

, &

Shao

(

2021

Modelling the covid-19 infection trajectory: A piecewise linear quantile trend model

Journal of the Royal Statistical Society: Series B, Statistical Methodology

(

1589

–

1607

Knock

E. S.

Whittles

L. K.

Lees

J. A.

Perez-Guzman

P. N.

Verity

FitzJohn

R. G.

Gaythorpe

K. A. M.

Imai

Hinsley

Okell

L. C.

Rosello

Kantas

Walters

C. E.

Bhatia

Watson

O. J.

Whittaker

Cattarino

Boonyasiri

Djaafara

B. A.

, …

Baguelin

(

2021

Key epidemiological drivers and impact of interventions in the 2020 SARS-CoV-2 epidemic in England

Science Translational Medicine

(

602

eabg4262

10.1126/scitranslmed.abg4262

Y. I.

Turk

Rohrbach

P. B.

Pietzonka

Kappler

Singh

Dolezal

Ekeh

Kikuchi

Peterson

Bolitho

Kobayashi

Cates

M. E.

Adhikari

, &

Jack

R. L.

(

2021

Efficient Bayesian inference of fully stochastic epidemiological models with applications to COVID-19

Royal Society Open Science

(

), Article 211065.

10.1098/rsos.211065

Google Scholar

OpenURL Placeholder Text

WorldCat

Crossref

Lipsitch

Cohen

Cooper

Robins

J. M.

James

Gopalakrishna

Chew

S. K.

Tan

C. C.

Samore

M. H.

Fisman

, &

Murray

(

2003

Transmission dynamics and control of severe acute respiratory syndrome

Science

300

(

5627

1966

–

1970

10.1126/science.1086616

Lloyd-Smith

J. O.

Schreiber

S. J.

Kopp

P. E.

, &

Getz

W. M.

(

2005

Superspreading and the effect of individual variation on disease emergence

Nature

438

(

7066

355

–

359

Miller

J. W.

, &

Harrison

M. T.

(

2013

Inconsistency of Pitman-Yor process mixtures for the number of components

OpenURL Placeholder Text

WorldCat

Miller

J. W.

, &

Harrison

M. T.

(

2018

Mixture models with a prior on the number of components

Journal of the American Statistical Association

113

(

521

340

–

356

10.1080/01621459.2016.1255636

Mollison

(

1991

Dependence of epidemic and population velocities on basic parameters

Mathematical Biosciences

107

(

255

–

287

10.1016/0025-5564(91)90009-8

Pellis

Birrell

P. J.

Blake

Overton

C. E.

Scarabel

Stage

H. B.

Brooks-Pollock

Danon

Hall

House

T. A.

Keeling

M. J.

Read

J. M.

JUNIPER Consortium

, &

De Angelis

. (

2022

Estimation of reproduction numbers in real time: Conceptual and statistical challenges

Journal of the Royal Statistical Society: Series A (Statistics in Society)

185

(

S112

–

S130

Richardson

, &

Green

P. J.

(

1997

On Bayesian analysis of mixtures with an unknown number of components (with discussion)

Journal of the Royal Statistical Society: Series B, Statistical Methodology

(

731

–

792

10.1111/1467-9868.00095

Google Scholar

Crossref

WorldCat

Rousseau

, &

Mengersen

(

2011

Asymptotic behaviour of the posterior distribution in overfitted mixture models

Journal of the Royal Statistical Society: Series B, Statistical Methodology

(

689

–

710

10.1111/j.1467-9868.2011.00781.x

Google Scholar

Crossref

WorldCat

Sethuraman

(

1994

A constructive definition of Dirichlet priors

Statistica Sinica

(

639

–

650

Google Scholar

OpenURL Placeholder Text

WorldCat

Shah

Wilson

, &

Ghahramani

(

2014

Student-t processes as alternatives to Gaussian processes

Journal of Machine Learning Research

877

–

885

Google Scholar

OpenURL Placeholder Text

WorldCat

Shen

Ning

Zhou

Lin

Chin

D. P.

Zhu

, &

Schuchat

(

2004

Superspreading SARS events, Beijing, 2003

Emerging Infectious Diseases

(

256

–

260

10.3201/eid1002.030732

Svensson

Å.

(

2007

A note on generation times in epidemic models

Mathematical Biosciences

208

(

300

–

311

10.1016/j.mbs.2006.10.010

Svensson

Å.

(

2015

The influence of assumptions on generation time distributions in epidemic models

Mathematical Biosciences

270

–

10.1016/j.mbs.2015.10.006

Vehtari

Gelman

, &

Gabry

(

2017

Practical Bayesian model evaluation using leave-one-out cross-validation and Waic

Statistics and Computing

(

1413

–

1432

10.1007/s11222-016-9696-4

Google Scholar

Crossref

WorldCat

Ward

Atchison

Whitaker

Ainslie

K. E. C.

Elliott

Okell

Redd

Ashby

Donnelly

C. A.

Barclay

Darzi

Cooke

Riley

, &

Elliott

(

2021

SARS-CoV-2 antibody prevalence in England following the first peak of the pandemic

Nature Communications

(

905

10.1038/s41467-021-21237-w

Watanabe

(

2013

A widely applicable Bayesian information criterion

Journal of Machine Learning Research

(

867

–

897

Google Scholar

OpenURL Placeholder Text

WorldCat

Wistuba

Mayr

, &

Staerk

(

2022

Estimating the course of the covid-19 pandemic in Germany via spline-based hierarchical modelling of death counts

Scientific Reports

(

9784

10.1038/s41598-022-13723-y

Author notes

Conflicts of interest: The authors declare that there is no conflict of interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
November 2024	158
December 2024	30
January 2025	60
February 2025	66
March 2025	80
April 2025	83

Article Contents

Multiphasic stochastic epidemic models

Abstract

1 Introduction

2 Modelling disease transmission

2.1 Model definition and related characterizations

2.2 Observation regimes

2.2.1 Observed infections

2.2.2 Unobserved cases

3 Epidemic complexity determination

3.1 Deterministic number of phases

3.2 Stochastic number of phases

3.2.1 Poisson point process-based model

3.2.2 DP-based model

4 Synthetic data experiments

5 Real-data application

5.1 Data description and prepocessing

5.2 Analyses and results

6 Discussion

Acknowledgments

Funding

Data availability

Author contributions

Supplementary material

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Multiphasic stochastic epidemic models

Abstract

1 Introduction

2 Modelling disease transmission

2.1 Model definition and related characterizations

2.2 Observation regimes

2.2.1 Observed infections

2.2.2 Unobserved cases

3 Epidemic complexity determination

3.1 Deterministic number of phases

3.2 Stochastic number of phases

3.2.1 Poisson point process-based model

3.2.2 DP-based model

4 Synthetic data experiments

5 Real-data application

5.1 Data description and prepocessing

5.2 Analyses and results

6 Discussion

Acknowledgments

Funding

Data availability

Author contributions

Supplementary material

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only