-
PDF
- Split View
-
Views
-
Cite
Cite
Petros Barmpounakis, Nikolaos Demiris, Multiphasic stochastic epidemic models, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 74, Issue 2, March 2025, Pages 491–505, https://doi.org/10.1093/jrsssc/qlae064
- Share Icon Share
Abstract
At the onset of the COVID-19 pandemic, various non-pharmaceutical interventions aimed to reduce infection levels, leading to multiple phases of transmission. The disease reproduction number, , quantifies transmissibility and is central to evaluating these interventions. This article discusses hierarchical stochastic epidemic models with piece-wise constant , suitable for capturing distinct epidemic phases and estimating disease magnitude. The timing and scale of changes are inferred from data, while the number of phases is allowed to vary. The model uses Poisson point processes and Dirichlet process components to learn the number of phases, providing insight into epidemic dynamics. We test the models on synthetic data and apply them to freely available data from the UK, Greece, California, and New York. We estimate the true number of infections and and independently validate this approach via a large seroprevalence study. The results show that key disease characteristics can be derived from publicly available data without imposing strong assumptions.
1 Introduction
At the start of 2020 the emergence of COVID-19, an infectious disease caused by the virus SARS-CoV-2, has placed health systems around the globe under immense pressure. In March 2020, the World Health Organization declared COVID-19 as a global pandemic, and as of the end of September 2022, more than 6.5 million have died due to illness or complications of it. At the beginning of the pandemic in the absence of available vaccines or suitable medication, the majority of governments around the globe resorted to Non-Pharmaceutical-Interventions (NPIs) in an attempt to stop the exponential spreading of the virus and reduce transmissibility. Such NPIs involved measures like work-from-home policies, school, and university closures, stay-at-home guidance for people in high-risk groups and full lockdowns.
These measures had an effect on reducing the transmissibility and resulted in spreading trajectories that could not be properly described by the standard epidemic models due to the resulting multiphasic nature of transmission. The first systematic technique to assess these interventions was due to Flaxman et al. (2020) who proposed a renewal equation model whose infection dynamics were modelled through a multilevel framework incorporating NPIs. We amend this model by inferring the points in time at which the transmissibility changes as well as the magnitude of infectiousness in a data-driven manner. We determine the number of phases by using appropriate stochastic processes based upon variations of the Poisson process (PP) and Dirichlet process (DP)-based priors via their stick-breaking constructions (Miller & Harrison, 2018; Sethuraman, 1994).
Several models have been proposed in the literature for the estimation of multiphasic infectious diseases, particularly COVID-19. Briefly, a stochastic Susceptible-Exposed-Infectious-Removed (SEIR) model with a regression framework for the effect of the NPIs on transmissibility is used in Knock et al. (2021) while Birrell et al. (2021), Li et al. (2021), and Chatzilena et al. (2022) use stochastic SEIR models where the transmission mechanism is described by a system of non-linear ordinary differential equations and the transmission rate is modelled by a diffusion process. Modelling the transmission rate as a random walk facilitates gradual and smooth changes in time. A piecewise linear quantile trend model was proposed by Jiang et al. (2021), a kernel-based SIR model distinguishing the different phases of the transmissibility in space was developed by Geng et al. (2021) while Wistuba et al. (2022) incorporated splines to estimate the reproduction number in Germany.
Simpler forms of deterministic and stochastic multiphasic epidemic models have been considered before. In the context of modelling SARS-CoV-2 transmission, (Flaxman et al., 2020) used an approach with a fixed number, date, and scale of the change. Related work based upon variations of DP mixtures is presented in Hu and Geng (2021) and Creswell et al. (2023). In the former, the authors used a mixture of finite mixtures (MFM) model on a Susceptible-Infected-Recovered-Susceptible model, while in the latter the authors used a suitably modified Pitman-Yor process but only fitted it to the reported cases, thus dispensing with the effort to estimate the complete epidemic burden and the suitable adjustment for the reproduction number. The main advantage of this class of methods, based on piece-wise constant transmission rates, is the intuitive characterization of the epidemic in terms of multiple phases of transmissibility. The number and magnitude of the distinct phases are determined purely by data without forced assumptions on the effect of policy changes and NPIs. This approach should be central to a retrospective assessment of the NPIs: an evidence-based method for estimating the timing and effect of those interventions, minimizing the risk of introducing several types of bias.
The article is organized as follows. In Section 2, we define the proposed compartmental process, elucidate its equivalence with renewal process-based models and describe the observation regimes of the data. In Section 3, we complete the model definition by characterizing the complexity regimes. Section 4 assesses the proposed models via simulation experiments while Section 5 contains the application to data from California and New York state, the UK and Greece. The article concludes with a discussion.
2 Modelling disease transmission
2.1 Model definition and related characterizations
The methodology for modelling the time-varying disease transmissibility has been implemented under two distinct but equivalent models, the compartmental Susceptible-Infectious-Removed (SIR) model and the seemingly simpler time-since-infection model with population susceptibility reduction. Here, we define both models and delineate their equivalence.
The model assumes that the population has size n, is closed (demographic changes during the course of the epidemic are ignored) homogeneous and homogeneously mixing. In the stochastic SIR model, an infected individual makes contact with any other individual on day t at the points of a PP with time-varying intensity . This scaling is commonly adopted as it makes the contact process independent of the size of the population (e.g. Andersson & Britton, 2000). If these (close) contacts of an infected individual occur with a susceptible individual they result in an infection. Each individual remains infectious for a random time period Y. All PPs in this construction are assumed to be independent. The disease reproduction number is defined as , , where T is the time horizon of the study.
The infection rate will generally be assumed piece-wise constant in this article and further details are given below. For this model, let denote the number of new infections on day t. The expected number of new infections on day is given by:
where represents the number of individuals that remain susceptible:
and denotes the active set of infectives:
and the indicator function of the event that the individual j, infected on day s, remains infectious on day t. This event is implicitly determined by the disease characteristics. Then, (1) can be rewritten as
since and denotes the generation interval which characterises the time from the infection of an individual until they generate their first infection, see for example (Champredon et al., 2018; Svensson, 2007, 2015). Such approximation works reasonably well in homogeneous populations but there are notable exceptions, including on spatial epidemics, see Mollison (1991) and Durrett and Levin (1994) for extensive discussion and counter examples.
Note that equation (4) is used in the commonly adopted technique of Cori et al. (2013) for estimating the instantaneous reproduction number and this approach gives a different justification to the derivation of the correction factor in Bhatt et al. (2023). The term accounts for the depletion of the susceptible population and is sometimes ignored when the aim of modelling disease transmissibility is somewhat different. We will use equation (4) for the subsequent analyses as it is computationally more efficient compared to the SIR structure. One should also consider potential ‘superspreading’ events when certain individuals infected unusually large numbers of secondary cases (Lipsitch et al., 2003; Shen et al., 2004). We account for this variability assuming that the individual reproduction number is gamma distributed with mean and dispersion parameter , yielding (Lloyd-Smith et al., 2005), where is equal to of equation (4).
The reproduction number is of great practical interest as it is used to assess if the epidemic is growing or shrinking. Here, we consider two distinct instances of reproduction number. The effective reproduction number describes the expected number of secondary cases generated by an infectious individual. Then, and indicate that the epidemic is growing or shrinking, respectively and reducing below unity is the typical target of public health authorities. In contrast, quantifies contacts that may not always result in new infections, due to mixing with the immune proportion of the population. Therefore, does not necessarily mean that the epidemic is growing. A detailed discussion about reproduction numbers can be found in Pellis et al. (2022).
2.2 Observation regimes
We consider two distinct observation regimes, one where the observed number of cases corresponds to the total number of infections and one where the total number of infections is indirectly estimated, outlined below.
2.2.1 Observed infections
The regime where the total number of infections is observed may be of interest in its own right but may also be used for certain transmissible diseases, for example in the analysis of influenza-like-illness data when seroprevalence study information is available. Epidemic models are attractive for analysing such data and are naturally defined in terms of infector–infectee pair and the timing of such events. In reality, however, this type of data is rarely available. Disease monitoring is based on the daily reported infections, which are known to be susceptible to multiple problems, including a time lag between the timing of infection and symptom onset or testing positive.
In the case of COVID-19, a large proportion of the population experiences asymptomatic or mild disease (Ward et al., 2021) leading to severe under-reporting. Inference about the reproduction number can be robust when the reported cases are used if depletion of the susceptible population is accounted for, or if the observed proportion of cases remains constant over time. One way to validate this assumption is by sequentially performing seroprevalence studies to estimate the true disease prevalence and the proportion of unreported incidences. However, such information was not available in most countries. In the following subsection, we describe an alternative approach that dispenses with the need for this assumption.
2.2.2 Unobserved cases
The case where infections may not be directly observed has been studied in a different context by Demiris et al. (2014). In the case of the pandemic, it became immediately apparent that the observed number of infections only partially accounts for the complete epidemic burden. An alternative technique was proposed by Flaxman et al. (2020) where the true cases were estimated by back-calculating infections from the daily reported deaths which are likely less prone to under-reporting. This method has the additional advantage of yielding an estimate of . We adopt this approach by introducing another hierarchical level into our model and the daily deaths are linked with the true cases via:
where and are the mean and dispersion parameters of the negative binomial distribution, respectively. IFR is the infection fatality ratio and π is the distribution of the time from infection to death, with denoting the probability an infected individual will die i days after becoming infected. Accurate estimates of the IFR and π are necessary for estimating incidence, which is treated here as a latent parameter. The IFR and π parameters may be calculated independently from external data or in a single stage, leveraging additional evidence from seroprevalence studies.
3 Epidemic complexity determination
The number of phases may be treated as a fixed but unknown integer or as a random quantity to be modelled and estimated from data. We describe two such frameworks in the following two subsections.
3.1 Deterministic number of phases
In Flaxman et al. (2020), the number of phases was a priori selected and the times when the reproduction number changed were also predefined. The dates of these points were informed by the NPIs implemented by each government leading to a piece-wise constant reproduction number , effectively assuming the immediate effect of those NPIs. We study the limitations of this approach in the case when someone makes inaccurate assumptions regarding the start or the end of an epidemic phase or indeed their number. We do this using the simulated data of the Section 4 and present the results in the Supplementary material. We follow Flaxman et al. (2020) considering as a piece-wise constant function and infer the point in time and magnitude of changes directly from the data. The number, K, of epidemic phases is investigated using models with different K values and the best model is selected using the Watanabe–Akaike information criterion (WAIC) (Watanabe, 2013) and approximate leave-one-out cross-validation (LOO) (Vehtari et al., 2017). Note that the time-ordering of the data suggests that k-step ahead forecasting would be a better predictive model selection process. However, model complexity renders such repeated evaluations practically infeasible and therefore we proceed with the commonly used approximation via information criteria. The model is defined as follows:
where is any suitable positive defined probability distribution. We initially search for the first point in time, , when the epidemic changes phase for the first time. This search spans the entire duration of our study. To avoid identifiability issues with timepoints , each new point is determined by adding a positive quantity to the previous one.
3.2 Stochastic number of phases
Under the Bayesian paradigm, a natural but not trivial way is to treat the number of epidemic phases K as a parameter and learn its posterior distribution. The ‘reversible jump’ algorithm (e.g. Richardson & Green, 1997) could be used to explore the joint space of K and within-K models. Here we adopt a different approach and model K as a characteristic of two stochastic models, the PP and variations of the DP (Ferguson, 1973). For both processes, we use the stick-breaking representation, see Miller and Harrison (2018) and Sethuraman (1994) for the PP and DP respectively, facilitating inference for K. The directed acyclic graph (Figure 1) represents the general structure of our modelling framework.

Directed acyclic graph of the model. Ellipses denote parameters to be learned by the model. The number of phases K is estimated by the DP/PP model or via model selection criteria.
Estimating the number of phases of the epidemic and the associated date and magnitude of the changes can lead to identifiability problems for and its generative quantities, notably the total number of infections. In order to overcome such issues, we explore a multi-stage modelling procedure (e.g. Bhatt et al., 2023) for the stochastic number of phases models under the unobserved cases paradigm. At the first stage, the latent disease cases are estimated using a Gaussian Process (GP) model or a GP mixture with Student-t marginals (Heyde & Leonenko, 2005; Shah et al., 2014) and then the smoothed medians of these latent cases are treated as data with the likelihood given in (4). The GP model was used as our baseline since both models performed similarly based on WAIC. More details for the estimation of cases can be found in the supplementary material.
3.2.1 Poisson point process-based model
We consider that the arrival of new phases in the time horizon (0,T] is driven by a time-homogeneous PP with rate λ, with K growing linearly with time. Hence, following the first epidemic phase, the number, , of new phases follows a Poisson distribution with rate while the duration of each phase a priori follows an Exponential distribution with rate λ. We follow Miller and Harrison (2018) and use the representation:
truncating K at , far higher than data-supported estimates. We use an informative prior on λ with a mean of 0.02, so that the mean of the Poisson distribution for the number of phases is around 5 for 250 days. This is a broad estimate for the prior mean, following a visual inspection of the simulated daily deaths.
3.2.2 DP-based model
An alternative model for the number of phases is based on the DP and its stick-breaking construction. This is based on a stick of unit length, where one samples iteratively the from a distribution as a portion of the remaining stick, deriving the weights of the DP:
where L is the truncation point of the DP, set here to 36. Here, K is increasing with the scaling parameter θ. While inherently very different, the PP and DP models share a similar intuition in their construction. The goal is to construct weights that represent the proportion of each epidemic phase over the complete duration of the study, T. These weights are then used in a Categorical distribution that assigns each daily observation to a specific phase.
4 Synthetic data experiments
Simultaneously learning the parameters and the dimension of a model is typically a challenging statistical task. We assess the performance of our methods by simulating epidemics of various characteristics for 250 days. The epidemic model defined in Section 2 was used for simulating daily infections and deaths. The population size was set at with . The discretized infectious period and the infection-to-death interval are described in the supplementary material. The epidemic was simulated with 5 distinct increasing/decreasing phases resembling the observed COVID-19 outbreaks after the introduction of multiple NPIs. The time-varying reproduction number was set as follows:
Using the model in (6) and the daily deaths as data (unobserved infections regime) the lowest WAIC and LOO selected six phases (Table 1). Models with varying (5, 6, and 7) number of phases incorrectly identified the first 10 days of the simulation as a distinct phase (Figure 2). This can be attributed to the lack of information at the start, where we essentially only draw information from our prior distribution, a common issue in epidemic models. Following this period the model with six phases correctly identifies the different epidemic phases, including their timing and magnitude of change. The total daily infections (Figure 2) are also accurately recovered. Inference was initiated on the day that 10 cumulative deaths were observed. Plots for the other models may be found in the supplementary material.

Simulation and estimates based on observing deaths. (a) Simulated (triangles) and estimated daily infections with 95% credible intervals (line) and (b) Real (solid line) and estimated reproduction number with 95% credible intervals (dashed line).
Number of phases . | WAIC . | LOO . |
---|---|---|
Five phases | 3,635.0 | 3,635.1 |
Six phases | 2,252.9 | 2,253.4 |
Seven phases | 2,260.6 | 2,261.8 |
Number of phases . | WAIC . | LOO . |
---|---|---|
Five phases | 3,635.0 | 3,635.1 |
Six phases | 2,252.9 | 2,253.4 |
Seven phases | 2,260.6 | 2,261.8 |
Number of phases . | WAIC . | LOO . |
---|---|---|
Five phases | 3,635.0 | 3,635.1 |
Six phases | 2,252.9 | 2,253.4 |
Seven phases | 2,260.6 | 2,261.8 |
Number of phases . | WAIC . | LOO . |
---|---|---|
Five phases | 3,635.0 | 3,635.1 |
Six phases | 2,252.9 | 2,253.4 |
Seven phases | 2,260.6 | 2,261.8 |
In addition to the findings that the models correctly select the right complexity, it is interesting to summarize the model behaviour when investigating model misspecification. Broadly, these findings may be summarized as follows; when we fix the number of phases to be smaller than the true one then the model is correctly recovering the early phases while it is averaging the final ones leading to poorly fitted models. In contrast, when fixing K to be larger than the true one, we recover the true patterns and get a good fit. Hence, overestimating the number of phases does not materially affect the recovery of the true signal. A list of detailed results is outlined in the supplement.
When fitting the models with a stochastic number of phases using as data the daily infections (observed infections regime), both the PP and DP models precisely estimate the number of epidemic phases, the time of change and the true value (Figure 3). The model was run for 1,00,000 iterations and 8 chains. The analysis based on observing deaths is included in the supplementary material (unobserved infections regime—multi-stage approach). Briefly, the intermediate phases of the epidemic are well estimated while the first and final phases are recovered with noise. The level of smoothing of the noisy estimated cases affects the estimation of the reproduction number; the smoother the estimation of cases the smoother and with fewer phases the reproduction number is recovered.

True (solid line) and estimated reproduction number with 95% credible intervals (dashed line) based on observing infections. (a) Dirichlet process model and (b) Poisson process model.
We utilized readily available software such as Rstan and Nimble within the R statistical programming language. In Rstan, we used the No-U-Turn Sampler (NUTS) algorithm for the model in (6), while in Nimble, we used a combination of the Random Walk Metropolis–Hastings sampler and a categorical sampler for the DP and PP models. The code replicating all the experiments may be found in https://github.com/pbarmpounakis/Multiphasic-stochastic-epidemic-models.
5 Real-data application
5.1 Data description and prepocessing
The models were fitted to daily reported deaths (unobserved cases regime) from two US states, California and New York, and two European countries, the UK and Greece. The data are accessible from John Hopkins University and ECDC and the time horizon ran to the end of June 2021 when many NPIs were lifted. Due to a lack of data availability, the model does not account for reinfections. The age-standardized IFR for each country was informed by the meta-analysis from COVID-19 Forecasting Team (2022) accounting for time, geography, and population characteristics. We allowed the IFR to vary over time, accounting for the age structure of those infected, the burden of health systems and amendments in treating the disease. The infection-to-death time and generation interval were given a Gamma distribution with (mean, standard deviation) set to (19, 8.5) and (6.5, 4.4) days, respectively following Flaxman et al. (2020).
5.2 Analyses and results
California was one of the first US states to report cases on the 26th of January, 2020. A state of emergency was declared on March 4, 2020, and mass/social gatherings were banned while a mandatory statewide stay-at-home order was issued on 19 March 2020. We fitted the model to daily deaths (unobserved cases regime) and using WAIC/LOO selected seven phases. Figures 4 and 5 suggest that was reduced after imposing restrictions and fell below the critical value of 1 after April 2020 when school closure was decided for the remainder of the 2019–2020 academic year. The epidemic remained under control until the summer of 2020 when jumped slightly above 1 following a gradual relaxation of measures. On 31 August 2020, a new set of measures called ‘Blueprint for a Safer Economy’ was applied and all models show that they were effective, alongside the gained immunity of the population, at reducing the effective reproduction number below one and keeping the epidemic under control until the first half of October 2020. All models estimate a sharp increase in , which resulted in an increase in the daily reported cases and deaths between November 2020 and January 2021. Nighttime curfew and regional stay-at-home orders were announced at the start of December 2020 whence remained stable and began declining. The initiation of the vaccination program in early 2021 brought the epidemic under control with remaining below 1.

Estimation of effective reproduction number with 95% credible intervals (solid and dashed lines) based on observing deaths, fixed number of phases model. (a) California state. (b) New York state. (c) The UK and (d) Greece.

Estimation of effective reproduction number with 95% credible intervals (solid and dashed lines) based on observing deaths, multi-stage approach. (a) California state—DP model. (b) California state—PP model. (c) New York state—DP model and (d) New York state—PP.
New York state had, by 10 April 2020, more confirmed cases than any country outside the US and was heavily affected at the start of the pandemic, with daily recorded deaths reaching a thousand in April. On 15 March, all New York City schools were closed and on 20 March state-wide stay-at-home order was declared. As a result, the models show a drop of below 1 from mid-March 2020 until August 2020 (Figures 4 and 5). The best-performing models based on WAIC and LOO had eight distinct phases. This model estimates that after the summer of 2020, remained above 1 up until the start of 2021 with a small increase during November and the holiday season. The DP and PP models (unobserved cases regime multi-stage approach) show similar estimates for (Figure 5).
For the UK, a model with nine phases was selected by WAIC and LOO. Until early March 2020, when a lockdown was imposed we estimate that (Figure 4). These measures were lifted in early June and during the lockdown remained below 1, and therefore under control. After the summer increased above 1 and the so-called rule of six was imposed while on 5 November 2020, the second lockdown was announced. The number of reported deaths was reduced after the initiation of the vaccination program on 4 January 2021. Virtually identical estimates for the UK are inferred by the DP and PP models (figures and additional details in the supplementary material).
We conducted an independent (or ‘external’) validation of the model performance based upon REACT-2, an antibody prevalence study conducted in the UK with the participation of more than 100,000 adults (Ward et al., 2021). This appeared like an optimal choice as it took place in early July 2020 when waning immunity was relatively unlikely. As such, it provides a reasonable estimate of the total disease burden up to that time. The estimated prevalence for the adult population (children were excluded) was 6.0% (95% CI: 5.8, 6.1) and our estimate for the whole population is 7.5% (95% credible intervals (Cr.I.): 5.7, 10.) (Figure 6) well compatible with that independent estimate.

Cumulative sum of estimated daily infections with 95% credible intervals (dashed lines) and the estimation of REACT-2 with 95% confidence intervals (solid lines) for the UK.
For Greece WAIC and LOO selected the eight-phases model. At the starting phase, we estimate and a decrease below 1 in the first half of March 2020 (Figure 4). On 10 March, the government suspended most activities, including educational, shopping, and recreational while a week later all nonessential movement was restricted. The estimate remained below 1 until early June 2020 when it increased following the lifting of restrictions. During summer remained over 1 until November 2020 since a case spike in October led to new measures. Similar estimates for the are obtained by the DP and PP models (Supplementary material).
The computation time was similar for the PP and DP models with the DP being faster. More importantly, we get valuable insights into the effectiveness of the measures imposed by the governments. For New York and the UK, it appears that the school closures and the strict lockdowns predate the reductions in transmissibility, and we can assume that they were pivotal in achieving that. California and Greece adopted the measures before a large first wave, like other EU countries and US states. All regions were similar when these two measures were relaxed: multiple epidemic waves emerged and the estimated remained above 1. These findings closely align with similar results documented in the existing literature, where authors found that the national lockdowns consistently helped with the spread of the disease (Flaxman et al., 2020; Knock et al., 2021).
The results of our simulation experiments corroborate the findings of the application to real data from different areas. The time-ordering of the data facilitates avoiding label-switching problems typically encountered when fitting mixture-type models. By selecting the number of phases we capture mortality changes in all the real-world examples (Figure 7). The DP and PP models can infer a higher number of phases but the conclusions are not materially affected. This observation is in line with Rousseau and Mengersen (2011) who show a generally stable behaviour of such so-called overfitted mixture models, theoretically verifying the robust behaviour of the developed models.

Reported (triangles) and estimated deaths with 95% credible intervals (solid and dashed lines) based on observing deaths, fixed number of phases model. (a) California state. (b) New York state. (c) The UK and (d) Greece.
6 Discussion
In this article, we propose three models for the transmission mechanism of infectious diseases with multiple epidemic phases. We use freely available data to estimate the points in time when transmissibility changes and the realized magnitude of the NPI effects. We focus on publicly available data, as data availability and usability were raised as concerns during the acute COVID-19 pandemic phase. Several potentially useful datasets, including those regarding the pressure on a country’s health system, such as the number of patients in ICU beds, may not be generally available.
A number of interventions were applied in overlapping time intervals, and identifiability issues can arise when trying to disentangle individual effects and the associated time lags. In particular, it is generally hard to estimate the timing of the effect and the possible time lag between cause and effect. In this work, we retrospectively assess the effect of the NPIs by comparing changes in the reproduction number with the dates that these measures were imposed. Such a purely evidence-based approach represents an alternative to forced assumptions that aim for (potentially unwarranted) causal statements about intervention effects. Our code is made freely available, and the reliance on publicly available data facilitates reproducibility and potential uptake by other researchers.
Selecting the number of phases requires multiple runs and the computation time can be an issue when nowcasting is essential for decision-making. Estimating the number of phases via the DP and PP models represents an alternative approach that is computationally efficient and statistically robust. The DP and PP models may overestimate the number of epidemic phases and this issue is discussed in detail in Rousseau and Mengersen (2011), see also Miller and Harrison (2013). In our setting, this effect essentially relates to the start and end of the epidemic and the inherent challenges of limited information. At the start of the epidemic, such uncertainty dictates that estimates should be interpreted with caution. In the end, this is less of an issue and is mostly due to the time lag between cases and deaths. When one is working with the observed infections these issues largely disappear and inference is generally accurate throughout the duration of the data as indicated by our simulation experiments.
In times of crisis, there is a demand for rapid predictions that facilitate decision support and one may employ the multi-stage approach of the DP or PP models to gauge the evolution of the transmissibility. After obtaining an initial estimate of the reproduction number from these models, a more focused exploration of the number of phases using the deterministic number of phases model could be conducted. This targeted investigation aims to minimize the need for extensive model fitting by narrowing down the range of potential values for the number of epidemic phases, K.
It is not apparent how our models can be optimally used for predicting new epidemic phases. Now-casting or short-term future predictions are typically performed under the assumption that the main conditions, including transmissibility, remain the same, or by simulating particular scenarios. In this article, we focus on the retrospective assessment of the imposed NPIs and the inferred quantities may be used in feeding forward simulations in a similar manner.
The models developed in this work assume a homogeneous and homogeneously mixed population similar to many studies on SARS-CoV-2 transmission. This assumption may be appropriate for large populations such as working at the state or country level since functional central limit theorems can reasonably be thought of as applicable (e.g. Andersson & Britton, 2000). Incorporating additional information or structure from data sources such as hospitalizations, seroprevalence studies, or vaccination programs is desirable in principle but may also pose challenges. The inclusion of data sources such as those discussed in Knock et al. (2021) would necessitate model expansion. Our models can naturally be extended when more detailed information is available and this is the subject of current research.
Acknowledgments
The authors are grateful to the two referees whose comments substantially improved the content and presentation of this work. We also wish to thank Kostas Kalogeropoulos and Petros Dellaportas for useful comments on an earlier version of this article and the editor and associate editor for a number of useful suggestions.
Funding
This article is part of the first author’s doctoral thesis, co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme ‘Human Resources Development, Education and Lifelong Learning’ in the context of the Act ‘Enhancing Human Resources Research Potential by undertaking a Doctoral Research’ Sub-action 2: State Scholarships Foundation (IKY, Greece) Scholarship Programme for PhD candidates in the Greek Universities.
Data availability
All the data used in the synthetic data experiments resulted from simulation studies, which were carefully designed to replicate realistic conditions and are detailed within the methodology to provide transparency and allow for potential replication by others. The data for the real-data application were derived from publicly available sources, no proprietary or restricted data were used. All real-world data are collectively available at COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University at GitHub.
Author contributions
P.B. and N.D. conceived the experiment(s); P.B. conducted the experiment(s); P.B. and N.D. analysed the results; P.B. and N.D. wrote and reviewed the manuscript.
Supplementary material
Supplementary material is available online at Journal of the Royal Statistical Society: Series C.
References
Author notes
Conflicts of interest: The authors declare that there is no conflict of interest.