-
PDF
- Split View
-
Views
-
Cite
Cite
Sylvia Richardson, Proposer of the vote of thanks and contribution to the Discussion of ‘The Second Discussion Meeting on Statistical aspects of the Covid-19 Pandemic’, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 186, Issue 4, October 2023, Pages 633–636, https://doi.org/10.1093/jrsssa/qnad045
- Share Icon Share
It is a pleasure to discuss these two papers which shine light on how, in response to urgent public health needs and policy questions about effectiveness of interventions, epidemic modelling coupled with Bayesian inference and computation were judiciously used to provide decision-makers with informative evidence in near real time.
Throughout the pandemic, a range of modelling and simulation-based approaches were implemented by analysts to quantify the state of the epidemic and produce short-term forecasts, see e.g., the ensemble of models considered in the UK by the expert group scientific pandemic influenza group on modelling, ensemble which mirrored the main modelling approaches used internationally. The papers presented today represent two of the main modelling approaches and share common inference principles, but otherwise differ in several aspects. Before commenting on each paper, I thought it would be useful to highlight areas of commonality, differences, and complementarities (see Table 1).
Areas of commonality, differences and complementarity between Bhatt et al and Storvik et al
. | Bhatt et al. . | Storvik et al. . |
---|---|---|
Modelling approach | Semi-mechanistic stochastic model | |
Purpose | Part of a body of work, focus on developing hierarchical framework for renewal processes, evaluation of NPIs and mobility | Standalone model for transmission in Norway, short term forecasts for Public Health surveillance |
Latent epidemic process | Discrete renewal equation (focus on Rt) | Regional SEIR compartmental model |
Literature review | Mostly to previous papers of the Imperial modelling group | Discussion of other modelling approaches and of SMC literature |
Data | Deaths in 11 countries, first wave (up to 4th May) | Hospitalisation and positive counts in Norway Between region mobility using phone data 18 months up to 01 July 2021 |
Observation process | Generic discussion | Beta Binomial |
Estimation framework | Bayesian inference: (a) two-stage and (b) extended renewal model with log linear component | Joint Bayesian inference of SEIR compartment and transmission, large number of parameters |
Computations | MCMC implemented using Stan | SMC for posterior inference produced daily, requiring tuning |
Model evaluation/comparison | Limited to model comparison of different two-stage models | Sensitivity analyses, comparison of observed and predicted, comparison with EpiEstim |
. | Bhatt et al. . | Storvik et al. . |
---|---|---|
Modelling approach | Semi-mechanistic stochastic model | |
Purpose | Part of a body of work, focus on developing hierarchical framework for renewal processes, evaluation of NPIs and mobility | Standalone model for transmission in Norway, short term forecasts for Public Health surveillance |
Latent epidemic process | Discrete renewal equation (focus on Rt) | Regional SEIR compartmental model |
Literature review | Mostly to previous papers of the Imperial modelling group | Discussion of other modelling approaches and of SMC literature |
Data | Deaths in 11 countries, first wave (up to 4th May) | Hospitalisation and positive counts in Norway Between region mobility using phone data 18 months up to 01 July 2021 |
Observation process | Generic discussion | Beta Binomial |
Estimation framework | Bayesian inference: (a) two-stage and (b) extended renewal model with log linear component | Joint Bayesian inference of SEIR compartment and transmission, large number of parameters |
Computations | MCMC implemented using Stan | SMC for posterior inference produced daily, requiring tuning |
Model evaluation/comparison | Limited to model comparison of different two-stage models | Sensitivity analyses, comparison of observed and predicted, comparison with EpiEstim |
Areas of commonality, differences and complementarity between Bhatt et al and Storvik et al
. | Bhatt et al. . | Storvik et al. . |
---|---|---|
Modelling approach | Semi-mechanistic stochastic model | |
Purpose | Part of a body of work, focus on developing hierarchical framework for renewal processes, evaluation of NPIs and mobility | Standalone model for transmission in Norway, short term forecasts for Public Health surveillance |
Latent epidemic process | Discrete renewal equation (focus on Rt) | Regional SEIR compartmental model |
Literature review | Mostly to previous papers of the Imperial modelling group | Discussion of other modelling approaches and of SMC literature |
Data | Deaths in 11 countries, first wave (up to 4th May) | Hospitalisation and positive counts in Norway Between region mobility using phone data 18 months up to 01 July 2021 |
Observation process | Generic discussion | Beta Binomial |
Estimation framework | Bayesian inference: (a) two-stage and (b) extended renewal model with log linear component | Joint Bayesian inference of SEIR compartment and transmission, large number of parameters |
Computations | MCMC implemented using Stan | SMC for posterior inference produced daily, requiring tuning |
Model evaluation/comparison | Limited to model comparison of different two-stage models | Sensitivity analyses, comparison of observed and predicted, comparison with EpiEstim |
. | Bhatt et al. . | Storvik et al. . |
---|---|---|
Modelling approach | Semi-mechanistic stochastic model | |
Purpose | Part of a body of work, focus on developing hierarchical framework for renewal processes, evaluation of NPIs and mobility | Standalone model for transmission in Norway, short term forecasts for Public Health surveillance |
Latent epidemic process | Discrete renewal equation (focus on Rt) | Regional SEIR compartmental model |
Literature review | Mostly to previous papers of the Imperial modelling group | Discussion of other modelling approaches and of SMC literature |
Data | Deaths in 11 countries, first wave (up to 4th May) | Hospitalisation and positive counts in Norway Between region mobility using phone data 18 months up to 01 July 2021 |
Observation process | Generic discussion | Beta Binomial |
Estimation framework | Bayesian inference: (a) two-stage and (b) extended renewal model with log linear component | Joint Bayesian inference of SEIR compartment and transmission, large number of parameters |
Computations | MCMC implemented using Stan | SMC for posterior inference produced daily, requiring tuning |
Model evaluation/comparison | Limited to model comparison of different two-stage models | Sensitivity analyses, comparison of observed and predicted, comparison with EpiEstim |
Starting from the common root of semi-mechanistic stochastic models, as opposed, e.g., to agent-based models, the two papers are also aligned in using Bayesian inference and stochastic algorithms though of a different nature. What the Table highlights is that the focus, style, data sources, and level of technical details of the two papers are quite different.
Bhatt et al. has to be read as part of a series of papers by the Imperial team, providing a high-level discussion and outlining model extensions, while Storvik et al. is a standalone paper modelling transmission in Norway, which presents in detail the steps of the model building, computational issues, and comparative model evaluations. Aligned with their different purpose, the two papers adopt different formulations for the latent epidemic process, renewal equation versus susceptible exposed infectious recovered (SEIR) compartmental model. However, with regards to the transmission process dynamics and instantaneous reproduction number (Rt), it has been shown that these two formulations are equivalent under additional distributional assumptions (Champredon et al., 2018).
In Storvik et al., an informative comparison between the inference provided by the SEIR compartmental model and EpiEstim, which is based on the renewal equation, is presented (Figure 5). It shows a good overall correspondence between the time pattern of the respective Rt time series estimated on identical data, but a notable difference in variability estimation. That EpiEstim overconfident in its assessment of uncertainty had already been pointed out in Teh et al. (2022) (Figure 3). Demonstrating a good calibration of uncertainty as done in Storvik et al. is of paramount importance for decision-makers, something that deserves to be given additional prominence by the statistical community.
From Sections 2 to 6, Bhatt et al. discuss possible extensions of the renewal approach needed: (a) to account for characteristics of the data collection and (b) to accommodate additional data sources.
The observation equation, equation (2) and its alternative versions (4), (8), or (9), has two crucial elements: the ascertainment process captured through αt, and the distribution π of the lag between infection and recorded observations. Allowing for the possibility of ascertainment bias is indeed important for data such as counts of Covid-19 positive tests, one of the statistics routinely reported worldwide. Self-selection of the population going forward to be tested increases the probability of being infected, leading to ascertainment bias. What additional information would Bhatt et al. propose to use to identify the ascertainment process parameter αt? This is far from straightforward. Work carried out in our Turing Royal Statistical Society Health Data Lab showed that it was essential to combine data on infection prevalence from randomised surveillance with observational data to get a handle on the ascertainment bias (Nicholson et al., 2022). However, randomised surveillance data is not commonly available in many countries, and such a data synthesis exercise necessitated a carefully constructed approach to account for different granularity and quality of the data sources to be combined.
Bhatt et al. rightly encourage the simultaneous use of different types of data to inform the latent epidemic and suggest indexing all unknown quantities by data types. Planning to use simultaneously several types of data (e.g., death, hospitalisation, test counts) is natural, but making Bayesian data synthesis work concretely is more often than not challenging! In particular, delicate issues of conflict between the data sources may arise.
As discussed by Bhatt et al., delay distributions between infection and recorded events need to be estimated on external data. The quantitative estimates of these distributions will be influential on downward inference on transmission. When several data sources are available, misspecification of the corresponding delay distributions might lead to conflicting information on the state of the epidemic. It would be good to hear from Storvik et al. if they encountered any issue of data conflict and how these were resolved.
Overall, substantial progress has been made on the inferential side for semi-mechanistic models. In the future, it would be useful to pay renewed attention to the observation process and the design of data collection contributing to anchor the different parameters in semi-mechanistic models.
In Section 8.2, Bhatt et al. present a two-stage approach, where a time series of Rt for the first wave is first extracted for each of 11 countries included in the previous Flaxman et al. (2020) paper. Non pharmaceutical interventions (NPIs) and mobility data are then used as regressors with random slopes in a hierarchical linear model indexed by countries, with a view to assess the influence of mobility on the variability of transmission, independently of NPIs. It would have been helpful to include details of the source of the mobility data and its potential normalisation.
Using a two-step hierarchical model (referred here as partial pooling) for synthesising information from different geographical units (e.g., countries, regions or cities) has a long history in environmental health (see e.g., Dominici et al., 2000). Typically, the variability of the extracted summaries, here the time series of estimated Rt, is introduced in the first level of the hierarchical model. This does not seem to have been done here despite analysing a period of rapid changes in Rt, likely accompanied by larger uncertainties. Moreover, unmodelled residual autocorrelation in the errors of the linear model might affect the displayed credibility intervals. Finally, the use of shrinkage prior, whilst useful for avoiding singularity, could potentially distort the relative contribution of the regressors as they are highly correlated. All these points warrant further consideration.
In Section 8.3, the model in Flaxman et al., 2020 is usefully extended to include log linear modelling of Rt. In equations (12) and (13), besides lockdown, mobility data as well as country specific random slopes and a random walk for the residuals have been added. The causal mediation analysis interpretation of equations (12) and (13) relies on strong assumptions. In particular, clear directionality in the effect pathway (NPIs reducing mobility, in turn reducing transmission) and the assumption that there are no other confounding variables that could influence both changes in mobility and transmission. As social behaviour influences both mobility and transmission and can be in turn be affected by current epidemic level, one must remain cautious in going down the path of a causal mediation interpretation in such a complex situation. In fine, the time pattern in the covariates reduces essentially to a ‘one degree of freedom’ step down. It is not clear if assuming a constant variance for the random walk modelling, the error processes ε1tm and ε2tm in (12) and (13) in such a non-stationary period might have distorted the estimation of fixed effects of lockdown in (12) and (13).
The use of compartmental models in Storvick et al. provides additional information besides estimation of Rt, such as current estimates of the number of people in the compartments and short-term predictions, which are directly useful for public health and hospital authorities planning. Stratification by region and mobile phone data was employed to refine mass action assumptions, leading to over 3,000 parameters to estimate! Hence, Storvik et al. had to dig deep into the sequential Monte Carlo (SMC) machinery to be able to provide a workable real-time implementation, which was used daily by the Norwegian public health authorities. To build their SMC proposal, they project data 24 days ahead of the current time, a computationally expensive step. Is there a trade-off to be sought between the computational burden of using such smoothing at each time step, versus reusing the same proposal for a set number of iteration steps, thus reducing the computational burden but also potentially the acceptance rates? What generic conclusion could they draw from their tailored implementation which could help future large-scale SMC implementation for compartmental models?
Storvik et al. should be congratulated for their careful evaluation and sensitivity analyses. On Figure 4, 3 weeks ahead forecasts are compared with actual data, showing good fit. It is important to display such comparisons in real time during the course of the epidemic, so that understanding and trust are built between modelling teams, public health policy makers, and the public at large and that assumptions, limitations, and good use of a modelling framework are appreciated by non-specialists.
I very much enjoyed reading both papers and would like to congratulate both teams for the important insights they have given us into the implementation of Bayesian inference for epidemic models to help public health policy makers.
References
Author notes
Conflicts of interest: none declared.