-
PDF
- Split View
-
Views
-
Cite
Cite
Fernando G Zampieri, Gurmeet Singh, Using Bayesian statistics to foster interpretation of small clinical trials in extracorporeal cardiopulmonary resuscitation after cardiac arrest, European Heart Journal. Acute Cardiovascular Care, Volume 13, Issue 2, February 2024, Pages 201–202, https://doi.org/10.1093/ehjacc/zuae010
- Share Icon Share
Augmentation of out-of-hospital cardiac arrest (OHCA) survival by deploying extracorporeal life support (ECLS) continues to perplex clinicians. The post hoc Bayesian re-analysis of the INCEPTION trial by Heuts et al. in this issue of the European Heart Journal: Acute Cardiovascular Care attempts to frame favourable neurologic outcomes with a probabilistic interpretation. Extracorporeal life support after cardiac arrest is challenging to study, however. Designing and conducting trials under very acute scenarios with narrow inclusion windows pose several challenges for trialists and clinicians alike. These include defining the appropriate question, defining inclusion and exclusion criteria that capture the population of interest, training, and intervention deployment. Unsurprisingly, randomized clinical trials (RCTs) under this scenario tend to be small due to a combination of costs, number of eligible patients, and recruitability.1,2
Randomized clinical trials are generally designed under a specific frequentist framework.2 Defining trial sample size is a key aspect, usually designed with the specific capacity to detect a given effect size of interest (‘power’) while controlling for false-positive results (Type 1 events). The estimated sample size is a consequence of the effect size of interest and the desired power and Type 1 event control. The lower the effect size of interest, the higher the power, and the lower the tolerance for type 1 events, the higher the required sample size. In addition, the less granular the primary endpoint, the higher the required sample size. For example, trials targeting mortality, a binary endpoint, will require sample sizes orders of magnitude higher than trials aimed at detecting changes in continuous endpoints. This explains why many trials in critical care medicine moved from traditional binary endpoints to more granular outcomes, such as ordinal scales (which were common during COVID-19 pandemic) and continuous scales that penalize death, such as days alive and free of a given condition.3 It is common that trials on acute respiratory distress syndrome use days alive and free of the ventilator as endpoint of choice and penalize death by attributing zero or minus one to patients that die. While this strategy greatly increases statistical power, it does hamper interpretability of the results.
Over the last decade, Bayesian methods as an alternative to frequentist methods for RCTs have grown in popularity. Bayesian statistics are not new, but are conceptually different from frequentist analyses, especially regarding the ‘null hypothesis testing’ (NHT) in the frequentist analysis framework. A null hypothesis is set as an ‘absence of difference’, and the compatibility of the data collected under the condition that the null hypothesis is true is assessed using a myriad of statistical tests. Therefore, frequentist analyses provide evidence that may be considered as ‘indirect’, and the INCEPTION trial is an example.2
Although the trial originally collected the outcome as an ordinal variable (Cerebral Performance Category [CPS]), the analysis was performed by dichotomizing the CPS score into 1–2 (favourable outcome) and >2 (unfavourable). The original paper reported 14 patients in the early ECLS group with favourable outcome at 30 days vs. 10 in the control group (corresponding to 20% and 16%, respectively). The main analysis of the trial, therefore, established a cost–utility function that lumped CPS of 1–2 and 3–5 (full recovery/mild disability or severe disability/vegetative state/dead) as if they were equally relevant to patients, families, and the community. The trial reported an odds ratio (OR) for CPS 1–2 vs. 3–5 for ECLS of 1.4, with 95% confidence interval (CI) from 0.5 to 3.5 (P = 0.52), concluding that ‘in patients with refractory out-of-hospital cardiac arrest, extracorporeal CPR and conventional CPR had similar effects on survival with a favorable neurologic outcome’.
A more nuanced interpretation would read, ‘given that the null hypothesis of absence of difference is true, the frequentist probability that results equally or more extreme of those found occurred is 0.52, with data being compatible with odds ratio as low as 0.5 and as high as 3.5’; that is, frequentist reasoning is conditioned to the veracity of the null hypothesis and presents intervals that are compatible with data observed under this specific condition. Therefore, the wide CIs (and the large P values) should make the community aware that the trial was unable to provide a clear message: important benefits and harms could not be excluded from the obtained data. Assuming that the effect is neutral, as concluded in the original publication, is as troubling as suggesting that ORs of 0.5 and 3.5 are equivalent. Accepting equivalence to zero as the same as being equal to zero is a common challenge in the interpretation of clinical trials. Of course, there are many frequentist or Bayesian alternatives to interpret this, but they are seldom employed in clinical trials. The neutral results of INCEPTION under a frequentist framework should not come as a surprise when the trial design was powered to detect an increase in CPS 1–2 from 0.08 to 0.30.2
Under a Bayesian framework, the prior knowledge available (the prior) and the collected data (the likelihood) are used to generate a posterior distribution.4 This posterior distribution can provide the compatible effect sizes compatible with data and the prior under a probabilistic framework. Priors usually reflect the current understanding of the effect size on any given scale. The prior is the tensest definition of Bayesian analysis, as there is a concern that ‘any prior can be used’ to drive results to favour a given conclusion. Many priors used are simply weakly neutral priors that aim at simply regularizing the model, that is, making the model results sceptical to very large effect sizes. When previous information is available, priors can be updated by the available data generating a posterior; this posterior can be, in the future, the prior for a next trial. This Bayesian updating is a core issue in Bayesian analysis. Usually, if sample sizes are large, results can be mostly insensitive to reasonable priors. However, small studies, like INCEPTION, can be very sensitive to prior definitions,
Once the posterior is generated, it can be queried as desired by the authors. This allows generation of interesting summaries, such as probability of benefit or harm (the probability mass of the posterior that suggests benefit or harm). For example, investigators may be interested in knowing how probable it is that effect size is within a given range. This is frequently coupled with the creation of a definition of a minimal clinically important difference (MCID) so that investigators can understand how much of the probability mass of the posterior lies within a margin that is equivalent. Minimal clinically important difference is hard to define and depends on costs, availability, patient preferences, etc.
Heuts et al. explored results of the INCEPTION trial under a Bayesian framework. The authors properly stressed the data using several different priors, including a sceptical prior and two data-driven priors based on previous data, including an additional clinical trial. They concluded that the probability of benefit, as assessed by a CPS score of 1–2 at 30 days, was close to 0.70 and that around 0.40 of the probability mass was above the MCID of 0.05 under a minimally informative prior. When more favourable (‘enthusiastic’) priors were used, probability of benefit was as high as ∼0.99 with 0.82 of the probability mass for an effect higher than the MCID. These apparent differences are caused by the strong effect of priors with a small amount of data. The reader may be puzzled on how to interpret these results since NHT is not straightforward under a Bayesian framework. ‘Significance’ is not an absolute definition but can be usually stated if there is a high probability of benefit with a low probability that the effect size is contained in the MCID margins. A probability of benefit above 0.95 coupled with a probability that the effect lies within the MCID lower than 0.05 can be considered strong evidence of effect.
A conservative interpretation of Heuts et al.’s findings is that given INCEPTION trial data, the probability of benefit was sensitive to the priors used, being as low as 0.648, for a sceptical prior, and as high as 0.988 for data-based prior using a previous trial as reference. The probability of a benefit considered relevant, set at a 0.05 margin, was also very variable (0.250–0.820, respectively). From a pure Bayesian perspective, the prior that uses the best available information on the effect size should be favoured whenever available, and hence, one may consider that the PRAGUE-OHCA should be considered,1 but this is subject to discussion based on the many differences between the two studies.
Heuts et al.’s results provide a more nuanced interpretation of small Extracorporeal cardiopulmonary resuscitation (ECPR) trials and highlight that larger trials are desperately needed. Pursuing individual, non-adaptive trials (that is, trials that do not adapt parts of its design, such as enrolment ratio and/or sample size according to interim results, or trials that ignore previous information) will not foster this field. For example, under a frequentist framework, close to 2500 patients are needed to show an increase in CPS 1–2 from 0.16 (the observed outcome in INCEPTION) to 0.21 assuming a power of 0.90 and an alpha of 0.05. Conducting such a massive trial for an expensive resource that requires several levels of expertise is daunting and likely impossible.
A series of strategies could be applied for future ECPR trials. These include utilizing previous information, including that provided by Heuts et al., to avoid ‘starting over’ every trial. This is more easily done under a Bayesian framework, with more granular endpoints for reporting and sample size calculation and using adaptive randomization to adjust group enrolment. One of the first reports of ECLS in children pioneered adaptive randomization (using ‘randomized play the winner method’, which is a form of probabilistic adaptation) almost 40 years ago.5
Regrettably, efforts over the past decades have all adopted a conservative approach that does not appear suitable to study complex interventions on severely ill patients. Cost and resource implications for broad ECPR adaptation remain formidable. Healthcare and hospital systems are variably stressed with unique national and jurisdictional challenges. ECPR trials, no matter how well designed, cannot resolve these with certitude. Medical and operational leaders and clinicians are left to struggle to interpret data as they may apply to their individual context. We hope that the present work by Heuts et al. can trigger and rejuvenate interest in more advanced methods to evaluate ECPR.
Funding
None declared.
Data availability
No new data were generated or analysed in support of this research.
References
Author notes
The opinions expressed in this article are not necessarily those of the Editors of the European Heart Journal – Acute Cardiovascular Care or of the European Society of Cardiology.
Conflict of interest: none declared.
Comments