Extract

We congratulate the authors on their paper introducing a ‘safe’ inference framework. However, despite their supposed safety, we have reservations about whether ‘safe’ tests can make the scientific enterprise more reliable. For instance, researchers may still prefer classical non-sequential over sequential study designs, as the latter can be difficult to implement in practice. For example, interim analyses in randomized clinical trials will require unblinding and may as such threaten the integrity of the trial (Ellenberg et al., 2019, Chapter 5). In addition, it has been argued that practical problems in the use of traditional p-values (rather than e-values) stem more from cognitive factors than from the choice of inferential statistics (Gelman, 2016). For example, an excessive focus on testing parameter values of ‘no effect’ while ignoring other relevant parameter values (‘nullism’), unnecessary dichotomization of quantitative information (‘dichotomania’), or treating statistical models as known physical laws rather than speculative assumptions (‘statistical reification’) can all lead to distorted interpretations of study results (Greenland, 2017). Especially with respect to the last point, we believe that naming a statistical test as ‘safe’ can mislead researchers into thinking that the method is ‘always valid’, when this is only true in a certain technical sense and under certain speculative assumptions (e.g. that the underlying data model is correctly specified). For instance, in meta-analysis, a promising application according to the authors, inferences based on e-values will not be ‘safe’ unless the analysis accounts for possible bias, for example, publication bias (Cooper et al., 2019, Chapter 18). In this case, can e-values be adjusted for potential model misspecification (Copas & Eguchi, 2005)?

You do not currently have access to this article.