Collins and Le Manach [ 1 ] have expressed concerns over the flawed methodology of our recent study [ 2 ]. These unfavourable comments are derived from the utilization of the Hosmer–Lemeshow calibration test which, in their opinion, results in intentional and inappropriate statistical approach. This test has been used for the evaluation of calibration in a well-known number of validation studies although recent consensus recommends other tests for that purpose. Unfortunately, these recommendations were not available at the time of the design of our project, as they date back to early 2015 [ 3 ]. Therefore, it may be inappropriate to impose this on authors, when this knowledge was not readily available.

Concerns surrounding the Hosmer–Lemeshow test refer to the influence of sample size in results [ 4 ]. Recently, Lemeshow and coworkers set the rules to optimize statistical power and obtain meaningful results with the test by performing an adequate selection of the sample size and the number of groups in the analysis [ 5 ].

Novel calibration methods were considered during the design of our study although we found some advantages in the Hosmer–Lemeshow test. Readers are used to it by its wide utilization and this makes possible to easily compare results with previous studies in the same terms. Perhaps we failed to adequately describe this in our article and we apologize for this. Although we acknowledge these limitations, we do not share this negative opinion about the Hosmer–Lemeshow test. Consequently, we cannot accept the suggestion of intentional flaw that they intend to transmit. A good performance of the test was estimated during the calculation of sample size in the design phase, and this is the reason why it was decided to use it in our study. We were careful in our conclusions, conscious of the possible interpretations of our results and the limitations of the calibration test. These issues have been addressed in the ‘Discussion’ section, which is the place meant to do so according to the journal instructions. The reader understands that the sections ‘Methods’ and ‘Results’ refer to how to do things and which the results of doing something, regardless of outcomes, have been. Once again, the ‘Discussion’ section seems to be appropriate for all comments related to the topic. We are not in a position to discuss Collins and Le Manachs' opinion about the actual amount of time readers dedicate to the ‘Discussion’ section, but it sounds appropriate that they should produce actual data to support their opinion, considering their strong scientific and methodological background. Our article underwent an exhaustive peer review process that involved 2 editors, 3 reviewers and 2 independent statisticians. Possible misinterpretations were reassessed, and no additional problems in our methodology detected.

We thank Collins and Le Manach for reminding the community their recommendations, which we will consider. It is appropriate that explanations about unclear methods or data should be demanded. We understand the deep disappointment of Collins and Le Manach for what they consider a suboptimal methodology in our contribution. Scientific thinking should also stay away from radical ideas and disqualification, and should be respectful towards other thoughts that differ from one's own.

REFERENCES

1

Collins
G
Le Manach
Y
.
Knowingly repeating an incorrect and inefficient analysis is flawed logic
.
Eur J Cardiothorac Surg
2016
;
49
:
357
8
.

2

Garcia-Valentin
A
Mestres
CA
Bernabeu
E
Bahamonde
JA
Martin
I
Rueda
C
et al.
Validation and quality measurements for EuroSCORE and EuroSCORE II in the Spanish cardiac surgical population: a prospective, multicentre study
.
Eur J Cardiothorac Surg
2015
; .

3

Moons
KG
Altman
DG
Reitsma
JB
Ioannidis
JP
Macaskill
P
Steyerberg
EW
et al.
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration
.
Ann Int Med
2015
;
162
:
W1
73
.

4

Steyerberg
E
.
Clinical Prediction Models
, 1st edn .
New York
:
Springer Science+Business Media, LLC
,
2010
.

5

Paul
P
Pennell
ML
Lemeshow
S
.
Standardizing the power of the Hosmer-Lemeshow goodness of fit test in large data sets
.
Stat Med
2013
;
32
:
67
80
.