-
PDF
- Split View
-
Views
-
Cite
Cite
Fabio Barili, Davide Pacini, Francesco Rosato, Maurizio Roberto, Alberto Battisti, Claudio Grossi, Francesco Alamanni, Roberto Di Bartolomeo, Alessandro Parolari, In-hospital mortality risk assessment in elective and non-elective cardiac surgery: a comparison between EuroSCORE II and age, creatinine, ejection fraction score, European Journal of Cardio-Thoracic Surgery, Volume 46, Issue 1, July 2014, Pages 44–48, https://doi.org/10.1093/ejcts/ezt581
- Share Icon Share
Abstract
Age, creatinine, ejection fraction (ACEF) score is a simplified algorithm for prediction of mortality after elective cardiac surgery. Although mainly conceived for elective cardiac surgery, no information is available on its performance in non-elective surgery and on comparison with the new EuroSCORE II. This study was undertaken to compare the performance of ACEF score and EuroSCORE II within classes of urgency.
Complete data on 13 871 consecutive patients who underwent major cardiac surgery in a 6-year period were retrieved from three prospective institutional databases. Discriminatory power was assessed using the c-index and h with Delong, bootstrap and Venkatraman methods. Calibration was evaluated with calibration curves and associated statistics.
The in-hospital mortality rate was 2.5%. The discriminatory power of ACEF score within elective and non-elective surgery was similar (area under the curve (AUC) 0.71, 95% confidence interval (CI) 0.67–0.74 and AUC 0.68, 95% CI 0.62–0.73, respectively) but significantly lower than that of EuroSCORE II (AUC 0.80, 95% CI 0.77–0.83 for elective surgery; AUC 0.82, 95% CI 0.78–0.85 for non-elective surgery). The calibration patterns were different in the two subgroups, but the summary statistics underscored a miscalibration in both of them (U-statistic and Spiegelhalter Z-test P-values <0.05). Even the calibration of EuroSCORE II was insufficient, although it was demonstrated to be well calibrated in the first tertile of predicted risk.
This study demonstrated that the performance of ACEF score in predicting in-hospital mortality in elective and non-elective cardiac surgery is comparable. Nonetheless, it is not as satisfactory as the new EuroSCORE II, as its discrimination is significantly lower and it is also miscalibrated.
INTRODUCTION
One of the most useful tools for the improvement of the standard of care and correct allocation of clinical and economic resources is risk estimation [1, 2]. Hence, several risk scores to predict perioperative mortality in cardiac surgery have been developed in the last two decades and a continuous updating is ongoing to improve scores' performance, above of all in specific subgroups of patients at higher risk who may need specific healthcare. Among the most diffuse algorithms, the Society of Thoracic Surgeons score has been demonstrated to outperform the EuroSCORE [3, 4], as it is derived from a larger data set of patients operated on in a more current era and risk models were separately developed into different surgical subgroups [5]. In order to overcome the logistic EuroSCORE' risk overestimation emerged in several studies, the updated EuroSCORE II has been developed within a modern cohort of patients [6]. It appears to be more complex than the previous versions, although the core of risk factors is almost the same. The first validation studies have shown that EuroSCORE II has better performance [7, 8], but some concerns on its parsimony have already been raised [9]. The inclusion in the model algorithm of 24% of non-significant factors was demonstrated to be not useful and a simplified derived score could offer the same performance.
The issue of parsimony has been recently highlighted by Ranucci et al. [10] who developed a new score for predicting perioperative mortality in elective patients on the basis of only three risk factors, namely age, creatinine, ejection fraction (ACEF). The ACEF score was demonstrated by internal validation to have better global performance if compared with more complex mortality scores, even in subgroups of patients such as those with aortic valve stenosis [11]. These data were confirmed even in a large external validation study, as ACEF score has been demonstrated to have non-inferior accuracy and a better clinical performance with respect to the additive and logistic EuroSCORE [12]. Hence, considering that a general statistical advice is to be parsimonious in selecting independent regressors, the simpler ACEF score should always be preferred to old and new EuroSCORE in elective cardiac surgery [10, 13]. No information is available on the performance of ACEF in non-elective patients and on performance differences among classes of urgency and its use has not yet been tested for potential extension to urgent and emergent surgeries.
This study was designed to externally validate ACEF score in elective and non-elective cardiac surgery and to compare performance differences among urgency levels. Moreover, we sought to test the performance of the simple ACEF score with the more complex EuroSCORE II.
MATERIALS AND METHODS
Study population and study design
The study population included all patients who underwent cardiac surgery in a 6-year period (from 2006 to April 2012, 13 871 patients enrolled) within the departments of cardiac surgery of two university hospitals and one regional hospital. The study population was extracted by a larger database [9] that has been updated to April 2012. Preoperative and demographic information, operative data and perioperative mortality and complications for all patients were retrieved from the institutional databases that are prospectively collected. The Institutional Review Boards approved the dataset's use for research. The Institutional Ethics Committees approved the study and informed written consent was waived on the condition that subjects' identities were masked. Data from the three centres were matched and stored in a dedicated data set.
The scores were tested on the prediction of in-hospital mortality. For the evaluation of the performance of the scores, EuroSCORE II and ACEF score were calculated for each patient in accordance to published guidelines with a dedicated software [6, 10]. ACEF score is a new simplified score for predicting operative mortality in elective cardiac surgery based on three parameters, namely age, creatinine level and left ventricular ejection fraction (LVEF). It has been developed to overcome the common limitations of existing risk scores, first of all the problem of overfitting.
Data analysis
The performance of the EuroSCORE models was analysed focusing on discrimination power and calibration [14, 15]. The discrimination performance indicates the extent to which the model distinguishes between patients who will die or survive in the perioperative period. It was evaluated by constructing receiver operating characteristic (ROCs) curves for each model and calculating the area under the curve (AUC) with 95% confidence intervals (CIs) [16–18]. Numerically, an area of 1.0 indicates the perfect discrimination power, whereas an area of 0.5 indicates no discrimination of the binary outcome. The comparison among curves was done with Delong, bootstrap and Venkatraman methods, the first two comparing the AUC and the last the ROC curves themselves [18]. Another index used to evaluate the predictive abilities was the Somers' Dxy rank correlation between predicted probabilities and observed responses. When Dxy = 0, the model is making random prediction, when Dxy = 1, the predictions are perfectly discriminating [19].
Calibration refers to the agreement between observed outcomes and predictions. The calibration performance was evaluated by generating calibration plots that visually compare the prediction with the observed probability [14, 18–20]. The perfect calibrated predictions stay on the 45-degree line, while a curve below or above the diagonal respectively reflects overestimation and underestimation. For each model, the comparison of actual slope and intercept with the ideal value of 1 and 0 was performed with the U-statistic and tested against a χ2 distribution with 2 degrees of freedom (d.f.). For testing whether the calibration curve is ideal, we employed even the single d.f. Spiegelhalter Z-test for calibration accuracy with its two-tailed P-value. Moreover, calibration was tested with Hosmer–Lemeshow goodness-of-fit test, which compares observed with predicted values by decile of predicted probability.
The accuracy of the models was also tested with the Brier score (quadratic difference between predicted probability and observed outcome for each patient), an overall performance measure that is 0 when the prediction is perfect [18–20].
In order to analyse the effect of urgency's level on discriminatory power, we modelled the ROC curve with a parametric generalized linear model (GLM), using a binormal model.
Missing values occurred for variables ‘chronic pulmonary disease’ (0.20%), ‘extracardiac arteriopathy’ (0.12%), ‘neurological dysfunction disease’ (0.15%), ‘poor mobility’ (0.11%), ‘New York Heart Association class’ (0.22%), ‘LVEF’ (0.06%) and ‘recent myocardial infarction’ (0.08%). Missing values were substituted by means of multiple imputation, as described in order to reduce bias and increase statistical power [19–21].
Two-sided statistics were performed with a significance level of 0.05. For all analyses, the R 2.15.1 software was used (R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/).
RESULTS
Table 1 reports the descriptive statistics of the study group. As expected, the risk profile was different between patients undergoing elective and those undergoing non-elective cardiac surgery, as the latter were older and with a significant higher number of comorbidities. The mean values of ACEF score and EuroSCORE II were, respectively, 1.3 ± 0.5 and 2.5 ± 2.8 for elective surgery and 1.4 ± 0.7 and 6.2 ± 8.2 for non-elective surgery. The mean predicted mortality of ACEF score in the two subgroups was 2.8 ± 3.2% and 3.8 ± 5.5%. The observed mortality was 2.5% (335 patients), 1.7% (210 patients) in the elective surgery group (EG) and 8.1% (125 patients) in non-elective surgery group (NEG). The baseline characteristics described in Table 1 were not significantly different among the three centres.
Variable . | All patients . | Elective surgery . | Non-elective surgery . | P-value . |
---|---|---|---|---|
Preoperative data and comorbidities | ||||
Number of patients | 13 871 | 12 201 (88%) | 1670 (12%) | |
Logistic EuroSCORE | 7.7 ± 8.9 | 6.8 ± 7.0 | 14.1 ± 15.8 | <0.001 |
EuroSCORE II | 3.0 ± 4.1 | 2.5 ± 2.8 | 6.2 ± 8.2 | <0.001 |
ACEF score | 1.3 ± 0.5 | 1.3 ± 0.5 | 1.4 ± 0.7 | <0.001 |
ACEF predicted mortality | 2.9 ± 3.5 | 2.8 ± 3.2 | 3.8 ± 5.5 | <0.001 |
Age (years) | 67.4 ± 11.7 | 67.3 ± 11.8 | 68.1 ± 11.4 | 0.009 |
Gender (female) | 4359 (31%) | 3918 (32%) | 441 (26%) | <0.001 |
Chronic pulmonary diseasea | 846 (6%) | 740 (6%) | 106 (6%) | 0.651 |
Extracardiac arteriopathya | 1660 (12%) | 1415 (12%) | 245 (15%) | <0.001 |
Neurological dysfunction diseasea | 100 (1%) | 68 (1%) | 32 (2%) | <0.001 |
Previous cardiac surgerya | 823 (6%) | 731 (6%) | 92 (6%) | 0.434 |
Serum creatinine (ml/dl) | 1.1 ± 1.1 | 1.1 ± 1.2 | 0.199 | |
Creatinine clearance (ml/min) | 76.5 ± 32.4 | 76.8 ± 33.8 | 73.8 ± 19.6 | <0.001 |
Creatinine clearance 50–85 ml/min | 9782 (70%) | 8537 (70%) | 1245 (75%) | <0.001 |
Creatinine clearance <50 ml/min | 1214 (9%) | 1034 (8%) | 180 (11%) | 0.002 |
Active endocarditisa | 200 (1%) | 118 (1%) | 82 (5%) | <0.001 |
Critical preoperative statea | 298 (2%) | 63 (1%) | 235 (14%) | <0.001 |
Unstable anginaa | 697 (5%) | 306 (2%) | 391 (23%) | <0.001 |
LVEF | 57.6 ± 10.8 | 57.9 ± 10.7 | 54.5 ± 11.1 | <0.001 |
LVEF 30–50% | 2876 (21%) | 2426 (20%) | 450 (27%) | <0.001 |
LVEF <30% | 598 (4%) | 463 (4%) | 135 (8%) | <0.001 |
Surgical data | ||||
Number of surgical proceduresa | ||||
1 non-CABG | 4408 (32%) | 4088 (33%) | 320 (19%) | <0.001 |
2 | 3690 (27%) | 3352 (27%) | 338 (20%) | <0.001 |
3 or more | 991 (7%) | 900 (7%) | 91 (5%) | 0.004 |
Coronary artery bypass grafting | 7449 (54%) | 6235 (51%) | 1214 (73%) | <0.001 |
Mitral valve surgery | 3449 (25%) | 3222 (26%) | 227 (14%) | <0.001 |
Aortic valve surgery | 5057 (36%) | 4764 (39%) | 293 (17%) | <0.001 |
Tricuspid valve surgery | 695 (5%) | 644 (5%) | 51 (3%) | <0.001 |
Surgery for left ventricular aneurysm | 242 (2%) | 199 (2%) | 43 (3%) | 0.001 |
Surgery on thoracic aortaa | 1663 (12%) | 1437 (12%) | 226 (13%) | 0.038 |
Variable . | All patients . | Elective surgery . | Non-elective surgery . | P-value . |
---|---|---|---|---|
Preoperative data and comorbidities | ||||
Number of patients | 13 871 | 12 201 (88%) | 1670 (12%) | |
Logistic EuroSCORE | 7.7 ± 8.9 | 6.8 ± 7.0 | 14.1 ± 15.8 | <0.001 |
EuroSCORE II | 3.0 ± 4.1 | 2.5 ± 2.8 | 6.2 ± 8.2 | <0.001 |
ACEF score | 1.3 ± 0.5 | 1.3 ± 0.5 | 1.4 ± 0.7 | <0.001 |
ACEF predicted mortality | 2.9 ± 3.5 | 2.8 ± 3.2 | 3.8 ± 5.5 | <0.001 |
Age (years) | 67.4 ± 11.7 | 67.3 ± 11.8 | 68.1 ± 11.4 | 0.009 |
Gender (female) | 4359 (31%) | 3918 (32%) | 441 (26%) | <0.001 |
Chronic pulmonary diseasea | 846 (6%) | 740 (6%) | 106 (6%) | 0.651 |
Extracardiac arteriopathya | 1660 (12%) | 1415 (12%) | 245 (15%) | <0.001 |
Neurological dysfunction diseasea | 100 (1%) | 68 (1%) | 32 (2%) | <0.001 |
Previous cardiac surgerya | 823 (6%) | 731 (6%) | 92 (6%) | 0.434 |
Serum creatinine (ml/dl) | 1.1 ± 1.1 | 1.1 ± 1.2 | 0.199 | |
Creatinine clearance (ml/min) | 76.5 ± 32.4 | 76.8 ± 33.8 | 73.8 ± 19.6 | <0.001 |
Creatinine clearance 50–85 ml/min | 9782 (70%) | 8537 (70%) | 1245 (75%) | <0.001 |
Creatinine clearance <50 ml/min | 1214 (9%) | 1034 (8%) | 180 (11%) | 0.002 |
Active endocarditisa | 200 (1%) | 118 (1%) | 82 (5%) | <0.001 |
Critical preoperative statea | 298 (2%) | 63 (1%) | 235 (14%) | <0.001 |
Unstable anginaa | 697 (5%) | 306 (2%) | 391 (23%) | <0.001 |
LVEF | 57.6 ± 10.8 | 57.9 ± 10.7 | 54.5 ± 11.1 | <0.001 |
LVEF 30–50% | 2876 (21%) | 2426 (20%) | 450 (27%) | <0.001 |
LVEF <30% | 598 (4%) | 463 (4%) | 135 (8%) | <0.001 |
Surgical data | ||||
Number of surgical proceduresa | ||||
1 non-CABG | 4408 (32%) | 4088 (33%) | 320 (19%) | <0.001 |
2 | 3690 (27%) | 3352 (27%) | 338 (20%) | <0.001 |
3 or more | 991 (7%) | 900 (7%) | 91 (5%) | 0.004 |
Coronary artery bypass grafting | 7449 (54%) | 6235 (51%) | 1214 (73%) | <0.001 |
Mitral valve surgery | 3449 (25%) | 3222 (26%) | 227 (14%) | <0.001 |
Aortic valve surgery | 5057 (36%) | 4764 (39%) | 293 (17%) | <0.001 |
Tricuspid valve surgery | 695 (5%) | 644 (5%) | 51 (3%) | <0.001 |
Surgery for left ventricular aneurysm | 242 (2%) | 199 (2%) | 43 (3%) | 0.001 |
Surgery on thoracic aortaa | 1663 (12%) | 1437 (12%) | 226 (13%) | 0.038 |
aAs defined by EuroSCORE' algorithms.
NIM: a factor not included in the model; Removed: factor removed in the modified version of EuroSCORE II as not significant in the EuroSCORE II development study.
Variable . | All patients . | Elective surgery . | Non-elective surgery . | P-value . |
---|---|---|---|---|
Preoperative data and comorbidities | ||||
Number of patients | 13 871 | 12 201 (88%) | 1670 (12%) | |
Logistic EuroSCORE | 7.7 ± 8.9 | 6.8 ± 7.0 | 14.1 ± 15.8 | <0.001 |
EuroSCORE II | 3.0 ± 4.1 | 2.5 ± 2.8 | 6.2 ± 8.2 | <0.001 |
ACEF score | 1.3 ± 0.5 | 1.3 ± 0.5 | 1.4 ± 0.7 | <0.001 |
ACEF predicted mortality | 2.9 ± 3.5 | 2.8 ± 3.2 | 3.8 ± 5.5 | <0.001 |
Age (years) | 67.4 ± 11.7 | 67.3 ± 11.8 | 68.1 ± 11.4 | 0.009 |
Gender (female) | 4359 (31%) | 3918 (32%) | 441 (26%) | <0.001 |
Chronic pulmonary diseasea | 846 (6%) | 740 (6%) | 106 (6%) | 0.651 |
Extracardiac arteriopathya | 1660 (12%) | 1415 (12%) | 245 (15%) | <0.001 |
Neurological dysfunction diseasea | 100 (1%) | 68 (1%) | 32 (2%) | <0.001 |
Previous cardiac surgerya | 823 (6%) | 731 (6%) | 92 (6%) | 0.434 |
Serum creatinine (ml/dl) | 1.1 ± 1.1 | 1.1 ± 1.2 | 0.199 | |
Creatinine clearance (ml/min) | 76.5 ± 32.4 | 76.8 ± 33.8 | 73.8 ± 19.6 | <0.001 |
Creatinine clearance 50–85 ml/min | 9782 (70%) | 8537 (70%) | 1245 (75%) | <0.001 |
Creatinine clearance <50 ml/min | 1214 (9%) | 1034 (8%) | 180 (11%) | 0.002 |
Active endocarditisa | 200 (1%) | 118 (1%) | 82 (5%) | <0.001 |
Critical preoperative statea | 298 (2%) | 63 (1%) | 235 (14%) | <0.001 |
Unstable anginaa | 697 (5%) | 306 (2%) | 391 (23%) | <0.001 |
LVEF | 57.6 ± 10.8 | 57.9 ± 10.7 | 54.5 ± 11.1 | <0.001 |
LVEF 30–50% | 2876 (21%) | 2426 (20%) | 450 (27%) | <0.001 |
LVEF <30% | 598 (4%) | 463 (4%) | 135 (8%) | <0.001 |
Surgical data | ||||
Number of surgical proceduresa | ||||
1 non-CABG | 4408 (32%) | 4088 (33%) | 320 (19%) | <0.001 |
2 | 3690 (27%) | 3352 (27%) | 338 (20%) | <0.001 |
3 or more | 991 (7%) | 900 (7%) | 91 (5%) | 0.004 |
Coronary artery bypass grafting | 7449 (54%) | 6235 (51%) | 1214 (73%) | <0.001 |
Mitral valve surgery | 3449 (25%) | 3222 (26%) | 227 (14%) | <0.001 |
Aortic valve surgery | 5057 (36%) | 4764 (39%) | 293 (17%) | <0.001 |
Tricuspid valve surgery | 695 (5%) | 644 (5%) | 51 (3%) | <0.001 |
Surgery for left ventricular aneurysm | 242 (2%) | 199 (2%) | 43 (3%) | 0.001 |
Surgery on thoracic aortaa | 1663 (12%) | 1437 (12%) | 226 (13%) | 0.038 |
Variable . | All patients . | Elective surgery . | Non-elective surgery . | P-value . |
---|---|---|---|---|
Preoperative data and comorbidities | ||||
Number of patients | 13 871 | 12 201 (88%) | 1670 (12%) | |
Logistic EuroSCORE | 7.7 ± 8.9 | 6.8 ± 7.0 | 14.1 ± 15.8 | <0.001 |
EuroSCORE II | 3.0 ± 4.1 | 2.5 ± 2.8 | 6.2 ± 8.2 | <0.001 |
ACEF score | 1.3 ± 0.5 | 1.3 ± 0.5 | 1.4 ± 0.7 | <0.001 |
ACEF predicted mortality | 2.9 ± 3.5 | 2.8 ± 3.2 | 3.8 ± 5.5 | <0.001 |
Age (years) | 67.4 ± 11.7 | 67.3 ± 11.8 | 68.1 ± 11.4 | 0.009 |
Gender (female) | 4359 (31%) | 3918 (32%) | 441 (26%) | <0.001 |
Chronic pulmonary diseasea | 846 (6%) | 740 (6%) | 106 (6%) | 0.651 |
Extracardiac arteriopathya | 1660 (12%) | 1415 (12%) | 245 (15%) | <0.001 |
Neurological dysfunction diseasea | 100 (1%) | 68 (1%) | 32 (2%) | <0.001 |
Previous cardiac surgerya | 823 (6%) | 731 (6%) | 92 (6%) | 0.434 |
Serum creatinine (ml/dl) | 1.1 ± 1.1 | 1.1 ± 1.2 | 0.199 | |
Creatinine clearance (ml/min) | 76.5 ± 32.4 | 76.8 ± 33.8 | 73.8 ± 19.6 | <0.001 |
Creatinine clearance 50–85 ml/min | 9782 (70%) | 8537 (70%) | 1245 (75%) | <0.001 |
Creatinine clearance <50 ml/min | 1214 (9%) | 1034 (8%) | 180 (11%) | 0.002 |
Active endocarditisa | 200 (1%) | 118 (1%) | 82 (5%) | <0.001 |
Critical preoperative statea | 298 (2%) | 63 (1%) | 235 (14%) | <0.001 |
Unstable anginaa | 697 (5%) | 306 (2%) | 391 (23%) | <0.001 |
LVEF | 57.6 ± 10.8 | 57.9 ± 10.7 | 54.5 ± 11.1 | <0.001 |
LVEF 30–50% | 2876 (21%) | 2426 (20%) | 450 (27%) | <0.001 |
LVEF <30% | 598 (4%) | 463 (4%) | 135 (8%) | <0.001 |
Surgical data | ||||
Number of surgical proceduresa | ||||
1 non-CABG | 4408 (32%) | 4088 (33%) | 320 (19%) | <0.001 |
2 | 3690 (27%) | 3352 (27%) | 338 (20%) | <0.001 |
3 or more | 991 (7%) | 900 (7%) | 91 (5%) | 0.004 |
Coronary artery bypass grafting | 7449 (54%) | 6235 (51%) | 1214 (73%) | <0.001 |
Mitral valve surgery | 3449 (25%) | 3222 (26%) | 227 (14%) | <0.001 |
Aortic valve surgery | 5057 (36%) | 4764 (39%) | 293 (17%) | <0.001 |
Tricuspid valve surgery | 695 (5%) | 644 (5%) | 51 (3%) | <0.001 |
Surgery for left ventricular aneurysm | 242 (2%) | 199 (2%) | 43 (3%) | 0.001 |
Surgery on thoracic aortaa | 1663 (12%) | 1437 (12%) | 226 (13%) | 0.038 |
aAs defined by EuroSCORE' algorithms.
NIM: a factor not included in the model; Removed: factor removed in the modified version of EuroSCORE II as not significant in the EuroSCORE II development study.
The analysis of discrimination in the two subgroups demonstrated a worse performance of ACEF score, if compared with EuroSCORE II. The ROC curves of the two scores in the elective group are plotted in Fig. 1A. The AUC was 0.71 (95% CI: 0.67–0.74) for ACEF score and 0.80 (95% CI: 0.77–0.83) for EuroSCORE II. The comparison among score performances did show significant differences between them (P = 0.00) (Table 2). Even in the NEG, the AUC of ACEF score was lower than that of EuroSCORE II, being 0.68 (95% CI: 0.62–0.73) and 0.82 (95% CI: 0.78–0.85), respectively (P = 0.00 at comparison tests, Fig. 1B and Table 2). The direct parametric modelling of ROC curves with the GLM methodology demonstrated that the level of urgency does not significantly affect the discrimination performance of both EuroSCORE II and ACEF score.
Predictive performance of ACEF score and EuroSCORE II in elective and non-elective surgery
. | Elective surgery . | Non-elective surgery . | ||
---|---|---|---|---|
ACEF score . | EuroSCORE II . | ACEF score . | EuroSCORE II . | |
Overall performance | ||||
Brier score | 0.017 | 0.016 | 0.068 | 0.062 |
Discrimination | ||||
AUC (95% CI) | 0.71 (0.67–0.74) | 0.80 (0.77–0.83) | 0.68 (0.62–0.73) | 0.82 (0.78–0.85) |
DeLong's test P-value | <0.001 | <0.001 | ||
Bootstrap method P-value | <0.001 | <0.001 | ||
Venkatraman P-value | <0.001 | <0.001 | ||
Somers' Dxy | 0.40 | 0.59 | 0.35 | 0.64 |
Calibration | ||||
Slope | 0.904 | 1.249 | 0.732 | 1.030 |
Intercept | −0.825 | 0.391 | −0.059 | 0.269 |
U-statistic P-value | <0.001 | <0.001 | <0.001 | 0.138 |
Hosmer–Lemeshow test P-value | <0.001 | <0.001 | <0.001 | 0.046 |
Spiegelhalter Z-test | <0.001 | <0.001 | <0.001 | 0.023 |
. | Elective surgery . | Non-elective surgery . | ||
---|---|---|---|---|
ACEF score . | EuroSCORE II . | ACEF score . | EuroSCORE II . | |
Overall performance | ||||
Brier score | 0.017 | 0.016 | 0.068 | 0.062 |
Discrimination | ||||
AUC (95% CI) | 0.71 (0.67–0.74) | 0.80 (0.77–0.83) | 0.68 (0.62–0.73) | 0.82 (0.78–0.85) |
DeLong's test P-value | <0.001 | <0.001 | ||
Bootstrap method P-value | <0.001 | <0.001 | ||
Venkatraman P-value | <0.001 | <0.001 | ||
Somers' Dxy | 0.40 | 0.59 | 0.35 | 0.64 |
Calibration | ||||
Slope | 0.904 | 1.249 | 0.732 | 1.030 |
Intercept | −0.825 | 0.391 | −0.059 | 0.269 |
U-statistic P-value | <0.001 | <0.001 | <0.001 | 0.138 |
Hosmer–Lemeshow test P-value | <0.001 | <0.001 | <0.001 | 0.046 |
Spiegelhalter Z-test | <0.001 | <0.001 | <0.001 | 0.023 |
Best performance for: Brier score = 0, AUC = 1, Somers' Dxy = 1, Slope = 1, Intercept = 0, non-significant P-values of the U-statistic test, Hosmer–Lemeshow test and Spiegelhalter Z-test.
Predictive performance of ACEF score and EuroSCORE II in elective and non-elective surgery
. | Elective surgery . | Non-elective surgery . | ||
---|---|---|---|---|
ACEF score . | EuroSCORE II . | ACEF score . | EuroSCORE II . | |
Overall performance | ||||
Brier score | 0.017 | 0.016 | 0.068 | 0.062 |
Discrimination | ||||
AUC (95% CI) | 0.71 (0.67–0.74) | 0.80 (0.77–0.83) | 0.68 (0.62–0.73) | 0.82 (0.78–0.85) |
DeLong's test P-value | <0.001 | <0.001 | ||
Bootstrap method P-value | <0.001 | <0.001 | ||
Venkatraman P-value | <0.001 | <0.001 | ||
Somers' Dxy | 0.40 | 0.59 | 0.35 | 0.64 |
Calibration | ||||
Slope | 0.904 | 1.249 | 0.732 | 1.030 |
Intercept | −0.825 | 0.391 | −0.059 | 0.269 |
U-statistic P-value | <0.001 | <0.001 | <0.001 | 0.138 |
Hosmer–Lemeshow test P-value | <0.001 | <0.001 | <0.001 | 0.046 |
Spiegelhalter Z-test | <0.001 | <0.001 | <0.001 | 0.023 |
. | Elective surgery . | Non-elective surgery . | ||
---|---|---|---|---|
ACEF score . | EuroSCORE II . | ACEF score . | EuroSCORE II . | |
Overall performance | ||||
Brier score | 0.017 | 0.016 | 0.068 | 0.062 |
Discrimination | ||||
AUC (95% CI) | 0.71 (0.67–0.74) | 0.80 (0.77–0.83) | 0.68 (0.62–0.73) | 0.82 (0.78–0.85) |
DeLong's test P-value | <0.001 | <0.001 | ||
Bootstrap method P-value | <0.001 | <0.001 | ||
Venkatraman P-value | <0.001 | <0.001 | ||
Somers' Dxy | 0.40 | 0.59 | 0.35 | 0.64 |
Calibration | ||||
Slope | 0.904 | 1.249 | 0.732 | 1.030 |
Intercept | −0.825 | 0.391 | −0.059 | 0.269 |
U-statistic P-value | <0.001 | <0.001 | <0.001 | 0.138 |
Hosmer–Lemeshow test P-value | <0.001 | <0.001 | <0.001 | 0.046 |
Spiegelhalter Z-test | <0.001 | <0.001 | <0.001 | 0.023 |
Best performance for: Brier score = 0, AUC = 1, Somers' Dxy = 1, Slope = 1, Intercept = 0, non-significant P-values of the U-statistic test, Hosmer–Lemeshow test and Spiegelhalter Z-test.

ROC curves for ACEF score and EuroSCORE II in elective and non-elective cardiac surgery (Panels A and B, respectively). The diagonal line represents no discriminatory power (AUC 0.50). In the two panels, the ROC curves with their 95% CIs are reported.
The calibration curves of ACEF score and EuroSCORE II are shown in Fig. 2. The pattern of calibration was different among the scores and in the different subgroups of patients. ACEF score demonstrated a progressive trend towards over-prediction after the first decile of predicted mortality in the EG, while the calibration of EuroSCORE II in EG lines close to the ideal diagonal until 30%-predicted probability and diverge significantly and markedly afterward showing over-prediction. The patterns of calibration of the two scores are more similar in the NEG. Both scores have significant P-values (P < 0.05) for the related summary statistics (Unreliability, Hosmer–Lemeshow test, Spiegelhalter Z-test) in the EG, indicating that they do not provide accurate probabilities (Table 2). Even in the NEG, the summary statistics confirmed that ACEF score is not well calibrated.

Calibration plots of ACEF score and EuroSCORE II in elective and non-elective cardiac surgery. The diagonal line represents the perfect calibration. EuroSCORE II appeared better calibrated in both elective and non-elective surgery if compared with ACEF score. ACEF score shows a pattern of constant over-prediction (miscalibration in the large, negative intercept) in the elective surgery.
DISCUSSION
ACEF score was mainly conceived to simplify the existing predictive scores without affecting the performance and was demonstrated to have accuracy equivalent to or even better than that of more complex models in elective cardiac surgery, with good calibration and satisfying clinical performance [10, 11]. The results of the present study in elective setting are consistent with those of the external validation study previously performed, where the calibration was found to be lower than that of the internal validation [12]. On the contrary, the main difference emerges in the performance of the EuroSCORE system that was shown to have better discrimination in our study group. Although we reported only the comparison with EuroSCORE II, even the logistic EuroSCORE was tested, demonstrating accuracy significantly better than that of ACEF score and similar to that of the new EuroSCORE II, as previously demonstrated (data not shown) [9]. The calibration of both ACEF score and EuroSCORE II in the elective patients has been found unsatisfactory (U-statistic test, Spiegelhalter Z-test and Hosmer–Lemeshow goodness-of-fit test P-values <0.05), although a diverse pattern of miscalibration was highlighted (Fig. 2). ACEF score shows a somehow constant over-prediction of mortality for all classes of risk but the first decile, while EuroSCORE II has been confirmed to have an optimal calibration till 30%-predicted probability, whereas it progressively over-predicts afterwards, leading to a calibration line that lay below and progressively distant from the diagonal. The discrimination performance of ACEF score in the non-elective setting was comparable with that of the elective surgery and again was significantly worse than that of EuroSCORE II. The ACEF and EuroSCORE calibration curves of the NES had a similar pattern. Nonetheless, the summary statistics diverged and demonstrated that ACEF calibration was significantly different from the ideal line while EuroSCORE II was better calibrated. This paradox could be explained analysing the first decile of predicted risks, which represents the most numerous part of the study group, and considering the weight of the ACEF under-prediction in the low-predicted risk on the summary tests.
According to our findings, the ACEF score does not perform better than EuroSCORE II, both in terms of discrimination and calibration. The low performance in the NEG can be easily addressed, as ACEF score was developed in an elective setting [10] and the prediction of the mortality in a more complex clinical scenario linked to urgent or emergent surgery cannot prescind from a more complex panel of risk factors. Non-elective surgery represents an independent risk factor for perioperative mortality and also it is often correlated with worst clinical conditions and comorbidities that can negatively affect outcomes [6, 22]. Nonetheless, the performance of the ACEF score is not different in NES and ES, although lower than that of EuroSCORE II.
Risk scoring in cardiac surgery is still an ongoing issue and several attempts to predict perioperative mortality have led to incomplete answers. STS-PROM score has been demonstrated to be a well-performing score, while EuroSCORE II has significantly improved the performance of its old versions, overcoming their overestimation [4–6]. Nonetheless, the development of EuroSCORE II has been demonstrated to have major biases linked to study group selection and size. Moreover, both of them failed to predict mortality in high-risk subgroups of patients and are even more limited in application to patients who are considered at prohibitive risk of cardiac surgery, a cohort that is considered for the application of new technologies [23]. The development of new specific scores is ongoing and the current models could be improved by the addition of clinical and anatomical variables that affect mortality. The extreme simplification of the ACEF score can probably lead to better performance in the very-low decile of risk but seems not sufficient to predict mortality afterward. Moreover, even the algorithm could be implemented, as done by the EuroSCORE group, including creatinine clearance instead of the cut-off value of creatinine or at least testing creatinine as continuous variable and its correlation with death through linear or other transformations. Although several advances have been made in prediction of perioperative mortality after cardiac surgery, the further step should be the identification of potential risk factors for mid-term and long-term outcomes. However, even new advanced scores created for special subset of patients or developed for risk prediction of diverse outcomes will share the same special caution that should be applied to existing models, as underscored by Nashef et al. [6]. Most importantly, a risk model does not predict the outcome of an individual patient and it should be an instrument for facilitating better decision-making and a benchmark for quality control.
Limitations
The potential limitation of the study is its retrospective nature. Data were derived from three institutional datasets that were prospectively collected. In all datasets, original EuroSCORE factors have been collected in specific columns together with the score values, and hence, logistic and additive EuroSCORE for each patient were computed again to check the correctness of the values. EuroSCORE II and ACEF score new factors were derived from the data sets, creatinine clearance was computed as suggested and categorizations were recalculated from continuous data.
Conclusions
This study demonstrated that the performance values of ACEF score in predicting in-hospital mortality in elective and non-elective cardiac surgery are comparable. Nonetheless, it is not as satisfactory as the new EuroSCORE II, as its discrimination is significantly lower and it is even miscalibrated. In our population, EuroSCORE II performs better than ACEF score, although its lack of parsimony has been previously defined. It should be preferred, if the data collection is adequate and adherent to risk factor definition.
Conflict of interest: none declared.