-
PDF
- Split View
-
Views
-
Cite
Cite
R Chocron, T Laurenceau, Y Youssfi, W Bougouin, J P Empana, N Chopin, X Jouven, Importance and limits of cardiovascular risk factors on the prediction of SCD using machine learning approach, European Heart Journal, Volume 45, Issue Supplement_1, October 2024, ehae666.3465, https://doi.org/10.1093/eurheartj/ehae666.3465
- Share Icon Share
Abstract
SCD is mainly due to lethal ventricular arrhythmias and occurs often among patients with ischemic heart disease. SCD and ischemic heart disease share a similar pattern of cardiovascular risk factors (CVRF). Commonly developed risk scores have been focused on cardiac-related risk factors. We aimed to investigate whether non-CVRF variables could enhance predictive performance beyond standard CVRF of SCD.
We compared 2 different strategies for variable inclusion. First, we built a prediction model containing EHR data restricted to cardiovascular diseases and risk factors (CVD MR model) that occurred up to 5 years before SCD. We selected all available main cardiovascular variables as surrogate markers for coronary artery disease, stroke, diabetes, hypertension, smoking status, obesity, lipid disorders and chronic renal failure. On the other hand, our global approach (ALL MR model) includes all medical records codes, without any prior selection. To estimate the risk of SCD over three months, we trained a machine learning model on EHR data representing 8,566,229 drug prescriptions and 801,352 hospital diagnoses up to five years prior to SCD. The data were obtained from a cohort of 12,338 SCD in France and 12,338 controls from 2011 to 2015. We then validated the results on two external cohorts: one temporal in the same area between 2016 and 2020 with 11,620 SCD and 11,620 controls and one geographical from the USA with 892 SCD and 892 controls from 2013 to 2021.
The CVD MR + ML model (CatBoost algorithm) yielded moderate performances with an AUC of 0.68 (95%CI 0.65-0.68), a sensitivity of 45%.(95%CI 42-47) and a specificity of 82% (95%CI 80-84). We found that the EHR model with all medical records (All MR + ML model) offered better performances. In the derivation cohort, this model achieved an AUC of 0.80 (95% CI: 0.78–0.82) with a sensitivity of 67% (95% CI 64-69), a specificity of 80% (95% CI 78-82) (Figure 1). The logistic regression (All MR + LR) applied on all medical records performed better than the CatBoost algorithm when restricted to CVD medical records (Figure 1).
The inclusion of all potential variables beyond the usual CVRF significantly improves the performances of SCD prediction models, independently of the methods (AI and logistic regression).

Author notes
Funding Acknowledgements: Type of funding sources: Public hospital(s). Main funding source(s): APHP
- obesity
- smoking
- myocardial ischemia
- hypertension
- coronary arteriosclerosis
- cardiovascular diseases
- diabetes mellitus
- medical records
- heart disease risk factors
- cerebrovascular accident
- ischemic stroke
- kidney failure, chronic
- diabetes mellitus, type 2
- cardiovascular system
- prescriptions, drug
- roc curve
- heart
- ventricular arrhythmia
- lipid disorders
- surrogate markers
- electronic medical records
- machine learning