Abstract

Background and Aims

Accurate near-term prediction of life-threatening ventricular arrhythmias would enable pre-emptive actions to prevent sudden cardiac arrest/death. A deep learning–enabled single-lead ambulatory electrocardiogram (ECG) may identify an ECG profile of individuals at imminent risk of sustained ventricular tachycardia (VT).

Methods

This retrospective study included 247 254, 14 day ambulatory ECG recordings from six countries. The first 24 h were used to identify patients likely to experience sustained VT occurrence (primary outcome) in the subsequent 13 days using a deep learning–based model. The development set consisted of 183 177 recordings. Performance was evaluated using internal (n = 43 580) and external (n = 20 497) validation data sets. Saliency mapping visualized features influencing the model’s risk predictions.

Results

Among all recordings, 1104 (.5%) had sustained ventricular arrhythmias. In both the internal and external validation sets, the model achieved an area under the receiver operating characteristic curve of .957 [95% confidence interval (CI) .943–.971] and .948 (95% CI .926–.967). For a specificity fixed at 97.0%, the sensitivity reached 70.6% and 66.1% in the internal and external validation sets, respectively. The model accurately predicted future VT occurrence of recordings with rapid sustained VT (≥180 b.p.m.) in 80.7% and 81.1%, respectively, and 90.0% of VT that degenerated into ventricular fibrillation. Saliency maps suggested the role of premature ventricular complex burden and early depolarization time as predictors for VT.

Conclusions

A novel deep learning model utilizing dynamic single-lead ambulatory ECGs accurately identifies patients at near-term risk of ventricular arrhythmias. It also uncovers an early depolarization pattern as a potential determinant of ventricular arrhythmias events.

Overview of the development and validation of the deep learning–based model. Prediction of near-term sustained ventricular tachycardia (VT) using a single-lead 24 h electrocardiogram (ECG) achieved an area under the receiver operating characteristic curve (AUROC) of .957 and .948 in the internal and external validation data sets, respectively. Model explainability confirmed the role of premature ventricular complex (PVC) burden and identified early depolarization time as potential determinants for VT. CNN, convolutional neural network.
Structured Graphical Abstract

Overview of the development and validation of the deep learning–based model. Prediction of near-term sustained ventricular tachycardia (VT) using a single-lead 24 h electrocardiogram (ECG) achieved an area under the receiver operating characteristic curve (AUROC) of .957 and .948 in the internal and external validation data sets, respectively. Model explainability confirmed the role of premature ventricular complex (PVC) burden and identified early depolarization time as potential determinants for VT. CNN, convolutional neural network.

See the editorial comment for this article ‘Predicting imminent ventricular arrhythmias from ambulatory ECG signals: far-reaching or too far to reach?’, by K. C. Siontis and P. A. Friedman, https://doi.org/10.1093/eurheartj/ehaf008.

Translational perspective
  • The potential applications of this deep learning model extend beyond traditional clinical settings. It paves the way for new real-time monitoring tools, which could be integrated as artificial intelligence–based ‘smart-monitoring’ systems.

  • The performance of this model using a ubiquitous single-lead electrocardiogram suggests opportunities for integration with wearable devices like smartwatches and implantable loop recorders. These innovations hold promise for remote patient monitoring and pre-emptive interventions, transforming the landscape of sudden cardiac death risk management and potentially improving patient outcomes.

Introduction

More than 40 years after the first implantable cardioverter defibrillator (ICD) implantation, sudden cardiac death (SCD) still accounts for >5 million deaths worldwide every year, with a majority occurring in the general population, among subjects without known heart disease.1,2 Ventricular tachycardia/fibrillation (VT/VF) represents one of the main mechanisms of SCD, with coronary artery disease being the aetiology in up to 80% of cases.3 The incidence of SCD has remained disappointingly stable over time, despite major efforts deployed in the field towards prevention.4

Sudden cardiac death is a result of a dynamic complex process acting on a specific ventricular substrate that still remains incompletely understood. The prevention of SCD is traditionally based on the mid- and long-term prediction of life-threatening arrhythmias, with left ventricular ejection fraction being the cornerstone parameter used in clinical practice.5 The limited accuracy of this approach reflects the problem of using a fixed and non-specific structural parameter at a given time for long-term risk stratification.6,7 It also only assesses the substrate, neglecting dynamic aspects of arrhythmia pathophysiology, including the autonomic nervous system as well as triggers, such as premature ventricular complexes (PVCs).8,9 Therefore, the rationale for an alternative dynamic approach that would identify vulnerable subjects at high risk of SCD at near-term (within minutes, hours, or days prior to the potentially fatal event) is particularly appealing.1

In such a setting, artificial intelligence (AI), particularly deep learning,10 has shown the potential to detect subtle patterns indiscernible to the human eye and may thus help refine and improve the accuracy of risk assessment. Artificial intelligence applications have already demonstrated success, for instance, in predicting the risk of atrial fibrillation from sinus rhythm electrocardiograms (ECGs).11,12 Furthermore, in contrast to the traditional black box perception surrounding AI, wherein it is considered that the logic behind AI-based predictions cannot be understood, the use of interpretability analysis methodology can possibly provide important insight into mechanisms of arrhythmogenesis.13 This study hypothesizes that a 24 h single-lead ECG recording contains key information that can be used by AI to identify subjects at imminent risk of life-threatening ventricular arrhythmias in the following days. This would enable prompt pre-emptive actions and enhance near-term SCD prevention.

Methods

Data sources and study setting

The study protocol was approved by the local Institutional Review Board, and the need for individual informed consent was waived.

In this retrospective international study, we developed and validated a deep learning–based model to predict the near-term risk of sustained VT from a single-lead ambulatory ECG. We used 14 day ambulatory ECG recordings to derive the model input and outputs. The first 24 h of each recording (which had no sustained VT) were used as input to a deep learning model. We then labelled each recording according to whether there was any sustained VT documented in the subsequent 13 days and used it as the output (Figure 1A).

Study design and deep learning–based model. (A) Example heart rate density plot of an ambulatory electrocardiogram recording with no ventricular tachycardia in the first 24 h and an episode of ventricular tachycardia degrading to ventricular fibrillation on Day 4. The first 24 h of the recording are used to derive inputs to the deep learning–based model, while the remaining duration is used to label whether sustained ventricular tachycardia occurred in the following days. (B) Patient age, sex, and various electrocardiogram measurements extracted from the first 24 h are passed to an encoder to generate a measurement embedding. (C) An heart rate density plot is constructed from the first 24 h and passed to a convolutional neural network to extract spatial feature maps, which are passed to a transformer encoder to generate a heart rate density plot embedding. (D) A collection of 10 s electrocardiogram strips is sampled from the 24 h recording and passed to a convolutional neural network to extract features from each strip and then aggregated using a transformer encoder to generate an electrocardiogram waveform embedding. The embeddings generated from each input are fused and passed to a classier to predict a near-term ventricular tachycardia risk score
Figure 1

Study design and deep learning–based model. (A) Example heart rate density plot of an ambulatory electrocardiogram recording with no ventricular tachycardia in the first 24 h and an episode of ventricular tachycardia degrading to ventricular fibrillation on Day 4. The first 24 h of the recording are used to derive inputs to the deep learning–based model, while the remaining duration is used to label whether sustained ventricular tachycardia occurred in the following days. (B) Patient age, sex, and various electrocardiogram measurements extracted from the first 24 h are passed to an encoder to generate a measurement embedding. (C) An heart rate density plot is constructed from the first 24 h and passed to a convolutional neural network to extract spatial feature maps, which are passed to a transformer encoder to generate a heart rate density plot embedding. (D) A collection of 10 s electrocardiogram strips is sampled from the 24 h recording and passed to a convolutional neural network to extract features from each strip and then aggregated using a transformer encoder to generate an electrocardiogram waveform embedding. The embeddings generated from each input are fused and passed to a classier to predict a near-term ventricular tachycardia risk score

All data included in the study were acquired from individuals receiving routine continuous cardiac monitoring and uploaded to the Cardiologs Holter platform for analysis. The model was developed and internally validated using an internal data set consisting of ambulatory ECG recordings collected from various Independent Diagnostic Testing Facilities (IDTFs) and centres across five countries (USA, UK, France, South Africa, and India) between 1 January 2019 and 1 January 2024 (see Supplementary data online, Table S5). The internal data set was divided into a development and held-out validation set in the following way: all recordings collected before 1 July 2023 were used for model development (80%) and all collected thereafter were held out for internal validation (20%). The development data set was randomly split into a training (80%) and tuning set (20%). The tuning set was used to select hyperparameters and operating points from the training process.

To assess the generalizability of the model across different sources with different patient populations and data collection strategies, we validated the model using a fully independent external validation data set. The external validation set consisted of recordings collected from two separate IDTFs (USA and Czech Republic) between 1 January 2019 and 1 January 2024. All individuals in the external validation set were excluded from model development.

Ambulatory recordings were collected from numerous manufacturers, which consisted of single and multi-lead patch-based and traditional Holter monitors (see Supplementary data online, Table S7). Due to the variability in electrode positioning across patch-based and multi-lead Holters, we did not select a specific lead derivation; instead, the first available lead was used. All ECG recordings were stored in digital format and resampled to 250 Hz.

Since the study was conducted retrospectively, the race or ethnicity of the patients was not consistently documented during the acquisition of the data. As a result, we do not possess specific statistics on these demographics. However, we made efforts to include individuals from various regions across four continents to ensure a diverse representation of the population.

Outcome definition

The primary outcome was defined as the occurrence of sustained ventricular arrhythmias during the immediate 13 days following a 24 h ambulatory ECG recording. All recordings included in the study were analysed by physicians or certified ECG technicians using the Cardiologs Holter analysis platform (see Supplementary data online, Table S6). Sustained VT was defined as a ventricular rhythm lasting ≥30 s with a rate of ≥100 b.p.m., in accordance with guidelines.14,15 Two certified academic electrophysiologists, blinded to the model predictions, have verified the data. All documented sustained VT episodes were reviewed and adjudicated centrally by two experts in ECG interpretation. The opinion of a third experienced cardiac electrophysiologist was requested in case of discrepancy to limit the impact of inter-rater variability. For the ‘non-VT’ recordings, a random verification of 3800 Holter recordings was conducted by two certified academic electrophysiologists, with no sustained VT identified.

Deep learning–based model

We developed a deep learning–based model (Figure 1), which utilizes three different modalities derived from a single-lead ambulatory ECG to predict the risk of VT. The model consists of three separate branches, which use as input: (i) patient demographics and quantitative measurements calculated from the recording (Figure 1B); (ii) a heart rate density plot (HRDP) (Figure 1C); and (iii) the ECG waveform (Figure 1D). It is trained using a co-learning approach by extracting features from each modality and learning interactions between them. Embeddings (i.e. high-level features) are learnt from the quantitative measurements, HRDP, and raw ECG waveform in parallel (Figure 1B–D). The embeddings extracted from each input are then fused and passed to three fully connected layers to aggregate the features. The final layer consists of a sigmoid activation function and outputs a probability of VT occurring in the following days. The three branches of the model used for feature extraction are described below. It is important to note that the model prediction ultimately relies exclusively on the raw ECG signal, with the exception of age and sex, and therefore does not require a preliminary analysis by a human or any other software.

To capture information related to potential triggers, we utilized clinical data and quantitative measurements derived from the ECG recordings, which have been previously associated with VT risk (Figure 1B). Additionally, to obtain a more comprehensive view of the global rhythm profile over an extended period, we introduced a new representation of the 24 h recording in the form of an HRDP. The clinical data consist of patient age and sex, and the measurements included 19 parameters: PVC burden (%), repetitive character (number of consecutive PVC, mean, and standard deviation), coupling interval between sinus beats and PVCs (mean and standard deviation), QRS duration (mean and standard deviation), non-sustained VT characteristics [count, maximum heart rate (HR), and longest duration], and HR variability (HRV, including SDNN, SDANN, SDNNI, pNN50, RMSSD, and HF power), premature atrial complex (PAC) burden (%), and counts of PVC couplets, bigeminy, and trigeminy. The identification of sinus beats, PACs, PVCs, and VT was performed by the Cardiologs algorithm, as described in the previous work.16 The clinical data and measurements were normalized and provided to two fully connected layers to generate a feature representation.

An HRDP is a 3D representation of the instantaneous HR during the monitoring period (Figure 1C). The x-axis represents time, the y-axis represents HR, and the z-axis consists of three channels corresponding to each beat’s classification as either: sinus, PVC, or PAC. The HRDP backbone consisted of a convolutional neural network (CNN), a transformer encoder, and an attention-based pooling layer. The CNN is based on the ResNet-50 architecture,17 which takes as input an HRDP to extract spatial feature maps from the recording. An attention-pooling layer18 was used to aggregate spatial feature maps across the HR-axis (y-axis) resulting in temporal feature maps. A transformer encoder, composed of four multi-headed attention layers,19 is used to exploit relationships across the temporal features. A final attention-pooling layer is then used to generate a global feature representation of the HRDP.

To capture information related to the arrhythmogenic underlying substrate, we applied a deep neural network directly to the ECG waveform data (Figure 1D). The backbone consists of an ECG-strip encoder, a transformer encoder, and an attention-based pooling layer. The ECG-strip encoder is a CNN based on the ResNest-50 architecture,20 which takes as input a 10 s single-lead ECG waveform strip to generate a single strip embedding. During training, 30 min of ECG signal is sampled from each recording by randomly selecting 180 strips with the temporal order preserved. Each strip is passed to the CNN backbone to generate 180 strip embeddings. The transformer encoder, composed of two multi-headed attention layers,19 is used to aggregate information from the strip embeddings. An attention-pooling layer is then used to assign an attention score to each strip and generate a feature representation of the ECG waveform.18 Training details, data augmentation methods, and oversampling strategies are described in the Supplementary data online, Appendix.

Secondary analyses

As a secondary analysis, we evaluated the performance of the model on shorter prediction horizons of 7, 3, 2, 1 day, and 1 h. At each prediction horizon, we took the preceding 24 h of all positive recordings that ended within the time horizon of the first sustained VT episode. For example, to evaluate the algorithm’s ability to predict 1 h VT risk, we used the 24 h of a recording that ended within 1 h of the first VT onset. The original predictions of all negative recordings remained fixed for each horizon. Additionally, we evaluated the effect that monitoring duration has on predicting the risk of VT by training models with the same architecture using shorter input durations of 1, 3, 6, and 12 h. Lastly, for comparison, we developed a baseline multivariable logistic regression model using patient age, sex, PVC burden, HR, QRS duration, QTc, and HRV SDNN. Since there is no widely recognized reference for calculating VT risk from Holter ECG data, these metrics were chosen based on their known potential for determining VT risk, although this list of metrics is not exhaustive.

Interpretability

We generated saliency maps using gradient-weighted class activation mapping and integrated gradients on the HRDP and ECG waveform branches of the model,21,22 respectively. Both are gradient-based methods which highlight regions of the input that have strong influence on the prediction of VT. For qualitative comparison, we randomly selected 20 samples from each classification group (true positive, true negative, false positive, and false negative) from the validation set. We then asked an experienced electrophysiologist to describe differences among the saliency maps of the classification groups.

Statistical analysis

The performance of the model was evaluated using the area under the receiver operating characteristic curve (AUROC), area under the precision–recall curve (AUPRC, also known as average precision), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). To estimate 95% confidence intervals (CIs), we used non-parametric bootstrapping with 1000 samples. Sensitivity, specificity, PPV, and NPV were calculated at binary decision thresholds. The optimal decision threshold for near-term VT risk was calibrated using the receiver operating characteristic curve and F2 score on the tuning set. Subgroup analyses were performed on the validation sets to evaluate model fairness across patient demographics. We compared AUROCs across age and sex in a pairwise approach with the DeLong method.23 We considered two-sided P-values <.05 statistically significant. All models and statistics were computed using Python (v.3.8.12). Deep learning models were trained using the PyTorch (v1.7.0). Data analysis was performed using numpy (1.19.5), pandas (1.2.0), scipy (1.6.0), and scikit-learn (0.24.0). For data visualization and scientific plotting, matplotlib (3.2.2) and seaborn (0.12.2) were used.

Results

A total of 247 254, 14 day ambulatory ECG recordings collected between 2019 and 2024 from individuals aged ≥18 years were included in the study (Table 1 and Supplementary data online, Figure S1). Of these, 1104 (.5%) had at least one sustained VT episode documented during the monitoring period (beyond the 24 h used as input to the model), with 43 (3.9%) being polymorphic VT. Notably, 22 recordings presented VT degenerating into VF. Among the 917 (83.1%) patients with sustained VT documented during monitoring, the most common indications for monitoring were 289 (31.5%) palpitations, 153 (16.7%) PVC, 98 (10.7%) atrial fibrillation, 76 (8.3%) VT, 52 (5.7%) syncope, and 249 (27.1%) for other reasons. Additional VT characteristics from each data set are detailed in Table 1. The remaining recordings had no documented sustained VT during the 14 day monitoring period.

Table 1

Baseline and indication characteristics

 TrainingInternal validationExternal validation
 ControlVTP-valueControlVTP-valueControlVTP-value
n183 17780443 58019720 497103
Demographics
 Age (years)61.1 ± 17.863.4 ± 15.3<.00162.2 ± 18.563.2 ± 16.4.44261.3 ± 15.366.5 ± 12.7<.001
 Female (%)108 951 (59.5)244 (30.3)<.00126 887 (61.7)54 (27.4)<.00112 062 (58.8)21 (20.4)<.001
ECG measurements
 QRS (ms)100 ± 18107 ± 22<.00198 ± 17107 ± 20<.00197 ± 16105 ± 21<.001
 QTc (ms)401 ± 38409 ± 41<.001400 ± 37410 ± 40<.001397 ± 34409 ± 41<.001
 PVC burden, median (IQR)0.7 (0.2–3.1)8.9 (2.1–17.8)<.0010.6 (0.2–2.9)12.0 (3.4–23.5)<.0010.8 (0.2–3.5)9.0 (2.4–21.5).297
 PVC coupling interval (ms)590 ± 119521 ± 106<.001576 ± 113511 ± 73<.001585 ± 108525 ± 101<.001
 HR (b.p.m.)74 ± 1184 ± 26<.00175 ± 1181 ± 16<.00177 ± 1584 ± 50<.001
 HRV SDNN153 ± 48160 ± 51.022147 ± 49152 ± 61.257144 ± 51161 ± 59.049
Indication, n (%)
 Palpitations71 648 (39.1)216 (26.9)<.00116 981 (39.0)48 (24.4)<.0017194 (35.1)21 (20.4).003
 Atrial fibrillation32 499 (17.7)86 (10.7)<.0016869 (15.8)6 (3.0)<.0013239 (15.8)17 (16.5).952
 Syncope23 089 (12.6)59 (7.3)<.0015468 (12.5)19 (9.6).2633330 (16.2)10 (9.7).097
 Arrhythmia8790 (4.8)47 (5.8).1931897 (4.4)12 (6.1).3091093 (5.3)8 (7.8).381
 Bradycardia6461 (3.5)30 (3.7).8281721 (3.9)11 (5.6).322757 (3.7)6 (5.8).378
 PVC3679 (2.0)114 (14.2)<.001901 (2.1)44 (22.3)<.001304 (1.5)14 (13.6)<.001
 VT1165 (0.6)57 (7.1)<.001286 (0.7)21 (10.7)<.001111 (0.5)14 (13.6)<.001
 Other35 846 (19.6)195 (24.3).0019457 (21.7)36 (18.3).2814469 (21.8)13 (12.6).033
 TrainingInternal validationExternal validation
 ControlVTP-valueControlVTP-valueControlVTP-value
n183 17780443 58019720 497103
Demographics
 Age (years)61.1 ± 17.863.4 ± 15.3<.00162.2 ± 18.563.2 ± 16.4.44261.3 ± 15.366.5 ± 12.7<.001
 Female (%)108 951 (59.5)244 (30.3)<.00126 887 (61.7)54 (27.4)<.00112 062 (58.8)21 (20.4)<.001
ECG measurements
 QRS (ms)100 ± 18107 ± 22<.00198 ± 17107 ± 20<.00197 ± 16105 ± 21<.001
 QTc (ms)401 ± 38409 ± 41<.001400 ± 37410 ± 40<.001397 ± 34409 ± 41<.001
 PVC burden, median (IQR)0.7 (0.2–3.1)8.9 (2.1–17.8)<.0010.6 (0.2–2.9)12.0 (3.4–23.5)<.0010.8 (0.2–3.5)9.0 (2.4–21.5).297
 PVC coupling interval (ms)590 ± 119521 ± 106<.001576 ± 113511 ± 73<.001585 ± 108525 ± 101<.001
 HR (b.p.m.)74 ± 1184 ± 26<.00175 ± 1181 ± 16<.00177 ± 1584 ± 50<.001
 HRV SDNN153 ± 48160 ± 51.022147 ± 49152 ± 61.257144 ± 51161 ± 59.049
Indication, n (%)
 Palpitations71 648 (39.1)216 (26.9)<.00116 981 (39.0)48 (24.4)<.0017194 (35.1)21 (20.4).003
 Atrial fibrillation32 499 (17.7)86 (10.7)<.0016869 (15.8)6 (3.0)<.0013239 (15.8)17 (16.5).952
 Syncope23 089 (12.6)59 (7.3)<.0015468 (12.5)19 (9.6).2633330 (16.2)10 (9.7).097
 Arrhythmia8790 (4.8)47 (5.8).1931897 (4.4)12 (6.1).3091093 (5.3)8 (7.8).381
 Bradycardia6461 (3.5)30 (3.7).8281721 (3.9)11 (5.6).322757 (3.7)6 (5.8).378
 PVC3679 (2.0)114 (14.2)<.001901 (2.1)44 (22.3)<.001304 (1.5)14 (13.6)<.001
 VT1165 (0.6)57 (7.1)<.001286 (0.7)21 (10.7)<.001111 (0.5)14 (13.6)<.001
 Other35 846 (19.6)195 (24.3).0019457 (21.7)36 (18.3).2814469 (21.8)13 (12.6).033

HR, heart rate; HRV, heart rate variability; IQR, interquartile range; PVC, premature ventricular complex; SDNN, standard deviation of NN intervals; VT, ventricular tachycardia.

Table 1

Baseline and indication characteristics

 TrainingInternal validationExternal validation
 ControlVTP-valueControlVTP-valueControlVTP-value
n183 17780443 58019720 497103
Demographics
 Age (years)61.1 ± 17.863.4 ± 15.3<.00162.2 ± 18.563.2 ± 16.4.44261.3 ± 15.366.5 ± 12.7<.001
 Female (%)108 951 (59.5)244 (30.3)<.00126 887 (61.7)54 (27.4)<.00112 062 (58.8)21 (20.4)<.001
ECG measurements
 QRS (ms)100 ± 18107 ± 22<.00198 ± 17107 ± 20<.00197 ± 16105 ± 21<.001
 QTc (ms)401 ± 38409 ± 41<.001400 ± 37410 ± 40<.001397 ± 34409 ± 41<.001
 PVC burden, median (IQR)0.7 (0.2–3.1)8.9 (2.1–17.8)<.0010.6 (0.2–2.9)12.0 (3.4–23.5)<.0010.8 (0.2–3.5)9.0 (2.4–21.5).297
 PVC coupling interval (ms)590 ± 119521 ± 106<.001576 ± 113511 ± 73<.001585 ± 108525 ± 101<.001
 HR (b.p.m.)74 ± 1184 ± 26<.00175 ± 1181 ± 16<.00177 ± 1584 ± 50<.001
 HRV SDNN153 ± 48160 ± 51.022147 ± 49152 ± 61.257144 ± 51161 ± 59.049
Indication, n (%)
 Palpitations71 648 (39.1)216 (26.9)<.00116 981 (39.0)48 (24.4)<.0017194 (35.1)21 (20.4).003
 Atrial fibrillation32 499 (17.7)86 (10.7)<.0016869 (15.8)6 (3.0)<.0013239 (15.8)17 (16.5).952
 Syncope23 089 (12.6)59 (7.3)<.0015468 (12.5)19 (9.6).2633330 (16.2)10 (9.7).097
 Arrhythmia8790 (4.8)47 (5.8).1931897 (4.4)12 (6.1).3091093 (5.3)8 (7.8).381
 Bradycardia6461 (3.5)30 (3.7).8281721 (3.9)11 (5.6).322757 (3.7)6 (5.8).378
 PVC3679 (2.0)114 (14.2)<.001901 (2.1)44 (22.3)<.001304 (1.5)14 (13.6)<.001
 VT1165 (0.6)57 (7.1)<.001286 (0.7)21 (10.7)<.001111 (0.5)14 (13.6)<.001
 Other35 846 (19.6)195 (24.3).0019457 (21.7)36 (18.3).2814469 (21.8)13 (12.6).033
 TrainingInternal validationExternal validation
 ControlVTP-valueControlVTP-valueControlVTP-value
n183 17780443 58019720 497103
Demographics
 Age (years)61.1 ± 17.863.4 ± 15.3<.00162.2 ± 18.563.2 ± 16.4.44261.3 ± 15.366.5 ± 12.7<.001
 Female (%)108 951 (59.5)244 (30.3)<.00126 887 (61.7)54 (27.4)<.00112 062 (58.8)21 (20.4)<.001
ECG measurements
 QRS (ms)100 ± 18107 ± 22<.00198 ± 17107 ± 20<.00197 ± 16105 ± 21<.001
 QTc (ms)401 ± 38409 ± 41<.001400 ± 37410 ± 40<.001397 ± 34409 ± 41<.001
 PVC burden, median (IQR)0.7 (0.2–3.1)8.9 (2.1–17.8)<.0010.6 (0.2–2.9)12.0 (3.4–23.5)<.0010.8 (0.2–3.5)9.0 (2.4–21.5).297
 PVC coupling interval (ms)590 ± 119521 ± 106<.001576 ± 113511 ± 73<.001585 ± 108525 ± 101<.001
 HR (b.p.m.)74 ± 1184 ± 26<.00175 ± 1181 ± 16<.00177 ± 1584 ± 50<.001
 HRV SDNN153 ± 48160 ± 51.022147 ± 49152 ± 61.257144 ± 51161 ± 59.049
Indication, n (%)
 Palpitations71 648 (39.1)216 (26.9)<.00116 981 (39.0)48 (24.4)<.0017194 (35.1)21 (20.4).003
 Atrial fibrillation32 499 (17.7)86 (10.7)<.0016869 (15.8)6 (3.0)<.0013239 (15.8)17 (16.5).952
 Syncope23 089 (12.6)59 (7.3)<.0015468 (12.5)19 (9.6).2633330 (16.2)10 (9.7).097
 Arrhythmia8790 (4.8)47 (5.8).1931897 (4.4)12 (6.1).3091093 (5.3)8 (7.8).381
 Bradycardia6461 (3.5)30 (3.7).8281721 (3.9)11 (5.6).322757 (3.7)6 (5.8).378
 PVC3679 (2.0)114 (14.2)<.001901 (2.1)44 (22.3)<.001304 (1.5)14 (13.6)<.001
 VT1165 (0.6)57 (7.1)<.001286 (0.7)21 (10.7)<.001111 (0.5)14 (13.6)<.001
 Other35 846 (19.6)195 (24.3).0019457 (21.7)36 (18.3).2814469 (21.8)13 (12.6).033

HR, heart rate; HRV, heart rate variability; IQR, interquartile range; PVC, premature ventricular complex; SDNN, standard deviation of NN intervals; VT, ventricular tachycardia.

The development set used to train the model consisted of 183 177 ambulatory ECG recordings. On the internal validation set (n = 43 580), the deep learning–based model achieved an AUROC of .957 (95% CI .943–.971) and AUPRC of .300 (95% CI .239–.376; Figure 2A and B). With a fixed operating point, the sensitivity, specificity, PPV, and NPV were 70.6% (95% CI 64.2%–77.2%), 97.7% (95% CI 97.6%–97.9%), 12.3% (95% CI 10.6%–14.4%), and 99.9% (95% CI 99.8%–99.9%), respectively. On the external validation data set (n = 20 497), the model yielded an AUROC of .948 (95% CI .926–.967) and AUPRC of .269 (95% CI .189–.362). The sensitivity, specificity, PPV, and NPV were 66.1% (95% CI 57.4%–75.2%), 97.0% (95% CI 96.8%–97.3%), 10.1% (95% CI 7.9%–12.6%), and 99.8% (95% CI 9.8%–99.9%), respectively. For comparison, we evaluated the performance of a multivariable logistic regression model using patient age, sex, QRS duration, QTc interval, PVC burden, HR, and HRV SDNN. On the internal and external validation sets, the reference model yielded an AUROC of .845 (95% CI .806–.879) and .847 (95% CI .817–.876), AUPRC of .05 (95% CI .026–.094) and .039 (95% CI .031–.056), sensitivity of 49.7% (95% CI 41.7%–57.6%) and 45.6% (95% CI 34.7%–56.4%), and specificity of 88.2% (95% CI 87.8%–88.6%) and 90% (95% CI 89.5%–90.6%). Performance when using different input combinations is provided in Supplementary data online, Table S1 and confusion matrices in Supplementary data online, Tables S2 and S3.

Model performance. (A) Receiver operating characteristic curves of the deep learning–based model on the internal and external validation sets. (B) Precision–recall curves of the deep learning–based model on the internal and external validation set. The curves show the trade-off between sensitivity and positive predictive value. (C) Area under the receiver operating characteristic curve of the model when evaluated at various prediction horizons. Each model was compared with the respective 13 days prediction horizon model on the external validation set using the DeLong method. Significance levels are denoted as ns (not significant) for P ≥ .05; * for P < .05, and ** for P < .001. Using bootstrapping with 1000 samples, 95% confidence intervals were computed. Error bars indicate 95% confidence intervals. (D) Area under the receiver operating characteristic curve of the model when using shorter input durations to predict the risk of ventricular tachycardia. Each model was compared with the respective 24 h model on the external validation set using the DeLong method. Significance levels are denoted as ns (not significant) for P ≥ .05; * for P < .05, and ** for P < .001. Using bootstrapping with 1000 samples, 95% confidence intervals were computed. Error bars indicate 95% confidence intervals
Figure 2

Model performance. (A) Receiver operating characteristic curves of the deep learning–based model on the internal and external validation sets. (B) Precision–recall curves of the deep learning–based model on the internal and external validation set. The curves show the trade-off between sensitivity and positive predictive value. (C) Area under the receiver operating characteristic curve of the model when evaluated at various prediction horizons. Each model was compared with the respective 13 days prediction horizon model on the external validation set using the DeLong method. Significance levels are denoted as ns (not significant) for P ≥ .05; * for P < .05, and ** for P < .001. Using bootstrapping with 1000 samples, 95% confidence intervals were computed. Error bars indicate 95% confidence intervals. (D) Area under the receiver operating characteristic curve of the model when using shorter input durations to predict the risk of ventricular tachycardia. Each model was compared with the respective 24 h model on the external validation set using the DeLong method. Significance levels are denoted as ns (not significant) for P ≥ .05; * for P < .05, and ** for P < .001. Using bootstrapping with 1000 samples, 95% confidence intervals were computed. Error bars indicate 95% confidence intervals

We observed consistent performance improvements when evaluating the model’s ability to predict VT risk at prediction horizons shorter than 2 days, compared with the 13 day prediction, with a significant difference in AUROCs (P < .005; Figure 2C). While using 24 h of monitoring to predict 3 day VT risk, the AUROC improved to .96 (95% CI .948–.972) and .952 (95% CI .933–.974) on the internal and external validation sets, respectively. To predict the occurrence of VT in the very next hour, the model achieved AUROCs of .970 (95% CI .960–.982) and .961 (95% CI .946–.966). Additionally, there was a significant drop in performance for both internal and external validation sets when using an input duration <6 h compared with the 24 h model (P < .001; Figure 2D).

We evaluated model performance across subgroups of patient age and sex (Table 2). The model showed consistent performance for both sexes (P = .84). Model performance was comparatively lower in older patients compared with younger patients (P = .15). We also analysed the performance according to VT rate, notably using the internal and external validation data sets, which consisted of 57 (28.9%) and 37 (35.9%) recordings with rapid VT (≥180 b.p.m.), respectively (Table 3). We observed that the model correctly predicted VT occurrence in 80.7% and 81.1% of recordings with rapid VT (≥180 b.p.m.) on the internal and external validation sets, respectively (see Supplementary data online, Table S4). Notably, the model identified 9 of the 10 recordings where VT degenerated into VF among the validation sets.

Table 2

Internal and external validation performance by subgroups

 nn VTAUROC (95% CI)AUPRC (95% CI)Sens. (95% CI)Spe. (95% CI)PPV (95% CI)NPV (95% CI)
Internal validation
Age
 18–6520 7899797.1 (95.3–98.6)44.0 (34.0–54.8)74.2 (64.8–82.5)98.6 (98.4–98.7)19.4 (15.3–23.3)99.9 (99.8–99.9)
 ≥6522 98810094.2 (91.7–96.3)18.1 (12.7–27.4)67.0 (57.3–75.8)97.0 (96.8–97.2)8.9 (7.0–11.0)99.9 (99.8–99.9)
Sex
 Male16 83614395.0 (93.4–96.4)32.7 (25.3–41.2)71.3 (63.6–78.5)96.1 (95.8–96.4)13.6 (11.4–16.2)99.7 (99.7–99.8)
 Female26 9415494.6 (90.8–97.8)26.2 (15.3–39.0)68.5 (55.3–80.9)98.7 (98.6–98.9)9.8 (7.0–13.1)99.9 (99.9–100.0)
External validation
Age
 18–6510 6344195.1 (91.6–98.2)33.1 (19.7–48.0)63.4 (48.6–77.6)98.8 (98.5–99.0)16.5 (10.9–22.3)99.9 (99.8–99.9)
 ≥6599666294.0 (91.4–96.2)24.3 (14.7–37.0)45.2 (32.3–57.6)97.8 (97.5–98.1)11.6 (7.8–15.8)99.7 (99.5–99.8)
Sex
 Male85178293.2 (90.7–95.5)28.8 (19.5–39.4)50.0 (39.4–60.5)97.2 (96.8–97.5)14.7 (10.7–19.2)99.5 (99.3–99.7)
 Female12 0832194.5 (88.4–99.1)20.9 (9.6–41.3)61.9 (40.0–81.2)99.1 (98.9–99.3)10.7 (5.2–16.7)99.9 (99.9–100.0)
 nn VTAUROC (95% CI)AUPRC (95% CI)Sens. (95% CI)Spe. (95% CI)PPV (95% CI)NPV (95% CI)
Internal validation
Age
 18–6520 7899797.1 (95.3–98.6)44.0 (34.0–54.8)74.2 (64.8–82.5)98.6 (98.4–98.7)19.4 (15.3–23.3)99.9 (99.8–99.9)
 ≥6522 98810094.2 (91.7–96.3)18.1 (12.7–27.4)67.0 (57.3–75.8)97.0 (96.8–97.2)8.9 (7.0–11.0)99.9 (99.8–99.9)
Sex
 Male16 83614395.0 (93.4–96.4)32.7 (25.3–41.2)71.3 (63.6–78.5)96.1 (95.8–96.4)13.6 (11.4–16.2)99.7 (99.7–99.8)
 Female26 9415494.6 (90.8–97.8)26.2 (15.3–39.0)68.5 (55.3–80.9)98.7 (98.6–98.9)9.8 (7.0–13.1)99.9 (99.9–100.0)
External validation
Age
 18–6510 6344195.1 (91.6–98.2)33.1 (19.7–48.0)63.4 (48.6–77.6)98.8 (98.5–99.0)16.5 (10.9–22.3)99.9 (99.8–99.9)
 ≥6599666294.0 (91.4–96.2)24.3 (14.7–37.0)45.2 (32.3–57.6)97.8 (97.5–98.1)11.6 (7.8–15.8)99.7 (99.5–99.8)
Sex
 Male85178293.2 (90.7–95.5)28.8 (19.5–39.4)50.0 (39.4–60.5)97.2 (96.8–97.5)14.7 (10.7–19.2)99.5 (99.3–99.7)
 Female12 0832194.5 (88.4–99.1)20.9 (9.6–41.3)61.9 (40.0–81.2)99.1 (98.9–99.3)10.7 (5.2–16.7)99.9 (99.9–100.0)

AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision–recall curve; NPV, negative predictive value; PPV, positive predictive value; Sens., sensitivity; Spe., specificity.

Table 2

Internal and external validation performance by subgroups

 nn VTAUROC (95% CI)AUPRC (95% CI)Sens. (95% CI)Spe. (95% CI)PPV (95% CI)NPV (95% CI)
Internal validation
Age
 18–6520 7899797.1 (95.3–98.6)44.0 (34.0–54.8)74.2 (64.8–82.5)98.6 (98.4–98.7)19.4 (15.3–23.3)99.9 (99.8–99.9)
 ≥6522 98810094.2 (91.7–96.3)18.1 (12.7–27.4)67.0 (57.3–75.8)97.0 (96.8–97.2)8.9 (7.0–11.0)99.9 (99.8–99.9)
Sex
 Male16 83614395.0 (93.4–96.4)32.7 (25.3–41.2)71.3 (63.6–78.5)96.1 (95.8–96.4)13.6 (11.4–16.2)99.7 (99.7–99.8)
 Female26 9415494.6 (90.8–97.8)26.2 (15.3–39.0)68.5 (55.3–80.9)98.7 (98.6–98.9)9.8 (7.0–13.1)99.9 (99.9–100.0)
External validation
Age
 18–6510 6344195.1 (91.6–98.2)33.1 (19.7–48.0)63.4 (48.6–77.6)98.8 (98.5–99.0)16.5 (10.9–22.3)99.9 (99.8–99.9)
 ≥6599666294.0 (91.4–96.2)24.3 (14.7–37.0)45.2 (32.3–57.6)97.8 (97.5–98.1)11.6 (7.8–15.8)99.7 (99.5–99.8)
Sex
 Male85178293.2 (90.7–95.5)28.8 (19.5–39.4)50.0 (39.4–60.5)97.2 (96.8–97.5)14.7 (10.7–19.2)99.5 (99.3–99.7)
 Female12 0832194.5 (88.4–99.1)20.9 (9.6–41.3)61.9 (40.0–81.2)99.1 (98.9–99.3)10.7 (5.2–16.7)99.9 (99.9–100.0)
 nn VTAUROC (95% CI)AUPRC (95% CI)Sens. (95% CI)Spe. (95% CI)PPV (95% CI)NPV (95% CI)
Internal validation
Age
 18–6520 7899797.1 (95.3–98.6)44.0 (34.0–54.8)74.2 (64.8–82.5)98.6 (98.4–98.7)19.4 (15.3–23.3)99.9 (99.8–99.9)
 ≥6522 98810094.2 (91.7–96.3)18.1 (12.7–27.4)67.0 (57.3–75.8)97.0 (96.8–97.2)8.9 (7.0–11.0)99.9 (99.8–99.9)
Sex
 Male16 83614395.0 (93.4–96.4)32.7 (25.3–41.2)71.3 (63.6–78.5)96.1 (95.8–96.4)13.6 (11.4–16.2)99.7 (99.7–99.8)
 Female26 9415494.6 (90.8–97.8)26.2 (15.3–39.0)68.5 (55.3–80.9)98.7 (98.6–98.9)9.8 (7.0–13.1)99.9 (99.9–100.0)
External validation
Age
 18–6510 6344195.1 (91.6–98.2)33.1 (19.7–48.0)63.4 (48.6–77.6)98.8 (98.5–99.0)16.5 (10.9–22.3)99.9 (99.8–99.9)
 ≥6599666294.0 (91.4–96.2)24.3 (14.7–37.0)45.2 (32.3–57.6)97.8 (97.5–98.1)11.6 (7.8–15.8)99.7 (99.5–99.8)
Sex
 Male85178293.2 (90.7–95.5)28.8 (19.5–39.4)50.0 (39.4–60.5)97.2 (96.8–97.5)14.7 (10.7–19.2)99.5 (99.3–99.7)
 Female12 0832194.5 (88.4–99.1)20.9 (9.6–41.3)61.9 (40.0–81.2)99.1 (98.9–99.3)10.7 (5.2–16.7)99.9 (99.9–100.0)

AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision–recall curve; NPV, negative predictive value; PPV, positive predictive value; Sens., sensitivity; Spe., specificity.

Table 3

Distribution of the longest ventricular tachycardia duration and the maximum ventricular tachycardia rate among positive recordings in each data set

 Number of recordings (%)
TrainingInternal validationExternal validation
VT duration (s)
 30–60328 (40.8)78 (39.6)42 (40.8)
 60–240296 (36.8)77 (39.1)35 (34.0)
 240–60082 (10.2)22 (11.2)7 (6.8)
 ≥60098 (12.2)20 (10.2)19 (18.4)
VT rate (b.p.m.)
 100–150355 (44.2)82 (41.6)46 (44.7)
 150–180210 (26.1)58 (29.4)20 (19.4)
 ≥180239 (29.7)57 (28.9)37 (35.9)
 Number of recordings (%)
TrainingInternal validationExternal validation
VT duration (s)
 30–60328 (40.8)78 (39.6)42 (40.8)
 60–240296 (36.8)77 (39.1)35 (34.0)
 240–60082 (10.2)22 (11.2)7 (6.8)
 ≥60098 (12.2)20 (10.2)19 (18.4)
VT rate (b.p.m.)
 100–150355 (44.2)82 (41.6)46 (44.7)
 150–180210 (26.1)58 (29.4)20 (19.4)
 ≥180239 (29.7)57 (28.9)37 (35.9)

VT, ventricular tachycardia.

Table 3

Distribution of the longest ventricular tachycardia duration and the maximum ventricular tachycardia rate among positive recordings in each data set

 Number of recordings (%)
TrainingInternal validationExternal validation
VT duration (s)
 30–60328 (40.8)78 (39.6)42 (40.8)
 60–240296 (36.8)77 (39.1)35 (34.0)
 240–60082 (10.2)22 (11.2)7 (6.8)
 ≥60098 (12.2)20 (10.2)19 (18.4)
VT rate (b.p.m.)
 100–150355 (44.2)82 (41.6)46 (44.7)
 150–180210 (26.1)58 (29.4)20 (19.4)
 ≥180239 (29.7)57 (28.9)37 (35.9)
 Number of recordings (%)
TrainingInternal validationExternal validation
VT duration (s)
 30–60328 (40.8)78 (39.6)42 (40.8)
 60–240296 (36.8)77 (39.1)35 (34.0)
 240–60082 (10.2)22 (11.2)7 (6.8)
 ≥60098 (12.2)20 (10.2)19 (18.4)
VT rate (b.p.m.)
 100–150355 (44.2)82 (41.6)46 (44.7)
 150–180210 (26.1)58 (29.4)20 (19.4)
 ≥180239 (29.7)57 (28.9)37 (35.9)

VT, ventricular tachycardia.

To understand which features contributed to the prediction of VT by the model, we generated saliency maps to highlight regions of the HRDP and ECG signal with a strong impact on the model’s prediction. In the HRDP, we confirmed that PVC burden is a key predictor of VT (Figure 3A).14 Among ECGs in sinus rhythm, we observed three localizations of the signal commonly highlighted, which include the region before the onset of the QRS, the first slope of the QRS, and along the ST segment (Figure 3B). Additional saliency maps are provided in Supplementary data online, Figures S2 and S3.

Saliency maps. (A) Gradient-weighted class activation mapping saliency map overlaid on a heart rate density plot of a true positive. The plot shows sinus (black) and ventricular (red) beats from a 12 h recording. Regions highlighted in red signify higher importance. (B) Saliency map computed using integrated gradients overlaid on an electrocardiogram signal of a true-positive recording in sinus rhythm
Figure 3

Saliency maps. (A) Gradient-weighted class activation mapping saliency map overlaid on a heart rate density plot of a true positive. The plot shows sinus (black) and ventricular (red) beats from a 12 h recording. Regions highlighted in red signify higher importance. (B) Saliency map computed using integrated gradients overlaid on an electrocardiogram signal of a true-positive recording in sinus rhythm

Discussion

In this study, we developed and validated a novel deep learning–based model to predict near-term VT using an ambulatory ECG. This model, trained using a large volume of ambulatory records, showed robust performance in both internal and external validation data sets. The performance on the external validation set indicates that this model may generalize to patient populations not encountered during training. Furthermore, using saliency mapping, in addition to the importance of PVC burden, we identified QRS fragmentation, the first slope of QRS and the region before the onset of the QRS to be potential determinants of VT risk in the model (Structured Graphical Abstract ). These findings have significant implications for developing a ‘near-term’ prevention novel approach for SCD.1,3,24

Given the limitations of the current strategy of mid- and long-term SCD prevention based on risk stratification of patients with underlying heart disease, it has become increasingly important to explore alternative approaches. However, no prediction tool is currently used in clinical practice, especially for short-term horizons. While the management of patients flagged as high risk for VT remains uncertain, an interventional approach based on VT prediction has yet to be established and validated through randomized trials. A recent randomized study25 demonstrated a short-term mortality benefit from using an AI-ECG model capable of identifying patients at high risk of mortality, which led to more intensive surveillance, diagnostic examinations, and therapeutic actions. Moreover, in this study, cardiac mortality, including arrhythmias, was significantly lower in the intervention group. As we acknowledge the relatively low PPV reported in Table 2 with a fixed sensitivity and specificity, which is partly due to the relatively low prevalence of VT in our study population, the model’s sensitivity and specificity can be adjusted based on its intended use, depending on whether a high PPV or high NPV is preferred (Figure 2). Our model may have numerous applications across different clinical settings. In the outpatient setting, patients monitored with mobile cardiac telemetry could benefit from a triggered alert preceding the onset of a life-threatening arrhythmia, allowing pre-emptive actions. During hospitalization, fatal events may be prevented hours to days before they occur with a new AI-based ‘smart-monitoring’ system. The performance of this model using a single-lead ECG also paves the way for its integration with smartwatches or implantable loop recorders, enabling remote patient monitoring and pre-emptive interventions. Recent work has demonstrated the ability of deep learning to detect numerous cardiovascular diseases, including valvular heart disease, hypertrophic cardiomyopathy, future atrial fibrillation, and also cardiac arrest from the ECG.11,12,26–28 The performance obtained in our study could be explained by the ambulatory ECG containing both structural and temporal information reflecting the complexity of the interactions between the autonomic nervous system, substrate, and triggers. Although this study focuses on near-term prediction, those elements of the ECG captured by the algorithm may potentially also be used to improve longer term risk stratification and prevention, although this concept needs further testing.

Additionally, we analysed saliency maps of positive ambulatory recordings, acknowledging their hypothesis-generating nature, to explore whether certain features were more influential in the model’s predictions of VT risk. In the HRDP, we confirmed that salient regions were generally focused on dense regions of PVCs (Figure 3A and Supplementary data online, Figure S2). On ECG waveform analysis during sinus rhythm, our findings suggest that the beginning of the QRS may have a strong impact on the prediction. A QRS that has either a slow fragmented slope or an early depolarization pattern, identified as a low-voltage fragmented wave occurring during the 40 ms prior to the apparent onset of the QRS, was commonly encountered in ambulatory recordings with VT occurrence (Figure 3B and Supplementary data online, Figure S3). One hypothesis could be that this pattern reflects abnormal conduction in the myocardium, through scaring near the Purkinje system. While we recognize the importance of developing mechanistic insights from the AI-based prediction of VT, we must be cautious in interpreting these saliency maps. It is still early, and drawing premature conclusions about the underlying factors identified by the AI model should be avoided. Future studies should focus on carefully analysing the specific parameters that the model uses to make these predictions. Premature ventricular complex burden and traditional features are recognized as classical potential predictors of sustained VT.2 However, no widely accepted quantitative VT risk score currently exists based on Holter ECG metrics. In our study, we demonstrated superior performance with an AI model that utilizes the entire raw ECG signal, compared with a logistic regression model built with classical variables.

Although this is one of the first evaluations of a deep learning–based approach to predict sustained VT using the largest database of ambulatory ECGs, we acknowledge several limitations. Firstly, we lacked associated clinical and race/ethnicity data; however, given the large amount of international data used here without filtering or selection, it is likely that a wide variety of pathologies and patients with and without heart disease were included. Our objective was to demonstrate the feasibility of an AI model in predicting ventricular arrhythmias purely based on electrical data (the intrinsic value of isolated electrical signal analysis, beyond the underlying cardiac substrate), without the added value of additional clinical information. It would be interesting for future studies to explore how different AI-based tools align as new models are developed. For now, this study remains focused on demonstrating proof of concept—that an AI-based tool can effectively predict VT in the short term. It is also important to note that the model’s predictions are based solely on the raw ECG signal, with the exception of age and sex, and do not require any preliminary analysis by humans or external software. However, further studies are needed to assess how the model performs across specific patient populations and clinical profiles. Secondly, the number of positive ambulatory recordings in our external validation set was limited (n = 103); however, to the best of our knowledge, it is one of the largest reported databases of sustained VT on ambulatory ECG recordings. Thirdly, this study is a retrospective analysis involving previously collected ambulatory recordings and will need to be further assessed in prospective studies before translation to the clinical arena. Retrospective analysis can introduce selection bias and may not fully capture temporal relationships, limiting the generalizability of our findings; thus, validation in well-designed prospective trials is essential to confirm also the feasibility of such a concept. Additionally, we acknowledge that sustained VT does not necessarily lead to SCD, and we lack complete information on the final clinical outcomes of the patients in our cohort, except for 10 cases where VT progressed to VF. The haemodynamic tolerance of VT depends on various factors, such as heart rate and the presence of underlying heart disease. However, even slower VTs can result in fatal heart failure or escalate to VF. Given this potential deterioration into VF, sustained VT is recognized in clinical guidelines14,15 as a serious condition that often warrants ICD implantation to prevent SCD. Notably, the model accurately identified 9 out of the 10 recordings where VT degenerated into VF within the validation set. Furthermore, the exclusion of recordings shorter than 13 days and without VT for model testing leads to an increase in disease prevalence, which may enhance certain metrics (PPV, AUPRC). However, sensitivity and specificity should theoretically remain unaffected. For prediction horizons of <13 days (Figure 2C), including these shorter recordings could potentially alter some performance metrics. Finally, despite the identification of well-known and novel determinants of VT risk on ECG through saliency mapping, we recognize that this method has its limitations. Notably, the locations revealed in the ECG waveform should not be considered exhaustive, and the link between the highlighted locations and the exact pathophysiology should be regarded as hypothesis generating.29

Conclusions

Using a large cohort of patients, we developed and validated a novel deep learning–based model to predict near-term risk of sustained ventricular arrhythmias from a single-lead ambulatory ECG. This tool could potentially lay the foundation for a new approach towards SCD risk management and improve patient outcomes.

Supplementary data

Supplementary data are available at European Heart Journal online.

Declarations

Disclosure of Interest

L.F. is a medical expert at Cardiologs, a Philips company. T.C., J.L., and C.H. are employed at Cardiologs. J.P.S. is a consultant for Abbott Inc., Boston Scientific, Biotronik, Biosense Webster, Cardiologs Inc., CVRx Inc., EBR Systems Inc., Implicity Inc., Impulse Dynamics, Rhythm Management Group, Medtronic Inc., Sanofi Inc., and WebMD. K.N. has no competing interests. E.M. is a consultant for Boston Scientific, Medtronic, Abbott, and Zoll, and received research grants from Abbott, Boston Scientific, Medtronic, Microport, and Biotronik.

Data Availability

All the data collected for the study cannot be publicly released due to European regulations and the requirement of permission from the original data owners for research purposes. However, if you have academic inquiries, you can reach out to the authors (E.M., Université Paris Cité: [email protected]) to obtain access to de-identified data through a Data Transfer Agreement procedure. Please note that the AI algorithm used in the study is patented, and we do not have the rights to share it. If you are interested in using the algorithm for an academic project, you can contact the authors (L.F.: [email protected] or T.C.: [email protected]) to discuss the possibility of applying for an agreement procedure with Cardiologs/Philips. The Python code used to generate the data and perform analyses is available at https://github.com/carbonati/predict-vt.

Funding

Cardiologs enabled the collection of the database and enabled the provision of the human resources necessary for data management and the development of the algorithm and its validation.

Ethical Approval

The study protocol was approved by the local Institutional Review Board, and the need for individual informed consent was waived.

Pre-registered Clinical Trial Number

Not applicable. Since this is a retrospective and non-interventional study, there was no pre-registered clinical trial number.

References

1

Marijon
 
E
,
Narayanan
 
K
,
Smith
 
K
,
Barra
 
S
,
Basso
 
C
,
Blom
 
MT
, et al.  
The Lancet Commission to reduce the global burden of sudden cardiac death: a call for multidisciplinary action
.
Lancet
 
2023
;
402
:
883
936
.

2

Lown
 
B
,
Wolf
 
M
.
Approaches to sudden death from coronary heart disease
.
Circulation
 
1971
;
44
:
130
42
.

3

Marijon
 
E
,
Garcia
 
R
,
Narayanan
 
K
,
Karam
 
N
,
Jouven
 
X
.
Fighting against sudden cardiac death: need for a paradigm shift-adding near-term prevention and pre-emptive action to long-term prevention
.
Eur Heart J
 
2022
;
43
:
1457
64
.

4

Sasson
 
C
,
Rogers
 
MAM
,
Dahl
 
J
,
Kellermann
 
AL
.
Predictors of survival from out-of-hospital cardiac arrest
.
Circ Cardiovasc Qual Outcomes
 
2010
;
3
:
63
81
.

5

Goldberger
 
JJ
,
Cain
 
ME
,
Hohnloser
 
SH
,
Kadish
 
AH
,
Knight
 
BP
,
Lauer
 
MS
, et al.  
American Heart Association/American College of Cardiology Foundation/Heart Rhythm Society scientific statement on noninvasive risk stratification techniques for identifying patients at risk for sudden cardiac death. A scientific statement from the American Heart Association Council on Clinical Cardiology Committee on Electrocardiography and Arrhythmias and Council on Epidemiology and Prevention
.
J Am Coll Cardiol
 
2008
;
52
:
1179
99
.

6

Trayanova
 
NA
,
Topol
 
EJ
.
Deep learning a person’s risk of sudden cardiac death
.
Lancet
 
2022
;
399
:
1933
.

7

Stecker
 
EC
,
Vickers
 
C
,
Waltz
 
J
,
Socoteanu
 
C
,
John
 
BT
,
Mariani
 
R
, et al.  
Population-based analysis of sudden cardiac death with and without left ventricular systolic dysfunction: two-year findings from the Oregon Sudden Unexpected Death Study
.
J Am Coll Cardiol
 
2006
;
47
:
1161
6
.

8

Coumel
 
P
.
The management of clinical arrhythmias. An overview on invasive versus non-invasive electrophysiology
.
Eur Heart J
 
1987
;
8
:
92
9
.

9

Shen
 
MJ
,
Zipes
 
DP
.
Role of the autonomic nervous system in modulating cardiac arrhythmias
.
Circ Res
 
2014
;
114
:
1004
21
.

10

LeCun
 
Y
,
Bengio
 
Y
,
Hinton
 
G
.
Deep learning
.
Nature
 
2015
;
521
:
436
44
.

11

Attia
 
ZI
,
Noseworthy
 
PA
,
Lopez-Jimenez
 
F
,
Asirvatham
 
SJ
,
Deshmukh
 
AJ
,
Gersh
 
BJ
, et al.  
An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction
.
Lancet
 
2019
;
394
:
861
7
.

12

Singh
 
JP
,
Fontanarava
 
J
,
de Massé
 
G
,
Carbonati
 
T
,
Li
 
J
,
Henry
 
C
, et al.  
Short-term prediction of atrial fibrillation from ambulatory monitoring ECG using a deep neural network
.
Eur Heart J Digit Health
 
2022
;
3
:
208
17
.

13

Savage
 
N
.
Breaking into the black box of artificial intelligence
.
Nature
 
2022
.

14

Zeppenfeld
 
K
,
Tfelt-Hansen
 
J
,
de Riva
 
M
,
Winkel
 
BG
,
Behr
 
ER
,
Blom
 
NA
, et al.  
2022 ESC guidelines for the management of patients with ventricular arrhythmias and the prevention of sudden cardiac death
.
Eur Heart J
 
2022
;
43
:
3997
4126
.

15

Al-Khatib
 
SM
,
Stevenson
 
WG
,
Ackerman
 
MJ
,
Bryant
 
WJ
,
Callans
 
DJ
,
Curtis
 
AB
, et al.  
2017 AHA/ACC/HRS guideline for management of patients with ventricular arrhythmias and the prevention of sudden cardiac death
.
Circulation
 
2018
;
138
:
e272
391
.

16

Fiorina
 
L
,
Maupain
 
C
,
Gardella
 
C
,
Manenti
 
V
,
Salerno
 
F
,
Socie
 
P
, et al.  
Evaluation of an ambulatory ECG analysis platform using deep neural networks in routine clinical practice
.
J Am Heart Assoc
 
2022
;
11
:
e026196
.

17

He
 
K
,
Zhang
 
X
,
Ren
 
S
,
Sun
 
J.
 Deep residual learning for image recognition. In:
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.
Las Vegas
:
IEEE
,
2016
,
770
8
.

18

Ilse
 
M
,
Tomczak
 
JM
,
Welling
 
M.
 
Attention-based deep multiple instance learning
.
arXiv
, , 13 February 2018, preprint: not peer reviewed.

19

Vaswani
 
A
,
Shazeer
 
N
,
Parmar
 
N
,
Uszkoreit
 
J
,
Jones
 
L
,
Gomez
 
AN
, et al.  Attention is all you need. In:
Guyon
 
I
,
Luxburg
 
UV
,
Bengio
 
S
,
Wallach
 
H
,
Fergus
 
R
,
Vishwanathan
 
S
(eds.),
Advances in Neural Information Processing Systems
:
Curran Associates, Inc.
,
2017
,
5998
6008
.

20

Zhang
 
H
,
Wu
 
C
,
Zhang
 
Z
,
Zhu
 
Y
,
Zhang
 
Z
,
Lin
 
H
, et al.  
ResNeSt: split-attention networks
.
arXiv
, , 19 April 2020, preprint: not peer reviewed.

21

Selvaraju
 
RR
,
Cogswell
 
M
,
Das
 
A
,
Vedantam
 
R
,
Parikh
 
D
,
Batra
 
D
. Grad-CAM: visual explanations from deep networks via gradient-based localization. In:
2017 IEEE International Conference on Computer Vision (ICCV)
.
Venice, Italy
:
IEEE
,
2017
,
618
26
.

22

Sundararajan
 
M
,
Taly
 
A
,
Yan
 
Q
.
Axiomatic attribution for deep networks
.
arXiv
, , 4 March 2017, preprint: not peer reviewed.

23

DeLong
 
ER
,
DeLong
 
DM
,
Clarke-Pearson
 
DL
.
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach
.
Biometrics
 
1988
;
44
:
837
45
.

24

Marijon
 
E
,
Uy-Evanado
 
A
,
Dumas
 
F
,
Karam
 
N
,
Reinier
 
K
,
Teodorescu
 
C
, et al.  
Warning symptoms are associated with survival from sudden cardiac arrest
.
Ann Intern Med
 
2016
;
164
:
23
9
.

25

Lin
 
C-S
,
Liu
 
W-T
,
Tsai
 
D-J
,
Lou
 
Y-S
,
Chang
 
C-H
,
Lee
 
C-C
, et al.  
AI-enabled electrocardiography alert intervention and all-cause mortality: a pragmatic randomized clinical trial
.
Nat Med
 
2024
;
30
:
1461
70
.

26

Cohen-Shelly
 
M
,
Attia
 
ZI
,
Friedman
 
PA
,
Ito
 
S
,
Essayagh
 
BA
,
Ko
 
WY
, et al.  
Electrocardiogram screening for aortic valve stenosis using artificial intelligence
.
Eur Heart J
 
2021
;
42
:
2885
96
.

27

Ko
 
W-Y
,
Siontis
 
KC
,
Attia
 
ZI
,
Carter
 
RE
,
Kapa
 
S
,
Ommen
 
SR
, et al.  
Detection of hypertrophic cardiomyopathy using a convolutional neural network-enabled electrocardiogram
.
J Am Coll Cardiol
 
2020
;
75
:
722
33
.

28

Kwon
 
J-M
,
Kim
 
K-H
,
Jeon
 
K-H
,
Lee
 
SY
,
Park
 
J
,
Oh
 
B-H
.
Artificial intelligence algorithm for predicting cardiac arrest using electrocardiography
.
Scand J Trauma Resusc Emerg Med
 
2020
;
28
:
98
.

29

Arun
 
N
,
Gaw
 
N
,
Singh
 
P
,
Chang
 
K
,
Aggarwal
 
M
,
Chen
 
B
, et al.  
Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging
.
Radiol Artif Intell
 
2021
;
3
:
e200267
.

Author notes

Laurent Fiorina and Tanner Carbonati contributed equally to this work.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Supplementary data