Abstract

Background/Introduction

Despite the recent increase in the availability of different data sources that can be used for prediction models for cardiovascular disease (CVD), it remains unclear to what extent such data could contribute to improving performance of the models in data-driven cardiovascular research.

Purpose

To compare the contribution of different data types in basic clinical factors, the European Society of Cardiology Systematic Coronary Risk Evaluation (ESC SCORE), and multidimensional risk factors for CVD prediction performance of artificial neural networks (ANN) using the relevant input features derived from a large-scale medical claims database.

Methods

We abstracted data through the National Health Insurance Sharing Service and collected information on 258,896 middle-aged individuals free of CVD at baseline (2009–2010) who were followed up for incident CVD until 2013. Multidimensional risk factors identifiable from the database were chosen from a systematic review of published articles. Input features in ANN were classified as follows: basic clinical factors (age, sex, and body mass index), ESC SCORE (age, sex, total cholesterol, systolic blood pressure, and cigarette smoking), and multidimensional risk factors (sociodemographic, lifestyle behavior, underlying medical conditions, dental health, medication use, etc). The data were partitioned into the training and test sets with 7:3 ratio and the performance of each ANN model was evaluated with area under the curve (AUC).

Results

The ANN model with multidimensional risk factors had higher prediction performance (AUC: 0.692) compared to the models with basic clinical factors (AUC: 0.671) and ESC SCORE (AUC: 0.684). Within the multidimensional risk factors, atrial fibrillation, family history, chronic kidney disease, retinal vein occlusion, dental caries, antipsychotics, and corticosteroid use were some of the strong predictors. However, adding multidimensional risk factors only showed marginal improvement (increase in 1.17% of AUC) compared with the ESC SCORE model.

Conclusions

Adding multidimensional risk factors as input features in the ANN only showed marginal improvement in the CVD prediction performance. When assessing cardiovascular risk from the large-scale healthcare data, variables included in the ESC SCORE should primarily be considered in the model.

Performance of the ANN models for CVD

ANN model (by input features)AUCChange in AUC*
Basic Clinical Factors0.671
ESC SCORE0.684+1.93%
Multidimensional Risk Factors0.692+3.31%
ANN model (by input features)AUCChange in AUC*
Basic Clinical Factors0.671
ESC SCORE0.684+1.93%
Multidimensional Risk Factors0.692+3.31%

*Change from the Model with Basic Clinical Factors.

Funding Acknowledgement

Type of funding source: Public grant(s) – National budget only. Main funding source(s): Kyuwoong Kim received a scholarship from the BK21-plus education program provided by the National Research Foundation of the Republic of Korea. This work is a part of Kyuwoong Kim's PhD dissertation.

This content is only available as a PDF.
This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)