Key points
  • In a blinded, randomized, non-inferiority clinical trial, the performance of artificial intelligence (AI) against sonographer assessment on cardiologist evaluation of left ventricular ejection fraction (LVEF) was tested.1

  • A total of 3769 echocardiographic studies present in the Cedars-Sinai Medical Centre database were initially screened for subsequent analysis; 274 of them were excluded because of poor image quality, and the remaining 3495 were analysed using an AI model trained at Stanford healthcare.2 Twenty-five sonographers quantified LVEF, using either monoplane 4-chamber (2249 studies) or biplane (1246 studies) Simpson method. Ten cardiologists were subsequently exposed to either sonographer or AI quantitative analysis of LVEF in a strictly blinded fashion and reported their final evaluation.

  • The primary endpoint was the proportion of studies with more than 5% change between AI or sonographer evaluation and cardiologist final assessment. Subgroup analysis of primary endpoint was performed between method of LVEF evaluation, race, sex, image quality, location of the patient, and group of cardiologist prediction. The secondary safety outcome was a substantial difference between final cardiologist evaluation and the initial report.

  • The primary outcome occurred in 292 (16.8%) studies in the AI group and 478 (27.2%) studies in the sonographer group [difference of −10.4%; 95% confidence interval (CI), −13.2% to −7.7%; P < 0.001 for non-inferiority and P < 0.001 for superiority]. Such a reduction in the primary outcome of the AI group was consistent among all subgroups tested. The mean absolute difference between the AI evaluation and cardiologist assessment of LVEF was 2.79%, while it was 3.77% between the sonographer and cardiologist evaluation (difference −0.97%; 95% CI, −1.33% to −0.54%; P < 0.001 for superiority).

  • The mean absolute difference between the original report and final cardiologist evaluation was 6.29% in the AI group and 7.23% in the sonographer group (difference −0.94%; 95% CI: −1.34% to −0.54%; P < 0.001 for superiority).

  • Cardiologists needed a median of 54 s [interquartile range (IQR): 31–95 s] to finalize the report in the AI group and 64 s (IQR: 36–108 s) in the sonographer group, with a mean difference of 8 s (95% CI: −12 to −4 s, P < 0.001). Sonographers took a median of 119 s (IQR: 77–173) to assess and annotate LVEF.

  • Between initial and final cardiologist assessment, LVEF of 35%, a clinically meaningful threshold considered for indication to implantable defibrillators, was crossed in 22 of 1740 (1.3%) studies in AI group and in 54 of 1755 (3.1%) in the sonographer group (P = 0.0004)

Comment on ‘Blinded, randomized trial of sonographer vs. AI cardiac function assessment’ published in Nature, doi: 10.1038/s41586-023-05947-3.

Comment

Largely applied in cardiac imaging, artificial intelligence (AI) finds an ideal application in volume and function quantification using echocardiography.3 The current work of He et al.,1 the first in this area of being randomized and blinded, confirms that AI is non-inferior and even superior to sonographer quantification of left ventricular ejection fraction (LVEF). The major strength of this paper resides in the possible application of a well-trained and tested AI algorithm in a context of limited cardiological experience, where widening the use of echocardiography, using AI, might allow the evaluation of the cardiac function even in rural areas.4

Furthermore, such fast and accurate AI algorithm might be precious in order to facilitate research and discovery, standardizing the analysis of millions of echocardiograms present in large databases, as shown by He et al.,1 provided that the issues on the possible risks related to breach in confidentiality will be solved. It is of note that this is the largest test–retest study in the evaluation of clinician variability and reproducibility in LVEF assessment, with potentially relevant implications for clinical practice. In this regard, the trial showed that the use of AI decreased variability between independent clinician assessments.

The study has several potential limitations for a clinical standpoint. It is not based on a prospective reading design, since both sonographers and AI evaluated previously recorded cine-loops and, thus, the image analysis and interpretation might not reflect the best diagnostic conditions. In fact, in the real-world practice, sonographers or cardiologists evaluate the LVEF directly in the cine-loop that they have acquired. Furthermore, only a minority of cardiac images (36%) were quantified by conventional Simpson biplane method. The authors did not discuss the reasons for this approach (were the images inadequate to analyse both views, considering that two-chamber view is always more troublesome to achieve?), which implies that assessment of LVEF could have been less than optimal in two-thirds of samples.4

It is of note that the authors do not provide quantitative data of LVEF, but only percentage of changes between differently obtained LVEF, nor they present data on regional wall motion abnormalities.5 Since the intra- and inter-observer variability of LVEF quantification is much lower when assessing hearts with normal function, AI, sonographer, and cardiologist evaluation might have been different in different LVEF categories or in the presence of regional wall motion abnormalities.5

The large potential of AI in improving interpretation of echocardiographic images is not fully explored in this study. For example, it would have been interesting to process by AI the 247 studies defined as inadequate for tracings by sonographers, to verify whether AI could have been of help in recognizing endocardial borders even in challenging cases. In fact, the results of this study might be extrapolated only to reasonably good quality images and in patients with normal body mass index.

It is of note that, in this study,1 cardiologists are used as gold standard, while maybe AI is even superior to cardiologists in predicting ejection fraction (EF) or other abnormalities that are clinically relevant but not visible to the human eye.

The outcomes of this large analysis further support the potential advantages of AI in echocardiography to improve the quality of the diagnostic assessment, reduce human inter-observer variability and costs, and minimize the disparities in the access to medical care. However, this study has not explored all potential applications of AI in echocardiography, such as view identification, image segmentation, quantification of structure and function, and disease detection. These more advanced tasks are the results of the application of deep learning algorithms.6 In this regard, Zang and colleagues6 applied a deep learning algorithm that was successful in image recognition task with 96% accuracy for distinguishing between broad echocardiographic view classes and an 84% accuracy overall, even in partially obscured views.

Automated, fast, and reproducible AI solutions could also pave the way towards multimodality integration in medical imaging with data simultaneously obtained from echocardiography and other imaging techniques, with the aim of developing a new and more effective diagnostic paradigm.7 Furthermore, this and additional studies in this area, validating models of AI, might promote a process of ‘democratization’ in the access to echocardiographic methods to assess cardiac function.

Declarations

Disclosure of Interest

All authors declare no conflict of interest for this contribution.

References

1

He
B
,
Kwan
AC
,
Cho
JH
,
Yuan
N
,
Pollick
C
,
Shiota
T
, et al.
Blinded, randomized trial of sonographer versus AI cardiac function assessment
.
Nature
2023
;
616
:
520
524
. https://doi.org/10.1038/s41586-023-05947-3

2

Ouyang
D
,
He
B
,
Ghorbani
A
,
Yuan
N
,
Ebinger
J
,
Langlotz
CP
, et al.
Video-based AI for beat-to-beat assessment of cardiac function
.
Nature
2020
;
580
:
252
256
. https://doi.org/10.1038/s41586-020-2145-8

3

Zhou
J
,
Du
M
,
Chang
S
,
Chen
Z
.
Artificial intelligence in echocardiography: detection, functional evaluation, and disease diagnosis
.
Cardiovasc Ultrasound
2021
;
19
:
29
. https://doi.org/10.1186/s12947-021-00261-2

4

Lang
RM
,
Badano
LP
,
Mor-Avi
V
,
Afilalo
J
,
Armstrong
A
,
Ernande
L
, et al.
Recommendations for cardiac chamber quantification by echocardiography in adults: an update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging
.
J Am Soc Echocardiogr
2015
;
28
:
1
39. e14
. https://doi.org/10.1016/j.echo.2014.10.003

5

Cole
GD
,
Dhutia
NM
,
Shun-Shin
MJ
,
Willson
K
,
Harrison
J
,
Raphael
CE
, et al.
Defining the real-world reproducibility of visual grading of left ventricular function and visual estimation of left ventricular ejection fraction: impact of image quality, experience and accreditation
.
Int J Cardiovasc Imaging
2015
;
31
:
1303
1314
. https://doi.org/10.1007/s10554-015-0659-1

6

Zhang
J
,
Gajjala
S
,
Agrawal
P
,
Tison
GH
,
Hallock
LA
,
Beussink-Nelson
L
, et al.
Fully automated echocardiogram interpretation in clinical practice
.
Circulation
2018
;
138
:
1623
1635
. https://doi.org/10.1161/CIRCULATIONAHA.118.034338

7

van Assen
M
,
Razavi
AC
,
Whelton
SP
,
De Cecco
CN
.
Artificial intelligence in cardiac imaging: where we are and what we want
.
Eur Heart J
2023
;
44
:
541
543
. https://doi.org/10.1093/eurheartj/ehac700

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)