From Bench-to-Bedside: How Artificial Intelligence is Changing Thyroid Nodule Diagnostics, a Systematic Review

Sant, Vivek R; Radhachandran, Ashwath; Ivezic, Vedrana; Lee, Denise T; Livhits, Masha J; Wu, James X; Masamed, Rinat; Arnold, Corey W; Yeh, Michael W; Speier, William

doi:10.1210/clinem/dgae277

Abstract

Context

Use of artificial intelligence (AI) to predict clinical outcomes in thyroid nodule diagnostics has grown exponentially over the past decade. The greatest challenge is in understanding the best model to apply to one's own patient population, and how to operationalize such a model in practice.

Evidence Acquisition

A literature search of PubMed and IEEE Xplore was conducted for English-language publications between January 1, 2015 and January 1, 2023, studying diagnostic tests on suspected thyroid nodules that used AI. We excluded articles without prospective or external validation, nonprimary literature, duplicates, focused on nonnodular thyroid conditions, not using AI, and those incidentally using AI in support of an experimental diagnostic outside standard clinical practice. Quality was graded by Oxford level of evidence.

Evidence Synthesis

A total of 61 studies were identified; all performed external validation, 16 studies were prospective, and 33 compared a model to physician prediction of ground truth. Statistical validation was reported in 50 papers. A diagnostic pipeline was abstracted, yielding 5 high-level outcomes: (1) nodule localization, (2) ultrasound (US) risk score, (3) molecular status, (4) malignancy, and (5) long-term prognosis. Seven prospective studies validated a single commercial AI; strengths included automating nodule feature assessment from US and assisting the physician in predicting malignancy risk, while weaknesses included automated margin prediction and interobserver variability.

Conclusion

Models predominantly used US images to predict malignancy. Of 4 Food and Drug Administration–approved products, only S-Detect was extensively validated. Implementing an AI model locally requires data sanitization and revalidation to ensure appropriate clinical performance.

artificial intelligence, machine learning, thyroid nodules, diagnostics

Thyroid nodules are highly prevalent, affecting more than 50% of people older than 50 years (1, 2). Increasingly, their discovery may be incidental, given the increased use of cross-sectional imaging (3). Nodules may span the diagnostic spectrum from benign and asymptomatic, to symptomatic, to cancerous. Diagnostic workup often involves several steps, associated cost, and resultant patient anxiety (4-6).

Ultrasound (US) and fine-needle aspiration (FNA) biopsy have been the gold standard for thyroid nodule assessment, but performance and interpretation are plagued by variability across physicians interpreting US findings, performing the FNA, and interpreting cytology, yielding frequent inadequate or indeterminate results (7, 8). Assessment of a thyroid nodule represents an area where technology may offer opportunities for improvement (9). With the development of new techniques and greater accessibility of computational power, artificial intelligence (AI) has been applied by thyroid researchers over the past decade to automate nodule localization and improve risk stratification, to decrease operator variability and the need for potentially unnecessary biopsies or diagnostic surgery (10).

Numerous AI models have been proposed in the literature, but the greatest challenge is in understanding a model's applicability to one's own patient population, and how to operationalize such a model in practice. Efforts have been made to identify available models. Toro-Tobon et al (11) systematically identify manuscripts applying AI to various thyroid conditions, reporting on model performance and stage of development. Similarly, Taha et al (12) review a selection of 18 AI models in thyroid diagnostics. Ludwig et al (13) report on a selection of 33 models using AI to diagnose and classify thyroid nodules. However, these reviews do not systematically assess relative model strength and external validity, making assessment of future or external performance difficult. Our aim is to perform a systematic literature review to identify and compare the various AI models developed for thyroid nodule diagnostics, their relative strength and external validity, and the clinical inputs used for each outcome predicted.

Materials and Methods

Systematic Literature Search

An inclusive query was developed and then refined to ensure that known relevant articles appeared in the results. The PubMed and IEEE Xplore databases were queried with the following terms: (artificial intelligence OR machine learning OR deep learning OR neural network) AND (thyroid). The search was restricted to English-language papers published on or after January 1, 2015, through the search date, January 1, 2023. Titles and abstracts were screened for relevance (985 studies), followed by full-text review (448 studies) to determine inclusion. During full-text review, relevant articles identified in citations were reviewed for inclusion. The PRISMA flow diagram (14) denotes the number of studies excluded at each step (Fig. 1).

Figure 1.

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram. Meta-analysis was not performed as part of this review.

Open in new tab Download slide

Study Selection

We included original research studying diagnostic tests on suspected thyroid nodules that used AI. AI techniques included non–deep learning (eg, regression, decision tree, random forest) and deep-learning (eg, neural network) techniques. We excluded articles without prospective or external validation, nonprimary literature, duplicates, focused on nonnodular thyroid conditions, not using AI, and those incidentally using AI in support of an experimental diagnostic outside standard clinical practice (eg, spectrometry, spectroscopy).

Data Abstraction and Analysis

Each paper was independently reviewed by 2 of 3 authors (V.R.S., A.R., and V.I.), first for inclusion, then for data extraction, and discrepancies were addressed by discussion. Details including sample size, study design, statistical methods, AI methodology, clinical inputs, and outcome predicted were tabulated and level of evidence was rated according to the Oxford Centre for Evidence-Based Medicine (15).

Results

In total, 61 studies were included (Table 1). Oxford level of evidence ranged from 2 to 3 (77). In total, 61 studies met the inclusion criteria of including external validation. Of these, 33 also compared the model's performance to physician assessment (physician comparison). Of the 61 studies, 16 were prospective in study design. The intersection of study designs is shown in Fig. 2. Statistical validation of model stability was reported in 50 papers (77).

Figure 2.

Intersection of study designs. This Venn diagram quantifies the different study designs captured in this review.

Open in new tab Download slide

Table 1.

Open in new tab

Included articles and outcomes assessed

Source by outcome	Outcome ground truth
Nodule localization
Nie et al, 2022 (16)	Nodule contour
Ultrasound risk score
Chi et al, 2017 (17)	US risk score
Duan et al, 2020 (18)	US risk score
Zhang et al, 2022 (19)	US risk score
Gao et al, 2018 (20)	US risk score
Bai et al, 2020 (21)	US risk score
Molecular status
Anand et al, 2021 (22)	Molecular test
Malignancy
Pankratz et al, 2016 (23)	Surgical pathology
Wang et al, 2018 (24)	Surgical pathology
Song et al, 2020 (25)	Cytology
Zhou et al, 2020 (26)	Cytology
Zhou et al, 2020 (27)	Cytology
Böhland et al, 2021 (28)	Surgical pathology
Lee et al, 2021 (29)	Cytology
Park et al, 2021 (30)	Surgical pathology
Zhang et al, 2021 (31)	Cytology
Jia et al, 2022 (32)	Surgical pathology
Jin et al, 2022 (33)	Cytology, surgical pathology
Keutgen et al, 2022 (34)	Surgical pathology
Liu et al, 2022 (35)	Cytology
Randolph et al, 2022 (36)	Surgical pathology
Li et al, 2019 (37)	Cytology, surgical pathology
Park et al, 2019 (38)	Cytology, surgical pathology
Song et al, 2019 (39)	Cytology, surgical pathology
Koh et al, 2020 (40)	Cytology, surgical pathology
Wei et al, 2020 (41)	Surgical pathology
Zhang et al, 2020 (42)	Surgical pathology
Zhu et al, 2021 (43)	Cytology, surgical pathology
Zhu et al, 2021 (44)	Cytology
Zhu et al, 2021 (45)	Cytology, surgical pathology
Han et al, 2022 (46)	Cytology, surgical pathology
Yang et al, 2022 (47)	Cytology, surgical pathology
Barczyński et al, 2020 (48)	Surgical pathology
Liang et al, 2020 (49)	Cytology, surgical pathology
Zhang et al, 2020 (50)	Surgical pathology
Peng et al, 2021 (51)	Cytology, surgical pathology
Zhu et al, 2021 (52)	Surgical pathology
Kim et al, 2022 (53)	Cytology
Wang et al, 2022 (54)	Surgical pathology
Wang et al, 2022 (55)	Surgical pathology
Xu et al, 2022 (56)	Surgical pathology
Long-term prognosis
Abbasian et al, 2022 (57)	Metastases
Yu et al, 2022 (58)	Metastases
Zou et al, 2022 (59)	Metastases
Multi-outcome
Bhalla et al, 2020 (60)	Metastases, surgical pathology
Yang et al, 2020 (61)	Nodule contour, surgical pathology
Chen et al, 2021 (62)	Surgical pathology, survival/recurrence
Dolezal et al, 2021 (63)	Molecular test, surgical pathology
Swan et al, 2022 (64)	Surgical pathology, US risk score
Wu et al, 2022 (65)	Metastases, surgical pathology
Kim et al, 2019 (66)	Cytology, nodule presence, surgical pathology, US risk score
Han et al, 2021 (67)	Cytology, surgical pathology, US risk score
Liang et al, 2021 (68)	Cytology, nodule contour, surgical pathology, US risk score
Stenman et al, 2022 (69)	Surgical pathology, survival/recurrence
Choi et al, 2017 (70)	Cytology, nodule contour, surgical pathology, US risk score
Yoo et al, 2018 (71)	Cytology, US risk score
Jeong et al, 2019 (72)	Cytology, surgical pathology, US risk score
Xia et al, 2019 (73)	Cytology, surgical pathology, US risk score
Wei et al, 2020 (74)	Cytology, surgical pathology, US risk score
Cui et al, 2022 (75)	Surgical pathology, US risk score
Huang et al, 2022 (76)	Cytology, surgical pathology, US risk score

Source by outcome	Outcome ground truth
Nodule localization
Nie et al, 2022 (16)	Nodule contour
Ultrasound risk score
Chi et al, 2017 (17)	US risk score
Duan et al, 2020 (18)	US risk score
Zhang et al, 2022 (19)	US risk score
Gao et al, 2018 (20)	US risk score
Bai et al, 2020 (21)	US risk score
Molecular status
Anand et al, 2021 (22)	Molecular test
Malignancy
Pankratz et al, 2016 (23)	Surgical pathology
Wang et al, 2018 (24)	Surgical pathology
Song et al, 2020 (25)	Cytology
Zhou et al, 2020 (26)	Cytology
Zhou et al, 2020 (27)	Cytology
Böhland et al, 2021 (28)	Surgical pathology
Lee et al, 2021 (29)	Cytology
Park et al, 2021 (30)	Surgical pathology
Zhang et al, 2021 (31)	Cytology
Jia et al, 2022 (32)	Surgical pathology
Jin et al, 2022 (33)	Cytology, surgical pathology
Keutgen et al, 2022 (34)	Surgical pathology
Liu et al, 2022 (35)	Cytology
Randolph et al, 2022 (36)	Surgical pathology
Li et al, 2019 (37)	Cytology, surgical pathology
Park et al, 2019 (38)	Cytology, surgical pathology
Song et al, 2019 (39)	Cytology, surgical pathology
Koh et al, 2020 (40)	Cytology, surgical pathology
Wei et al, 2020 (41)	Surgical pathology
Zhang et al, 2020 (42)	Surgical pathology
Zhu et al, 2021 (43)	Cytology, surgical pathology
Zhu et al, 2021 (44)	Cytology
Zhu et al, 2021 (45)	Cytology, surgical pathology
Han et al, 2022 (46)	Cytology, surgical pathology
Yang et al, 2022 (47)	Cytology, surgical pathology
Barczyński et al, 2020 (48)	Surgical pathology
Liang et al, 2020 (49)	Cytology, surgical pathology
Zhang et al, 2020 (50)	Surgical pathology
Peng et al, 2021 (51)	Cytology, surgical pathology
Zhu et al, 2021 (52)	Surgical pathology
Kim et al, 2022 (53)	Cytology
Wang et al, 2022 (54)	Surgical pathology
Wang et al, 2022 (55)	Surgical pathology
Xu et al, 2022 (56)	Surgical pathology
Long-term prognosis
Abbasian et al, 2022 (57)	Metastases
Yu et al, 2022 (58)	Metastases
Zou et al, 2022 (59)	Metastases
Multi-outcome
Bhalla et al, 2020 (60)	Metastases, surgical pathology
Yang et al, 2020 (61)	Nodule contour, surgical pathology
Chen et al, 2021 (62)	Surgical pathology, survival/recurrence
Dolezal et al, 2021 (63)	Molecular test, surgical pathology
Swan et al, 2022 (64)	Surgical pathology, US risk score
Wu et al, 2022 (65)	Metastases, surgical pathology
Kim et al, 2019 (66)	Cytology, nodule presence, surgical pathology, US risk score
Han et al, 2021 (67)	Cytology, surgical pathology, US risk score
Liang et al, 2021 (68)	Cytology, nodule contour, surgical pathology, US risk score
Stenman et al, 2022 (69)	Surgical pathology, survival/recurrence
Choi et al, 2017 (70)	Cytology, nodule contour, surgical pathology, US risk score
Yoo et al, 2018 (71)	Cytology, US risk score
Jeong et al, 2019 (72)	Cytology, surgical pathology, US risk score
Xia et al, 2019 (73)	Cytology, surgical pathology, US risk score
Wei et al, 2020 (74)	Cytology, surgical pathology, US risk score
Cui et al, 2022 (75)	Surgical pathology, US risk score
Huang et al, 2022 (76)	Cytology, surgical pathology, US risk score

Abbreviation: US, ultrasound.

Table 1.

Open in new tab

Included articles and outcomes assessed

Source by outcome	Outcome ground truth
Nodule localization
Nie et al, 2022 (16)	Nodule contour
Ultrasound risk score
Chi et al, 2017 (17)	US risk score
Duan et al, 2020 (18)	US risk score
Zhang et al, 2022 (19)	US risk score
Gao et al, 2018 (20)	US risk score
Bai et al, 2020 (21)	US risk score
Molecular status
Anand et al, 2021 (22)	Molecular test
Malignancy
Pankratz et al, 2016 (23)	Surgical pathology
Wang et al, 2018 (24)	Surgical pathology
Song et al, 2020 (25)	Cytology
Zhou et al, 2020 (26)	Cytology
Zhou et al, 2020 (27)	Cytology
Böhland et al, 2021 (28)	Surgical pathology
Lee et al, 2021 (29)	Cytology
Park et al, 2021 (30)	Surgical pathology
Zhang et al, 2021 (31)	Cytology
Jia et al, 2022 (32)	Surgical pathology
Jin et al, 2022 (33)	Cytology, surgical pathology
Keutgen et al, 2022 (34)	Surgical pathology
Liu et al, 2022 (35)	Cytology
Randolph et al, 2022 (36)	Surgical pathology
Li et al, 2019 (37)	Cytology, surgical pathology
Park et al, 2019 (38)	Cytology, surgical pathology
Song et al, 2019 (39)	Cytology, surgical pathology
Koh et al, 2020 (40)	Cytology, surgical pathology
Wei et al, 2020 (41)	Surgical pathology
Zhang et al, 2020 (42)	Surgical pathology
Zhu et al, 2021 (43)	Cytology, surgical pathology
Zhu et al, 2021 (44)	Cytology
Zhu et al, 2021 (45)	Cytology, surgical pathology
Han et al, 2022 (46)	Cytology, surgical pathology
Yang et al, 2022 (47)	Cytology, surgical pathology
Barczyński et al, 2020 (48)	Surgical pathology
Liang et al, 2020 (49)	Cytology, surgical pathology
Zhang et al, 2020 (50)	Surgical pathology
Peng et al, 2021 (51)	Cytology, surgical pathology
Zhu et al, 2021 (52)	Surgical pathology
Kim et al, 2022 (53)	Cytology
Wang et al, 2022 (54)	Surgical pathology
Wang et al, 2022 (55)	Surgical pathology
Xu et al, 2022 (56)	Surgical pathology
Long-term prognosis
Abbasian et al, 2022 (57)	Metastases
Yu et al, 2022 (58)	Metastases
Zou et al, 2022 (59)	Metastases
Multi-outcome
Bhalla et al, 2020 (60)	Metastases, surgical pathology
Yang et al, 2020 (61)	Nodule contour, surgical pathology
Chen et al, 2021 (62)	Surgical pathology, survival/recurrence
Dolezal et al, 2021 (63)	Molecular test, surgical pathology
Swan et al, 2022 (64)	Surgical pathology, US risk score
Wu et al, 2022 (65)	Metastases, surgical pathology
Kim et al, 2019 (66)	Cytology, nodule presence, surgical pathology, US risk score
Han et al, 2021 (67)	Cytology, surgical pathology, US risk score
Liang et al, 2021 (68)	Cytology, nodule contour, surgical pathology, US risk score
Stenman et al, 2022 (69)	Surgical pathology, survival/recurrence
Choi et al, 2017 (70)	Cytology, nodule contour, surgical pathology, US risk score
Yoo et al, 2018 (71)	Cytology, US risk score
Jeong et al, 2019 (72)	Cytology, surgical pathology, US risk score
Xia et al, 2019 (73)	Cytology, surgical pathology, US risk score
Wei et al, 2020 (74)	Cytology, surgical pathology, US risk score
Cui et al, 2022 (75)	Surgical pathology, US risk score
Huang et al, 2022 (76)	Cytology, surgical pathology, US risk score

Source by outcome	Outcome ground truth
Nodule localization
Nie et al, 2022 (16)	Nodule contour
Ultrasound risk score
Chi et al, 2017 (17)	US risk score
Duan et al, 2020 (18)	US risk score
Zhang et al, 2022 (19)	US risk score
Gao et al, 2018 (20)	US risk score
Bai et al, 2020 (21)	US risk score
Molecular status
Anand et al, 2021 (22)	Molecular test
Malignancy
Pankratz et al, 2016 (23)	Surgical pathology
Wang et al, 2018 (24)	Surgical pathology
Song et al, 2020 (25)	Cytology
Zhou et al, 2020 (26)	Cytology
Zhou et al, 2020 (27)	Cytology
Böhland et al, 2021 (28)	Surgical pathology
Lee et al, 2021 (29)	Cytology
Park et al, 2021 (30)	Surgical pathology
Zhang et al, 2021 (31)	Cytology
Jia et al, 2022 (32)	Surgical pathology
Jin et al, 2022 (33)	Cytology, surgical pathology
Keutgen et al, 2022 (34)	Surgical pathology
Liu et al, 2022 (35)	Cytology
Randolph et al, 2022 (36)	Surgical pathology
Li et al, 2019 (37)	Cytology, surgical pathology
Park et al, 2019 (38)	Cytology, surgical pathology
Song et al, 2019 (39)	Cytology, surgical pathology
Koh et al, 2020 (40)	Cytology, surgical pathology
Wei et al, 2020 (41)	Surgical pathology
Zhang et al, 2020 (42)	Surgical pathology
Zhu et al, 2021 (43)	Cytology, surgical pathology
Zhu et al, 2021 (44)	Cytology
Zhu et al, 2021 (45)	Cytology, surgical pathology
Han et al, 2022 (46)	Cytology, surgical pathology
Yang et al, 2022 (47)	Cytology, surgical pathology
Barczyński et al, 2020 (48)	Surgical pathology
Liang et al, 2020 (49)	Cytology, surgical pathology
Zhang et al, 2020 (50)	Surgical pathology
Peng et al, 2021 (51)	Cytology, surgical pathology
Zhu et al, 2021 (52)	Surgical pathology
Kim et al, 2022 (53)	Cytology
Wang et al, 2022 (54)	Surgical pathology
Wang et al, 2022 (55)	Surgical pathology
Xu et al, 2022 (56)	Surgical pathology
Long-term prognosis
Abbasian et al, 2022 (57)	Metastases
Yu et al, 2022 (58)	Metastases
Zou et al, 2022 (59)	Metastases
Multi-outcome
Bhalla et al, 2020 (60)	Metastases, surgical pathology
Yang et al, 2020 (61)	Nodule contour, surgical pathology
Chen et al, 2021 (62)	Surgical pathology, survival/recurrence
Dolezal et al, 2021 (63)	Molecular test, surgical pathology
Swan et al, 2022 (64)	Surgical pathology, US risk score
Wu et al, 2022 (65)	Metastases, surgical pathology
Kim et al, 2019 (66)	Cytology, nodule presence, surgical pathology, US risk score
Han et al, 2021 (67)	Cytology, surgical pathology, US risk score
Liang et al, 2021 (68)	Cytology, nodule contour, surgical pathology, US risk score
Stenman et al, 2022 (69)	Surgical pathology, survival/recurrence
Choi et al, 2017 (70)	Cytology, nodule contour, surgical pathology, US risk score
Yoo et al, 2018 (71)	Cytology, US risk score
Jeong et al, 2019 (72)	Cytology, surgical pathology, US risk score
Xia et al, 2019 (73)	Cytology, surgical pathology, US risk score
Wei et al, 2020 (74)	Cytology, surgical pathology, US risk score
Cui et al, 2022 (75)	Surgical pathology, US risk score
Huang et al, 2022 (76)	Cytology, surgical pathology, US risk score

Abbreviation: US, ultrasound.

Starting with known clinical steps, a diagnostic pipeline was abstracted from the papers, representing the clinical data collected and used in AI models, and the resultant AI-predicted outcomes (Fig. 3). The distribution of data inputs used to predict clinical outcomes is shown in Fig. 4. Additional details on the 4 Food and Drug Administration (FDA)-approved thyroid AI diagnostic solutions are reported in Table 2. The following 5 high-level abstracted outcomes were used to organize reporting of the remaining results.

Figure 3.

Clinical workflow with resultant data inputs and predicted outcomes. A diagnostic pipeline was abstracted from the papers starting with known clinical steps and culminating in 5 high-level predicted outcomes. If a certain input modality was included in a paper for one of the predicted outcomes, the corresponding data collection box had an arrow pointing out that outcome.

Open in new tab Download slide

Figure 4.

Distribution of data inputs used to predict clinical outcomes. Each paper was categorized by data input and predicted clinical outcome. If a paper was multi-input or multi-outcome, it was counted for each input-outcome combination.

Open in new tab Download slide

Table 2.

Open in new tab

Assessment of commercially available computer-aided diagnoses in the United States

Commercially available CAD	Input	Output	Nodule localization	Downstream clinical implications	Limitations	Clinical workflow integration
AmCAD-UT	Static ultrasound image	− Nodule features (echogenicity, echogenic foci, texture, margin, tumor shape, anechoic area percentage) − Ultrasound risk score (TI-RADS, ATA, other international systems) − Structured summary report	Automated	Provides a malignancy risk for several risk stratification systems, along with quantitative and visual data on key nodule features, allowing for faster and more efficient patient triaging	− Lack of extensive external validation in clinical setting − Expects ultrasounds with discrete nodules > 1 cm (78)	Windows-based platform intended for use on a PC or at a workstation (78, 79)
S-Detect	Static ultrasound image	− Nodule features (composition, shape, orientation, margins, echogenicity, and spongiform appearance) − Binary recommendation of possibly malignant/benign − Structured summary report	Semi-automated (user marks middle of nodule)		− Primarily improves diagnostic performance for junior radiologists (74) − Does not incorporate nonimaging patient data − Performance is dependent on choice of input image, which is dependent on operator experience − Potential discrepancy in nodule margin and composition assessment − Works only with vendor-specific ultrasound machines	Developed for a vendor-specific ultrasound machine (79)
Koios DS	2 static ultrasound images (axial and sagittal) (80) Location on thyroid	− Nodule features (echogenicity, echogenic foci, tumor shape, composition, margin) − Thyroid nodule dimensions − Ultrasound risk score (TI-RADS)	Manual		− Lack of extensive external validation of this software in real-world clinical setting − Requires manual input in the form of ROI to localize nodule (81, 82)	Used at PACS workstation (79)
Exo AI (formerly MEDO-Thyroid)	Static ultrasound image(s)	− Thyroid lobe and nodule dimensions − Ultrasound risk score (TI-RADS) (based on user input) − Structured summary report	Semi-automated	Improves ease of reporting an ultrasound study for clinicians	− Lack of extensive external validation of this software in real-world clinical setting − Requires manual input in the form of ROI to localize nodule − Tested only on Philips, GE, and Siemens ultrasound devices (83)	Cloud-based software

Commercially available CAD	Input	Output	Nodule localization	Downstream clinical implications	Limitations	Clinical workflow integration
AmCAD-UT	Static ultrasound image	− Nodule features (echogenicity, echogenic foci, texture, margin, tumor shape, anechoic area percentage) − Ultrasound risk score (TI-RADS, ATA, other international systems) − Structured summary report	Automated	Provides a malignancy risk for several risk stratification systems, along with quantitative and visual data on key nodule features, allowing for faster and more efficient patient triaging	− Lack of extensive external validation in clinical setting − Expects ultrasounds with discrete nodules > 1 cm (78)	Windows-based platform intended for use on a PC or at a workstation (78, 79)
S-Detect	Static ultrasound image	− Nodule features (composition, shape, orientation, margins, echogenicity, and spongiform appearance) − Binary recommendation of possibly malignant/benign − Structured summary report	Semi-automated (user marks middle of nodule)		− Primarily improves diagnostic performance for junior radiologists (74) − Does not incorporate nonimaging patient data − Performance is dependent on choice of input image, which is dependent on operator experience − Potential discrepancy in nodule margin and composition assessment − Works only with vendor-specific ultrasound machines	Developed for a vendor-specific ultrasound machine (79)
Koios DS	2 static ultrasound images (axial and sagittal) (80) Location on thyroid	− Nodule features (echogenicity, echogenic foci, tumor shape, composition, margin) − Thyroid nodule dimensions − Ultrasound risk score (TI-RADS)	Manual		− Lack of extensive external validation of this software in real-world clinical setting − Requires manual input in the form of ROI to localize nodule (81, 82)	Used at PACS workstation (79)
Exo AI (formerly MEDO-Thyroid)	Static ultrasound image(s)	− Thyroid lobe and nodule dimensions − Ultrasound risk score (TI-RADS) (based on user input) − Structured summary report	Semi-automated	Improves ease of reporting an ultrasound study for clinicians	− Lack of extensive external validation of this software in real-world clinical setting − Requires manual input in the form of ROI to localize nodule − Tested only on Philips, GE, and Siemens ultrasound devices (83)	Cloud-based software

Abbreviations: ATA, American Thyroid Association; CAD, computer-aided diagnosis; PC, personal computer; ROI, region of interest; TI-RADS, Thyroid Imaging Reporting & Data System.

Table 2.

Open in new tab

Assessment of commercially available computer-aided diagnoses in the United States

Commercially available CAD	Input	Output	Nodule localization	Downstream clinical implications	Limitations	Clinical workflow integration
AmCAD-UT	Static ultrasound image	− Nodule features (echogenicity, echogenic foci, texture, margin, tumor shape, anechoic area percentage) − Ultrasound risk score (TI-RADS, ATA, other international systems) − Structured summary report	Automated	Provides a malignancy risk for several risk stratification systems, along with quantitative and visual data on key nodule features, allowing for faster and more efficient patient triaging	− Lack of extensive external validation in clinical setting − Expects ultrasounds with discrete nodules > 1 cm (78)	Windows-based platform intended for use on a PC or at a workstation (78, 79)
S-Detect	Static ultrasound image	− Nodule features (composition, shape, orientation, margins, echogenicity, and spongiform appearance) − Binary recommendation of possibly malignant/benign − Structured summary report	Semi-automated (user marks middle of nodule)		− Primarily improves diagnostic performance for junior radiologists (74) − Does not incorporate nonimaging patient data − Performance is dependent on choice of input image, which is dependent on operator experience − Potential discrepancy in nodule margin and composition assessment − Works only with vendor-specific ultrasound machines	Developed for a vendor-specific ultrasound machine (79)
Koios DS	2 static ultrasound images (axial and sagittal) (80) Location on thyroid	− Nodule features (echogenicity, echogenic foci, tumor shape, composition, margin) − Thyroid nodule dimensions − Ultrasound risk score (TI-RADS)	Manual		− Lack of extensive external validation of this software in real-world clinical setting − Requires manual input in the form of ROI to localize nodule (81, 82)	Used at PACS workstation (79)
Exo AI (formerly MEDO-Thyroid)	Static ultrasound image(s)	− Thyroid lobe and nodule dimensions − Ultrasound risk score (TI-RADS) (based on user input) − Structured summary report	Semi-automated	Improves ease of reporting an ultrasound study for clinicians	− Lack of extensive external validation of this software in real-world clinical setting − Requires manual input in the form of ROI to localize nodule − Tested only on Philips, GE, and Siemens ultrasound devices (83)	Cloud-based software

Commercially available CAD	Input	Output	Nodule localization	Downstream clinical implications	Limitations	Clinical workflow integration
AmCAD-UT	Static ultrasound image	− Nodule features (echogenicity, echogenic foci, texture, margin, tumor shape, anechoic area percentage) − Ultrasound risk score (TI-RADS, ATA, other international systems) − Structured summary report	Automated	Provides a malignancy risk for several risk stratification systems, along with quantitative and visual data on key nodule features, allowing for faster and more efficient patient triaging	− Lack of extensive external validation in clinical setting − Expects ultrasounds with discrete nodules > 1 cm (78)	Windows-based platform intended for use on a PC or at a workstation (78, 79)
S-Detect	Static ultrasound image	− Nodule features (composition, shape, orientation, margins, echogenicity, and spongiform appearance) − Binary recommendation of possibly malignant/benign − Structured summary report	Semi-automated (user marks middle of nodule)		− Primarily improves diagnostic performance for junior radiologists (74) − Does not incorporate nonimaging patient data − Performance is dependent on choice of input image, which is dependent on operator experience − Potential discrepancy in nodule margin and composition assessment − Works only with vendor-specific ultrasound machines	Developed for a vendor-specific ultrasound machine (79)
Koios DS	2 static ultrasound images (axial and sagittal) (80) Location on thyroid	− Nodule features (echogenicity, echogenic foci, tumor shape, composition, margin) − Thyroid nodule dimensions − Ultrasound risk score (TI-RADS)	Manual		− Lack of extensive external validation of this software in real-world clinical setting − Requires manual input in the form of ROI to localize nodule (81, 82)	Used at PACS workstation (79)
Exo AI (formerly MEDO-Thyroid)	Static ultrasound image(s)	− Thyroid lobe and nodule dimensions − Ultrasound risk score (TI-RADS) (based on user input) − Structured summary report	Semi-automated	Improves ease of reporting an ultrasound study for clinicians	− Lack of extensive external validation of this software in real-world clinical setting − Requires manual input in the form of ROI to localize nodule − Tested only on Philips, GE, and Siemens ultrasound devices (83)	Cloud-based software

Abbreviations: ATA, American Thyroid Association; CAD, computer-aided diagnosis; PC, personal computer; ROI, region of interest; TI-RADS, Thyroid Imaging Reporting & Data System.

Nodule Localization

Localization of nodules in US images was assessed by 5 papers (16-68). Localization was performed through the tasks of detection and segmentation. Detection involves identifying an image region that contains the object of interest. To create training labels for the US studies, radiologists annotated nodules on US images with rectangular bounding boxes to serve as ground truth for model training (70). Segmentation involves identifying the specific contour of the object of interest. To create training labels, radiologists manually contoured nodules on US images to serve as ground truth for model training (16, 61).

The goals of such models were 2-fold: (1) to improve speed and standardization in determining whether an image had thyroid nodules present, and (2) to focus feature extraction on suspected nodule regions for use in downstream tasks. For instance, Yang et al (61) found that a model trained to predict malignancy from US images performed better when trained with nodule-focused features.

Prospective localization studies used Samsung's commercial computer-aided diagnosis (CAD) software, S-Detect, which first requests users to perform detection (70, 74-76). S-Detect then uses AI to propose segmentation masks for the user to choose from, before generating an AI-based US risk score and malignancy prediction. Despite the effort to improve standardization through semi-automated segmentation, Jeong et al (72) found that when the CAD was used by experienced radiologists, it performed better than when used by less experienced radiologists. They concluded that persistent performance variability was likely due to user variability, even in choosing among segmentation candidates.

Ultrasound Risk Score

Sixteen papers predicted sonographic features associated with risk of malignancy or a composite score such as the American College of Radiology Thyroid Imaging Reporting and Data System (TI-RADS) (66, 68, 70, 64-73, 74). Sonographic risk stratification systems aim to decrease variability in reporting nodule features and estimating malignancy risk, but interreporter and intrareporter variability persist (70). These papers used AI to decrease variability further, reduce the burden on physicians manually assessing a score, and improve US risk score performance.

Six papers prospectively validated the commercial product S-Detect, which predicts sonographic features as well as malignancy (70, 71-73, 74). Wei et al (74) found that less experienced radiologists benefited most from S-Detect assistance, bringing their TI-RADS assessments in line with those of experienced radiologists, which were considered as ground truth. Choi et al (70) and Xia et al (73) found substantial agreement between S-Detect and experienced radiologists for all feature predictions apart from margin. Jeong et al (72) and Yoo et al (71) reported similar results with discrepancy in composition and spongiform features, respectively.

Cui et al (75) prospectively evaluated AI-TIRADS, a machine-learning model that reassigned new values to constituent TI-RADS features in an effort to preserve model explainability. They compared its performance to the American College of Radiology's original (2017) version of TI-RADS, and found a lower rate of unnecessary FNA (41.0% vs 47.8%) and missed cancer diagnosis (22.8% vs 27.5%).

Molecular Status

Testing tumors for genetic alterations and assessing differential RNA expression has proven helpful in predicting malignancy, tumor aggressiveness, and response to targeted therapy, but cost prohibits their routine use (60, 62). Two papers used AI to predict molecular status opportunistically from routinely obtained clinical inputs such as US, to provide an alternative: Anand et al (22) and Dolezal et al (63) analyzed cytology and surgical pathology slides to predict high-risk mutations. Dolezal et al (63) took this one step further, externally validating their model to predict BRAF-RAS gene status (area under the receiver operating characteristic curve [AUROC] 0.97) and noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP), a liminal diagnosis with high interobserver variability.

Malignancy

The majority of papers (26) predicted malignancy, aiming to improve speed and reliability of diagnosis and to avoid unnecessary biopsies for patients (63-65, 60-67, 70-76). Malignancy predictions used either of two sources as ground truth: cytology (FNA) and surgical pathology (resection). Several papers used both, although none to our knowledge included nodules that were not biopsied and presumed benign sonographically. Most studies excluded cytologically indeterminate (Bethesda III or IV) nodules from their analysis, with only 5 papers including them for evaluation (34, 35, 39, 47, 64).

All 16 prospective studies assessed malignancy, including 7 that evaluated the commercial product S-Detect. Barczyński et al (48) and Wei et al (74) found S-Detect had greater sensitivity and specificity in predicting malignancy from US compared to a junior physician, but less compared to a senior physician. Performance of junior physicians improved significantly when assisted by AI, but senior physician performance did not. Choi et al (70) found S-Detect to have similar sensitivity but lower specificity and AUROC compared to an experienced physician. Xia et al (73) found similar results, and in subgroup analysis noted a shortfall in model performance specifically with malignant nodules (papillary thyroid cancer or follicular thyroid cancer). Jeong et al (72) noted lower specificity compared to senior physicians only, as well as operator dependence of CAD performance as noted earlier. Yoo et al (71) found that a CAD-assisted physician had higher sensitivity (92.0%) than either physician (84.0%) or CAD (80.0%) alone, and Huang et al (76) found a similar result across all physician experience levels. Although there was no universal definition of experience level, “senior physicians” were typically those with more than 9 to 20 years of experience with thyroid US.

AI-SONIC is a commercial CAD that completely automates nodule detection, but is not currently approved for use in the United States. Zhang et al (50) prospectively studied AI-SONIC, finding that compared to a senior physician, the CAD had similar specificity (86.0% for both) but lower sensitivity (71.5% vs 95.2%) and AUROC (0.79 vs 0.91). However, they noted that sensitivity improved both for senior and junior physicians when assisted by CAD (95.2% vs 97.8% and 75.3% vs 88.2%, respectively). Wang et al (55) found that AI-SONIC had greater AUROC (0.906) than senior radiologists (0.787) in classifying malignancy. They demonstrated that the AI system could achieve similar or higher sensitivity and specificity compared to each radiologist at an appropriate decision threshold. Xu et al (56) found a similar pattern, with an AI-SONIC AUROC of 0.76 outperforming senior radiologists. However, at the specific decision threshold the authors chose, AI-SONIC had much higher specificity (0.71) than senior radiologists (0.56) but lower sensitivity (AI-SONIC 0.69 vs radiologist 0.78).

Of the prospective studies, Kim et al (53) and Yoo et al (71) used cytology alone as malignancy ground truth, seven papers (48, 50, 52, 54-56, 75) used surgical pathology alone, and the rest used a more balanced mix of both, reflecting the respective potential biases as described above. One paper reported receiving funding from Samsung to study S-Detect (70) but the remaining prospective studies reported no industry conflicts of interest.

Long-term Prognosis

Long-term prognostic cancer outcomes were characterized by 7 papers, including metastases (60, 59-65) and recurrence/survival (62, 69). Thyroid cancer prognosis currently involves dynamic risk stratification, whereby initial risk is assessed from operative and histopathologic findings and modified over a subsequent time interval using clinical evidence of tumor response. AI models tried to predict tumor biology from the outset, with the aim of allowing the physician to better counsel the patient before surgery regarding anticipated cancer aggressiveness, lymph node metastases, and need for adjuvant treatment. Most models used as inputs imaging or genetic data alone, while Zou et al (59) additionally used clinical variables.

Bhalla et al (60) identified a 36-gene panel that could predict American Joint Committee on Cancer staging as I/II vs III/IV with a positive predictive value of 84% and sensitivity of 76%. Chen et al (62) trained and externally validated a model using DNA methylation data and were able to better predict overall survival than could be done through tumor, node, metastases staging.

Discussion

The past decade has seen a major push to deliver on the promise of AI to improve physician workflow, patient outcomes, and health-care delivery. Physicians may benefit from faster, standardized, and less burdensome processes to predict nodule localization, sonographic features, and malignancy risk, and can more appropriately counsel patients regarding extent of surgery and de-escalation of care. Patients may avoid unnecessary biopsies and benefit from improved diagnostic accuracy, decreased need for diagnostic surgery, and clearer expectations for long-term prognosis. Health care may be delivered at lower cost as information later in the diagnostic pipeline is predicted opportunistically from routinely obtained data, and task automation allows health-care providers to practice at the top of their scope.

The Artificial Intelligence–Empowered Physician

For physicians who wish to use a commercial AI solution, 4 CADs have been approved by the FDA: AmCAD-UT, S-Detect, Koios DS, and MEDO-Thyroid, as reviewed in further detail in Table 2. Although none are currently in widespread clinical use, S-Detect has undergone the most extensive validation.

Sonographic risk stratification and malignancy prediction, outcomes with significant clinical effect and the most readily available data, received the most attention in analyzed papers. Specifically with commercial CADs, S-Detect had difficulty with automated margin and composition prediction. AI-SONIC validation studies demonstrated higher AUROC compared to radiologists of all skill levels. Importantly, whether sensitivity, specificity, or both, improved depended on the specific decision threshold chosen for AI-SONIC. Wang et al (55) and Xu et al (56) both selected thresholds by optimizing a combination of sensitivity and specificity. Adjusting the decision threshold may alter the balance between type I and type II errors, improving sensitivity or specificity at the expense of the other. Depending on the intended use in the diagnostic pipeline (eg, “rule-out test”), it may be more clinically useful to optimize a certain metric (eg, maximize sensitivity). Further work in identifying optimal decision thresholds for different uses will be valuable.

Rather than trying to replace a physician, the greatest current value appeared to be in democratizing care via the concept of the “AI-empowered physician.” Studies found that the AI-empowered physician performed better than physician or AI alone. Several studies found that inexperienced physicians with AI assistance made malignancy predictions with similar accuracy as experienced physicians. This use of AI has tremendous potential to enable patients at nonspecialized centers to receive similar diagnostic care as patients presenting to tertiary-care centers. The biggest barrier to widespread adoption is the need for pilot testing in each local environment to identify how AI can provide the greatest benefit (eg, time, staffing, or quality improvement).

How a paper chose their data source for malignancy prediction affected potential bias. Some papers used cytology alone, others used surgical pathology alone, and some used both. Using surgical pathology alone limited data to patients who required surgery, typically those with larger nodules or higher suspicion of malignancy. Papers that used cytology alone had more benign cases, but could not account for false-negative biopsies, and typically excluded indeterminate results. Thus, caution should be used when extrapolating the findings of such papers beyond the population evaluated.

Although the literature is inundated with retrospective studies reporting successful nodule localization techniques, few validated these approaches prospectively or externally. In fact, most commercial CADs do not entirely automate nodule localization, reflecting that real-world segmentation is not easy to automate. Additionally, nodule localization is an important technical intermediary step but not a critical clinical end point, which may explain the dearth of prospective or external validation studies.

Molecular prediction was externally validated in fewer papers, reflecting the recency of molecular testing gaining mainstream utility. In most countries outside the US, molecular testing for indeterminate nodules has not entered routine practice. Although opportunistic molecular prediction has strong clinical potential, lack of data access may hold up practical application. Long-term outcome prediction suffered similar inattention. Although opportunities to develop such data sets are not limited by the same logistical and financial issues as molecular testing, maintaining long-term oncologic follow-up requires considerable effort. This data curation issue likely drove researchers to assess other more readily available outcomes. Moreover, as a later stage in the clinical pipeline, it may have appeared comparatively lower-value in a disease with generally good prognosis.

The Ideal Artificial Intelligence Model

Critical review of an AI model's versatility requires assessment of several criteria including performance on prospective and external data sets, statistical validity, adequate training cohort size, and evaluation of bias in the training data set. Using a standardized methodology for model evaluation allows one to better distinguish clinically effective tools from poorly designed ones.

Models solely trained and evaluated on single retrospective data sets have uncertain external validity. Prospective evaluation and external validation promote exposure to unseen data, which can help uncover overfitting and bias. Beyond robust study design, reporting statistical validity through bootstrapping and cross-validation is important to understanding model stability. Of the reviewed papers, 82% included a CI, SE, or P value with their metrics of model performance. Such reporting helps evaluate variation in model performance on different data subsets, giving greater confidence in a model's performance on varied real-world data.

Next, sample size and data set curation must be considered. A minimum sample size is typically necessary to ensure data diversity, although numbers are application-specific. However, even large data sets may yield limited diversity due to poor data set curation and risk developing models that are overfit to idiosyncrasies unique to the training data. Validation of an AI model should be performed on a data set that closely reflects the expected real-world use case. For example, most validation studies excluded cytologically indeterminate nodules, which comprise a substantial portion of real-world thyroid nodule FNA results. Excluding indeterminate nodules in studies is problematic because model evaluation on an idealized data set would be an unrealistic representation of model performance in a clinical setting. A model that is not trained on indeterminate nodules would perform unpredictably when encountering them during prospective or external model validation. Thus, expected performance of existing AI models for such nodules in clinical practice remains unclear. Similarly, given the low overall prevalence of malignancy in thyroid nodules, it is important that a validation data set reflect this pattern as well. Some studies artificially balance the frequency of benign and malignant cases for model training, and use similar proportions during model validation, not recognizing that these validation results may not reflect performance in the expected target population.

Rigorous implementation of these methods is intensive but crucial to improving confidence in the real-world validity of an AI model. While we see their utility in thyroid nodule diagnostics, they apply even more broadly in advancing AI-powered diagnostics in other disease domains.

Future Directions

We anticipate the next decade to yield both advancements in AI research applications as well as improvement in integration of AI solutions into mainstream clinical practice.

As seen in Fig. 4, few studies leveraged rich data modalities such as clinical variables and cytopathology slides. Future technical innovations should aim to integrate multiple data sources in a multimodal approach that could emulate a physician's synthesis of all available clinical data to provide the most accurate prediction. Along a similar vein, future methods leveraging US should explore the effect of multiple frames (through cine clips or sampling full-image studies) in improving automated diagnosis performance. Given that in clinical practice physicians make use of multiple views of a nodule to make a diagnosis, using multiple frames could address a limitation of most existing CAD systems that rely on manually selected, single-frame inputs. Several outcomes have also received minimal attention, such as prediction of molecular status and long-term outcomes, although they have high clinical value. Expanding data access through more effective data-sharing and publication of open-source data sets will enhance research replication and refinement of existing methodologies. Lastly, stronger collaboration between computational researchers and clinicians will ensure that the most important clinical problems are addressed, and that models are designed most effectively.

Regarding clinical integration, 4 commercial CADs are FDA approved for use but no reports in the literature mention their routine clinical use, to our knowledge. Each takes slightly different inputs and has slightly different strengths and potential use cases. Building physician trust and adapting to local practice variation will be the 2 most important steps to enabling integration of AI solutions into mainstream clinical use. Deep-learning models in particular can appear to some as a “black-box,” and in the absence of thorough external validation, may engender physician mistrust in the predicted results. Tools such as Gradient-weighted Class Activation Mapping (Grad-CAM) to visualize influential regions in an input image may help overcome this issue (46). Even without obvious explainability, rigorous prospective and external validation may help physicians eventually feel comfortable using a model.

The second important hurdle to overcome is local variation. Institutions use different US machines, serve patients with differing characteristics, and use different workflows. In contrast, training data sets used to train AI models are often “idealized” in that they lack artifacts and noise, furthering the need for effective, real-world validation. Integration of AI will require careful local stakeholder and workflow analysis to select the most impactful value proposition for that site, as well as infrastructure to locally pilot models.

Limitations

The included studies had significant heterogeneity in model technique, data sources, and outcomes predicted. As a result, direct comparison between studies was not possible. While papers were limited to the English language, most models were trained on data sets from China, Korea, or India, where screening and surgical practices are notably different compared to the United States and Europe. This challenges model validity outside their respective environments. Even within a similar clinical environment, implementing a model locally requires data sanitization, revalidation, and potential retraining to ensure performance.

We identified 61 studies using AI to predict thyroid nodule–focused outcomes, of which 16 were prospective, 61 performed external validation, and 33 performed physician comparison. Models using US images to predict malignancy predominated, and 1 of the 4 FDA-approved CADs (S-Detect) was extensively validated. Further validation and integration into clinical workflow will help models achieve greater clinical utility.

Funding

This work was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health (award No. R21EB030691) as well as a UCLA Radiology Exploratory Research Grant.

Disclosures

The authors have nothing to disclose.

Data Availability

Some or all data sets generated during and/or analyzed during the current study are not publicly available but are available from the corresponding author on reasonable request.

References

1

Mortensen

JD

,

Woolner

LB

,

Bennett

WA

.

Gross and microscopic findings in clinically normal thyroid glands

.

J Clin Endocrinol Metab

.

1955

;

15

(

10

):

1270

‐

1280

.

2

Wang

C

,

Crapo

C

.

The epidemiology of thyroid disease and implications for screening

.

Endocrinol Metab Clin North Am

.

1997

;

26

(

1

):

189

‐

218

.

3

Hoang

JK

,

Langer

JE

,

Middleton

WD

, et al.

Managing incidental thyroid nodules detected on imaging: white paper of the ACR Incidental Thyroid Findings Committee

.

J Am Coll Radiol

.

2015

;

12

(

2

):

143

‐

150

.

4

Hamberger

B

,

Gharib

H

,

Melton

LJ

,

Goellner

JR

,

Zinsmeister

AR

.

Fine-needle aspiration biopsy of thyroid nodules. Impact on thyroid practice and cost of care

.

Am J Med

.

1982

;

73

(

3

):

381

‐

384

.

5

Kuo

EJ

,

Wu

JX

,

Zanocco

KA

.

Cost effectiveness of immediate biopsy versus surveillance of intermediate-suspicion thyroid nodules

.

Surgery

.

2018

;

164

(

6

):

1330

‐

1335

.

6

Pitt

SC

,

Saucke

MC

,

Wendt

EM

, et al.

Patients’ reaction to diagnosis with thyroid cancer or an indeterminate thyroid nodule

.

Thyroid

.

2021

;

31

(

4

):

580

‐

588

.

7

Park

CS

,

Kim

SH

,

Jung

SL

, et al.

Observer variability in the sonographic evaluation of thyroid nodules

.

J Clin Ultrasound

.

2010

;

38

(

6

):

287

‐

293

.

8

Cibas

ES

,

Ali

SZ

.

The 2017 Bethesda system for reporting thyroid cytopathology

.

Thyroid

.

2017

;

27

(

11

):

1341

‐

1346

.

9

Bini

F

,

Pica

A

,

Azzimonti

L

, et al.

Artificial intelligence in thyroid field-A comprehensive review

.

Cancers (Basel)

.

2021

;

13

(

19

):

4740

.

10

Tessler

FN

,

Thomas

J

.

Artificial intelligence for evaluation of thyroid nodules: a primer

.

Thyroid

.

2023

;

33

(

2

):

150

‐

158

.

11

Toro-Tobon

D

,

Loor-Torres

R

,

Duran

M

, et al.

Artificial intelligence in thyroidology: a narrative review of the current applications, associated challenges, and future directions

.

Thyroid

.

2023

;

33

(

8

):

903

‐

917

.

12

Taha

A

,

Saad

B

,

Taha-Mehlitz

S

, et al.

Analysis of artificial intelligence in thyroid diagnostics and surgery: a scoping review

.

Am J Surg

.

2023

;

229

:

57

‐

64

.

13

Ludwig

M

,

Ludwig

B

,

Mikuła

A

,

Biernat

S

,

Rudnicki

J

,

Kaliszewski

K

.

The use of artificial intelligence in the diagnosis and classification of thyroid nodules: an update

.

Cancers (Basel)

.

2023

;

15

(

3

):

708

.

14

Page

MJ

,

McKenzie

JE

,

Bossuyt

PM

, et al.

The PRISMA 2020 statement: an updated guideline for reporting systematic reviews

.

BMJ

.

2021

;

372

:

n71

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

15

OCEBM Levels of Evidence Working Group

. The Oxford Levels of Evidence 2. Oxford Centre for Evidence-Based Medicine. Available at: https://www.cebm.ox.ac.uk/resources/levels-of-evidence/ocebm-levels-of-evidence. Accessed August 7, 2023.

16

Nie

X

,

Zhou

X

,

Tong

T

, et al.

N-Net: a novel dense fully convolutional neural network for thyroid nodule segmentation

.

Front Neurosci

.

2022

;

16

:

872601

.

17

Chi

J

,

Walia

E

,

Babyn

P

,

Wang

J

,

Groot

G

,

Eramian

M

.

Thyroid nodule classification in ultrasound images by fine-tuning deep convolutional neural network

.

J Digit Imaging

.

2017

;

30

(

4

):

477

‐

486

.

18

Duan

X

,

Duan

S

,

Jiang

P

, et al.

An ensemble deep learning architecture for multilabel classification on TI-RADS

. In:

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Seoul, Korea (South): IEEE; 2020:576-582

.

19

Zhang

X

,

Lee

VCS

,

Rong

J

,

Liu

F

,

Kong

H

.

Multi-channel convolutional neural network architectures for thyroid cancer detection

.

PLoS One

.

2022

;

17

(

1

):

e0262128

.

20

Gao

L

,

Liu

R

,

Jiang

Y

, et al.

Computer-aided system for diagnosing thyroid nodules on ultrasound: a comparison with radiologist-based clinical assessments

.

Head Neck

.

2018

;

40

(

4

):

778

‐

783

.

21

Bai

Z

,

Chang

L

,

Yu

R

, et al.

Thyroid nodules risk stratification through deep learning based on ultrasound images

.

Med Phys

.

2020

;

47

(

12

):

6355

‐

6365

.

22

Anand

D

,

Yashashwi

K

,

Kumar

N

,

Rane

S

,

Gann

PH

,

Sethi

A

.

Weakly supervised learning on unannotated H&E-stained slides predicts BRAF mutation in thyroid cancer with high accuracy

.

J Pathol

.

2021

;

255

(

3

):

232

‐

242

.

23

Pankratz

DG

,

Hu

Z

,

Kim

SY

, et al.

Analytical performance of a gene expression classifier for medullary thyroid carcinoma

.

Thyroid

.

2016

;

26

(

11

):

1573

‐

1580

.

24

Wang

J

,

Li

S

,

Song

W

,

Qin

H

,

Zhang

B

,

Hao

A.

Learning from weakly-labeled clinical data for automatic thyroid nodule classification in ultrasound images

. In:

2018 25th IEEE International Conference on Image Processing (ICIP). Athens: IEEE; 2018:3114-3118

.

25

Song

R

,

Zhang

L

,

Zhu

C

,

Liu

J

,

Yang

J

,

Zhang

T

.

Thyroid nodule ultrasound image classification through hybrid feature cropping network

.

IEEE Access

.

2020

;

8

:

64064

‐

64074

.

Google Scholar

Crossref

WorldCat

26

Zhou

H

,

Jin

Y

,

Dai

L

, et al.

Differential diagnosis of benign and malignant thyroid nodules using deep learning radiomics of thyroid ultrasound images

.

Eur J Radiol

.

2020

;

127

:

108992

.

27

Zhou

H

,

Wang

K

,

Tian

J

.

Online transfer learning for differential diagnosis of benign and malignant thyroid nodules with ultrasound images

.

IEEE Trans Biomed Eng

.

2020

;

67

(

10

):

2773

‐

2780

.

28

Böhland

M

,

Tharun

L

,

Scherr

T

, et al.

Machine learning methods for automated classification of tumors with papillary thyroid carcinoma-like nuclei: a quantitative analysis

.

PLoS One

.

2021

;

16

(

9

):

e0257635

.

29

Lee

H

,

Chai

YJ

,

Joo

H

, et al.

Federated learning for thyroid ultrasound image analysis to protect personal information: validation study in a real health care environment

.

JMIR Med Inform

.

2021

;

9

(

5

):

e25869

.

30

Park

KS

,

Kim

SH

,

Oh

JH

,

Kim

SY

.

Highly accurate diagnosis of papillary thyroid carcinomas based on personalized pathways coupled with machine learning

.

Brief Bioinformatics

.

2021

;

22

(

4

):

bbaa336

.

31

Zhang

Q

,

Zhang

S

,

Li

J

, et al.

Improved diagnosis of thyroid cancer aided with deep learning applied to sonographic text reports: a retrospective, multi-cohort, diagnostic study

.

Cancer Biol Med

.

2021

;

19

(

5

):

733

‐

741

.

32

Jia

X

,

Ma

Z

,

Kong

D

, et al.

Novel human artificial intelligence hybrid framework pinpoints thyroid nodule malignancy and identifies overlooked second-order ultrasonographic features

.

Cancers (Basel)

.

2022

;

14

(

18

):

4440

.

33

Jin

Z

,

Pei

S

,

Ouyang

L

, et al.

Thy-wise: an interpretable machine learning model for the evaluation of thyroid nodules

.

Intl Journal of Cancer

.

2022

;

151

(

12

):

2229

‐

2243

.

Google Scholar

Crossref

WorldCat

34

Keutgen

XM

,

Li

H

,

Memeh

K

, et al.

A machine-learning algorithm for distinguishing malignant from benign indeterminate thyroid nodules using ultrasound radiomic features

.

J Med Imag

.

2022

;

9

(

03

):

e034501

.

Google Scholar

Crossref

WorldCat

35

Liu

Z

,

Deyer

L

,

Yang

A

, et al.

Automated machine learning-based radiomics analysis versus deep learning-based classification for thyroid nodule on ultrasound images: a multi-center study

. In:

2022 IEEE 22nd International Conference on Bioinformatics and Bioengineering (BIBE). Taichung, Taiwan: IEEE; 2022:23-28

.

36

Randolph

GW

,

Sosa

JA

,

Hao

Y

, et al.

Preoperative identification of medullary thyroid carcinoma (MTC): clinical validation of the afirma MTC RNA-sequencing classifier

.

Thyroid

2022

;

32

(

9

):

1069

‐

1076

.

37

Li

X

,

Zhang

S

,

Zhang

Q

, et al.

Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study

.

Lancet Oncol

.

2019

;

20

(

2

):

193

‐

201

.

38

Park

VY

,

Han

K

,

Seong

YK

, et al.

Diagnosis of thyroid nodules: performance of a deep learning convolutional neural network model vs

.

Radiologists Sci Rep

.

2019

;

9

(

1

):

17843

.

39

Song

J

,

Chai

YJ

,

Masuoka

H

, et al.

Ultrasound image analysis using deep learning algorithm for the diagnosis of thyroid nodules

.

Medicine (Baltimore)

.

2019

;

98

(

15

):

e15133

.

40

Koh

J

,

Lee

E

,

Han

K

, et al.

Diagnosis of thyroid nodules on ultrasonography by a deep convolutional neural network

.

Sci Rep

.

2020

;

10

(

1

):

15245

.

41

Wei

X

,

Gao

M

,

Yu

R

, et al.

Ensemble deep learning model for multicenter classification of thyroid nodules on ultrasound images

.

Med Sci Monit

.

2020

;

26

:

e926096

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

42

Zhang

S

,

Du

H

,

Jin

Z

, et al.

A novel interpretable computer-aided diagnosis system of thyroid nodules on ultrasound based on clinical experience

.

IEEE Access

.

2020

;

8

:

53223

‐

53231

.

Google Scholar

Crossref

WorldCat

43

Zhu

Y-C

,

AlZoubi

A

,

Jassim

S

, et al.

A generic deep learning framework to classify thyroid and breast lesions in ultrasound images

.

Ultrasonics

.

2021

;

110

:

106300

.

44

Zhu

Y-C

,

Jin

P-F

,

Bao

J

,

Jiang

Q

,

Wang

X

.

Thyroid ultrasound image classification using a convolutional neural network

.

Ann Transl Med

.

2021

;

9

(

20

):

1526

‐

1526

.

45

Zhu

Y

,

Du

H

,

Jiang

Q

, et al.

Machine learning assisted Doppler features for enhancing thyroid cancer diagnosis: a multi-cohort study

.

J Ultrasound Med

.

2022

;

41

(

8

):

1961

‐

1974

.

46

Han

X

,

Chang

L

,

Song

K

,

Cheng

L

,

Li

M

,

Wei

X

.

Multitask network for thyroid nodule diagnosis based on TI-RADS

.

Med Phys

.

2022

;

49

(

8

):

5064

‐

5080

.

47

Yang

J

,

Page

LC

,

Wagner

L

, et al.

Thyroid nodules on ultrasound in children and young adults: comparison of diagnostic performance of radiologists’ impressions, ACR TI-RADS, and a deep learning algorithm

.

Am J Roentgenol

.

2023

;

220

(

3

):

408

‐

417

.

Google Scholar

Crossref

WorldCat

48

Barczyński

M

,

Stopa-Barczyńska

M

,

Wojtczak

B

,

Czarniecka

A

,

Konturek

A

.

Clinical validation of S-DetectTM mode in semi-automated ultrasound classification of thyroid lesions in surgical office

.

Gland Surg

.

2020

;

9

(

S2

):

S77

‐

S85

.

49

Liang

X

,

Yu

J

,

Liao

J

,

Chen

Z

.

Convolutional neural network for breast and thyroid nodules diagnosis in ultrasound imaging

.

Biomed Res Int

.

2020

;

2020

:

1

‐

9

.

Google Scholar

OpenURL Placeholder Text

WorldCat

50

Zhang

Y

,

Wu

Q

,

Chen

Y

,

Wang

Y

.

A clinical assessment of an ultrasound computer-aided diagnosis system in differentiating thyroid nodules with radiologists of different diagnostic experience

.

Front Oncol

.

2020

;

10

:

557169

.

51

Peng

S

,

Liu

Y

,

Lv

W

, et al.

Deep learning-based artificial intelligence model to assist thyroid nodule diagnosis and management: a multicentre diagnostic study

.

Lancet Digital Health

.

2021

;

3

(

4

):

e250

‐

e259

.

52

Zhu

J

,

Zhang

S

,

Yu

R

, et al.

An efficient deep convolutional neural network model for visual localization and automatic diagnosis of thyroid nodules on ultrasound images

.

Quant Imaging Med Surg

.

2021

;

11

(

4

):

1368

‐

1380

.

53

Kim

Y-J

,

Choi

Y

,

Hur

S-J

, et al.

Deep convolutional neural network for classification of thyroid nodules on ultrasound: comparison of the diagnostic performance with that of radiologists

.

Eur J Radiol

.

2022

;

152

:

110335

.

54

Wang

B

,

Wan

Z

,

Li

C

, et al.

Identification of benign and malignant thyroid nodules based on dynamic AI ultrasound intelligent auxiliary diagnosis system

.

Front Endocrinol

.

2022

;

13

:

1018321

.

Google Scholar

Crossref

WorldCat

55

Wang

Y

,

Xu

L

,

Lu

W

, et al.

Clinical evaluation of malignancy diagnosis of rare thyroid carcinomas by an artificial intelligent automatic diagnosis system

.

Endocrine

.

2022

;

80

(

1

):

93

‐

99

.

56

Xu

D

,

Wang

Y

,

Wu

H

, et al.

An artificial intelligence ultrasound system's ability to distinguish benign from malignant follicular-patterned lesions

.

Front Endocrinol

.

2022

;

13

:

981403

.

Google Scholar

Crossref

WorldCat

57

Abbasian Ardakani

A

,

Mohammadi

A

,

Mirza-Aghazadeh-Attari

M

,

Faeghi

F

,

Vogl

TJ

,

Acharya

UR

.

Diagnosis of metastatic lymph nodes in patients with papillary thyroid cancer: a comparative multi-center study of semantic features and deep learning-based models

.

J of Ultrasound Medicine

.

2023

;

42

(

6

):

1211

‐

1221

.

Google Scholar

Crossref

WorldCat

58

Yu

P

,

Wu

X

,

Li

J

, et al.

Extrathyroidal extension prediction of papillary thyroid cancer with computed tomography based radiomics nomogram: a multicenter study

.

Front Endocrinol

.

2022

;

13

:

874396

.

Google Scholar

Crossref

WorldCat

59

Zou

Y

,

Shi

Y

,

Sun

F

, et al.

Extreme gradient boosting model to assess risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: individual prediction using SHapley additive exPlanations

.

Comput Methods Programs Biomed

.

2022

;

225

:

107038

.

60

Bhalla

S

,

Kaur

H

,

Kaur

R

,

Sharma

S

,

Raghava

GPS

.

Expression based biomarkers and models to classify early and late-stage samples of papillary thyroid carcinoma

.

PLoS One

.

2020

;

15

(

4

):

e0231629

.

61

Yang

B

,

Yan

M

,

Yan

Z

,

Zhu

C

,

Xu

D

,

Dong

F

.

Segmentation and classification of thyroid follicular neoplasm using cascaded convolutional neural network

.

Phys Med Biol

.

2020

;

65

(

24

):

245040

.

62

Chen

W

,

Yao

Y

,

Zheng

P

,

Malywanga

J.

Development of a Set of DNA Methylation Markers in the Diagnosis and Prognosis of Papillary Thyroid Carcinoma by Machine Learning

. In:

2021 7th annual international conference on network and information systems for computers (ICNISC). Guiyang, China: IEEE; 2021:635-639

.

63

Dolezal

JM

,

Trzcinska

A

,

Liao

C-Y

, et al.

Deep learning prediction of BRAF-RAS gene expression signature identifies noninvasive follicular thyroid neoplasms with papillary-like nuclear features

.

Mod Pathol

.

2021

;

34

(

5

):

862

‐

874

.

64

Swan

KZ

,

Thomas

J

,

Nielsen

VE

,

Jespersen

ML

,

Bonnema

SJ

.

External validation of AIBx, an artificial intelligence model for risk stratification, in thyroid nodules

.

Eur Thyroid J

.

2022

;

11

(

2

):

e210129

.

65

Wu

X

,

Yu

P

,

Jia

C

, et al.

Radiomics analysis of computed tomography for prediction of thyroid capsule invasion in papillary thyroid carcinoma: a multi-classifier and two-center study

.

Front Endocrinol

.

2022

;

13

:

849065

.

Google Scholar

Crossref

WorldCat

66

Kim

HL

,

Ha

EJ

,

Han

M

.

Real-world performance of computer-aided diagnosis system for thyroid nodules using ultrasonography

.

Ultrasound Med Biol

.

2019

;

45

(

10

):

2672

‐

2678

.

67

Han

M

,

Ha

EJ

,

Park

JH

.

Computer-aided diagnostic system for thyroid nodules on ultrasonography: diagnostic performance based on the thyroid imaging reporting and data system classification and dichotomous outcomes

.

AJNR Am J Neuroradiol

.

2021

;

42

(

3

):

559

‐

565

.

68

Liang

X

,

Huang

Y

,

Cai

Y

,

Liao

J

,

Chen

Z

.

A computer-aided diagnosis system and thyroid imaging reporting and data system for dual validation of ultrasound-guided fine-needle aspiration of indeterminate thyroid nodules

.

Front Oncol

.

2021

;

11

:

611436

.

69

Stenman

S

,

Linder

N

,

Lundin

M

,

Haglund

C

,

Arola

J

,

Lundin

J

.

A deep learning–based algorithm for tall cell detection in papillary thyroid carcinoma

.

PLoS One

.

2022

;

17

(

8

):

e0272696

.

70

Choi

YJ

,

Baek

JH

,

Park

HS

, et al.

A computer-aided diagnosis system using artificial intelligence for the diagnosis and characterization of thyroid nodules on ultrasound: initial clinical assessment

.

Thyroid

.

2017

;

27

(

4

):

546

‐

552

.

71

Yoo

YJ

,

Ha

EJ

,

Cho

YJ

,

Kim

HL

,

Han

M

,

Kang

SY

.

Computer-aided diagnosis of thyroid nodules via ultrasonography: initial clinical experience

.

Korean J Radiol

.

2018

;

19

(

4

):

665

.

72

Jeong

EY

,

Kim

HL

,

Ha

EJ

,

Park

SY

,

Cho

YJ

,

Han

M

.

Computer-aided diagnosis system for thyroid nodules on ultrasonography: diagnostic performance and reproducibility based on the experience level of operators

.

Eur Radiol

.

2019

;

29

(

4

):

1978

‐

1985

.

73

Xia

S

,

Yao

J

,

Zhou

W

, et al.

A computer-aided diagnosing system in the evaluation of thyroid nodules—experience in a specialized thyroid center

.

World J Surg Onc

.

2019

;

17

(

1

):

210

.

Google Scholar

Crossref

WorldCat

74

Wei

Q

,

Zeng

S-E

,

Wang

L-P

, et al.

The value of S-detect in improving the diagnostic performance of radiologists for the differential diagnosis of thyroid nodules

.

Med Ultrason

.

2020

;

22

(

4

):

415

.

75

Cui

Y

,

Fu

C

,

Si

C

, et al.

Analysis and comparison of the malignant thyroid nodules not recommended for biopsy in ACR TIRADS and AI TIRADS with a large sample of surgical series

.

J Ultrasound Med

.

2023

;

42

(

6

):

1225

‐

1233

.

76

Huang

P

,

Zheng

B

,

Li

M

, et al.

The diagnostic value of artificial intelligence ultrasound S-detect technology for thyroid nodules

.

Comput Intell Neurosci

.

2022

;

2022

:

1

–

7

.

Google Scholar

OpenURL Placeholder Text

WorldCat

77

Sant

V

.

Supplementary material for manuscript (From Bench-to-Bedside: How Artificial Intelligence Is Changing Thyroid Nodule Diagnostics). Texas Data Repository, V1. 2024

. https://doi.org/10.18738/T8/G0FLTN.

78

FDA 510k: AmCAD-UT Detection 2.2. Available at: https://www.accessdata.fda.gov/cdrh_docs/pdf18/K180006.pdf. Accessed January 29, 2024.

79

Wildman-Tobriner

B

,

Taghi-Zadeh

E

,

Mazurowski

MA

.

Artificial intelligence (AI) tools for thyroid nodules on ultrasound, from the AJR special series on AI applications

.

Am J Roentgenol

.

2022

;

219

(

4

):

547

‐

554

.

Google Scholar

Crossref

WorldCat

80

Koios DS, Koios Medical, Inc

. Available at: https://grand-challenge.org/aiforradiology/product/koios-ds/. Accessed January 29, 2024.

81

Koios DSTM Thyroid—Koios Medical

. Available at: https://koiosmedical.com/products/koios-ds-thyroid/. Accessed January 29, 2024.

82

FDA 510k: Koios DS

. Available at: https://www.accessdata.fda.gov/cdrh_docs/pdf21/K212616.pdf. Accessed January 29, 2024.

83

FDA 510k: MEDO Thyroid

. Available at: https://www.accessdata.fda.gov/cdrh_docs/pdf20/K203502.pdf. Accessed January 29, 2024.

Abbreviations

AI
artificial intelligence

AUROC
area under the receiver operating characteristic curve

CAD
computer-aided diagnosis

FDA
Food and Drug Administration

FNA
fine-needle aspiration

TI-RADS
Thyroid Imaging Reporting & Data System

US
ultrasound

Author notes

V.R.S. and A.R. contributed equally as co-first authors of this work.

© The Author(s) 2024. Published by Oxford University Press on behalf of the Endocrine Society. All rights reserved. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)

Download all slides

Month:	Total Views:
April 2024	30
May 2024	153
June 2024	308
July 2024	350
August 2024	189
September 2024	128
October 2024	66
November 2024	60
December 2024	32
January 2025	79
February 2025	41
March 2025	73
April 2025	65
May 2025	65

Article Contents

From Bench-to-Bedside: How Artificial Intelligence is Changing Thyroid Nodule Diagnostics, a Systematic Review

Abstract

Materials and Methods

Systematic Literature Search

Study Selection

Data Abstraction and Analysis

Results

Nodule Localization

Ultrasound Risk Score

Molecular Status

Malignancy

Long-term Prognosis

Discussion

The Artificial Intelligence–Empowered Physician

The Ideal Artificial Intelligence Model

Future Directions

Limitations

Funding

Disclosures

Data Availability

References

Abbreviations

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

From Bench-to-Bedside: How Artificial Intelligence is Changing Thyroid Nodule Diagnostics, a Systematic Review Free

Abstract

Materials and Methods

Systematic Literature Search

Study Selection

Data Abstraction and Analysis

Results

Nodule Localization

Ultrasound Risk Score

Molecular Status

Malignancy

Long-term Prognosis

Discussion

The Artificial Intelligence–Empowered Physician

The Ideal Artificial Intelligence Model

Future Directions

Limitations

Funding

Disclosures

Data Availability

References

Abbreviations

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

From Bench-to-Bedside: How Artificial Intelligence is Changing Thyroid Nodule Diagnostics, a Systematic Review