AI-luminating Artificial Intelligence in Inflammatory Bowel Diseases: A Narrative Review on the Role of AI in Endoscopy, Histology, and Imaging for IBD

Sixteen studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the detection and diagnosis of IBD.

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Mahapatra 2013¹⁷	CD N = 26	MRE	Radiomics	Semi-automatic detection segments of bowel affected by CD using a novel method to calculate shape asymmetry combined with feature intensity and texture. Performance was measured against dual tree complex wavelet transform (DTCWT) and shape-asymmetry-based method (Asy).	The investigator’s method had higher sensitivity (90.4%), specificity (90.1%), and accuracy (88.9%) for detecting segments of bowel with CD compared with DTCWT and Asy methods.
Hahnemann 2015¹⁸	IBD N = 50	MRE	DL, CNN	Increased detection of inflammatory intestinal lesions using automatically generated maps of bowel motility with static MRI images compared with static MRI images alone.	Additional inflammatory lesions were found in 13 (26%) of 50 patients with automated motility compared + static MRI images vs static MRI images alone (P = .0002)
Mossotto 2017¹⁹	Pediatric IBD N = 287	Endoscopic and histologic images	Supervised and unsupervised ML methods	Develop ML algorithm to classify disease using endoscopic and histologic data from pediatric IBD patients.	ML models using endoscopy alone, histology alone, and endoscopy + histology combined to differentiate between UC, CD, and IBD unclassified yielded AUC of 0.71, 0.77, and 0.83, respectively.
Naziroglu 2017²⁰	CD N = 53	MRE	DL, Active contouring model	Compare manual vs semiautomatic measurements (active contouring segmentation algorithm) for delineating bowel and measurement of wall thickness in CD. Reproducibility of delineating diseased bowel evaluated by comparing area of overlap on 2 independent segmentations for each approach (reflected by overlap coefficient). Reproducibility of measuring bowel wall thickness evaluated by interobserver agreement.	Semiautomatic delineation of diseased regions of active CD was more reproducible than manual (median overlap 0.89 vs 0.72, p = 1.4e^-5). Semiautomatic measurement (ICC 0.88) of bowel wall thickness had higher interobserver agreement than manual (ICC 0.45, P = .005)
Gollifer 2019²¹	CD N = 105	MRE	DL, CNN	Compare software automated vs subjective quantification of intestinal motility and identify the combination of motility metrics most associated with symptom severity (HBI)	On multivariable model, software quantified temporal motility variation (β = -0.23, P = .005) and area of motile bowel (β = 0.16, P = .01) were associated with HBI. Subjective quantification of motility metrics was not associated with HBI.
Klang 2020²²	CD N = 49	VCE images	DL, CNN	Develop and evaluate a DL algorithm for detecting small bowel ulcers in CD.	The algorithm yielded AUC 0.99 for detecting ulcers.
Klang 2020²³	CD N = 27 892 images	VCE images	DL, CNN	Evaluate the accuracy of DL for detecting strictures from VCE images in CD.	The model differentiated strictures from normal mucosa and small-bowel ulcers with AUC 0.99 and 0.94, respectively.
Li 2021²⁴	IBD N = 165 lesions (UC = 66, CD 99)	Multislice CT	Radiomic	Develop a radiomic nomogram to distinguish between UC vs CD.	Multivariable regression model including only radiomic features yielded AUC 0.81, accuracy 70%, sensitivity 80%, and specificity 54% Combining radiomic features with 3 significant clinical features (presence of inflammatory mesenteric fat, lesion location, and CT-value of arterial-phase enhancement of bowel wall) improved the performance to AUC 0.88.
Zhu 2021²⁵	CD (n = 93) and intestinal TB (n = 67)	CTE	Radiomic	Develop and validate a clinical radiomics nomogram to differentiate CD from intestinal TB using clinical and radiomic features.	The clinical radiomic nomogram containing 9 radiomic and 2 clinical features had good performance (AUC 0.96) and superior to the only clinical and only radiomic model.
Arkko 2021²⁶	CD and non CD (n = 369; 50% CD)	MRE	DL, CNN	Determine feasibility of detecting CD using automated quantification of intestinal motility on MRE.	After testing with 4 different ROI approaches and 3 motility indices in 3 independent data sets, using full image ROI and motility index 1 (average of all generated motility maps) had the best performance with AUC 0.78.
Klang 2021²⁷	CD	VCE N = 19 245 images	DL, CNN	Determine if a CNN model trained to detect small bowel CD ulcers can differentiate between for CD vs NSAID-induced ulcers.	The CNN model trained on CD ulcers detected NSAID-induced ulcers with AUC 0.97, which is similar performance for detecting CD ulcers. Thus, CNN model unable to differentiate between NSAID vs CD ulcers.
Jiang 2022²⁸	IBD N = 120	CTE	DL, GIF (gradient image filter) algorithm	Compare accuracy of diagnosing IBD with traditional CTE vs low-dose CTE with an optimized GIF algorithm.	The diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the GIF algorithm group were higher than traditional CTE control group (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05). AI assisted imaging enhancement improves diagnostic accuracy
Wang 2022²⁹	IBD N = 496 (217 CD)	Endoscopic images	DL, CNN	Develop a CNN-model to differentiate between CD vs UC vs healthy controls.	The CNN model yielded higher differential diagnosis accuracy than humans for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).
Brodersen 2023³⁰	IBD N = 132	VCE	DL	Determine agreement between AI-aided vs standard evaluation of pan-enteric capsule endoscopy assessment for detecting CD.	The AI-aided evaluation reduced the number of output images to 470 and median review time was 3.2minutes/patient. For detecting CD, observers reviewing AI-selected images had 92-96% sensitivity and 90-93% specificity for diagnosing CD. The negative predictive value for CD was 95%.
Carter 2023³¹	IBD N = 308	Intestinal US	DL, CNN	Develop and validate an automated DL module to distinguished between increased and normal bowel wall thickness.	The module had 90.1% accuracy, 86.4% sensitivity, and 94% specificity to detecting bowel wall thickening with an AUC 0.98.
Gong 2023³²	CD and iTB N = 108	CTE	Radiomics	Develop and test a clinical multiregional radiomic model to differentiate CD from iTB. Radiomic features were extracted from bowel wall, largest lymph node, and region surrounding the ileocecal region.	A multimodal nomogram including 2 radiomic features, involved bowel segment on CTE, and longitudinal ulcer on endoscopy yielded AUC 0.96, which was better than clinical only model (AUC 0.88, P = .004). Combined nomogram had greater accuracy (89.5%) than 2 radiologists (66.7-75.2%). Decision curve analysis showed combined model had highest net benefit.
Zhou 2023³³	IBD N = 316	CTE	DL, CNN, Radiomics	Investigate if volumetric visceral adipose tissue features using radiomics and 3D CNN can differentiate between CD vs UC.	A radiomics model had a higher AUC than the CNN model (AUC 0.71 vs 0.69, P = .750). A nomogram incorporating the radiomics model and clinical factors differentiated UC from CD with AUC 0.78, which was better than radiomics model only (AUC 0.72) or clinical variables only (AUC 0.74).

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Mahapatra 2013¹⁷	CD N = 26	MRE	Radiomics	Semi-automatic detection segments of bowel affected by CD using a novel method to calculate shape asymmetry combined with feature intensity and texture. Performance was measured against dual tree complex wavelet transform (DTCWT) and shape-asymmetry-based method (Asy).	The investigator’s method had higher sensitivity (90.4%), specificity (90.1%), and accuracy (88.9%) for detecting segments of bowel with CD compared with DTCWT and Asy methods.
Hahnemann 2015¹⁸	IBD N = 50	MRE	DL, CNN	Increased detection of inflammatory intestinal lesions using automatically generated maps of bowel motility with static MRI images compared with static MRI images alone.	Additional inflammatory lesions were found in 13 (26%) of 50 patients with automated motility compared + static MRI images vs static MRI images alone (P = .0002)
Mossotto 2017¹⁹	Pediatric IBD N = 287	Endoscopic and histologic images	Supervised and unsupervised ML methods	Develop ML algorithm to classify disease using endoscopic and histologic data from pediatric IBD patients.	ML models using endoscopy alone, histology alone, and endoscopy + histology combined to differentiate between UC, CD, and IBD unclassified yielded AUC of 0.71, 0.77, and 0.83, respectively.
Naziroglu 2017²⁰	CD N = 53	MRE	DL, Active contouring model	Compare manual vs semiautomatic measurements (active contouring segmentation algorithm) for delineating bowel and measurement of wall thickness in CD. Reproducibility of delineating diseased bowel evaluated by comparing area of overlap on 2 independent segmentations for each approach (reflected by overlap coefficient). Reproducibility of measuring bowel wall thickness evaluated by interobserver agreement.	Semiautomatic delineation of diseased regions of active CD was more reproducible than manual (median overlap 0.89 vs 0.72, p = 1.4e^-5). Semiautomatic measurement (ICC 0.88) of bowel wall thickness had higher interobserver agreement than manual (ICC 0.45, P = .005)
Gollifer 2019²¹	CD N = 105	MRE	DL, CNN	Compare software automated vs subjective quantification of intestinal motility and identify the combination of motility metrics most associated with symptom severity (HBI)	On multivariable model, software quantified temporal motility variation (β = -0.23, P = .005) and area of motile bowel (β = 0.16, P = .01) were associated with HBI. Subjective quantification of motility metrics was not associated with HBI.
Klang 2020²²	CD N = 49	VCE images	DL, CNN	Develop and evaluate a DL algorithm for detecting small bowel ulcers in CD.	The algorithm yielded AUC 0.99 for detecting ulcers.
Klang 2020²³	CD N = 27 892 images	VCE images	DL, CNN	Evaluate the accuracy of DL for detecting strictures from VCE images in CD.	The model differentiated strictures from normal mucosa and small-bowel ulcers with AUC 0.99 and 0.94, respectively.
Li 2021²⁴	IBD N = 165 lesions (UC = 66, CD 99)	Multislice CT	Radiomic	Develop a radiomic nomogram to distinguish between UC vs CD.	Multivariable regression model including only radiomic features yielded AUC 0.81, accuracy 70%, sensitivity 80%, and specificity 54% Combining radiomic features with 3 significant clinical features (presence of inflammatory mesenteric fat, lesion location, and CT-value of arterial-phase enhancement of bowel wall) improved the performance to AUC 0.88.
Zhu 2021²⁵	CD (n = 93) and intestinal TB (n = 67)	CTE	Radiomic	Develop and validate a clinical radiomics nomogram to differentiate CD from intestinal TB using clinical and radiomic features.	The clinical radiomic nomogram containing 9 radiomic and 2 clinical features had good performance (AUC 0.96) and superior to the only clinical and only radiomic model.
Arkko 2021²⁶	CD and non CD (n = 369; 50% CD)	MRE	DL, CNN	Determine feasibility of detecting CD using automated quantification of intestinal motility on MRE.	After testing with 4 different ROI approaches and 3 motility indices in 3 independent data sets, using full image ROI and motility index 1 (average of all generated motility maps) had the best performance with AUC 0.78.
Klang 2021²⁷	CD	VCE N = 19 245 images	DL, CNN	Determine if a CNN model trained to detect small bowel CD ulcers can differentiate between for CD vs NSAID-induced ulcers.	The CNN model trained on CD ulcers detected NSAID-induced ulcers with AUC 0.97, which is similar performance for detecting CD ulcers. Thus, CNN model unable to differentiate between NSAID vs CD ulcers.
Jiang 2022²⁸	IBD N = 120	CTE	DL, GIF (gradient image filter) algorithm	Compare accuracy of diagnosing IBD with traditional CTE vs low-dose CTE with an optimized GIF algorithm.	The diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the GIF algorithm group were higher than traditional CTE control group (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05). AI assisted imaging enhancement improves diagnostic accuracy
Wang 2022²⁹	IBD N = 496 (217 CD)	Endoscopic images	DL, CNN	Develop a CNN-model to differentiate between CD vs UC vs healthy controls.	The CNN model yielded higher differential diagnosis accuracy than humans for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).
Brodersen 2023³⁰	IBD N = 132	VCE	DL	Determine agreement between AI-aided vs standard evaluation of pan-enteric capsule endoscopy assessment for detecting CD.	The AI-aided evaluation reduced the number of output images to 470 and median review time was 3.2minutes/patient. For detecting CD, observers reviewing AI-selected images had 92-96% sensitivity and 90-93% specificity for diagnosing CD. The negative predictive value for CD was 95%.
Carter 2023³¹	IBD N = 308	Intestinal US	DL, CNN	Develop and validate an automated DL module to distinguished between increased and normal bowel wall thickness.	The module had 90.1% accuracy, 86.4% sensitivity, and 94% specificity to detecting bowel wall thickening with an AUC 0.98.
Gong 2023³²	CD and iTB N = 108	CTE	Radiomics	Develop and test a clinical multiregional radiomic model to differentiate CD from iTB. Radiomic features were extracted from bowel wall, largest lymph node, and region surrounding the ileocecal region.	A multimodal nomogram including 2 radiomic features, involved bowel segment on CTE, and longitudinal ulcer on endoscopy yielded AUC 0.96, which was better than clinical only model (AUC 0.88, P = .004). Combined nomogram had greater accuracy (89.5%) than 2 radiologists (66.7-75.2%). Decision curve analysis showed combined model had highest net benefit.
Zhou 2023³³	IBD N = 316	CTE	DL, CNN, Radiomics	Investigate if volumetric visceral adipose tissue features using radiomics and 3D CNN can differentiate between CD vs UC.	A radiomics model had a higher AUC than the CNN model (AUC 0.71 vs 0.69, P = .750). A nomogram incorporating the radiomics model and clinical factors differentiated UC from CD with AUC 0.78, which was better than radiomics model only (AUC 0.72) or clinical variables only (AUC 0.74).

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to number of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.

Table 1.

Sixteen studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the detection and diagnosis of IBD.

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Mahapatra 2013¹⁷	CD N = 26	MRE	Radiomics	Semi-automatic detection segments of bowel affected by CD using a novel method to calculate shape asymmetry combined with feature intensity and texture. Performance was measured against dual tree complex wavelet transform (DTCWT) and shape-asymmetry-based method (Asy).	The investigator’s method had higher sensitivity (90.4%), specificity (90.1%), and accuracy (88.9%) for detecting segments of bowel with CD compared with DTCWT and Asy methods.
Hahnemann 2015¹⁸	IBD N = 50	MRE	DL, CNN	Increased detection of inflammatory intestinal lesions using automatically generated maps of bowel motility with static MRI images compared with static MRI images alone.	Additional inflammatory lesions were found in 13 (26%) of 50 patients with automated motility compared + static MRI images vs static MRI images alone (P = .0002)
Mossotto 2017¹⁹	Pediatric IBD N = 287	Endoscopic and histologic images	Supervised and unsupervised ML methods	Develop ML algorithm to classify disease using endoscopic and histologic data from pediatric IBD patients.	ML models using endoscopy alone, histology alone, and endoscopy + histology combined to differentiate between UC, CD, and IBD unclassified yielded AUC of 0.71, 0.77, and 0.83, respectively.
Naziroglu 2017²⁰	CD N = 53	MRE	DL, Active contouring model	Compare manual vs semiautomatic measurements (active contouring segmentation algorithm) for delineating bowel and measurement of wall thickness in CD. Reproducibility of delineating diseased bowel evaluated by comparing area of overlap on 2 independent segmentations for each approach (reflected by overlap coefficient). Reproducibility of measuring bowel wall thickness evaluated by interobserver agreement.	Semiautomatic delineation of diseased regions of active CD was more reproducible than manual (median overlap 0.89 vs 0.72, p = 1.4e^-5). Semiautomatic measurement (ICC 0.88) of bowel wall thickness had higher interobserver agreement than manual (ICC 0.45, P = .005)
Gollifer 2019²¹	CD N = 105	MRE	DL, CNN	Compare software automated vs subjective quantification of intestinal motility and identify the combination of motility metrics most associated with symptom severity (HBI)	On multivariable model, software quantified temporal motility variation (β = -0.23, P = .005) and area of motile bowel (β = 0.16, P = .01) were associated with HBI. Subjective quantification of motility metrics was not associated with HBI.
Klang 2020²²	CD N = 49	VCE images	DL, CNN	Develop and evaluate a DL algorithm for detecting small bowel ulcers in CD.	The algorithm yielded AUC 0.99 for detecting ulcers.
Klang 2020²³	CD N = 27 892 images	VCE images	DL, CNN	Evaluate the accuracy of DL for detecting strictures from VCE images in CD.	The model differentiated strictures from normal mucosa and small-bowel ulcers with AUC 0.99 and 0.94, respectively.
Li 2021²⁴	IBD N = 165 lesions (UC = 66, CD 99)	Multislice CT	Radiomic	Develop a radiomic nomogram to distinguish between UC vs CD.	Multivariable regression model including only radiomic features yielded AUC 0.81, accuracy 70%, sensitivity 80%, and specificity 54% Combining radiomic features with 3 significant clinical features (presence of inflammatory mesenteric fat, lesion location, and CT-value of arterial-phase enhancement of bowel wall) improved the performance to AUC 0.88.
Zhu 2021²⁵	CD (n = 93) and intestinal TB (n = 67)	CTE	Radiomic	Develop and validate a clinical radiomics nomogram to differentiate CD from intestinal TB using clinical and radiomic features.	The clinical radiomic nomogram containing 9 radiomic and 2 clinical features had good performance (AUC 0.96) and superior to the only clinical and only radiomic model.
Arkko 2021²⁶	CD and non CD (n = 369; 50% CD)	MRE	DL, CNN	Determine feasibility of detecting CD using automated quantification of intestinal motility on MRE.	After testing with 4 different ROI approaches and 3 motility indices in 3 independent data sets, using full image ROI and motility index 1 (average of all generated motility maps) had the best performance with AUC 0.78.
Klang 2021²⁷	CD	VCE N = 19 245 images	DL, CNN	Determine if a CNN model trained to detect small bowel CD ulcers can differentiate between for CD vs NSAID-induced ulcers.	The CNN model trained on CD ulcers detected NSAID-induced ulcers with AUC 0.97, which is similar performance for detecting CD ulcers. Thus, CNN model unable to differentiate between NSAID vs CD ulcers.
Jiang 2022²⁸	IBD N = 120	CTE	DL, GIF (gradient image filter) algorithm	Compare accuracy of diagnosing IBD with traditional CTE vs low-dose CTE with an optimized GIF algorithm.	The diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the GIF algorithm group were higher than traditional CTE control group (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05). AI assisted imaging enhancement improves diagnostic accuracy
Wang 2022²⁹	IBD N = 496 (217 CD)	Endoscopic images	DL, CNN	Develop a CNN-model to differentiate between CD vs UC vs healthy controls.	The CNN model yielded higher differential diagnosis accuracy than humans for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).
Brodersen 2023³⁰	IBD N = 132	VCE	DL	Determine agreement between AI-aided vs standard evaluation of pan-enteric capsule endoscopy assessment for detecting CD.	The AI-aided evaluation reduced the number of output images to 470 and median review time was 3.2minutes/patient. For detecting CD, observers reviewing AI-selected images had 92-96% sensitivity and 90-93% specificity for diagnosing CD. The negative predictive value for CD was 95%.
Carter 2023³¹	IBD N = 308	Intestinal US	DL, CNN	Develop and validate an automated DL module to distinguished between increased and normal bowel wall thickness.	The module had 90.1% accuracy, 86.4% sensitivity, and 94% specificity to detecting bowel wall thickening with an AUC 0.98.
Gong 2023³²	CD and iTB N = 108	CTE	Radiomics	Develop and test a clinical multiregional radiomic model to differentiate CD from iTB. Radiomic features were extracted from bowel wall, largest lymph node, and region surrounding the ileocecal region.	A multimodal nomogram including 2 radiomic features, involved bowel segment on CTE, and longitudinal ulcer on endoscopy yielded AUC 0.96, which was better than clinical only model (AUC 0.88, P = .004). Combined nomogram had greater accuracy (89.5%) than 2 radiologists (66.7-75.2%). Decision curve analysis showed combined model had highest net benefit.
Zhou 2023³³	IBD N = 316	CTE	DL, CNN, Radiomics	Investigate if volumetric visceral adipose tissue features using radiomics and 3D CNN can differentiate between CD vs UC.	A radiomics model had a higher AUC than the CNN model (AUC 0.71 vs 0.69, P = .750). A nomogram incorporating the radiomics model and clinical factors differentiated UC from CD with AUC 0.78, which was better than radiomics model only (AUC 0.72) or clinical variables only (AUC 0.74).

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Mahapatra 2013¹⁷	CD N = 26	MRE	Radiomics	Semi-automatic detection segments of bowel affected by CD using a novel method to calculate shape asymmetry combined with feature intensity and texture. Performance was measured against dual tree complex wavelet transform (DTCWT) and shape-asymmetry-based method (Asy).	The investigator’s method had higher sensitivity (90.4%), specificity (90.1%), and accuracy (88.9%) for detecting segments of bowel with CD compared with DTCWT and Asy methods.
Hahnemann 2015¹⁸	IBD N = 50	MRE	DL, CNN	Increased detection of inflammatory intestinal lesions using automatically generated maps of bowel motility with static MRI images compared with static MRI images alone.	Additional inflammatory lesions were found in 13 (26%) of 50 patients with automated motility compared + static MRI images vs static MRI images alone (P = .0002)
Mossotto 2017¹⁹	Pediatric IBD N = 287	Endoscopic and histologic images	Supervised and unsupervised ML methods	Develop ML algorithm to classify disease using endoscopic and histologic data from pediatric IBD patients.	ML models using endoscopy alone, histology alone, and endoscopy + histology combined to differentiate between UC, CD, and IBD unclassified yielded AUC of 0.71, 0.77, and 0.83, respectively.
Naziroglu 2017²⁰	CD N = 53	MRE	DL, Active contouring model	Compare manual vs semiautomatic measurements (active contouring segmentation algorithm) for delineating bowel and measurement of wall thickness in CD. Reproducibility of delineating diseased bowel evaluated by comparing area of overlap on 2 independent segmentations for each approach (reflected by overlap coefficient). Reproducibility of measuring bowel wall thickness evaluated by interobserver agreement.	Semiautomatic delineation of diseased regions of active CD was more reproducible than manual (median overlap 0.89 vs 0.72, p = 1.4e^-5). Semiautomatic measurement (ICC 0.88) of bowel wall thickness had higher interobserver agreement than manual (ICC 0.45, P = .005)
Gollifer 2019²¹	CD N = 105	MRE	DL, CNN	Compare software automated vs subjective quantification of intestinal motility and identify the combination of motility metrics most associated with symptom severity (HBI)	On multivariable model, software quantified temporal motility variation (β = -0.23, P = .005) and area of motile bowel (β = 0.16, P = .01) were associated with HBI. Subjective quantification of motility metrics was not associated with HBI.
Klang 2020²²	CD N = 49	VCE images	DL, CNN	Develop and evaluate a DL algorithm for detecting small bowel ulcers in CD.	The algorithm yielded AUC 0.99 for detecting ulcers.
Klang 2020²³	CD N = 27 892 images	VCE images	DL, CNN	Evaluate the accuracy of DL for detecting strictures from VCE images in CD.	The model differentiated strictures from normal mucosa and small-bowel ulcers with AUC 0.99 and 0.94, respectively.
Li 2021²⁴	IBD N = 165 lesions (UC = 66, CD 99)	Multislice CT	Radiomic	Develop a radiomic nomogram to distinguish between UC vs CD.	Multivariable regression model including only radiomic features yielded AUC 0.81, accuracy 70%, sensitivity 80%, and specificity 54% Combining radiomic features with 3 significant clinical features (presence of inflammatory mesenteric fat, lesion location, and CT-value of arterial-phase enhancement of bowel wall) improved the performance to AUC 0.88.
Zhu 2021²⁵	CD (n = 93) and intestinal TB (n = 67)	CTE	Radiomic	Develop and validate a clinical radiomics nomogram to differentiate CD from intestinal TB using clinical and radiomic features.	The clinical radiomic nomogram containing 9 radiomic and 2 clinical features had good performance (AUC 0.96) and superior to the only clinical and only radiomic model.
Arkko 2021²⁶	CD and non CD (n = 369; 50% CD)	MRE	DL, CNN	Determine feasibility of detecting CD using automated quantification of intestinal motility on MRE.	After testing with 4 different ROI approaches and 3 motility indices in 3 independent data sets, using full image ROI and motility index 1 (average of all generated motility maps) had the best performance with AUC 0.78.
Klang 2021²⁷	CD	VCE N = 19 245 images	DL, CNN	Determine if a CNN model trained to detect small bowel CD ulcers can differentiate between for CD vs NSAID-induced ulcers.	The CNN model trained on CD ulcers detected NSAID-induced ulcers with AUC 0.97, which is similar performance for detecting CD ulcers. Thus, CNN model unable to differentiate between NSAID vs CD ulcers.
Jiang 2022²⁸	IBD N = 120	CTE	DL, GIF (gradient image filter) algorithm	Compare accuracy of diagnosing IBD with traditional CTE vs low-dose CTE with an optimized GIF algorithm.	The diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the GIF algorithm group were higher than traditional CTE control group (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05). AI assisted imaging enhancement improves diagnostic accuracy
Wang 2022²⁹	IBD N = 496 (217 CD)	Endoscopic images	DL, CNN	Develop a CNN-model to differentiate between CD vs UC vs healthy controls.	The CNN model yielded higher differential diagnosis accuracy than humans for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).
Brodersen 2023³⁰	IBD N = 132	VCE	DL	Determine agreement between AI-aided vs standard evaluation of pan-enteric capsule endoscopy assessment for detecting CD.	The AI-aided evaluation reduced the number of output images to 470 and median review time was 3.2minutes/patient. For detecting CD, observers reviewing AI-selected images had 92-96% sensitivity and 90-93% specificity for diagnosing CD. The negative predictive value for CD was 95%.
Carter 2023³¹	IBD N = 308	Intestinal US	DL, CNN	Develop and validate an automated DL module to distinguished between increased and normal bowel wall thickness.	The module had 90.1% accuracy, 86.4% sensitivity, and 94% specificity to detecting bowel wall thickening with an AUC 0.98.
Gong 2023³²	CD and iTB N = 108	CTE	Radiomics	Develop and test a clinical multiregional radiomic model to differentiate CD from iTB. Radiomic features were extracted from bowel wall, largest lymph node, and region surrounding the ileocecal region.	A multimodal nomogram including 2 radiomic features, involved bowel segment on CTE, and longitudinal ulcer on endoscopy yielded AUC 0.96, which was better than clinical only model (AUC 0.88, P = .004). Combined nomogram had greater accuracy (89.5%) than 2 radiologists (66.7-75.2%). Decision curve analysis showed combined model had highest net benefit.
Zhou 2023³³	IBD N = 316	CTE	DL, CNN, Radiomics	Investigate if volumetric visceral adipose tissue features using radiomics and 3D CNN can differentiate between CD vs UC.	A radiomics model had a higher AUC than the CNN model (AUC 0.71 vs 0.69, P = .750). A nomogram incorporating the radiomics model and clinical factors differentiated UC from CD with AUC 0.78, which was better than radiomics model only (AUC 0.72) or clinical variables only (AUC 0.74).

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to number of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.

In regards to endoscopy in the diagnosis of IBD, 2 of the biggest challenges are differentiating between UC and CD, especially if there is no ileal involvement, and the significant time requirement for reviewing video capsule endoscopy (VCEs) to evaluate for small bowel CD. Additionally, VCE interpretation suffers from substantial heterogeneity and suboptimal agreement in both inter- and intra-observer evaluation.³⁴ To address this, Klang et al developed DL algorithms systems for detecting small bowel ulcers and strictures in CD that yielded area under the receiver operator curve (AUC) 0.99 and 0.99, respectively.^22,23 However, in a later study, the same investigators found the CNN-model for detecting CD ulcers could not differentiate CD from NSAID ulcers.²⁷ Nonetheless, AI-aided VCE interpretation has been shown to substantially reduce review with one study observing a median review time of 3.2 minutes per patient.³⁰ While not ready for primetime, AI-assisted VCE evaluation is an exciting and much needed clinical tool in IBD. To date, only 1 study has developed a AI model for differentiating between CD and UC.²⁹ The model developed by Wang et al yielded higher differential diagnosis accuracy than human observers for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).

Regarding the role of AI in the histologic diagnosis of IBD, there was only 1 study. Mossotto et al explored several different machine-learning models using endoscopic and histologic data to classify disease type (UC vs CD) in pediatric patients with IBD.¹⁹ An ML model using endoscopic and histologic data had superior performance compared with a ML model using endoscopic or histologic data alone (AUC 0.83 vs 0.71 and 0.77, respectively). This study also highlights the importance of using endoscopic and histologic data together in the diagnosis of IBD types.

In terms of cross-sectional imaging, 7 studies developed CNN-based algorithms to automate the detection and diagnosis of IBD through enhanced imaging interpretation as well as development of a novel biomarker using quantifying small bowel motility. Using MRE, Naziroglu et al developed an active contouring algorithm to perform volumetric segmentation of the inner and outer layers of the bowel wall to semiautomatically measure bowel wall thickness (BWT).²⁰ The algorithm-generated measurements yield better interobserver agreement than human-generated measurements of BWT (intraclass correlation coefficient [ICC] 0.88 vs 0.45, P = .005). This study highlights the strength of AI to detect inflammatory lesions in IBD more consistently than human observers. Intestinal ultrasound is increasingly being used to detect and monitor IBD globally, and Carter et al developed an AI algorithm to automatically detect IBD on IUS.³¹ Increased bowel wall thickness (BWT) on IUS is one of the main features that reflect active inflammation and, while BWT has high ICC among expert performers, novice performers often struggle with accurately and consistently detecting inflammatory lesions. To address this challenge, the investigators developed a CNN algorithm to automatically detect bowel wall thickening using over 1000 labeled images. The final CNN algorithm accurately detected thickened bowel wall with an AUC 0.98 with 90.1% accuracy, 86.4% sensitivity, and 94.0% specificity. This study highlights how AI can be used to train inexperienced operators and improve the standardizing imaging interpretation. Moreover, this is the only available study on AI applications with IUS in IBD. Because it is a radiation-sparing, point-of-care exam, IUS is a promising medium to explore other AI approaches for IBD.

In addition to detecting inflamed segments of bowel, DL algorithms can be used to improve imaging processing and allow for safer imaging protocols with less radiation exposure. Using a gradient image filter algorithm on low-dose CTE, Jiang et al found the diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the gradient image filter (GIF) algorithm group were higher than the traditional CTE protocol control group for differentiating CD from UC (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05).²⁸

Another opportunity for AI to detect IBD is quantifying intestinal motility, which is often impaired secondary to active inflammation and/or fibrosis. Leveraging cine images captured during MRE, 3 studies developed deep learning algorithms to quantify intestinal motility for detecting IBD. Hanhemann et al found using automatically generated intestinal motility maps with static MRIs had a higher detection rate of inflammatory lesions (66 lesions in 38 subjects) compared with static MRI alone (51 lesions in 34 subjects, P = .0002).¹⁸ In a larger study with 302 subjects, investigators used CNN to develop an automated intestinal motility quantification algorithm that was able to differentiate between CD and non-CD with AUC 0.78.²⁶ Deep learning algorithms to detect and quantify changes in motility also offer greater granularity than the human eye can detect. Gollifer et al demonstrated software automated quantification of intestinal motility parameters, particularly temporal motility variation (β = −0.23, P = .005) and area of motile bowel (β = 0.16, P = .01), were significantly associated with symptom severity defined by the Harvey-Bradshaw Index (HBI).²¹ Conversely, subjective quantification by humans of the same intestinal motility parameters used in the algorithm was not associated with HBI. These studies support intestinal motility as a promising novel objective biomarker to detect IBD and highlight the ability of AI to aid in the discovery of a new biomarker. However, the clinical value of intestinal motility as a biomarker in IBD needs further evaluation with future studies correlating intestinal motility metrics with endoscopic disease severity as well as understanding its role for disease monitoring.

Regarding radiomics, 6 studies developed unique multivariate models and nomograms to better detect and diagnose IBD. In the one of the earliest IBD radiomics studies, a novel method to quantify shape asymmetry was able to detect CD-affected segments of bowel with high sensitivity (90.4%) and specificity (90.1%).¹⁷ Radiomics may also help differentiate CD from intestinal TB (iTB), which is a frequent challenge in endemic countries. Two studies developed and validated multimodal radiomic nomograms incorporating clinical and/or endoscopic data to differentiate CD from iTB. By extracting radiomic features from intestinal lesions, Zhu et al developed a radiomics model with AUC 0.78 for differentiating CD from iTB.²⁵ However, when combined with a clinical model that included demographic, biochemical, and predefined radiographic features (ie, Comb’s sign), the prediction model improved to an AUC 0.90. The final nomogram contained 9 radiomic and 2 clinical features and yielded good performance (AUC 0.96). Likewise, Gong et al developed a multimodal clinical radiomic model using radiomic features extracted from the diseased segment of bowel as well as the largest lymph node and mesentery surrounding the affected segment of bowel. In addition to clinical variables, the investigators also incorporated endoscopic data into the final nomogram and found the clinical radiomic nomogram had greater accuracy for differentiating CD from iTB than interpretation by human radiologists (89.5% vs 75.22%).³² Finally, radiomics may also help differentiate CD from UC. A multimodal model that incorporated radiomic features of the inflamed bowel wall, clinical features (age and gender) and radiology features (bowel wall thickness, arterial-phase enhancement, increased attenuation of mesenteric fat, vasa recta engorgement, lymphadenopathy, and lesions location) differentiated CD from UC with an AUC 0.88.²⁴ Leveraging the differences in inflammatory alterations of the mesenteric fat in CD vs UC, another model combining radiomic features of visceral adipose tissue (VAT) with clinical factors helped differentiate CD from UC with good diagnostic performance (AUC 0.78).³³ Because inflammatory alterations of mesenteric fat are difficult to study noninvasively, imaging studies often use VAT, which has a surrogate marker because mesenteric fat is the largest compartment of VAT.³⁵ While the performance of these radiomic-based models varies from moderate to good, the available studies support radiomic features can provide valuable information not easily appreciate by the human eye. However, the studies consistently demonstrated radiomics alone are not enough to develop prediction models and need to be incorporated with clinical variables to improve model performance.

Disease Characterization/Phenotyping

In the treat-to-target era of IBD, endoscopy, histology, and imaging are critical for the tight monitoring of IBD to prevent disease progression and complications, and studies have demonstrated AI-based applications have tremendous potential in this arena (Table 2). During endoscopy, endoscopic disease scores such as the Mayo endoscopic score (MES), UC Endoscopic Index of Severity, and Simple Endoscopic Score for CD are important for monitoring improvement/progression of disease as well as standardizing communication with other providers. However, studies have found significant intra- and interobserver variability with endoscopy scores.⁶¹ Also, endoscopy scores may not fully capture disease severity. For example, MES only accounts for the colonic segment with the most severe disease in UC, but it does not account for variability in disease severity in other colon segments. This limitation has important implications when assessing therapeutic response. To address this, several studies have now developed deep learning algorithms for automating endoscopy scores, which can improve standardization of scoring, but these studies are primarily for MES in UC.^{41,42,45,47,48} One of the most innovative studies was by Stidham et al, where investigators developed a new Cumulative Disease Score (CDS) for UC using computer vision analyses on endoscopic videos from the UNIFI and JAK-UC clinical trials.⁶⁰ The CDS correlated strongly with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460). Automated scoring systems such as CDS will not only improve work flow efficiency during endoscopy but also better evaluate therapeutic response in UC.

Table 2.

Twenty-five studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the characterization of IBD and phenotyping CD strictures.

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Bhatnagar 2016³⁶	CD N = 7	MRE	Radiomics	To investigate if texture analysis of MRI (MRTA) of CD-afflicted small bowel differs based on presence of histological markers of hypoxia and angiogenesis in CD (VEGF).	Segments of bowel with VEGF present significantly lower mean pixel intensity (P = .004) and mean positive pixels within region of interest (P = .007) than segments of bowel without VEGF.
Makanyanga 2017³⁷	CD N = 16	MRE	Radiomics	Associate MRI textural analysis with MRI and histological CD activity	MRTA features were associated with CD activity: Entropy was correlated with MRI activity (rc 1.00, P = .01) while Kurtosis was negatively associated MRI activity (Rc -0.45, P = .002) Skewness was associated with histologic activity (Rc 4.27, P = .02)
Lamash 2018³⁸	Ped CD N = 23 pediatric	MRE	CNN	To develop and test an algorithm to semiautomatically segment bowel wall and lumen to facilitate development of future algorithms to measure luminal diameter and bowel wall thickness in CD.	The algorithm had good performance for segmenting the bowel wall (Dice coefficient 75%) and lumen (Dice coefficient 81%)
Puylaert 2018³⁹	CD N = 106	MRE	DL, Active contouring model	Develop/validate a predictive MRI score (VIGOR score) using qualitative (mural T2 signal) and semiautomatically extracted MRI features (bowel wall thickness, excess volume, and dynamic contrast enhancement). Compare diagnostic accuracy to existing MRI score (MaRIA, London score, and CDMI) with CDEIS as reference.	VIGOR score achieved comparable correlation with CDEIS (r = 0.58 vs 0.59). VIGOR score had improved interobserver agreement vs other scores (ICC 0.81 vs 0.44-0.59) VIGOR score achieved 80-81% diagnostic accuracy, like other scores.
Maeda 2019⁴⁰	UC N = 187	Endocytoscopy images	DL, CNN	Develop a computer-aided diagnostic (CAD) system for predicting histologic inflammation using endocytoscopy.	The CAD system had a diagnostic sensitivity, specificity, and accuracy of 74%, 97%, and 91%, respectively.
Ozawa 2019⁴¹	UC N = 955	Endoscopic images	DL, CNN	Develop a computer-assisted diagnostic (CAD) system using CNN to identify normal mucosa (Mayo 0) and mucosal healing state (Mayo 0-1).	The CNN-based CAD system yielded a AUC 8.06 and 0.98 for identifying Mayo 0 and Mayo 0-1, respectively. Performance was better in rectum (AUC 0.92) than right and left colon (AUC 0.83 and 0.83, respectively).
Stidham 2019⁴²	UC N = 2778	Endoscopic images and videos	DL, CNN	Develop deep learning models for grading UC endoscopic severity and compare its performance to experienced human reviewers.	For distinguishing between moderate-severe disease and endoscopic remission, CNN model yielded AUC 0.97. The CNN model had good agreement with human reviewers (κ = 0.84), which was similar to agreement between human reviewers (κ = 0.86).
Tabari 2019⁴³	CD N = 25	MRE	Radiomics	Determine if MRTA can determine CD stricture histologic type (degree of mucosal inflammation vs mural fibrosis) using surgical resection specimens.	Multivariable prediction model including mean, skewness, and entropy predicted stricture fibrosis with goodness-of-fit value 0.995. Combination of threshold values for 3 features correctly classified 100% of strictures as inflammatory or fibrotic.
Stidham 2020⁴⁴	CD N = 138	CTE	DL, Active contouring model	Compare the agreement of small bowel damage measurements (max bowel wall thickness, max bowel dilation, min lumen diameter, and presence of stricture) on CTE between semiautomated image analysis techniques vs radiologists	Semi-automated measurements correlated with radiologists for max bowel wall thickness (r = 0.70, P > .0001), max bowel dilation (r = 0.75, P < .0001), min lumen diameter (r = 0.38, P < .0001). Multivariate model using semiautomatic measurements to detect radiologist-defined intestinal strictures had an accuracy of 88% with AUC 0.86.
Takenaka 2020⁴⁵	UC N = 2012	Endoscopic images	DL, CNN	Develop a deep neural network to analyze endoscopic images of UC patients and predict histologic disease activity.	The algorithm identified endoscopic remission with 90.1% accuracy with good agreement with human reviewers (κ = 0.80). The algorithm identified subjects in histologic remission with 92.9% accuracy.
Barash 2021⁴⁶	CD N = 49	VCE images	DL, CNN	Develop a DL algorithm for automated grading for CD ulcers on VCE (Grades 1-3 mild to severe)	The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2.
Gottlieb 2021⁴⁷	UC N = 249	Endoscopic videos	DL, CNN	Develop an automated neural network model to predict UC endoscopic severity determined by Mayo endoscopic score (MES) and UC Endoscopic Index of Severity (UCEIS).	The model had excellent agreement with human reviewers for predicting MES (quadratic weight κ = 0.84) and UCEIS (quadratic weight κ = 0.86).
Yao 2021⁴⁸	UC N = 315 videos	Endoscopic Videos	DL, CNN	Pilot a fully automated video analysis system for grading endoscopic disease severity in UC.	CNN model correctly predicted MES 78% of videos. CNN model correctly distinguished MES 0-1 from MES 2-3 in 83.7% of videos.
Li 2021⁴⁹	CD n = 167	CTE	Radiomic	Develop and validate a CTE-based radiomic model to characterize intestinal fibrosis in CD as none/mild or moderate/severe degree of fibrosis based on pathology.	In the training data set, the model performance yielded AUC 0.89 for distinguishing between moderate-severe from no fibrosis with similar performance with validation data from 3 centers (AUC 0.75-0.82), Radiomic model performed better than visual interpretation by 2 radiologists (AUC_radiologist 0.55-0.60). Decision curve analysis showed radiomic model always had higher net benefit than radiologists’ visual interpretation.
Ding 2022⁵⁰	CD N = 121	MRE	Radiomics	Compare accuracy to detecting severity of CD activity using radiomics vs MaRIA score evaluated by radiologists.	Using terminal ileal (TI) CDEIS as ground truth, the radiomics model (containing 6 features) and MaRIA performed similarly for detecting TI CDEIS > 7 (AUC 0.87 vs 0.88, P = .85). MaRIA score between radiologists had fair agreement (ICC 0.58) while radiomic features had high reproducibility (ICC 0.93-0.96).
Guez 2022⁵¹	Pediatric CD N = 121	MRE	Machine learning	Develop a multi-modal machine learning fusion model with radiological and biochemical biomarkers to predict ileal endoscopic activity based on SES-CD. Compare performance to MaRIA and biochemical biomarker only prediction model.	A multimodal model containing disease length on MRE, CRP, and fecal calprotectin performed better than MaRIA for classifying SES-CD ≥3 (AUC 0.84 vs 0.80, P < 1e^-9). The multimodal model performed better than the biochemical biomarker only model (AUC 0.84 vs 0.67, P < 1e^-5).
Li 2022⁵²	CD N = 100	CT	Radiomic	Determine value of a CT-based radiomics model to identify active vs inactive CD.	The radiomics model achieved AUC 0.94 with high accuracy (90.3%), sensitivity (91.1%), and specificity (89.2%).
Meng 2022⁵³	CD N = 235	CTE	DL, CNN, radiomics	Develop a CTE-based DL and radiomics model to assess severity of bowel fibrosis and compare accuracy to radiologists. Severity of histologic fibrosis was assessed semi-quantitatively as none/mild vs moderate/severe.	DL and radiomics model (AUC 0.81) performed similarly (AUC 0.81 vs 0.83, P = .97) with shorter processing time (48.4 vs 599.8 seconds, P < .001). DL model performed better than visual interpretation by 2 radiologists (AUC 0.58-0.64, P < .005). Decision curve analysis showed DL and radiomics model possessed better net benefit than radiologists’ interpretation.
Noguchi 2022⁵⁴	UC N = 12	Histologic images	DL, CNN	Develop a CNN model to predict p53 immunohistochemical staining from histologic images to diagnose UC-associated dysplasia/cancer.	The CNN model had an average accuracy of 86-91% for predicting p53 positivity.
Yuan 2022⁵⁵	CD N = 48	CTE	DL, automated body composition segmentation	Evaluate the diagnostic performance of visceral adiposity to predict degree of inflammation vs fibrosis of CD strictures on surgical resection histopathology	Prediction model containing visceral:subcutaneous fat area ratio and lumen narrowing:prestenotic dilation ratio classified severe fibrosis with AUC 0.80 with 61.5% sensitivity, 91.4% specificity, and 83.3% accuracy.
Najdawi 2023⁵⁶	UC N = 637	Histologic images	DL, CNN	Develop a CNN model to predict Nancy histologic index score and histologic remission.	The CNN model accurately predicted Nancy histologic index score (weight k = 0.91) and predicted histologic remission with accuracy of 0.97.
Ruiqing 2023⁵⁷	CD N = 167	CTE	Radiomics	Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.	The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83 The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85. A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.
Rymarczyk 2023⁵⁸	IBD N = 1189 (302 CD)	Histologic images	DL, CNN	Develop deep learning models for automating histologic assessment in IBD.	The CNN model was able to detect histologic disease in the colon and ileum with accuracies ranging from 87-94% and 76-83%, respectively.
Xie 2023⁵⁹	CD N = 628	Endoscopic images	DL, CNN	Develop a CNN model to detect and grade severity of small bowel CD ulcers on double balloon endoscopy images	The CNN model detected ulcers with 96% accuracy. The model had 87%, 88%, and 85%, accuracy for grading ulcerated surface, ulcer size, and ulcer depth.
Stidham 2024⁶⁰	UC N = 1096	Endoscopic videos	DL	Develop computer vision methods to better quantify mucosal injury in UC and compare to MES	An automated cumulative disease score correlated with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460)

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Bhatnagar 2016³⁶	CD N = 7	MRE	Radiomics	To investigate if texture analysis of MRI (MRTA) of CD-afflicted small bowel differs based on presence of histological markers of hypoxia and angiogenesis in CD (VEGF).	Segments of bowel with VEGF present significantly lower mean pixel intensity (P = .004) and mean positive pixels within region of interest (P = .007) than segments of bowel without VEGF.
Makanyanga 2017³⁷	CD N = 16	MRE	Radiomics	Associate MRI textural analysis with MRI and histological CD activity	MRTA features were associated with CD activity: Entropy was correlated with MRI activity (rc 1.00, P = .01) while Kurtosis was negatively associated MRI activity (Rc -0.45, P = .002) Skewness was associated with histologic activity (Rc 4.27, P = .02)
Lamash 2018³⁸	Ped CD N = 23 pediatric	MRE	CNN	To develop and test an algorithm to semiautomatically segment bowel wall and lumen to facilitate development of future algorithms to measure luminal diameter and bowel wall thickness in CD.	The algorithm had good performance for segmenting the bowel wall (Dice coefficient 75%) and lumen (Dice coefficient 81%)
Puylaert 2018³⁹	CD N = 106	MRE	DL, Active contouring model	Develop/validate a predictive MRI score (VIGOR score) using qualitative (mural T2 signal) and semiautomatically extracted MRI features (bowel wall thickness, excess volume, and dynamic contrast enhancement). Compare diagnostic accuracy to existing MRI score (MaRIA, London score, and CDMI) with CDEIS as reference.	VIGOR score achieved comparable correlation with CDEIS (r = 0.58 vs 0.59). VIGOR score had improved interobserver agreement vs other scores (ICC 0.81 vs 0.44-0.59) VIGOR score achieved 80-81% diagnostic accuracy, like other scores.
Maeda 2019⁴⁰	UC N = 187	Endocytoscopy images	DL, CNN	Develop a computer-aided diagnostic (CAD) system for predicting histologic inflammation using endocytoscopy.	The CAD system had a diagnostic sensitivity, specificity, and accuracy of 74%, 97%, and 91%, respectively.
Ozawa 2019⁴¹	UC N = 955	Endoscopic images	DL, CNN	Develop a computer-assisted diagnostic (CAD) system using CNN to identify normal mucosa (Mayo 0) and mucosal healing state (Mayo 0-1).	The CNN-based CAD system yielded a AUC 8.06 and 0.98 for identifying Mayo 0 and Mayo 0-1, respectively. Performance was better in rectum (AUC 0.92) than right and left colon (AUC 0.83 and 0.83, respectively).
Stidham 2019⁴²	UC N = 2778	Endoscopic images and videos	DL, CNN	Develop deep learning models for grading UC endoscopic severity and compare its performance to experienced human reviewers.	For distinguishing between moderate-severe disease and endoscopic remission, CNN model yielded AUC 0.97. The CNN model had good agreement with human reviewers (κ = 0.84), which was similar to agreement between human reviewers (κ = 0.86).
Tabari 2019⁴³	CD N = 25	MRE	Radiomics	Determine if MRTA can determine CD stricture histologic type (degree of mucosal inflammation vs mural fibrosis) using surgical resection specimens.	Multivariable prediction model including mean, skewness, and entropy predicted stricture fibrosis with goodness-of-fit value 0.995. Combination of threshold values for 3 features correctly classified 100% of strictures as inflammatory or fibrotic.
Stidham 2020⁴⁴	CD N = 138	CTE	DL, Active contouring model	Compare the agreement of small bowel damage measurements (max bowel wall thickness, max bowel dilation, min lumen diameter, and presence of stricture) on CTE between semiautomated image analysis techniques vs radiologists	Semi-automated measurements correlated with radiologists for max bowel wall thickness (r = 0.70, P > .0001), max bowel dilation (r = 0.75, P < .0001), min lumen diameter (r = 0.38, P < .0001). Multivariate model using semiautomatic measurements to detect radiologist-defined intestinal strictures had an accuracy of 88% with AUC 0.86.
Takenaka 2020⁴⁵	UC N = 2012	Endoscopic images	DL, CNN	Develop a deep neural network to analyze endoscopic images of UC patients and predict histologic disease activity.	The algorithm identified endoscopic remission with 90.1% accuracy with good agreement with human reviewers (κ = 0.80). The algorithm identified subjects in histologic remission with 92.9% accuracy.
Barash 2021⁴⁶	CD N = 49	VCE images	DL, CNN	Develop a DL algorithm for automated grading for CD ulcers on VCE (Grades 1-3 mild to severe)	The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2.
Gottlieb 2021⁴⁷	UC N = 249	Endoscopic videos	DL, CNN	Develop an automated neural network model to predict UC endoscopic severity determined by Mayo endoscopic score (MES) and UC Endoscopic Index of Severity (UCEIS).	The model had excellent agreement with human reviewers for predicting MES (quadratic weight κ = 0.84) and UCEIS (quadratic weight κ = 0.86).
Yao 2021⁴⁸	UC N = 315 videos	Endoscopic Videos	DL, CNN	Pilot a fully automated video analysis system for grading endoscopic disease severity in UC.	CNN model correctly predicted MES 78% of videos. CNN model correctly distinguished MES 0-1 from MES 2-3 in 83.7% of videos.
Li 2021⁴⁹	CD n = 167	CTE	Radiomic	Develop and validate a CTE-based radiomic model to characterize intestinal fibrosis in CD as none/mild or moderate/severe degree of fibrosis based on pathology.	In the training data set, the model performance yielded AUC 0.89 for distinguishing between moderate-severe from no fibrosis with similar performance with validation data from 3 centers (AUC 0.75-0.82), Radiomic model performed better than visual interpretation by 2 radiologists (AUC_radiologist 0.55-0.60). Decision curve analysis showed radiomic model always had higher net benefit than radiologists’ visual interpretation.
Ding 2022⁵⁰	CD N = 121	MRE	Radiomics	Compare accuracy to detecting severity of CD activity using radiomics vs MaRIA score evaluated by radiologists.	Using terminal ileal (TI) CDEIS as ground truth, the radiomics model (containing 6 features) and MaRIA performed similarly for detecting TI CDEIS > 7 (AUC 0.87 vs 0.88, P = .85). MaRIA score between radiologists had fair agreement (ICC 0.58) while radiomic features had high reproducibility (ICC 0.93-0.96).
Guez 2022⁵¹	Pediatric CD N = 121	MRE	Machine learning	Develop a multi-modal machine learning fusion model with radiological and biochemical biomarkers to predict ileal endoscopic activity based on SES-CD. Compare performance to MaRIA and biochemical biomarker only prediction model.	A multimodal model containing disease length on MRE, CRP, and fecal calprotectin performed better than MaRIA for classifying SES-CD ≥3 (AUC 0.84 vs 0.80, P < 1e^-9). The multimodal model performed better than the biochemical biomarker only model (AUC 0.84 vs 0.67, P < 1e^-5).
Li 2022⁵²	CD N = 100	CT	Radiomic	Determine value of a CT-based radiomics model to identify active vs inactive CD.	The radiomics model achieved AUC 0.94 with high accuracy (90.3%), sensitivity (91.1%), and specificity (89.2%).
Meng 2022⁵³	CD N = 235	CTE	DL, CNN, radiomics	Develop a CTE-based DL and radiomics model to assess severity of bowel fibrosis and compare accuracy to radiologists. Severity of histologic fibrosis was assessed semi-quantitatively as none/mild vs moderate/severe.	DL and radiomics model (AUC 0.81) performed similarly (AUC 0.81 vs 0.83, P = .97) with shorter processing time (48.4 vs 599.8 seconds, P < .001). DL model performed better than visual interpretation by 2 radiologists (AUC 0.58-0.64, P < .005). Decision curve analysis showed DL and radiomics model possessed better net benefit than radiologists’ interpretation.
Noguchi 2022⁵⁴	UC N = 12	Histologic images	DL, CNN	Develop a CNN model to predict p53 immunohistochemical staining from histologic images to diagnose UC-associated dysplasia/cancer.	The CNN model had an average accuracy of 86-91% for predicting p53 positivity.
Yuan 2022⁵⁵	CD N = 48	CTE	DL, automated body composition segmentation	Evaluate the diagnostic performance of visceral adiposity to predict degree of inflammation vs fibrosis of CD strictures on surgical resection histopathology	Prediction model containing visceral:subcutaneous fat area ratio and lumen narrowing:prestenotic dilation ratio classified severe fibrosis with AUC 0.80 with 61.5% sensitivity, 91.4% specificity, and 83.3% accuracy.
Najdawi 2023⁵⁶	UC N = 637	Histologic images	DL, CNN	Develop a CNN model to predict Nancy histologic index score and histologic remission.	The CNN model accurately predicted Nancy histologic index score (weight k = 0.91) and predicted histologic remission with accuracy of 0.97.
Ruiqing 2023⁵⁷	CD N = 167	CTE	Radiomics	Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.	The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83 The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85. A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.
Rymarczyk 2023⁵⁸	IBD N = 1189 (302 CD)	Histologic images	DL, CNN	Develop deep learning models for automating histologic assessment in IBD.	The CNN model was able to detect histologic disease in the colon and ileum with accuracies ranging from 87-94% and 76-83%, respectively.
Xie 2023⁵⁹	CD N = 628	Endoscopic images	DL, CNN	Develop a CNN model to detect and grade severity of small bowel CD ulcers on double balloon endoscopy images	The CNN model detected ulcers with 96% accuracy. The model had 87%, 88%, and 85%, accuracy for grading ulcerated surface, ulcer size, and ulcer depth.
Stidham 2024⁶⁰	UC N = 1096	Endoscopic videos	DL	Develop computer vision methods to better quantify mucosal injury in UC and compare to MES	An automated cumulative disease score correlated with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460)

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; AUC, area under the curve; VEGF, vascular endothelial growth fact; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; MES, Mayo Endoscopic Score.

Table 2.

Twenty-five studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the characterization of IBD and phenotyping CD strictures.

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Bhatnagar 2016³⁶	CD N = 7	MRE	Radiomics	To investigate if texture analysis of MRI (MRTA) of CD-afflicted small bowel differs based on presence of histological markers of hypoxia and angiogenesis in CD (VEGF).	Segments of bowel with VEGF present significantly lower mean pixel intensity (P = .004) and mean positive pixels within region of interest (P = .007) than segments of bowel without VEGF.
Makanyanga 2017³⁷	CD N = 16	MRE	Radiomics	Associate MRI textural analysis with MRI and histological CD activity	MRTA features were associated with CD activity: Entropy was correlated with MRI activity (rc 1.00, P = .01) while Kurtosis was negatively associated MRI activity (Rc -0.45, P = .002) Skewness was associated with histologic activity (Rc 4.27, P = .02)
Lamash 2018³⁸	Ped CD N = 23 pediatric	MRE	CNN	To develop and test an algorithm to semiautomatically segment bowel wall and lumen to facilitate development of future algorithms to measure luminal diameter and bowel wall thickness in CD.	The algorithm had good performance for segmenting the bowel wall (Dice coefficient 75%) and lumen (Dice coefficient 81%)
Puylaert 2018³⁹	CD N = 106	MRE	DL, Active contouring model	Develop/validate a predictive MRI score (VIGOR score) using qualitative (mural T2 signal) and semiautomatically extracted MRI features (bowel wall thickness, excess volume, and dynamic contrast enhancement). Compare diagnostic accuracy to existing MRI score (MaRIA, London score, and CDMI) with CDEIS as reference.	VIGOR score achieved comparable correlation with CDEIS (r = 0.58 vs 0.59). VIGOR score had improved interobserver agreement vs other scores (ICC 0.81 vs 0.44-0.59) VIGOR score achieved 80-81% diagnostic accuracy, like other scores.
Maeda 2019⁴⁰	UC N = 187	Endocytoscopy images	DL, CNN	Develop a computer-aided diagnostic (CAD) system for predicting histologic inflammation using endocytoscopy.	The CAD system had a diagnostic sensitivity, specificity, and accuracy of 74%, 97%, and 91%, respectively.
Ozawa 2019⁴¹	UC N = 955	Endoscopic images	DL, CNN	Develop a computer-assisted diagnostic (CAD) system using CNN to identify normal mucosa (Mayo 0) and mucosal healing state (Mayo 0-1).	The CNN-based CAD system yielded a AUC 8.06 and 0.98 for identifying Mayo 0 and Mayo 0-1, respectively. Performance was better in rectum (AUC 0.92) than right and left colon (AUC 0.83 and 0.83, respectively).
Stidham 2019⁴²	UC N = 2778	Endoscopic images and videos	DL, CNN	Develop deep learning models for grading UC endoscopic severity and compare its performance to experienced human reviewers.	For distinguishing between moderate-severe disease and endoscopic remission, CNN model yielded AUC 0.97. The CNN model had good agreement with human reviewers (κ = 0.84), which was similar to agreement between human reviewers (κ = 0.86).
Tabari 2019⁴³	CD N = 25	MRE	Radiomics	Determine if MRTA can determine CD stricture histologic type (degree of mucosal inflammation vs mural fibrosis) using surgical resection specimens.	Multivariable prediction model including mean, skewness, and entropy predicted stricture fibrosis with goodness-of-fit value 0.995. Combination of threshold values for 3 features correctly classified 100% of strictures as inflammatory or fibrotic.
Stidham 2020⁴⁴	CD N = 138	CTE	DL, Active contouring model	Compare the agreement of small bowel damage measurements (max bowel wall thickness, max bowel dilation, min lumen diameter, and presence of stricture) on CTE between semiautomated image analysis techniques vs radiologists	Semi-automated measurements correlated with radiologists for max bowel wall thickness (r = 0.70, P > .0001), max bowel dilation (r = 0.75, P < .0001), min lumen diameter (r = 0.38, P < .0001). Multivariate model using semiautomatic measurements to detect radiologist-defined intestinal strictures had an accuracy of 88% with AUC 0.86.
Takenaka 2020⁴⁵	UC N = 2012	Endoscopic images	DL, CNN	Develop a deep neural network to analyze endoscopic images of UC patients and predict histologic disease activity.	The algorithm identified endoscopic remission with 90.1% accuracy with good agreement with human reviewers (κ = 0.80). The algorithm identified subjects in histologic remission with 92.9% accuracy.
Barash 2021⁴⁶	CD N = 49	VCE images	DL, CNN	Develop a DL algorithm for automated grading for CD ulcers on VCE (Grades 1-3 mild to severe)	The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2.
Gottlieb 2021⁴⁷	UC N = 249	Endoscopic videos	DL, CNN	Develop an automated neural network model to predict UC endoscopic severity determined by Mayo endoscopic score (MES) and UC Endoscopic Index of Severity (UCEIS).	The model had excellent agreement with human reviewers for predicting MES (quadratic weight κ = 0.84) and UCEIS (quadratic weight κ = 0.86).
Yao 2021⁴⁸	UC N = 315 videos	Endoscopic Videos	DL, CNN	Pilot a fully automated video analysis system for grading endoscopic disease severity in UC.	CNN model correctly predicted MES 78% of videos. CNN model correctly distinguished MES 0-1 from MES 2-3 in 83.7% of videos.
Li 2021⁴⁹	CD n = 167	CTE	Radiomic	Develop and validate a CTE-based radiomic model to characterize intestinal fibrosis in CD as none/mild or moderate/severe degree of fibrosis based on pathology.	In the training data set, the model performance yielded AUC 0.89 for distinguishing between moderate-severe from no fibrosis with similar performance with validation data from 3 centers (AUC 0.75-0.82), Radiomic model performed better than visual interpretation by 2 radiologists (AUC_radiologist 0.55-0.60). Decision curve analysis showed radiomic model always had higher net benefit than radiologists’ visual interpretation.
Ding 2022⁵⁰	CD N = 121	MRE	Radiomics	Compare accuracy to detecting severity of CD activity using radiomics vs MaRIA score evaluated by radiologists.	Using terminal ileal (TI) CDEIS as ground truth, the radiomics model (containing 6 features) and MaRIA performed similarly for detecting TI CDEIS > 7 (AUC 0.87 vs 0.88, P = .85). MaRIA score between radiologists had fair agreement (ICC 0.58) while radiomic features had high reproducibility (ICC 0.93-0.96).
Guez 2022⁵¹	Pediatric CD N = 121	MRE	Machine learning	Develop a multi-modal machine learning fusion model with radiological and biochemical biomarkers to predict ileal endoscopic activity based on SES-CD. Compare performance to MaRIA and biochemical biomarker only prediction model.	A multimodal model containing disease length on MRE, CRP, and fecal calprotectin performed better than MaRIA for classifying SES-CD ≥3 (AUC 0.84 vs 0.80, P < 1e^-9). The multimodal model performed better than the biochemical biomarker only model (AUC 0.84 vs 0.67, P < 1e^-5).
Li 2022⁵²	CD N = 100	CT	Radiomic	Determine value of a CT-based radiomics model to identify active vs inactive CD.	The radiomics model achieved AUC 0.94 with high accuracy (90.3%), sensitivity (91.1%), and specificity (89.2%).
Meng 2022⁵³	CD N = 235	CTE	DL, CNN, radiomics	Develop a CTE-based DL and radiomics model to assess severity of bowel fibrosis and compare accuracy to radiologists. Severity of histologic fibrosis was assessed semi-quantitatively as none/mild vs moderate/severe.	DL and radiomics model (AUC 0.81) performed similarly (AUC 0.81 vs 0.83, P = .97) with shorter processing time (48.4 vs 599.8 seconds, P < .001). DL model performed better than visual interpretation by 2 radiologists (AUC 0.58-0.64, P < .005). Decision curve analysis showed DL and radiomics model possessed better net benefit than radiologists’ interpretation.
Noguchi 2022⁵⁴	UC N = 12	Histologic images	DL, CNN	Develop a CNN model to predict p53 immunohistochemical staining from histologic images to diagnose UC-associated dysplasia/cancer.	The CNN model had an average accuracy of 86-91% for predicting p53 positivity.
Yuan 2022⁵⁵	CD N = 48	CTE	DL, automated body composition segmentation	Evaluate the diagnostic performance of visceral adiposity to predict degree of inflammation vs fibrosis of CD strictures on surgical resection histopathology	Prediction model containing visceral:subcutaneous fat area ratio and lumen narrowing:prestenotic dilation ratio classified severe fibrosis with AUC 0.80 with 61.5% sensitivity, 91.4% specificity, and 83.3% accuracy.
Najdawi 2023⁵⁶	UC N = 637	Histologic images	DL, CNN	Develop a CNN model to predict Nancy histologic index score and histologic remission.	The CNN model accurately predicted Nancy histologic index score (weight k = 0.91) and predicted histologic remission with accuracy of 0.97.
Ruiqing 2023⁵⁷	CD N = 167	CTE	Radiomics	Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.	The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83 The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85. A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.
Rymarczyk 2023⁵⁸	IBD N = 1189 (302 CD)	Histologic images	DL, CNN	Develop deep learning models for automating histologic assessment in IBD.	The CNN model was able to detect histologic disease in the colon and ileum with accuracies ranging from 87-94% and 76-83%, respectively.
Xie 2023⁵⁹	CD N = 628	Endoscopic images	DL, CNN	Develop a CNN model to detect and grade severity of small bowel CD ulcers on double balloon endoscopy images	The CNN model detected ulcers with 96% accuracy. The model had 87%, 88%, and 85%, accuracy for grading ulcerated surface, ulcer size, and ulcer depth.
Stidham 2024⁶⁰	UC N = 1096	Endoscopic videos	DL	Develop computer vision methods to better quantify mucosal injury in UC and compare to MES	An automated cumulative disease score correlated with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460)

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Bhatnagar 2016³⁶	CD N = 7	MRE	Radiomics	To investigate if texture analysis of MRI (MRTA) of CD-afflicted small bowel differs based on presence of histological markers of hypoxia and angiogenesis in CD (VEGF).	Segments of bowel with VEGF present significantly lower mean pixel intensity (P = .004) and mean positive pixels within region of interest (P = .007) than segments of bowel without VEGF.
Makanyanga 2017³⁷	CD N = 16	MRE	Radiomics	Associate MRI textural analysis with MRI and histological CD activity	MRTA features were associated with CD activity: Entropy was correlated with MRI activity (rc 1.00, P = .01) while Kurtosis was negatively associated MRI activity (Rc -0.45, P = .002) Skewness was associated with histologic activity (Rc 4.27, P = .02)
Lamash 2018³⁸	Ped CD N = 23 pediatric	MRE	CNN	To develop and test an algorithm to semiautomatically segment bowel wall and lumen to facilitate development of future algorithms to measure luminal diameter and bowel wall thickness in CD.	The algorithm had good performance for segmenting the bowel wall (Dice coefficient 75%) and lumen (Dice coefficient 81%)
Puylaert 2018³⁹	CD N = 106	MRE	DL, Active contouring model	Develop/validate a predictive MRI score (VIGOR score) using qualitative (mural T2 signal) and semiautomatically extracted MRI features (bowel wall thickness, excess volume, and dynamic contrast enhancement). Compare diagnostic accuracy to existing MRI score (MaRIA, London score, and CDMI) with CDEIS as reference.	VIGOR score achieved comparable correlation with CDEIS (r = 0.58 vs 0.59). VIGOR score had improved interobserver agreement vs other scores (ICC 0.81 vs 0.44-0.59) VIGOR score achieved 80-81% diagnostic accuracy, like other scores.
Maeda 2019⁴⁰	UC N = 187	Endocytoscopy images	DL, CNN	Develop a computer-aided diagnostic (CAD) system for predicting histologic inflammation using endocytoscopy.	The CAD system had a diagnostic sensitivity, specificity, and accuracy of 74%, 97%, and 91%, respectively.
Ozawa 2019⁴¹	UC N = 955	Endoscopic images	DL, CNN	Develop a computer-assisted diagnostic (CAD) system using CNN to identify normal mucosa (Mayo 0) and mucosal healing state (Mayo 0-1).	The CNN-based CAD system yielded a AUC 8.06 and 0.98 for identifying Mayo 0 and Mayo 0-1, respectively. Performance was better in rectum (AUC 0.92) than right and left colon (AUC 0.83 and 0.83, respectively).
Stidham 2019⁴²	UC N = 2778	Endoscopic images and videos	DL, CNN	Develop deep learning models for grading UC endoscopic severity and compare its performance to experienced human reviewers.	For distinguishing between moderate-severe disease and endoscopic remission, CNN model yielded AUC 0.97. The CNN model had good agreement with human reviewers (κ = 0.84), which was similar to agreement between human reviewers (κ = 0.86).
Tabari 2019⁴³	CD N = 25	MRE	Radiomics	Determine if MRTA can determine CD stricture histologic type (degree of mucosal inflammation vs mural fibrosis) using surgical resection specimens.	Multivariable prediction model including mean, skewness, and entropy predicted stricture fibrosis with goodness-of-fit value 0.995. Combination of threshold values for 3 features correctly classified 100% of strictures as inflammatory or fibrotic.
Stidham 2020⁴⁴	CD N = 138	CTE	DL, Active contouring model	Compare the agreement of small bowel damage measurements (max bowel wall thickness, max bowel dilation, min lumen diameter, and presence of stricture) on CTE between semiautomated image analysis techniques vs radiologists	Semi-automated measurements correlated with radiologists for max bowel wall thickness (r = 0.70, P > .0001), max bowel dilation (r = 0.75, P < .0001), min lumen diameter (r = 0.38, P < .0001). Multivariate model using semiautomatic measurements to detect radiologist-defined intestinal strictures had an accuracy of 88% with AUC 0.86.
Takenaka 2020⁴⁵	UC N = 2012	Endoscopic images	DL, CNN	Develop a deep neural network to analyze endoscopic images of UC patients and predict histologic disease activity.	The algorithm identified endoscopic remission with 90.1% accuracy with good agreement with human reviewers (κ = 0.80). The algorithm identified subjects in histologic remission with 92.9% accuracy.
Barash 2021⁴⁶	CD N = 49	VCE images	DL, CNN	Develop a DL algorithm for automated grading for CD ulcers on VCE (Grades 1-3 mild to severe)	The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2.
Gottlieb 2021⁴⁷	UC N = 249	Endoscopic videos	DL, CNN	Develop an automated neural network model to predict UC endoscopic severity determined by Mayo endoscopic score (MES) and UC Endoscopic Index of Severity (UCEIS).	The model had excellent agreement with human reviewers for predicting MES (quadratic weight κ = 0.84) and UCEIS (quadratic weight κ = 0.86).
Yao 2021⁴⁸	UC N = 315 videos	Endoscopic Videos	DL, CNN	Pilot a fully automated video analysis system for grading endoscopic disease severity in UC.	CNN model correctly predicted MES 78% of videos. CNN model correctly distinguished MES 0-1 from MES 2-3 in 83.7% of videos.
Li 2021⁴⁹	CD n = 167	CTE	Radiomic	Develop and validate a CTE-based radiomic model to characterize intestinal fibrosis in CD as none/mild or moderate/severe degree of fibrosis based on pathology.	In the training data set, the model performance yielded AUC 0.89 for distinguishing between moderate-severe from no fibrosis with similar performance with validation data from 3 centers (AUC 0.75-0.82), Radiomic model performed better than visual interpretation by 2 radiologists (AUC_radiologist 0.55-0.60). Decision curve analysis showed radiomic model always had higher net benefit than radiologists’ visual interpretation.
Ding 2022⁵⁰	CD N = 121	MRE	Radiomics	Compare accuracy to detecting severity of CD activity using radiomics vs MaRIA score evaluated by radiologists.	Using terminal ileal (TI) CDEIS as ground truth, the radiomics model (containing 6 features) and MaRIA performed similarly for detecting TI CDEIS > 7 (AUC 0.87 vs 0.88, P = .85). MaRIA score between radiologists had fair agreement (ICC 0.58) while radiomic features had high reproducibility (ICC 0.93-0.96).
Guez 2022⁵¹	Pediatric CD N = 121	MRE	Machine learning	Develop a multi-modal machine learning fusion model with radiological and biochemical biomarkers to predict ileal endoscopic activity based on SES-CD. Compare performance to MaRIA and biochemical biomarker only prediction model.	A multimodal model containing disease length on MRE, CRP, and fecal calprotectin performed better than MaRIA for classifying SES-CD ≥3 (AUC 0.84 vs 0.80, P < 1e^-9). The multimodal model performed better than the biochemical biomarker only model (AUC 0.84 vs 0.67, P < 1e^-5).
Li 2022⁵²	CD N = 100	CT	Radiomic	Determine value of a CT-based radiomics model to identify active vs inactive CD.	The radiomics model achieved AUC 0.94 with high accuracy (90.3%), sensitivity (91.1%), and specificity (89.2%).
Meng 2022⁵³	CD N = 235	CTE	DL, CNN, radiomics	Develop a CTE-based DL and radiomics model to assess severity of bowel fibrosis and compare accuracy to radiologists. Severity of histologic fibrosis was assessed semi-quantitatively as none/mild vs moderate/severe.	DL and radiomics model (AUC 0.81) performed similarly (AUC 0.81 vs 0.83, P = .97) with shorter processing time (48.4 vs 599.8 seconds, P < .001). DL model performed better than visual interpretation by 2 radiologists (AUC 0.58-0.64, P < .005). Decision curve analysis showed DL and radiomics model possessed better net benefit than radiologists’ interpretation.
Noguchi 2022⁵⁴	UC N = 12	Histologic images	DL, CNN	Develop a CNN model to predict p53 immunohistochemical staining from histologic images to diagnose UC-associated dysplasia/cancer.	The CNN model had an average accuracy of 86-91% for predicting p53 positivity.
Yuan 2022⁵⁵	CD N = 48	CTE	DL, automated body composition segmentation	Evaluate the diagnostic performance of visceral adiposity to predict degree of inflammation vs fibrosis of CD strictures on surgical resection histopathology	Prediction model containing visceral:subcutaneous fat area ratio and lumen narrowing:prestenotic dilation ratio classified severe fibrosis with AUC 0.80 with 61.5% sensitivity, 91.4% specificity, and 83.3% accuracy.
Najdawi 2023⁵⁶	UC N = 637	Histologic images	DL, CNN	Develop a CNN model to predict Nancy histologic index score and histologic remission.	The CNN model accurately predicted Nancy histologic index score (weight k = 0.91) and predicted histologic remission with accuracy of 0.97.
Ruiqing 2023⁵⁷	CD N = 167	CTE	Radiomics	Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.	The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83 The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85. A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.
Rymarczyk 2023⁵⁸	IBD N = 1189 (302 CD)	Histologic images	DL, CNN	Develop deep learning models for automating histologic assessment in IBD.	The CNN model was able to detect histologic disease in the colon and ileum with accuracies ranging from 87-94% and 76-83%, respectively.
Xie 2023⁵⁹	CD N = 628	Endoscopic images	DL, CNN	Develop a CNN model to detect and grade severity of small bowel CD ulcers on double balloon endoscopy images	The CNN model detected ulcers with 96% accuracy. The model had 87%, 88%, and 85%, accuracy for grading ulcerated surface, ulcer size, and ulcer depth.
Stidham 2024⁶⁰	UC N = 1096	Endoscopic videos	DL	Develop computer vision methods to better quantify mucosal injury in UC and compare to MES	An automated cumulative disease score correlated with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460)

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; AUC, area under the curve; VEGF, vascular endothelial growth fact; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; MES, Mayo Endoscopic Score.

Similarly, evaluating disease severity for small bowel CD has been an area of unmet need. Several VCE scores have been developed but are not routinely used in practice, potentially due to the added time requirement on top of the time needed to read and interpret VCE at baseline. Barash et al developed a DL algorithm grading small bowel ulcer severity (grade 1-3 mild to severe).⁴⁶ The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2. In the same vein, Xie et al developed a CNN model for double balloon endoscopic images that could detect small bowel ulcers with 96% accuracy and grade small bowel ulcerated surface, ulcer size, and ulcer depth with 87%, 88%, and 85% accuracy, respectively.⁵⁹ If validated, an AI-based VCE system for monitoring disease activity in the small bowel will be invaluable for improving efficiency and better monitoring of CD, especially for inexperience operators/observers.

During endoscopy, biopsies are often taken to evaluate for histologic disease activity. Several histologic disease severity scores have been developed to standardize histologic disease activity such as the PICaSSO Histologic Remission Index⁶² and Nancy histological index,⁶³ but their clinical utility is limited by their time intensive nature. To address this clinical challenge, Najdawi et al⁵⁶ developed a CNN model for UC that predicted Nancy histologic index score with high agreement with human reviewers (κ = 0.91) and could predict histology remission with an accuracy of 97%. Unlike the previous studies performed on colonic biopsies, another study developed an automated DL model that could detect histologic disease activity in the colon and ileum with 87% to 94% and 76% 83% accuracy, respectively.⁵⁸ Using endocytoscopy, Maeda et al developed a computer-aided diagnostic system for predicting histologic inflammation with 91% accuracy.⁴⁰ Finally, histologic evaluation is critical for diagnosing UC-associated dysplasia/cancer. One of the key studies in the work up with p53 immunohistochemistry along with hematoxylin and eosin staining. However, evaluation of p53 immunohistochemistry is expensive and time intensive, so Noguchi et al developed a CNN-model that predicted p53 immunohistochemistry staining with 86% to 91% accuracy.⁵⁴ While the role of AI for histology in IBD is relatively understudied compared with endoscopy and imaging, it has tremendous implications not only for improving diagnostic accuracy but also elevating workflow efficiency and cost effectiveness.

Imaging offers noninvasive options to characterize and monitor IBD to inform therapeutic strategies and assess treatment response. In CD, several imaging scores have been developed to assess disease activity such as the MaRIA,⁶⁴ London,⁶⁵ Nancy,⁶⁶ and Clermont score.⁶⁷ Additionally, the Lemann Index was developed to quantify total gut damage in CD and incorporates clinical, surgical, endoscopic, and imaging findings from all segments of the GI tract into one composite score.⁶⁸ However, these scores are often time-consuming, have variable sensitivity and specificity for detecting intestinal segments with active CD (with the MaRIA score being the best; 81% sensitivity, 89% specificity), have variable correlation with endoscopic disease activity, and have fair to good interobserver variability depending on the imaging feature of interest, which limits their clinical utility.^69,70 These limitations present a very significant opportunity for AI-based imaging interpretation to improve patient care. Of the available studies, AI has been used to characterize disease activity and phenotype inflammatory vs fibrotic strictures in CD.

Presently, 8 studies have explored the use of AI for characterizing disease activity in CD. Studies using DL approaches are limited. One of the biggest challenges for automating the quantification of disease activity in CD is accurately separating the bowel wall from the lumen to make measurements unique to each compartment. In a pilot study in pediatric patients with CD, Lamash et al developed a semiautomated supervised 3D CNN algorithm that only required placement of seed points by the operator to segment the bowel wall and lumen.³⁸ From this, the algorithm could measure lumen radius and bowel wall thickness. This study could not evaluate the algorithm performance due to lack of training date, and there was no endoscopic disease activity score to validate. However, it provides an excellent working foundation to develop future DL algorithms. Subsequent studies have developed DL algorithms using endoscopic scores such as the Simple Endoscopic Score for CD (SES-CD) and CD Endoscopic Index of Severity (CDEIS) as ground truth and compared the algorithms to established imaging disease activity scores. Puylaert et al developed the VIGOR score, which included both semiautomatic quantitative measurements (bowel thickness and contrast enhancement parameters) and qualitative measurements (degree of T2 mural signal enhancement determined by a radiologist).³⁹ The novel VIGOR score demonstrated a moderate correlation with CDEIS (r = 0.58, P < .001), which was similar the MaRIA (r = 0.40, P = .001), London score (r = 0.38, P = .001), and CDMI r = 0.34, P = .003). The VIGOR score also had similar diagnostic accuracy (80%) as the other scores. However, the VIGOR score had superior inter-rater reliability compared with the other imaging scores (ICC 0.81 vs 0.44-0.59), which emphasizes the strength of AI for imaging interpretation in IBD. Another study developed a multimodal machine-learning fusion model that included disease length on MRE, CRP, and fecal calprotectin to predict a SES-CD ≥3.⁵¹ The machine-learning model (AUC 0.84) performed better than the MaRIA score (AUC 0.80, P < 1e^-9) and biochemical markers alone (AUC 0.67, P < 1e^-5). Artificial intelligence–based imaging algorithms such as this would not only have important implications for patient care but also clinical trial recruitment, which often requires endoscopic disease activity score cut-off for inclusions.

While the number of imaging studies using deep-learning approaches to assess disease activity is limited, several studies have evaluated radiomic approaches for characterizing disease activity with promising results. Studies have identified several unique associations between MRTA parameters and disease activity. On a macroscopic level, entropy has been correlated with MRI Crohn’s disease activity score⁶⁵ (Rc 1.00, P = .01), while kurtosis has been negatively correlated (Rc −0.45, P = .002).³⁷ On a histologic level, skewness has been associated with histologic disease activity (rc 4.27, P = .02), and lower mean pixel intensity and mean positive pixels have been associated with segments of bowel with increased neoangiogenesis, a hallmark of active inflammation, defined by presence of vascular endothelial growth factor (VEGF) expression.³⁶ These studies demonstrate how radiomics can provide insight into the underlying biology of CD. In terms of quantifying disease activity, Ding et al developed an MRI radiomic-based model that could detect ileal disease with CDEIS >7 with similar performance as the MaRIA (AUC 0.87 vs 0.88, P = .85) but with superior reproducibility (radiomics ICC 0.93-0.96 vs MaRIA ICC 0.58).⁵⁰ Using CTs, 2 groups developed radiomic-based algorithms that could differentiate intestinal segments with active vs inactive CD.^52,57 Ruiqing et al developed a particularly interesting radiomics model that incorporated luminal and mesenteric radiomic features that could distinguish multicategorical SES-CD scores (ie, 0, 1, 2-5, 6-10, >10) with an AUC 0.83 and differentiate intestinal segments with moderate/severe disease (SES-CD >5) with AUC 0.85.⁵⁷ Including mesenteric radiomics features is unique because it allows for objective quantification of inflammatory alterations in the mesenteric fat, which not only provides an additional data point for quantifying disease activity but also facilitates future imaging-based investigations into the mesenteric fat. Noninvasive characterization of inflammatory mesenteric fat will become increasingly important, as studies have demonstrated mesenteric fat is intimately involved in the pathogenesis and progression of CD.³⁵

In CD, one of the biggest and most persistent clinical challenges is differentiating between inflammatory-predominant vs fibrotic predominant strictures to decide between medical vs surgical intervention. Also, multiple antifibrotic targets are under investigation, so the need to accurately phenotype CD stricture characteristics as potential trial end points is becoming increasingly important.^71,72 Multiple imaging parameters in US, CTE, and MRE have been proposed to phenotype CD strictures, but their time intensity and interobserver variation are potential limitations. For this unmet need in CD, AI-powered imaging interpretation is a promising tool to help phenotype CD stricture consistently and efficiently. Using DL approaches, several studies have developed semiautomated and automated algorithms to measures minimal lumen diameter, maximal prestenotic dilation, bowel wall thickness, and/or body composition (VAT and SAT volumetrics) to develop multivariable models to detect CD strictures and predict degree of fibrosis using histologic scores as the ground truth.^43,44,53,55 The performances of these models were generally good (AUC 0.80-0.86) and superior to radiologists’ interpretation (AUC 0.58-0.64). Additionally, compared with a radiomic approach, 1 study reported their DL algorithm had shorter processing time (48.4 vs 599.8 seconds, P < .0001).⁵³ Using radiomics, studies have achieved comparable success as DL approaches. In the 2 largest radiomic-based studies to date, CTE-based radiomics models classified CD strictures with moderate to severe fibrosis better than radiologists (AUC 0.83-0.89 vs 0.55-0.64), and decision curve analyses supported net benefit with a radiomics prediction model.^49,53 Considering data demonstrating the subpar human ability to classify fibrotic-predominant strictures, the ability to use AI to differentiate between inflammatory-predominant vs fibrosis-predominant strictures in CD has important and exciting implications for precision medicine and developing antifibrotic therapies. However, prospective clinical trials using AI-powered phenotyping of CD strictures to inform management decisions are needed to fully understand its clinical utility and safety.

Prognosis

The old saying “an ounce of prevention is worth a pound of cure” is a core concept for treating IBD to reduce the risk of complications and surgery, and a tremendous amount of research has been directed toward discovering prognostic biomarkers to improve our ability to position interventions earlier. However, many prognostic biomarkers are variably supported by the literature and have limited predictive value, and AI-powered clinical tools may help advance this area of need in IBD. Several studies have evaluated the role for AI in for prognosis in IBD with studies focusing on histology and imaging yielding the most exciting results (Table 3). In our search, we did not identify any studies that developed an AI-based model to determine prognosis based on endoscopy.

Table 3.

Ten studies evaluating the role of AI in endoscopy, histology and cross-sectional imaging for prognosis in IBD.

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Klein 2017⁷³	CD N = 105	Histologic images	DL	Validate a morphometric histology image analysis of baseline intestinal biopsies for predicting clinical phenotype in CD colitis.	Using B1 as reference the model was able to predict future B2 and B3 disease in 5 years with AUC 0.74 and 0.78, respectively.
Chen 2021⁷⁴	CD N = 186	CTE	Radiomic	Develop and validate a radiomic nomogram using the intestinal segment with the worst disease to predict secondary loss of response to infliximab.	A multivariate prediction model containing 8 features had significant discrimination (AUC 0.88). Ten-fold cross validation of the model yielded mean AUC 0.82 and 82% accuracy.
Feng 2021⁷⁵	CD N = 322	MRE	Radiomic	Explore the correlation between R2* (an MRI-based radiomic index that detects changes in hepatic iron metabolism) and inflammation. Develop a nomogram based on R2* to identify secondary loss of response to infliximab.	R2* was higher in active vs inactive CD (28.0 vs 26.6, P = .03). Multivariable prediction model including CRP, hemoglobin, and R2* had good discrimination for secondary loss of response with AUC 0.72 for training and validation data set.
Ohara 2022⁷⁶	UC N = 114	Histologic images	DL	Determine if an DL-based system for automatic quantification of goblet cell mucin is useful for predicting future relapse in UC patients in endoscopic remission.	The relapse group had lower goblet cell mucus area calculated by the DL system compared with non-relapse group.
Chirra 2023⁷⁷	CD N = 80	MRE	Radiomics	Develop and test a prognostic radiomic model for early surgery in CD patients requiring immunomodulators and/or biologics.	A model combining radiomics, simplified MaRIA scoring, and clinical variables yielded the best performance (AUC 0.83) compared with radiomics (AUC 0.74), simplified MaRIA (AUC 0.58) or clinical variables alone (AUC 0.48). The model accurately predicted time to surgery (HR, 4.13, p = 6.90e^-6, C-index 0.71).
Iacucci 2023⁴⁴	UC N = 273	Histologic images	DL, CNN	Develop and validate AI computer-aided diagnosis system to evaluate UC biopsies and predict prognosis.	The system had 89% sensitivity and 85% specificity for distinguishing histologic remission/activity based on PICaSSO Histologic Remission Index (PHRI). The AI-assessed PHRI was associated with flare up in 1 year with hazard ratio 4.64 compared with 3.56 with a human-assessed PHRI.
Li 2023⁵³	CD N = 256	CTE	Radiomics	Develop and validate a visceral fat-based radiomics model for predicting CD progression (development of penetrating/stricturing disease or surgery) and compare prediction accuracy to a subcutaneous fat-based radiomics model and six conventional fat metrics.	The visceral fat-based radiomics model (AUC 0.85) outperformed the subcutaneous fat-based model (AUC 0.79). On multivariate Cox regression analysis, the visceral fat-based radiomics model was the most important independent predictor of CD progression (HR, 9.29, P < .005) followed by the subcutaneous fat-based radiomics model (HR, 3.28, P = .06) Decision curve analysis showed visceral fat-based radiomics model had better net benefit over subcutaneous fat-based model. The conventional fat metrics were not associated with disease progression (P = .089-0.996).
Ruiqing 2023³⁹	CD N = 167	CTE	Radiomics	Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.	The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83 The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85. A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.
Shen 2023⁷³	CD N = 186	CTE	Radiomics	Develop and validate a preoperative CTE-based radiomics signature to predict postoperative recurrence (POR). Compare predictive accuracy of a multimodal nomogram (incorporating radiomics signature, clinical, and radiologic features) vs clinical-radiologic only model for POR.	An intestinal lesion only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signature (HR, 2.19, P = .0018) were associated with POR. The multi-modal nomogram performed modestly better than the clinical-radiologic only model (AUC 0.69 vs 0.66). Decision curve analysis showed the multi-modal nomogram had moderately better net benefits than clinical-radiological model.
Yao 2023⁷⁸	CD N = 268	CTE	Radiomics	Develop and validate a clinical-radiomic nomogram to predict 1-year surgical risk after CD diagnosis. The radiomics model will extra features from the inflamed segment of bowel and peri-intestinal mesenteric fat.	The clinical-radiomic model had superior performance (AUC 0.90) compared with the clinical only (AUC 0.77), intestinal radiomic only (AUC 0.88), and peri-intestinal mesenteric fat only (AUC 0.80) models. The clinical-radiomic nomogram was 71% sensitive 90% specific, and 85% accurate for predicting surgery within 1 year of CD diagnosis. Decision curve analysis supported net benefit of the clinical-radiomic nomogram over other models.

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Klein 2017⁷³	CD N = 105	Histologic images	DL	Validate a morphometric histology image analysis of baseline intestinal biopsies for predicting clinical phenotype in CD colitis.	Using B1 as reference the model was able to predict future B2 and B3 disease in 5 years with AUC 0.74 and 0.78, respectively.
Chen 2021⁷⁴	CD N = 186	CTE	Radiomic	Develop and validate a radiomic nomogram using the intestinal segment with the worst disease to predict secondary loss of response to infliximab.	A multivariate prediction model containing 8 features had significant discrimination (AUC 0.88). Ten-fold cross validation of the model yielded mean AUC 0.82 and 82% accuracy.
Feng 2021⁷⁵	CD N = 322	MRE	Radiomic	Explore the correlation between R2* (an MRI-based radiomic index that detects changes in hepatic iron metabolism) and inflammation. Develop a nomogram based on R2* to identify secondary loss of response to infliximab.	R2* was higher in active vs inactive CD (28.0 vs 26.6, P = .03). Multivariable prediction model including CRP, hemoglobin, and R2* had good discrimination for secondary loss of response with AUC 0.72 for training and validation data set.
Ohara 2022⁷⁶	UC N = 114	Histologic images	DL	Determine if an DL-based system for automatic quantification of goblet cell mucin is useful for predicting future relapse in UC patients in endoscopic remission.	The relapse group had lower goblet cell mucus area calculated by the DL system compared with non-relapse group.
Chirra 2023⁷⁷	CD N = 80	MRE	Radiomics	Develop and test a prognostic radiomic model for early surgery in CD patients requiring immunomodulators and/or biologics.	A model combining radiomics, simplified MaRIA scoring, and clinical variables yielded the best performance (AUC 0.83) compared with radiomics (AUC 0.74), simplified MaRIA (AUC 0.58) or clinical variables alone (AUC 0.48). The model accurately predicted time to surgery (HR, 4.13, p = 6.90e^-6, C-index 0.71).
Iacucci 2023⁴⁴	UC N = 273	Histologic images	DL, CNN	Develop and validate AI computer-aided diagnosis system to evaluate UC biopsies and predict prognosis.	The system had 89% sensitivity and 85% specificity for distinguishing histologic remission/activity based on PICaSSO Histologic Remission Index (PHRI). The AI-assessed PHRI was associated with flare up in 1 year with hazard ratio 4.64 compared with 3.56 with a human-assessed PHRI.
Li 2023⁵³	CD N = 256	CTE	Radiomics	Develop and validate a visceral fat-based radiomics model for predicting CD progression (development of penetrating/stricturing disease or surgery) and compare prediction accuracy to a subcutaneous fat-based radiomics model and six conventional fat metrics.	The visceral fat-based radiomics model (AUC 0.85) outperformed the subcutaneous fat-based model (AUC 0.79). On multivariate Cox regression analysis, the visceral fat-based radiomics model was the most important independent predictor of CD progression (HR, 9.29, P < .005) followed by the subcutaneous fat-based radiomics model (HR, 3.28, P = .06) Decision curve analysis showed visceral fat-based radiomics model had better net benefit over subcutaneous fat-based model. The conventional fat metrics were not associated with disease progression (P = .089-0.996).
Ruiqing 2023³⁹	CD N = 167	CTE	Radiomics	Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.	The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83 The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85. A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.
Shen 2023⁷³	CD N = 186	CTE	Radiomics	Develop and validate a preoperative CTE-based radiomics signature to predict postoperative recurrence (POR). Compare predictive accuracy of a multimodal nomogram (incorporating radiomics signature, clinical, and radiologic features) vs clinical-radiologic only model for POR.	An intestinal lesion only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signature (HR, 2.19, P = .0018) were associated with POR. The multi-modal nomogram performed modestly better than the clinical-radiologic only model (AUC 0.69 vs 0.66). Decision curve analysis showed the multi-modal nomogram had moderately better net benefits than clinical-radiological model.
Yao 2023⁷⁸	CD N = 268	CTE	Radiomics	Develop and validate a clinical-radiomic nomogram to predict 1-year surgical risk after CD diagnosis. The radiomics model will extra features from the inflamed segment of bowel and peri-intestinal mesenteric fat.	The clinical-radiomic model had superior performance (AUC 0.90) compared with the clinical only (AUC 0.77), intestinal radiomic only (AUC 0.88), and peri-intestinal mesenteric fat only (AUC 0.80) models. The clinical-radiomic nomogram was 71% sensitive 90% specific, and 85% accurate for predicting surgery within 1 year of CD diagnosis. Decision curve analysis supported net benefit of the clinical-radiomic nomogram over other models.

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.

Table 3.

. https://www.fda.gov/news-events/press-announcements/fda-authorizes-marketing-first-device-uses-artificial-intelligence-help-detect-potential-signs-colon.

Ten studies evaluating the role of AI in endoscopy, histology and cross-sectional imaging for prognosis in IBD.

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Klein 2017⁷³	CD N = 105	Histologic images	DL	Validate a morphometric histology image analysis of baseline intestinal biopsies for predicting clinical phenotype in CD colitis.	Using B1 as reference the model was able to predict future B2 and B3 disease in 5 years with AUC 0.74 and 0.78, respectively.
Chen 2021⁷⁴	CD N = 186	CTE	Radiomic	Develop and validate a radiomic nomogram using the intestinal segment with the worst disease to predict secondary loss of response to infliximab.	A multivariate prediction model containing 8 features had significant discrimination (AUC 0.88). Ten-fold cross validation of the model yielded mean AUC 0.82 and 82% accuracy.
Feng 2021⁷⁵	CD N = 322	MRE	Radiomic	Explore the correlation between R2* (an MRI-based radiomic index that detects changes in hepatic iron metabolism) and inflammation. Develop a nomogram based on R2* to identify secondary loss of response to infliximab.	R2* was higher in active vs inactive CD (28.0 vs 26.6, P = .03). Multivariable prediction model including CRP, hemoglobin, and R2* had good discrimination for secondary loss of response with AUC 0.72 for training and validation data set.
Ohara 2022⁷⁶	UC N = 114	Histologic images	DL	Determine if an DL-based system for automatic quantification of goblet cell mucin is useful for predicting future relapse in UC patients in endoscopic remission.	The relapse group had lower goblet cell mucus area calculated by the DL system compared with non-relapse group.
Chirra 2023⁷⁷	CD N = 80	MRE	Radiomics	Develop and test a prognostic radiomic model for early surgery in CD patients requiring immunomodulators and/or biologics.	A model combining radiomics, simplified MaRIA scoring, and clinical variables yielded the best performance (AUC 0.83) compared with radiomics (AUC 0.74), simplified MaRIA (AUC 0.58) or clinical variables alone (AUC 0.48). The model accurately predicted time to surgery (HR, 4.13, p = 6.90e^-6, C-index 0.71).
Iacucci 2023⁴⁴	UC N = 273	Histologic images	DL, CNN	Develop and validate AI computer-aided diagnosis system to evaluate UC biopsies and predict prognosis.	The system had 89% sensitivity and 85% specificity for distinguishing histologic remission/activity based on PICaSSO Histologic Remission Index (PHRI). The AI-assessed PHRI was associated with flare up in 1 year with hazard ratio 4.64 compared with 3.56 with a human-assessed PHRI.
Li 2023⁵³	CD N = 256	CTE	Radiomics	Develop and validate a visceral fat-based radiomics model for predicting CD progression (development of penetrating/stricturing disease or surgery) and compare prediction accuracy to a subcutaneous fat-based radiomics model and six conventional fat metrics.	The visceral fat-based radiomics model (AUC 0.85) outperformed the subcutaneous fat-based model (AUC 0.79). On multivariate Cox regression analysis, the visceral fat-based radiomics model was the most important independent predictor of CD progression (HR, 9.29, P < .005) followed by the subcutaneous fat-based radiomics model (HR, 3.28, P = .06) Decision curve analysis showed visceral fat-based radiomics model had better net benefit over subcutaneous fat-based model. The conventional fat metrics were not associated with disease progression (P = .089-0.996).
Ruiqing 2023³⁹	CD N = 167	CTE	Radiomics	Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.	The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83 The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85. A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.
Shen 2023⁷³	CD N = 186	CTE	Radiomics	Develop and validate a preoperative CTE-based radiomics signature to predict postoperative recurrence (POR). Compare predictive accuracy of a multimodal nomogram (incorporating radiomics signature, clinical, and radiologic features) vs clinical-radiologic only model for POR.	An intestinal lesion only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signature (HR, 2.19, P = .0018) were associated with POR. The multi-modal nomogram performed modestly better than the clinical-radiologic only model (AUC 0.69 vs 0.66). Decision curve analysis showed the multi-modal nomogram had moderately better net benefits than clinical-radiological model.
Yao 2023⁷⁸	CD N = 268	CTE	Radiomics	Develop and validate a clinical-radiomic nomogram to predict 1-year surgical risk after CD diagnosis. The radiomics model will extra features from the inflamed segment of bowel and peri-intestinal mesenteric fat.	The clinical-radiomic model had superior performance (AUC 0.90) compared with the clinical only (AUC 0.77), intestinal radiomic only (AUC 0.88), and peri-intestinal mesenteric fat only (AUC 0.80) models. The clinical-radiomic nomogram was 71% sensitive 90% specific, and 85% accurate for predicting surgery within 1 year of CD diagnosis. Decision curve analysis supported net benefit of the clinical-radiomic nomogram over other models.

Author Year	Dataset	Data source	Algorithm type	Task	Performance
Klein 2017⁷³	CD N = 105	Histologic images	DL	Validate a morphometric histology image analysis of baseline intestinal biopsies for predicting clinical phenotype in CD colitis.	Using B1 as reference the model was able to predict future B2 and B3 disease in 5 years with AUC 0.74 and 0.78, respectively.
Chen 2021⁷⁴	CD N = 186	CTE	Radiomic	Develop and validate a radiomic nomogram using the intestinal segment with the worst disease to predict secondary loss of response to infliximab.	A multivariate prediction model containing 8 features had significant discrimination (AUC 0.88). Ten-fold cross validation of the model yielded mean AUC 0.82 and 82% accuracy.
Feng 2021⁷⁵	CD N = 322	MRE	Radiomic	Explore the correlation between R2* (an MRI-based radiomic index that detects changes in hepatic iron metabolism) and inflammation. Develop a nomogram based on R2* to identify secondary loss of response to infliximab.	R2* was higher in active vs inactive CD (28.0 vs 26.6, P = .03). Multivariable prediction model including CRP, hemoglobin, and R2* had good discrimination for secondary loss of response with AUC 0.72 for training and validation data set.
Ohara 2022⁷⁶	UC N = 114	Histologic images	DL	Determine if an DL-based system for automatic quantification of goblet cell mucin is useful for predicting future relapse in UC patients in endoscopic remission.	The relapse group had lower goblet cell mucus area calculated by the DL system compared with non-relapse group.
Chirra 2023⁷⁷	CD N = 80	MRE	Radiomics	Develop and test a prognostic radiomic model for early surgery in CD patients requiring immunomodulators and/or biologics.	A model combining radiomics, simplified MaRIA scoring, and clinical variables yielded the best performance (AUC 0.83) compared with radiomics (AUC 0.74), simplified MaRIA (AUC 0.58) or clinical variables alone (AUC 0.48). The model accurately predicted time to surgery (HR, 4.13, p = 6.90e^-6, C-index 0.71).
Iacucci 2023⁴⁴	UC N = 273	Histologic images	DL, CNN	Develop and validate AI computer-aided diagnosis system to evaluate UC biopsies and predict prognosis.	The system had 89% sensitivity and 85% specificity for distinguishing histologic remission/activity based on PICaSSO Histologic Remission Index (PHRI). The AI-assessed PHRI was associated with flare up in 1 year with hazard ratio 4.64 compared with 3.56 with a human-assessed PHRI.
Li 2023⁵³	CD N = 256	CTE	Radiomics	Develop and validate a visceral fat-based radiomics model for predicting CD progression (development of penetrating/stricturing disease or surgery) and compare prediction accuracy to a subcutaneous fat-based radiomics model and six conventional fat metrics.	The visceral fat-based radiomics model (AUC 0.85) outperformed the subcutaneous fat-based model (AUC 0.79). On multivariate Cox regression analysis, the visceral fat-based radiomics model was the most important independent predictor of CD progression (HR, 9.29, P < .005) followed by the subcutaneous fat-based radiomics model (HR, 3.28, P = .06) Decision curve analysis showed visceral fat-based radiomics model had better net benefit over subcutaneous fat-based model. The conventional fat metrics were not associated with disease progression (P = .089-0.996).
Ruiqing 2023³⁹	CD N = 167	CTE	Radiomics	Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.	The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83 The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85. A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.
Shen 2023⁷³	CD N = 186	CTE	Radiomics	Develop and validate a preoperative CTE-based radiomics signature to predict postoperative recurrence (POR). Compare predictive accuracy of a multimodal nomogram (incorporating radiomics signature, clinical, and radiologic features) vs clinical-radiologic only model for POR.	An intestinal lesion only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signature (HR, 2.19, P = .0018) were associated with POR. The multi-modal nomogram performed modestly better than the clinical-radiologic only model (AUC 0.69 vs 0.66). Decision curve analysis showed the multi-modal nomogram had moderately better net benefits than clinical-radiological model.
Yao 2023⁷⁸	CD N = 268	CTE	Radiomics	Develop and validate a clinical-radiomic nomogram to predict 1-year surgical risk after CD diagnosis. The radiomics model will extra features from the inflamed segment of bowel and peri-intestinal mesenteric fat.	The clinical-radiomic model had superior performance (AUC 0.90) compared with the clinical only (AUC 0.77), intestinal radiomic only (AUC 0.88), and peri-intestinal mesenteric fat only (AUC 0.80) models. The clinical-radiomic nomogram was 71% sensitive 90% specific, and 85% accurate for predicting surgery within 1 year of CD diagnosis. Decision curve analysis supported net benefit of the clinical-radiomic nomogram over other models.

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.

While histologic remission can be difficult to achieve in the real world, several studies have found histologic remission may be associated with lower risk of future flares despite being in endoscopic remission.⁷⁸ Using the PICaSSO Histologic Remission Index (PHRI) for UC, Iacucci et al developed a CNN-based system that could distinguish histologic remission vs activity with 89% sensitivity and 85% specificity.⁷⁹ They also found an AI-assessed PHRI was associated with a UC flare within 1 year, with a hazard ratio (HR) of 4.64 compared with 3.56 with a human-assessed PHRI. Similarly, using computational pathology methods, Klein et al developed a system to analyze baseline histology images from patients with Crohn’s colitis that could predict future development of fibrostenosing and internal penetrating disease behavior within 5 years with AUC 0.74 and 0.78, respectively.⁷³ Likewise, Ohara et al developed a DL-based system to automate quantification of goblet cell mucin to predict risk of relapse within 12 months in UC subjects in endoscopic remission.⁷⁶ The investigators found the relapse group had lower goblet cell mucus area calculated by the DL system compared with the nonrelapse group. These studies highlight how AI can enhance our prognostic abilities using data previously not easily obtainable using traditional methods, primarily due to time restraints.

For imaging, radiomic-based models maybe the best approach for predicting outcomes in IBD. We identified no studies that used DL algorithms to develop prognostic models in IBD. In one study, a VAT-based radiomic signature independently predicted risk of CD progression (HR, 9.29, P = .005) with good performance in 2 independent test cohorts (AUC 0.82-0.87).⁸⁰ Conventional VAT metrics such as BMI, VAT volume, or VAT:SAT volume were not associated with risk of CD progression (P = 0.089-0.996), highlighting the limitations of human-derived prognostic biomarkers. Studies have also developed multimodal radiomic-based nomograms to predict secondary loss of response to infliximab using pretreatment imaging with good performance (AUC 0.72-0.88).^74,75 Interestingly, one of these studies developed a multivariable nomogram using an MRI-based radiomic index that detects changes in hepatic iron metabolism (R*) to predict secondary loss of response to infliximab with acceptable performance (AUC 0.72).⁷⁵ This study is another example of how radiomics can help uncover additional information about the underlying biology of CD. Similarly, 3 studies have developed radiomic-based nomograms using features from the bowel and/or peri-intestinal mesenteric adipose tissue to predict 1-year risk of surgery in CD.^57,77,81 Like other multimodal nomograms mentioned previously, incorporating clinical factors improved the performance for predicting surgery with acceptable to good performance (AUC 0.70-0.90). Finally, predicting postoperative recurrence in CD has remained an unmet challenge despite significant efforts, and many prognostic markers are variably supported by the literature. Using imaging obtained preoperatively, Shen et al identified intestinal only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signatures (HR, 2.19, P = .0018) that were associated with postoperative recurrence.⁸² Unfortunately, the multimodal nomogram incorporating these signatures with clinical factors had poor performance (AUC 0.69). The performance may have been limited by defining postoperative recurrence as composite end point including endoscopic, radiographic, or surgical recurrence, which can be confounded by patient adherence to postoperative disease monitoring. Overall, the available literature provides promising data supporting the use of AI to better predict outcomes in IBD. As studies in oncology and IBD have shown radiomics can reflect disease biology, correlating radiomic signatures with histologic or cellular level data (ie, transcriptomics) will not only advance our knowledge about the heterogenous nature of IBD but also develop more accurate prediction models.

Limitations and Future Directions

While studies investigating the role of AI in IBD have made significant strides, there are several important limitations. First, AI algorithms’ performance are dependent on the availability and quality of data. Majority of studies rely on retrospective data. Especially with CNN-based systems for automated endoscopic scoring, endoscopic image and videos acquisition were not standardized, as most were retrospective analyses from clinical practice. Using standardized acquisition of endoscopic data from randomized controlled trials will strengthen future development of AI-based endoscopic scoring systems such as Stidham et al and Gottlieb et al.^47,60 Second, there are important inherent biases to recognize. There is likely a significant degree of publication bias in the current literature, as investigators are unlikely to report negative AI algorithms and journals are unlikely to publish these negative studies. Journals should encourage the submission and publication of negative studies to fully comprehend the role and value of AI for IBD. There is also potential bias in the data sets used to train the AI models. This is particularly important to recognize considering most IBD data sets comprise Caucasian subjects, so whether these AI-based systems are accurate in non-Caucasian subjects is unclear. Efforts to study AI in underrepresented demographics will be crucial to prevent exacerbation of healthcare disparities. Third, studies in AI-based systems for endoscopic, histology, and imaging tend to favor either UC or CD. For example, endoscopy-based AI studies are primarily conducted in UC subjects, while imaging-based AI studies are primarily in CD subjects. Endoscopic scoring systems for CD are subject to the same limitations as UC scoring systems, so future studies are needed to developed AI-based systems to automate scoring in CD. Additionally, more studies are needed to develop and validate AI-based systems that can differentiate between CD and UC on endoscopy. Furthermore, studies developing AI-based systems to determine prognosis based on endoscopic findings are also needed. Finally, standardization of endoscopy, histology, and imaging techniques and settings are not standardized across institutions. Future standardization of these data is needed for AI-based systems for IBD to function appropriately across different institutions.

Conclusion

In conclusion, the transformative potential of AI applications across endoscopy, histology, and imaging in IBD is undeniably promising. Artificial intelligence stands poised to revolutionize the landscape of IBD care by addressing unmet clinical needs, improving workflow efficiency and enhancing patient outcomes through multifaceted approaches. The integration of AI-based clinical tools will play a critical role in advancing precision medicine in IBD. Additionally, AI-powered analytics present opportunities to augment the efficiency of clinical trials, facilitating quicker and more insightful analyses, ultimately expediting the development of novel therapies for IBD.

While the strides made in AI applications for IBD are exciting, inherent limitations and gaps in knowledge in the available literature underscore the need for cautious optimism. Many AI algorithms necessitate rigorous validation in larger prospective studies to ensure their reliability, reproducibility, and robust performance across diverse patient populations and clinical settings. Thus, as we continue to explore the potential for an AI-driven healthcare, the translation of these innovative tools and technologies into routine clinical practice require a comprehensive understanding of their limitations, coupled with a commitment to address these through continual research and development. Embracing a collaborative approach among clinicians, researchers, and technology developers is imperative to realizing the full potential of AI in IBD.

Supplementary Data

Supplementary data is available at Inflammatory Bowel Diseases online.

Author Contributions

P.G. is the guarantor of the article and was involved in concept and design, drafting of article, and final approval of article.

O.M. was involved in the drafting and final approval of the article.

S.D. was involved in the drafting and final approval of the article.

D.C. was involved in the drafting and final approval of the article.

P.W. was involved in the drafting and final approval of the article.

X.H. was involved in the drafting and final approval of the article.

D.L. was involved in the drafting and final approval of the article.

J.H.M. was involved in the drafting and final approval of the article.

D.P.B.M. was involved in the drafting and final approval of the article.

Funding

There are no sources of funding to disclose related to the work for this article.

Conflicts of interest

D.C.: speaker’s fees and/ or research support from Takeda, Janssen, AbbVie, Illy Lilly, Reckitt,Lapidot

Consultancy fees from Takeda, AbbVie, and Taro.

D.P.B.M. has received consulting fees from Takeda, Prometheus Biosciences Inc, Prometheus Labs, Palisade Bio, and MERCK.

P.G., O.M., S.D., P.W., X.H., D.L., J.H.M. have no conflicts of interest to disclose.

References

1.

Lewis

JD

,

Parlett

LE

,

Jonsson-Funk

ML

, et al.

Incidence, prevalence and racial and ethnic distribution of inflammatory bowel disease in the United States

.

Gastroenterology.

2023

;

165

(

5

):

1197

-

1205

.

2.

Ng

SC

,

Shi

HY

,

Hamidi

N

, et al.

Worldwide incidence and prevalence of inflammatory bowel disease in the 21st century: a systematic review of population-based studies

.

Lancet.

2017

;

390

(

10114

):

2769

-

2778

.

3.

FDA authorizes marketing of first device that uses artificial intelligence to help detect potential signs of colon cancer.

2021

Accessed 8 March 2023

.

4.

Faghani

S

,

Codipilly

DC

,

David

V

, et al.

Development of a deep learning model for the histologic diagnosis of dysplasia in Barrett’s esophagus

.

Gastrointest Endosc.

2022

;

96

(

6

):

918

-

925 e3

.

5.

Kim

M

,

Yun

J

,

Cho

Y

, et al.

Deep learning in medical imaging

.

Neurospine.

2019

;

16

(

4

):

657

-

668

.

6.

Esteva

A

,

Kuprel

B

,

Novoa

RA

, et al.

Dermatologist-level classification of skin cancer with deep neural networks

.

Nature.

2017

;

542

(

7639

):

115

-

118

.

7.

Chen

MM

,

Terzic

A

,

Becker

AS

, et al.

Artificial intelligence in oncologic imaging

.

Eur J Radiol Open.

2022

;

9

(

1

):

100441

.

8.

Aslam

MF

,

Bano

S

,

Khalid

M

, et al.

The effectiveness of real-time computer-aided and quality control systems in colorectal adenoma and polyp detection during colonoscopies: a meta-analysis

.

Ann Med Surg (Lond).

2023

;

85

(

2

):

80

-

91

.

9.

Geis

JR

,

Brady

AP

,

Wu

CC

, et al.

Ethics of artificial intelligence in radiology: summary of the Joint European and North American Multisociety Statement

.

Radiology.

2019

;

293

(

2

):

436

-

440

.

10.

van Timmeren

JE

,

Cester

D

,

Tanadini-Lang

S

,

Alkadhi

H

,

Baessler

B.

Radiomics in medical imaging—“how-to” guide and critical reflection

.

Insights Imaging.

2020

;

11

(

1

):

91

-

107

.

11.

Ganeshan

B

,

Strukowska

O

,

Skogen

K

,

Young

R

,

Chatwin

C

,

Miles

K.

Heterogeneity of focal breast lesions and surrounding tissue assessed by mammographic texture analysis: preliminary evidence of an association with tumor invasion and estrogen receptor status

.

Front Oncol.

2011

;

1

(

1

):

33

.

12.

Limkin

EJ

,

Sun

R

,

Dercle

L

, et al.

Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology

.

Ann Oncol.

2017

;

28

(

6

):

1191

-

1206

.

13.

Rutman

AM

,

Kuo

MD.

Radiogenomics: creating a link between molecular diagnostics and diagnostic imaging

.

Eur J Radiol.

2009

;

70

(

2

):

232

-

241

.

14.

Segal

E

,

Sirlin

CB

,

Ooi

C

, et al.

Decoding global gene expression programs in liver cancer by noninvasive imaging

.

Nat Biotechnol.

2007

;

25

(

6

):

675

-

680

.

15.

Qureshi

TA

,

Gaddam

S

,

Wachsman

AM

, et al.

Predicting pancreatic ductal adenocarcinoma using artificial intelligence analysis of prediagnostic computed tomography images

.

Cancer Biomark.

2022

;

33

(

2

):

211

-

217

.

16.

Yip

SS

,

Aerts

HJ.

Applications and limitations of radiomics

.

Phys Med Biol.

2016

;

61

(

13

):

R150

-

R166

.

17.

Mahapatra

D

,

Schueffler

P

,

Tielbeek

JA

,

Buhmann

JM

,

Vos

FM.

A supervised learning approach for Crohn’s disease detection using higher-order image statistics and a novel shape asymmetry measure

.

J Digit Imaging.

2013

;

26

(

5

):

920

-

931

.

18.

Hahnemann

ML

,

Nensa

F

,

Kinner

S

, et al.

Improved detection of inflammatory bowel disease by additional automated motility analysis in magnetic resonance imaging

.

Invest Radiol.

2015

;

50

(

2

):

67

-

72

.

19.

Mossotto

E

,

Ashton

JJ

,

Coelho

T

, et al.

Classification of paediatric inflammatory bowel disease using machine learning

.

Sci Rep.

2017

;

7

(

1

):

2427

.

20.

Naziroglu

RE

,

Puylaert

CAJ

,

Tielbeek

JAW

, et al.

Semi-automatic bowel wall thickness measurements on MR enterography in patients with Crohn’s disease

.

Br J Radiol.

2017

;

90

(

1074

):

20160654

.

21.

Gollifer

RM

,

Menys

A

,

Plumb

A

, et al.

Automated versus subjective assessment of spatial and temporal MRI small bowel motility in Crohn’s disease

.

Clin Radiol.

2019

;

74

(

10

):

814.e9

-

814.e19

.

22.

Klang

E

,

Barash

Y

,

Margalit

RY

, et al.

Deep learning algorithms for automated detection of Crohn’s disease ulcers by video capsule endoscopy

.

Gastrointest Endosc.

2020

;

91

(

3

):

606

-

613.e2

.

23.

Klang

E

,

Grinman

A

,

Soffer

S

, et al.

Automated detection of Crohn’s disease intestinal strictures on capsule endoscopy images using deep neural networks

.

J Crohns Colitis.

2021

;

15

(

5

):

749

-

756

.

24.

Li

H

,

Mo

Y

,

Huang

C

, et al.

An MSCT-based radiomics nomogram combined with clinical factors can identify Crohn’s disease and ulcerative colitis

.

Ann Transl Med.

2021

;

9

(

7

):

572

.

25.

Zhu

C

,

Yu

Y

,

Wang

S

, et al.

a novel clinical radiomics nomogram to identify Crohn’s disease from intestinal tuberculosis

.

J Inflamm Res.

2021

;

14

(

1

):

6511

-

6521

.

26.

Arkko

A

,

Kaseva

T

,

Salli

E

, et al.

Automatic detection of Crohn’s disease using quantified motility in magnetic resonance enterography: initial experiences

.

Clin Radiol.

2022

;

77

(

2

):

96

-

103

.

27.

Klang

E

,

Kopylov

U

,

Mortensen

B

, et al.

A convolutional neural network deep learning model trained on CD ulcers images accurately identifies NSAID ulcers

.

Front Med (Lausanne).

2021

;

8

(

1

):

656493

.

28.

Jiang

F

,

Fu

X

,

Kuang

K

,

Fan

D.

Artificial intelligence algorithm-based differential diagnosis of Crohn’s disease and ulcerative colitis by CT image

.

Comput Math Methods Med.

2022

;

2022

(

1

):

3871994

.

29.

Wang

L

,

Chen

L

,

Wang

X

, et al.

Development of a convolutional neural network-based colonoscopy image assessment model for differentiating Crohn’s disease and ulcerative colitis

.

Front Med (Lausanne).

2022

;

9

(

1

):

789862

.

30.

Brodersen

JB

,

Jensen

MD

,

Leenhardt

R

, et al.

Artificial intelligence-assisted analysis of pan-enteric capsule endoscopy in patients with suspected Crohn’s disease. A study on diagnostic performance

.

J Crohns Colitis.

2023

;

18

(

1

):

75

-

81

.

Crossref

31.

Carter

D

,

Albshesh

A

,

Shimon

C

, et al.

Automatized detection of Crohn’s disease in intestinal ultrasound using convolutional neural network

.

Inflamm Bowel Dis.

2023

;

29

(

12

):

16

.

Crossref

32.

Gong

T

,

Li

M

,

Pu

H

, et al.

Computed tomography enterography-based multiregional radiomics model for differential diagnosis of Crohn’s disease from intestinal tuberculosis

.

Abdom Radiol.

2023

;

48

(

6

):

1900

-

1910

.

Crossref

. https://pubmed.ncbi.nlm.nih.gov/30450491/

33.

Zhou

Z

,

Xiong

Z

,

Cheng

R

, et al.

Volumetric visceral fat machine learning phenotype on CT for differential diagnosis of inflammatory bowel disease

.

Eur Radiol.

2023

;

33

(

3

):

1862

-

1872

.

34.

Cortegoso Valdivia

P

,

Deding

U

,

Bjorsum-Meyer

T

, et al. ;

International CApsule endoscopy REsearch (I-CARE) Group

.

Inter/intra-observer agreement in video-capsule endoscopy: are we getting it all wrong? A systematic review and meta-analysis

.

Diagnostics (Basel).

2022

;

12

(

10

):

2400

.

35.

Gu

P

,

Dube

S

,

McGovern

DPB.

Medical and surgical implications of mesenteric adipose tissue in Crohn’s disease: a review of the literature

.

Inflamm Bowel Dis.

2023

;

29

(

3

):

458

-

469

.

36.

Bhatnagar

G

,

Makanyanga

J

,

Ganeshan

B

, et al.

MRI texture analysis parameters of contrast-enhanced T1-weighted images of Crohn’s disease differ according to the presence or absence of histological markers of hypoxia and angiogenesis

.

Abdom Radiol (NY).

2016

;

41

(

7

):

1261

-

1269

.

37.

Makanyanga

J

,

Ganeshan

B

,

Rodriguez-Justo

M

, et al.

MRI texture analysis (MRTA) of T2-weighted images in Crohn’s disease may provide information on histological and MRI disease activity in patients undergoing ileal resection

.

Eur Radiol.

2017

;

27

(

2

):

589

-

597

.

38.

Lamash

Y

,

Kurugol

S

,

Warfield

SK.

Semi-automated extraction of Crohns disease MR imaging markers using a 3D residual CNN with distance prior

. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain,

September 20, 2018

, Proceedings/

Danail

Stoyanov

,

Zeike

Taylor

,

Gustavo

Carneiro

,

Tanveer

Syeda-Mahmood

et al. (eds.)

2018

;

11045

:

218

-

226

39.

Puylaert

CAJ

,

Schuffler

PJ

,

Naziroglu

RE

, et al.

Semiautomatic assessment of the terminal ileum and colon in patients with Crohn disease using MRI (the VIGOR++ Project)

.

Acad Radiol.

2018

;

25

(

8

):

1038

-

1045

.

40.

Maeda

Y

,

Kudo

SE

,

Mori

Y

, et al.

Fully automated diagnostic system with artificial intelligence using endocytoscopy to identify the presence of histologic inflammation associated with ulcerative colitis (with video)

.

Gastrointest Endosc.

2019

;

89

(

2

):

408

-

415

.

41.

Ozawa

T

,

Ishihara

S

,

Fujishiro

M

, et al.

Novel computer-assisted diagnosis system for endoscopic disease activity in patients with ulcerative colitis

.

Gastrointest Endosc.

2019

;

89

(

2

):

416

-

421.e1

.

42.

Stidham

RW

,

Liu

W

,

Bishu

S

, et al.

Performance of a deep learning model vs human reviewers in grading endoscopic disease severity of patients with ulcerative colitis

.

JAMA Netw Open.

2019

;

2

(

5

):

e193963

.

43.

Tabari

A

,

Kilcoyne

A

,

Jeck

WR

,

Mino-Kenudson

M

,

Gee

MS.

Texture analysis of magnetic resonance enterography contrast enhancement can detect fibrosis in Crohn disease strictures

.

J Pediatr Gastroenterol Nutr.

2019

;

69

(

5

):

533

-

538

.

44.

Stidham

RW

,

Enchakalody

B

,

Waljee

AK

, et al.

Assessing small bowel stricturing and morphology in Crohn’s disease using semiautomated image analysis

.

Inflamm Bowel Dis.

2020

;

26

(

5

):

734

-

742

.

45.

Takenaka

K

,

Ohtsuka

K

,

Fujii

T

, et al.

Development and validation of a deep neural network for accurate evaluation of endoscopic images from patients with ulcerative colitis

.

Gastroenterology.

2020

;

158

(

8

):

2150

-

2157

.

46.

Barash

Y

,

Azaria

L

,

Soffer

S

, et al.

Ulcer severity grading in video capsule images of patients with Crohn’s disease: an ordinal neural network solution

.

Gastrointest Endosc.

2021

;

93

(

1

):

187

-

192

.

47.

Gottlieb

K

,

Requa

J

,

Karnes

W

, et al.

Central reading of ulcerative colitis clinical trial videos using neural networks

.

Gastroenterology.

2021

;

160

(

3

):

710

-

719.e2

.

48.

Yao

H

,

Najarian

K

,

Gryak

J

, et al.

Fully automated endoscopic disease activity assessment in ulcerative colitis

.

Gastrointest Endosc.

2021

;

93

(

3

):

728

-

736.e1

.

49.

Li

X

,

Liang

D

,

Meng

J

, et al.

Development and validation of a novel computed-tomography enterography radiomic approach for characterization of intestinal fibrosis in Crohn’s disease

.

Gastroenterology.

2021

;

160

(

7

):

2303

-

2316.e11

.

50.

Ding

H

,

Li

J

,

Jiang

K

, et al.

Assessing the inflammatory severity of the terminal ileum in Crohn disease using radiomics based on MRI

.

BMC Med Imaging.

2022

;

22

(

1

):

118

.

51.

Guez

I

,

Focht

G

,

Greer

MC

, et al.

Development of a multimodal machine-learning fusion model to noninvasively assess ileal Crohn’s disease endoscopic activity

.

Comput Methods Programs Biomed.

2022

;

227

(

1

):

107207

.

52.

Li

T

,

Liu

Y

,

Guo

J

, et al.

Prediction of the activity of Crohn’s disease based on CT radiomics combined with machine learning models

.

J Xray Sci Technol

.

2022

;

30

(

1

):

1155

-

1168

.

53.

Meng

J

,

Luo

Z

,

Chen

Z

, et al.

Intestinal fibrosis classification in patients with Crohn’s disease using CT enterography-based deep learning: comparisons with radiomics and radiologists

.

Eur Radiol.

2022

;

32

(

12

):

8692

-

8705

.

54.

Noguchi

T

,

Ando

T

,

Emoto

S

, et al.

Artificial intelligence program to predict p53 mutations in ulcerative colitis-associated cancer or dysplasia

.

Inflamm Bowel Dis.

2022

;

28

(

7

):

1072

-

1080

.

55.

Yuan

G

,

He

Y

,

Cao

QH

, et al.

Visceral adipose volume is correlated with surgical tissue fibrosis in Crohn’s disease of the small bowel

.

Gastroenterol Rep.

2022

;

10

(

1

):

goac044

.

56.

Najdawi

F

,

Sucipto

K

,

Mistry

P

, et al.

Artificial intelligence enables quantitative assessment of ulcerative colitis histology

.

Mod Pathol.

2023

;

36

(

6

):

100124

.

57.

Ruiqing

L

,

Jing

Y

,

Shunli

L

, et al.

A novel radiomics model integrating luminal and mesenteric features to predict mucosal activity and surgery risk in Crohn’s disease patients: a multicenter study

.

Acad Radiol.

2023

;

30

(

1

):

04

.

58.

Rymarczyk

D

,

Schultz

W

,

Borowa

A

, et al.

Deep learning models capture histological disease activity in Crohn’s disease and ulcerative colitis with high fidelity

.

J Crohns Colitis.

2023

:

jjad171

.

59.

Xie

W

,

Hu

J

,

Liang

P

, et al.

Deep learning based lesions detection and severity grading of small bowel Crohn’s disease ulcers on double-balloon endoscopy images

.

Gastrointest Endosc.

2023

.

60.

Stidham

RW

,

Cai

L

,

Cheng

S

, et al.

Using computer vision to improve endoscopic disease quantification in therapeutic clinical trials of ulcerative colitis

.

Gastroenterology.

2024

;

166

(

1

):

155

-

167.e2

.

61.

Pagnini

C

,

Menasci

F

,

Desideri

F

, et al.

Endoscopic scores for inflammatory bowel disease in the era of “mucosal healing”: old problem, new perspectives

.

Dig Liver Dis.

2016

;

48

(

7

):

703

-

708

.

62.

Gui

X

,

Bazarova

A

,

Del Amor

R

, et al.

PICaSSO histologic remission index (PHRI) in ulcerative colitis: development of a novel simplified histological score for monitoring mucosal healing and predicting clinical outcomes and its applicability in an artificial intelligence system

.

Gut.

2022

;

71

(

5

):

889

-

898

.

63.

Marchal-Bressenot

A

,

Salleron

J

,

Boulagnon-Rombi

C

, et al.

Development and validation of the Nancy histological index for UC

.

Gut.

2017

;

66

(

1

):

43

-

49

.

64.

Rimola

J

,

Rodriguez

S

,

Garcia-Bosch

O

, et al.

Magnetic resonance for assessment of disease activity and severity in ileocolonic Crohn’s disease

.

Gut.

2009

;

58

(

8

):

1113

-

1120

.

65.

Steward

MJ

,

Punwani

S

,

Proctor

I

, et al.

Non-perforating small bowel Crohn’s disease assessed by MRI enterography: derivation and histopathological validation of an MR-based activity index

.

Eur J Radiol.

2012

;

81

(

9

):

2080

-

2088

.

66.

Thierry

ML

,

Rousseau

H

,

Pouillon

L

, et al.

Accuracy of diffusion-weighted magnetic resonance imaging in detecting mucosal healing and treatment response, and in predicting surgery, in Crohn’s disease

.

J Crohns Colitis.

2018

;

12

(

10

):

1180

-

1190

.

67.

Buisson

A

,

Pereira

B

,

Goutte

M

, et al.

Magnetic resonance index of activity (MaRIA) and Clermont score are highly and equally effective MRI indices in detecting mucosal healing in Crohn’s disease

.

Dig Liver Dis.

2017

;

49

(

11

):

1211

-

1217

.

68.

Pariente

B

,

Cosnes

J

,

Danese

S

, et al.

Development of the Crohn’s disease digestive damage score, the Lemann score

.

Inflamm Bowel Dis.

2011

;

17

(

6

):

1415

-

1422

.

69.

Rozendorn

N

,

Amitai

MM

,

Eliakim

RA

,

Kopylov

U

,

Klang

E.

A review of magnetic resonance enterography-based indices for quantification of Crohn’s disease inflammation

.

Therap Adv Gastroenterol.

2018

;

11

(

1

):

1756284818765956

.

70.

Tielbeek

JA

,

Makanyanga

JC

,

Bipat

S

, et al.

Grading Crohn disease activity with MRI: interobserver variability of MRI features, MRI scoring of severity, and correlation with Crohn disease endoscopic index of severity

.

AJR Am J Roentgenol.

2013

;

201

(

6

):

1220

-

1228

.

71.

Rieder

F.

Toward an antifibrotic therapy for inflammatory bowel disease

.

United European Gastroenterol J.

2016

;

4

(

4

):

493

-

495

.

72.

Lin

SN

,

Mao

R

,

Qian

C

, et al. ;

Stenosis Therapy and Antifibrotic Research (STAR) Consortium

.

Development of antifibrotic therapy for stricturing Crohn’s disease: lessons from randomized trials in other fibrotic diseases

.

Physiol Rev.

2022

;

102

(

2

):

605

-

652

.

73.

Klein

A

,

Mazor

Y

,

Karban

A

, et al.

Early histological findings may predict the clinical phenotype in Crohn’s colitis

.

United European Gastroenterol J.

2017

;

5

(

5

):

694

-

701

.

74.

Chen

Y

,

Li

H

,

Feng

J

, et al.

A novel radiomics nomogram for the prediction of secondary loss of response to infliximab in Crohn’s disease

.

J Inflamm Res.

2021

;

14

(

1

):

2731

-

2740

.

75.

Feng

J

,

Feng

Q

,

Chen

Y

, et al.

MRI-based radiomic signature identifying secondary loss of response to infliximab in Crohn’s disease

.

Front Nutr.

2021

;

8

(

1

):

773040

.

76.

Ohara

J

,

Nemoto

T

,

Maeda

Y

, et al.

Deep learning-based automated quantification of goblet cell mucus using histological images as a predictor of clinical relapse of ulcerative colitis with endoscopic remission

.

J Gastroenterol.

2022

;

57

(

12

):

962

-

970

.

77.

Chirra

P

,

Sharma

A

,

Bera

K

, et al.

Integrating radiomics with clinicoradiological scoring can predict high-risk patients who need surgery in Crohn’s disease: a pilot study

.

Inflamm Bowel Dis.

2023

;

29

(

3

):

349

-

358

.

78.

Bryant

RV

,

Winer

S

,

Travis

SP

,

Riddell

RH.

Systematic review: histological remission in inflammatory bowel disease. Is “complete” remission the new treatment paradigm? An IOIBD initiative

.

J Crohns Colitis.

2014

;

8

(

12

):

1582

-

1597

.

79.

Iacucci

M

,

Parigi

TL

,

Del Amor

R

, et al.

Artificial intelligence enabled histological prediction of remission or activity and clinical outcomes in ulcerative colitis

.

Gastroenterology.

2023

;

164

(

7

):

1180

-

1188.e2

.

80.

Li

X

,

Zhang

N

,

Hu

C

, et al.

CT-based radiomics signature of visceral adipose tissue for prediction of disease progression in patients with Crohn’s disease: a multicentre cohort study

.

EClinicalMedicine.

2023

;

56

(

1

):

101805

.

81.

Yao

J

,

Zhou

J

,

Zhong

Y

, et al.

Computed tomography-based radiomics nomogram using machine learning for predicting 1-year surgical risk after diagnosis of Crohn’s disease

.

Med Phys.

2023

;

50

(

6

):

3862

-

3872

.

82.

Shen

XD

,

Zhang

RN

,

Huang

SY

, et al.

Preoperative computed tomography enterography-based radiomics signature: a potential predictor of postoperative anastomotic recurrence in patients with Crohn’s disease

.

Eur J Radiol.

2023

;

162

(

1

):

110766

.