Abstract

Endoscopy, histology, and cross-sectional imaging serve as fundamental pillars in the detection, monitoring, and prognostication of inflammatory bowel disease (IBD). However, interpretation of these studies often relies on subjective human judgment, which can lead to delays, intra- and interobserver variability, and potential diagnostic discrepancies. With the rising incidence of IBD globally coupled with the exponential digitization of these data, there is a growing demand for innovative approaches to streamline diagnosis and elevate clinical decision-making. In this context, artificial intelligence (AI) technologies emerge as a timely solution to address the evolving challenges in IBD. Early studies using deep learning and radiomics approaches for endoscopy, histology, and imaging in IBD have demonstrated promising results for using AI to detect, diagnose, characterize, phenotype, and prognosticate IBD. Nonetheless, the available literature has inherent limitations and knowledge gaps that need to be addressed before AI can transition into a mainstream clinical tool for IBD. To better understand the potential value of integrating AI in IBD, we review the available literature to summarize our current understanding and identify gaps in knowledge to inform future investigations.

Introduction

Inflammatory bowel diseases (IBD), which consists of Crohn’s disease (CD) and ulcerative colitis (UC), are chronic immune-mediated inflammatory diseases (IMID) of the gastrointestinal (GI) tract that affects over 2 million Americans and are associated with significant morbidity.1 Accurate and timely diagnosis, personalized treatment strategies, and tight monitoring are critical for mitigating disease progression and complications. To accomplish these objectives, gastroenterologists frequently rely on endoscopy, histology, and cross-sectional imaging, such as magnetic resonance enterography (MRE), computed tomography enterography (CTE), and intestinal ultrasound (IUS) to provide insights into anatomical and behavioral disease features to inform clinical decisions. However, interpretation of studies often relies on subjective human judgment, which can lead to delays, interobserver variability and/or potential diagnostic discrepancies. Moreover, the rising incidence of IBD globally,2 coupled with the exponential digitalization of these data, has intensified the demand for innovative approaches to streamline diagnosis and enhance clinical decision-making—a demand perfect for artificial intelligence (AI). Finally, novel imaging techniques in IBD such as IUS have gained interest during the last years, and AI-based operator supporting system can facilitate their use by less experienced operators.

In the last decade, AI technologies have disrupted the field of gastroenterology (GI) and have been extensively applied to medical imaging in several different medical disciplines that face similar challenges as those in IBD. Artificial intelligence algorithms, particularly those rooted in deep learning and pattern recognition, can process and analyze large volumes of data with remarkable accuracy and efficiency. These AI applications can perform a variety of tasks such as image classification, lesion segmentation, and detection and even uncover novel biomarkers with biologic and prognostic significance. In GI, perhaps the most disruptive AI-based technology is the FDA-approved computer-aided diagnosis system for detecting colon polyps during colonoscopy.3 In Barrett’s esophagus, deep learning models have been developed to predict dysplasia grade on histologic images.4 Oncology has arguably witnessed the great advancements in AI for medical imaging. Several AI-based imaging applications have been developed for the early detection of malignant lesions, predicting tumor biology to inform precision medicine strategies, informing prognosis, and predicting future development of certain cancers. These applications will potentially not only improve patient outcomes but also have significant downstream effects on disease prevention, reducing healthcare expenditures, and potentially drug discovery.

While our understanding about the role of AI for endoscopy, histology, and imaging in IBD is in its infancy, the available literature contains promising findings that support the potential of AI to improve patient care and also advance our understanding about the heterogenous nature of IBD. Additionally, providers will need a working knowledge about the strengths and limitations of AI technologies as it becomes increasingly integrated into routine clinical practice. As such, investing in investigations on the role of AI in endoscopy, histology, and imaging has significant importance for advancing the field of IBD. Thus, the purpose of this review is to provide a comprehensive summary of recent advances in the application of AI technologies in endoscopy, histology, and imaging for the diagnosis, phenotyping, and prognosis of IBD.

Artificial Intelligence in Medical Images

Artificial intelligence is an umbrella term that encompasses any machine, system, or software that can perform tasks that typically requires human intelligence. Artificial intelligence is composed of several subfields of which natural language processing (NLP) and machine learning have received the most attention. Natural language processing algorithms enable computers to understand, interpret, and generate human language. Machine learning involves developing algorithms and statistical models that enable computers to train from large amounts of data and make predictions or decisions. The central motivation of ML is to allow the learned function to be applied to new data. For medical images from endoscopy, histology, and cross-sectional imaging, deep learning (DL), a subset of machine learning, and radiomics are the most commonly used methods and will be the main focus of this review. The limitations unique to each AI subtype are also reviewed.

Deep Learning

Deep learning is inspired by the neural networks of the human visual cortex, and this artificial neural network (Figure 1) consists of interconnected nodes (neurons) that are organized into different layers, which include an input layer, hidden layer, and output layer. In between these layers are connections with associated weights that represent the strength of the connection between each neuron and are adjusted during training using various optimization algorithms to uncover patterns to generate a conclusion or complete the intended task.5 There are several different types of neural networks including fully connected neural network, recurrent neural network, deep generative networks, and convolutional neural network (CNN).

Example of a convolutional neural network algorithm
Figure 1.

Example of a convolutional neural network algorithm

Among the different DL algorithms, CNN is the most commonly used for medical images. Convolutional neural network is an artificial neural network that uses images as input and can perform automated tasks such as image classification, object detection, segmentation, and image generation.5 In CNN, its unique feature and core building block is the convolutional layer. In this layer, a set of small filters (ie, kernels) are applied to the input image. Through a computational method called convolution, these filters highlight specific features of the imaging such edges and textures to name a few. Afterwards, an activation function is applied for the CNN algorithm to learn complex relationships and generate a feature map. Following the activation layer, a pooling layer is often added to retain the most important imaging features while discarding the rest. Then, in the final 2 layers (fully connected layers and output layer), the learned imaging features are used to make classifications or predictions from the input image.

In the literature and media, CNN has received a lot of attention. A team of investigators from Stanford University developed a CNN algorithm that can classify skin cancer as accurately as dermatologists.6 Afterwards, the investigators developed a mobile phone app with their algorithm to improve healthcare access. Other CNN algorithms for oncologic imaging have been developed to inform follow-up lung cancer screening recommendations, detect brain cancer metastases for radiation planning, inform volumetric reconstruction of renal tumors for surgical planning, and also monitor tumor response to therapy, just to name a few.7 In GI, the FDA has approved CNN-based computer-aided diagnostic system for detecting colon polyps on colonoscopy, which has proven to improve polyp detection.8 These studies highlight how CNN-based algorithms can transform care, increase accessibility, potentially reduce costs, and consequently address healthcare disparities. CNN; however, has some limitations that limit its widespread clinical adoption. Firstly, CNNs are heavily dependent on the quality and amount of data in the training data set, which risks overfitting and biased results. Secondly, CNNs are often considered “black-box” models, meaning the logical basis of the internal workings of the model that enable it to complete the intended task is often difficult to explain or unknown. In high-stakes situations, the ambiguous interpretability of CNN models can introduce distrust from patients and providers and discourage its use for clinical decision-making. Finally, there are important ethical and legal considerations regarding patient privacy, data security, sex/gender biases, and adequate human oversight when used as a clinical decision-aid tool.9 This is a particularly important limitation when applying AI to medical imaging in IBD because most data are generated from European ancestry populations, resulting in inherit biases in using AI approaches in IBD.

Radiomics

Radiomics is a subfield of AI that uses advanced computational methods to analyze cross-sectional imaging and quantify a wide range of imaging features that are not easily appreciated by the human eye.10 Radiomics can quantify visual differences in image intensity, shape, texture, and spatial relationships and has been used in MRI, CT, PET, and US (Figure 2). Unlike CNN, radiomics does not imply automation of the diagnostic process but rather uses AI to generate additional data points. Radiomics is often combined with other clinical or “-omic” data to develop comprehensive and more accurate prediction models. One example of an IBD radiomic approach is MRI textural analysis (MRTA).11 This approach is a postprocessing procedure that commonly uses the filtration-histogram technique that extracts features of difference sizes in the region of interest (ROI) to construct representative histogram distribution of gray-scale levels and/or pixel intensity on MRI to allow for quantification of different parameters that reflect the underlying tissue biology. These parameters include mean (average value of the pixels within the ROI), standard deviation (SD, degree of variation/dispersion from the average), skewness (symmetry of the histogram distribution), mean positive pixels (average of the pixel values that are bright), kurtosis (a measure of the peakedness of the histogram relative to a Gaussian distribution), and entropy (degree of irregularity of ROI). The use of MRTA in IBD will be discussed later in this review.

Example of radiomics workflow from image acquisition to decision support tool development.
Figure 2.

Example of radiomics workflow from image acquisition to decision support tool development.

While probably lesser known to the public than DL-based imaging applications, radiomics shows great promise in medicine. In oncology, radiomic studies have identified signatures that can predict outcomes, risk of distant metastasis, and tumor biology.12 Additionally, radiomic studies have developed novel methods to predict tumor gene expression on a genome-wide scale without the use of tumor biopsies, giving rise to the field of “radiogenomics.”13 For example, in hepatocellular carcinoma, Segal et al found combinations of 28 radiomic features of a hepatocellular carcinoma could account for 78% of the tumor’s transcriptome variation, and the involved genetic variations shared common physiologic functions such as cell proliferation or liver enzyme synthesis.14 Studies have also found radiomic features can predict the future development of cancer from prediagnostic imaging. Utilizing prediagnostic abdominal CTs, Touseef et al’s radiomic analyses could predict future development of pancreatic ductal adenocarcinoma with 86% accuracy.15 Such AI applications would be invaluable for facilitating early intervention for aggressive cancers like pancreatic cancer.

While promising, radiomics also possess limitations that need to be addressed before integrating it into clinical practice.16 Firstly, accurate delineation of organ(s) of interest is critical for computation of radiomic features. This is not only time-intensive but can potentially be affected by interobserver variability. A possible solution is using semiautomatic or automatic methods for image annotation. Secondly, differences in imaging equipment and protocols can affect radiomic feature quantification and generalization of findings across institutions. Harmonization and standardization of imaging acquisition and feature computation can mitigate this variability, but certain technical factors such as differences in imaging systems and image intensity scales can still affect radiomic feature computation despite this. Finally, because radiomic analysis requires large amounts of data to draw robust conclusions, few radiomic studies have been validated in independent data sets.

Methods

Searches of Medline were performed from 1946 to July 2, 2023, to identify any studies that described AI approaches with endoscopy, histology, and imaging in IBD. Inclusion criteria required that studies were published in a peer-reviewed journal and included unselected adult or pediatric subjects with a possible diagnosis of UC, CD, or IBD-unclassified with available endoscopic, histologic, or cross-sectional images evaluated by or incorporated into an AI algorithm (deep learning, convolutional neural network, automated segmentation, machine-learning algorithm, radiomics). Scientific conference abstracts were excluded. There were no restrictions on number of subjects, type of AI approach, or study aims (diagnostics vs phenotyping vs prognosis). Search terms are described in the Supplemental Content 1 online. All articles were screened for relevance to the study question, and potentially relevant articles were reviewed in more detailed; P.G. performed assessment of article eligibility.

The following terms were used to identify potentially eligible IBD articles: inflammatory bowel diseases, ulcerative colitis, and Crohn disease. These terms were combined with the Boolean operator “OR.” The following terms were used to identify potentially eligible endoscopy articles: endoscopy, colonoscopy, and endoscopic scoring. The following terms were used to identify potentially eligible histopathology articles: histology, histopathology, and immunohistochemistry. The following terms were used to identify potentially eligible imaging articles: diagnostic imaging, magnetic resonance imaging, X-ray computed tomography, ultrasound, and radiology. These terms were combined with the set operator “OR” The following terms were used to identify potentially eligible articles about artificial intelligence: artificial intelligence, machine learning, radiomic, convolutional neural network, automat.* These terms were combined with the Boolean operator “OR” The 3 searches were combined using the set operator “AND” and limited to humans.

Artificial Intelligence in Endoscopy, Histology, and Cross-sectional Imaging in IBD

Overview

The search strategy yielded 52 unique studies (Figure 3), the majority of which included cross-sectional imaging (n = 31) followed by endoscopy (n = 12) and histology (n = 7). Imaging studies primarily studied CD, while endoscopy study primarily evaluated UC. The studies will be reviewed based on the intended task of the AI algorithm: detection/diagnosis (n = 16), characterization/phenotyping of IBD lesions (n = 25), and prognosis (n = 10).

Flow diagram of search strategy
Figure 3.

Flow diagram of search strategy

Diagnosis/Detection

During the initial diagnostic work up in individuals suspected to have IBD, the index endoscopy, histology, and imaging are arguably the most important compared with future exams in the disease course. The index data points help establish the diagnosis (CD vs UC), anatomical distribution, disease severity, which are all important to direct therapeutic and monitoring strategies. In this context, AI has the potential to streamline clinical workflow through automated algorithms and improve detection accuracy and intra- and interobserver variability. Table 1 summarizes the relevant studies.

Table 1.

Sixteen studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the detection and diagnosis of IBD.

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Mahapatra 201317CD
N = 26
MRERadiomics
  • Semi-automatic detection segments of bowel affected by CD using a novel method to calculate shape asymmetry combined with feature intensity and texture.

  • Performance was measured against dual tree complex wavelet transform (DTCWT) and shape-asymmetry-based method (Asy).

  • The investigator’s method had higher sensitivity (90.4%), specificity (90.1%), and accuracy (88.9%) for detecting segments of bowel with CD compared with DTCWT and Asy methods.

Hahnemann 201518IBD
N = 50
MREDL, CNN
  • Increased detection of inflammatory intestinal lesions using automatically generated maps of bowel motility with static MRI images compared with static MRI images alone.

  • Additional inflammatory lesions were found in 13 (26%) of 50 patients with automated motility compared + static MRI images vs static MRI images alone (P = .0002)

Mossotto
201719
Pediatric IBD
N = 287
Endoscopic and histologic imagesSupervised and unsupervised ML methods
  • Develop ML algorithm to classify disease using endoscopic and histologic data from pediatric IBD patients.

  • ML models using endoscopy alone, histology alone, and endoscopy + histology combined to differentiate between UC, CD, and IBD unclassified yielded AUC of 0.71, 0.77, and 0.83, respectively.

Naziroglu 201720CD
N = 53
MREDL, Active contouring model
  • Compare manual vs semiautomatic measurements (active contouring segmentation algorithm) for delineating bowel and measurement of wall thickness in CD.

  • Reproducibility of delineating diseased bowel evaluated by comparing area of overlap on 2 independent segmentations for each approach (reflected by overlap coefficient).

  • Reproducibility of measuring bowel wall thickness evaluated by interobserver agreement.

  • Semiautomatic delineation of diseased regions of active CD was more reproducible than manual (median overlap 0.89 vs 0.72, p = 1.4e-5).

  • Semiautomatic measurement (ICC 0.88) of bowel wall thickness had higher interobserver agreement than manual (ICC 0.45, P = .005)

Gollifer
201921
CD
N = 105
MREDL, CNN
  • Compare software automated vs subjective quantification of intestinal motility and identify the combination of motility metrics most associated with symptom severity (HBI)

  • On multivariable model, software quantified temporal motility variation (β = -0.23, P = .005) and area of motile bowel (β = 0.16, P = .01) were associated with HBI.

  • Subjective quantification of motility metrics was not associated with HBI.

Klang
202022
CD
N = 49
VCE imagesDL, CNN
  • Develop and evaluate a DL algorithm for detecting small bowel ulcers in CD.

  • The algorithm yielded AUC 0.99 for detecting ulcers.

Klang
202023
CD
N = 27 892 images
VCE imagesDL, CNN
  • Evaluate the accuracy of DL for detecting strictures from VCE images in CD.

  • The model differentiated strictures from normal mucosa and small-bowel ulcers with AUC 0.99 and 0.94, respectively.

Li
202124
IBD
N = 165 lesions (UC = 66, CD 99)
Multislice CTRadiomic
  • Develop a radiomic nomogram to distinguish between UC vs CD.

  • Multivariable regression model including only radiomic features yielded AUC 0.81, accuracy 70%, sensitivity 80%, and specificity 54%

  • Combining radiomic features with 3 significant clinical features (presence of inflammatory mesenteric fat, lesion location, and CT-value of arterial-phase enhancement of bowel wall) improved the performance to AUC 0.88.

Zhu
202125
CD (n = 93) and intestinal TB (n = 67)CTERadiomic
  • Develop and validate a clinical radiomics nomogram to differentiate CD from intestinal TB using clinical and radiomic features.

  • The clinical radiomic nomogram containing 9 radiomic and 2 clinical features had good performance (AUC 0.96) and superior to the only clinical and only radiomic model.

Arkko
202126
CD and non CD
(n = 369; 50% CD)
MREDL, CNN
  • Determine feasibility of detecting CD using automated quantification of intestinal motility on MRE.

  • After testing with 4 different ROI approaches and 3 motility indices in 3 independent data sets, using full image ROI and motility index 1 (average of all generated motility maps) had the best performance with AUC 0.78.

Klang
202127
CDVCE
N = 19 245 images
DL, CNN
  • Determine if a CNN model trained to detect small bowel CD ulcers can differentiate between for CD vs NSAID-induced ulcers.

  • The CNN model trained on CD ulcers detected NSAID-induced ulcers with AUC 0.97, which is similar performance for detecting CD ulcers.

  • Thus, CNN model unable to differentiate between NSAID vs CD ulcers.

Jiang
202228
IBD
N = 120
CTEDL, GIF (gradient image filter) algorithm
  • Compare accuracy of diagnosing IBD with traditional CTE vs low-dose CTE with an optimized GIF algorithm.

  • The diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the GIF algorithm group were higher than traditional CTE control group (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05).

  • AI assisted imaging enhancement improves diagnostic accuracy

Wang
202229
IBD
N = 496 (217 CD)
Endoscopic imagesDL, CNN
  • Develop a CNN-model to differentiate between CD vs UC vs healthy controls.

  • The CNN model yielded higher differential diagnosis accuracy than humans for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).

Brodersen
202330
IBD
N = 132
VCEDL
  • Determine agreement between AI-aided vs standard evaluation of pan-enteric capsule endoscopy assessment for detecting CD.

  • The AI-aided evaluation reduced the number of output images to 470 and median review time was 3.2minutes/patient.

  • For detecting CD, observers reviewing AI-selected images had 92-96% sensitivity and 90-93% specificity for diagnosing CD.

  • The negative predictive value for CD was 95%.

Carter
202331
IBD
N = 308
Intestinal USDL, CNN
  • Develop and validate an automated DL module to distinguished between increased and normal bowel wall thickness.

  • The module had 90.1% accuracy, 86.4% sensitivity, and 94% specificity to detecting bowel wall thickening with an AUC 0.98.

Gong
202332
CD and iTB
N = 108
CTERadiomics
  • Develop and test a clinical multiregional radiomic model to differentiate CD from iTB.

  • Radiomic features were extracted from bowel wall, largest lymph node, and region surrounding the ileocecal region.

  • A multimodal nomogram including 2 radiomic features, involved bowel segment on CTE, and longitudinal ulcer on endoscopy yielded AUC 0.96, which was better than clinical only model (AUC 0.88, P = .004).

  • Combined nomogram had greater accuracy (89.5%) than 2 radiologists (66.7-75.2%).

  • Decision curve analysis showed combined model had highest net benefit.

Zhou
202333
IBD
N = 316
CTEDL, CNN, Radiomics
  • Investigate if volumetric visceral adipose tissue features using radiomics and 3D CNN can differentiate between CD vs UC.

  • A radiomics model had a higher AUC than the CNN model (AUC 0.71 vs 0.69, P = .750).

  • A nomogram incorporating the radiomics model and clinical factors differentiated UC from CD with AUC 0.78, which was better than radiomics model only (AUC 0.72) or clinical variables only (AUC 0.74).

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Mahapatra 201317CD
N = 26
MRERadiomics
  • Semi-automatic detection segments of bowel affected by CD using a novel method to calculate shape asymmetry combined with feature intensity and texture.

  • Performance was measured against dual tree complex wavelet transform (DTCWT) and shape-asymmetry-based method (Asy).

  • The investigator’s method had higher sensitivity (90.4%), specificity (90.1%), and accuracy (88.9%) for detecting segments of bowel with CD compared with DTCWT and Asy methods.

Hahnemann 201518IBD
N = 50
MREDL, CNN
  • Increased detection of inflammatory intestinal lesions using automatically generated maps of bowel motility with static MRI images compared with static MRI images alone.

  • Additional inflammatory lesions were found in 13 (26%) of 50 patients with automated motility compared + static MRI images vs static MRI images alone (P = .0002)

Mossotto
201719
Pediatric IBD
N = 287
Endoscopic and histologic imagesSupervised and unsupervised ML methods
  • Develop ML algorithm to classify disease using endoscopic and histologic data from pediatric IBD patients.

  • ML models using endoscopy alone, histology alone, and endoscopy + histology combined to differentiate between UC, CD, and IBD unclassified yielded AUC of 0.71, 0.77, and 0.83, respectively.

Naziroglu 201720CD
N = 53
MREDL, Active contouring model
  • Compare manual vs semiautomatic measurements (active contouring segmentation algorithm) for delineating bowel and measurement of wall thickness in CD.

  • Reproducibility of delineating diseased bowel evaluated by comparing area of overlap on 2 independent segmentations for each approach (reflected by overlap coefficient).

  • Reproducibility of measuring bowel wall thickness evaluated by interobserver agreement.

  • Semiautomatic delineation of diseased regions of active CD was more reproducible than manual (median overlap 0.89 vs 0.72, p = 1.4e-5).

  • Semiautomatic measurement (ICC 0.88) of bowel wall thickness had higher interobserver agreement than manual (ICC 0.45, P = .005)

Gollifer
201921
CD
N = 105
MREDL, CNN
  • Compare software automated vs subjective quantification of intestinal motility and identify the combination of motility metrics most associated with symptom severity (HBI)

  • On multivariable model, software quantified temporal motility variation (β = -0.23, P = .005) and area of motile bowel (β = 0.16, P = .01) were associated with HBI.

  • Subjective quantification of motility metrics was not associated with HBI.

Klang
202022
CD
N = 49
VCE imagesDL, CNN
  • Develop and evaluate a DL algorithm for detecting small bowel ulcers in CD.

  • The algorithm yielded AUC 0.99 for detecting ulcers.

Klang
202023
CD
N = 27 892 images
VCE imagesDL, CNN
  • Evaluate the accuracy of DL for detecting strictures from VCE images in CD.

  • The model differentiated strictures from normal mucosa and small-bowel ulcers with AUC 0.99 and 0.94, respectively.

Li
202124
IBD
N = 165 lesions (UC = 66, CD 99)
Multislice CTRadiomic
  • Develop a radiomic nomogram to distinguish between UC vs CD.

  • Multivariable regression model including only radiomic features yielded AUC 0.81, accuracy 70%, sensitivity 80%, and specificity 54%

  • Combining radiomic features with 3 significant clinical features (presence of inflammatory mesenteric fat, lesion location, and CT-value of arterial-phase enhancement of bowel wall) improved the performance to AUC 0.88.

Zhu
202125
CD (n = 93) and intestinal TB (n = 67)CTERadiomic
  • Develop and validate a clinical radiomics nomogram to differentiate CD from intestinal TB using clinical and radiomic features.

  • The clinical radiomic nomogram containing 9 radiomic and 2 clinical features had good performance (AUC 0.96) and superior to the only clinical and only radiomic model.

Arkko
202126
CD and non CD
(n = 369; 50% CD)
MREDL, CNN
  • Determine feasibility of detecting CD using automated quantification of intestinal motility on MRE.

  • After testing with 4 different ROI approaches and 3 motility indices in 3 independent data sets, using full image ROI and motility index 1 (average of all generated motility maps) had the best performance with AUC 0.78.

Klang
202127
CDVCE
N = 19 245 images
DL, CNN
  • Determine if a CNN model trained to detect small bowel CD ulcers can differentiate between for CD vs NSAID-induced ulcers.

  • The CNN model trained on CD ulcers detected NSAID-induced ulcers with AUC 0.97, which is similar performance for detecting CD ulcers.

  • Thus, CNN model unable to differentiate between NSAID vs CD ulcers.

Jiang
202228
IBD
N = 120
CTEDL, GIF (gradient image filter) algorithm
  • Compare accuracy of diagnosing IBD with traditional CTE vs low-dose CTE with an optimized GIF algorithm.

  • The diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the GIF algorithm group were higher than traditional CTE control group (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05).

  • AI assisted imaging enhancement improves diagnostic accuracy

Wang
202229
IBD
N = 496 (217 CD)
Endoscopic imagesDL, CNN
  • Develop a CNN-model to differentiate between CD vs UC vs healthy controls.

  • The CNN model yielded higher differential diagnosis accuracy than humans for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).

Brodersen
202330
IBD
N = 132
VCEDL
  • Determine agreement between AI-aided vs standard evaluation of pan-enteric capsule endoscopy assessment for detecting CD.

  • The AI-aided evaluation reduced the number of output images to 470 and median review time was 3.2minutes/patient.

  • For detecting CD, observers reviewing AI-selected images had 92-96% sensitivity and 90-93% specificity for diagnosing CD.

  • The negative predictive value for CD was 95%.

Carter
202331
IBD
N = 308
Intestinal USDL, CNN
  • Develop and validate an automated DL module to distinguished between increased and normal bowel wall thickness.

  • The module had 90.1% accuracy, 86.4% sensitivity, and 94% specificity to detecting bowel wall thickening with an AUC 0.98.

Gong
202332
CD and iTB
N = 108
CTERadiomics
  • Develop and test a clinical multiregional radiomic model to differentiate CD from iTB.

  • Radiomic features were extracted from bowel wall, largest lymph node, and region surrounding the ileocecal region.

  • A multimodal nomogram including 2 radiomic features, involved bowel segment on CTE, and longitudinal ulcer on endoscopy yielded AUC 0.96, which was better than clinical only model (AUC 0.88, P = .004).

  • Combined nomogram had greater accuracy (89.5%) than 2 radiologists (66.7-75.2%).

  • Decision curve analysis showed combined model had highest net benefit.

Zhou
202333
IBD
N = 316
CTEDL, CNN, Radiomics
  • Investigate if volumetric visceral adipose tissue features using radiomics and 3D CNN can differentiate between CD vs UC.

  • A radiomics model had a higher AUC than the CNN model (AUC 0.71 vs 0.69, P = .750).

  • A nomogram incorporating the radiomics model and clinical factors differentiated UC from CD with AUC 0.78, which was better than radiomics model only (AUC 0.72) or clinical variables only (AUC 0.74).

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to number of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.

Table 1.

Sixteen studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the detection and diagnosis of IBD.

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Mahapatra 201317CD
N = 26
MRERadiomics
  • Semi-automatic detection segments of bowel affected by CD using a novel method to calculate shape asymmetry combined with feature intensity and texture.

  • Performance was measured against dual tree complex wavelet transform (DTCWT) and shape-asymmetry-based method (Asy).

  • The investigator’s method had higher sensitivity (90.4%), specificity (90.1%), and accuracy (88.9%) for detecting segments of bowel with CD compared with DTCWT and Asy methods.

Hahnemann 201518IBD
N = 50
MREDL, CNN
  • Increased detection of inflammatory intestinal lesions using automatically generated maps of bowel motility with static MRI images compared with static MRI images alone.

  • Additional inflammatory lesions were found in 13 (26%) of 50 patients with automated motility compared + static MRI images vs static MRI images alone (P = .0002)

Mossotto
201719
Pediatric IBD
N = 287
Endoscopic and histologic imagesSupervised and unsupervised ML methods
  • Develop ML algorithm to classify disease using endoscopic and histologic data from pediatric IBD patients.

  • ML models using endoscopy alone, histology alone, and endoscopy + histology combined to differentiate between UC, CD, and IBD unclassified yielded AUC of 0.71, 0.77, and 0.83, respectively.

Naziroglu 201720CD
N = 53
MREDL, Active contouring model
  • Compare manual vs semiautomatic measurements (active contouring segmentation algorithm) for delineating bowel and measurement of wall thickness in CD.

  • Reproducibility of delineating diseased bowel evaluated by comparing area of overlap on 2 independent segmentations for each approach (reflected by overlap coefficient).

  • Reproducibility of measuring bowel wall thickness evaluated by interobserver agreement.

  • Semiautomatic delineation of diseased regions of active CD was more reproducible than manual (median overlap 0.89 vs 0.72, p = 1.4e-5).

  • Semiautomatic measurement (ICC 0.88) of bowel wall thickness had higher interobserver agreement than manual (ICC 0.45, P = .005)

Gollifer
201921
CD
N = 105
MREDL, CNN
  • Compare software automated vs subjective quantification of intestinal motility and identify the combination of motility metrics most associated with symptom severity (HBI)

  • On multivariable model, software quantified temporal motility variation (β = -0.23, P = .005) and area of motile bowel (β = 0.16, P = .01) were associated with HBI.

  • Subjective quantification of motility metrics was not associated with HBI.

Klang
202022
CD
N = 49
VCE imagesDL, CNN
  • Develop and evaluate a DL algorithm for detecting small bowel ulcers in CD.

  • The algorithm yielded AUC 0.99 for detecting ulcers.

Klang
202023
CD
N = 27 892 images
VCE imagesDL, CNN
  • Evaluate the accuracy of DL for detecting strictures from VCE images in CD.

  • The model differentiated strictures from normal mucosa and small-bowel ulcers with AUC 0.99 and 0.94, respectively.

Li
202124
IBD
N = 165 lesions (UC = 66, CD 99)
Multislice CTRadiomic
  • Develop a radiomic nomogram to distinguish between UC vs CD.

  • Multivariable regression model including only radiomic features yielded AUC 0.81, accuracy 70%, sensitivity 80%, and specificity 54%

  • Combining radiomic features with 3 significant clinical features (presence of inflammatory mesenteric fat, lesion location, and CT-value of arterial-phase enhancement of bowel wall) improved the performance to AUC 0.88.

Zhu
202125
CD (n = 93) and intestinal TB (n = 67)CTERadiomic
  • Develop and validate a clinical radiomics nomogram to differentiate CD from intestinal TB using clinical and radiomic features.

  • The clinical radiomic nomogram containing 9 radiomic and 2 clinical features had good performance (AUC 0.96) and superior to the only clinical and only radiomic model.

Arkko
202126
CD and non CD
(n = 369; 50% CD)
MREDL, CNN
  • Determine feasibility of detecting CD using automated quantification of intestinal motility on MRE.

  • After testing with 4 different ROI approaches and 3 motility indices in 3 independent data sets, using full image ROI and motility index 1 (average of all generated motility maps) had the best performance with AUC 0.78.

Klang
202127
CDVCE
N = 19 245 images
DL, CNN
  • Determine if a CNN model trained to detect small bowel CD ulcers can differentiate between for CD vs NSAID-induced ulcers.

  • The CNN model trained on CD ulcers detected NSAID-induced ulcers with AUC 0.97, which is similar performance for detecting CD ulcers.

  • Thus, CNN model unable to differentiate between NSAID vs CD ulcers.

Jiang
202228
IBD
N = 120
CTEDL, GIF (gradient image filter) algorithm
  • Compare accuracy of diagnosing IBD with traditional CTE vs low-dose CTE with an optimized GIF algorithm.

  • The diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the GIF algorithm group were higher than traditional CTE control group (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05).

  • AI assisted imaging enhancement improves diagnostic accuracy

Wang
202229
IBD
N = 496 (217 CD)
Endoscopic imagesDL, CNN
  • Develop a CNN-model to differentiate between CD vs UC vs healthy controls.

  • The CNN model yielded higher differential diagnosis accuracy than humans for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).

Brodersen
202330
IBD
N = 132
VCEDL
  • Determine agreement between AI-aided vs standard evaluation of pan-enteric capsule endoscopy assessment for detecting CD.

  • The AI-aided evaluation reduced the number of output images to 470 and median review time was 3.2minutes/patient.

  • For detecting CD, observers reviewing AI-selected images had 92-96% sensitivity and 90-93% specificity for diagnosing CD.

  • The negative predictive value for CD was 95%.

Carter
202331
IBD
N = 308
Intestinal USDL, CNN
  • Develop and validate an automated DL module to distinguished between increased and normal bowel wall thickness.

  • The module had 90.1% accuracy, 86.4% sensitivity, and 94% specificity to detecting bowel wall thickening with an AUC 0.98.

Gong
202332
CD and iTB
N = 108
CTERadiomics
  • Develop and test a clinical multiregional radiomic model to differentiate CD from iTB.

  • Radiomic features were extracted from bowel wall, largest lymph node, and region surrounding the ileocecal region.

  • A multimodal nomogram including 2 radiomic features, involved bowel segment on CTE, and longitudinal ulcer on endoscopy yielded AUC 0.96, which was better than clinical only model (AUC 0.88, P = .004).

  • Combined nomogram had greater accuracy (89.5%) than 2 radiologists (66.7-75.2%).

  • Decision curve analysis showed combined model had highest net benefit.

Zhou
202333
IBD
N = 316
CTEDL, CNN, Radiomics
  • Investigate if volumetric visceral adipose tissue features using radiomics and 3D CNN can differentiate between CD vs UC.

  • A radiomics model had a higher AUC than the CNN model (AUC 0.71 vs 0.69, P = .750).

  • A nomogram incorporating the radiomics model and clinical factors differentiated UC from CD with AUC 0.78, which was better than radiomics model only (AUC 0.72) or clinical variables only (AUC 0.74).

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Mahapatra 201317CD
N = 26
MRERadiomics
  • Semi-automatic detection segments of bowel affected by CD using a novel method to calculate shape asymmetry combined with feature intensity and texture.

  • Performance was measured against dual tree complex wavelet transform (DTCWT) and shape-asymmetry-based method (Asy).

  • The investigator’s method had higher sensitivity (90.4%), specificity (90.1%), and accuracy (88.9%) for detecting segments of bowel with CD compared with DTCWT and Asy methods.

Hahnemann 201518IBD
N = 50
MREDL, CNN
  • Increased detection of inflammatory intestinal lesions using automatically generated maps of bowel motility with static MRI images compared with static MRI images alone.

  • Additional inflammatory lesions were found in 13 (26%) of 50 patients with automated motility compared + static MRI images vs static MRI images alone (P = .0002)

Mossotto
201719
Pediatric IBD
N = 287
Endoscopic and histologic imagesSupervised and unsupervised ML methods
  • Develop ML algorithm to classify disease using endoscopic and histologic data from pediatric IBD patients.

  • ML models using endoscopy alone, histology alone, and endoscopy + histology combined to differentiate between UC, CD, and IBD unclassified yielded AUC of 0.71, 0.77, and 0.83, respectively.

Naziroglu 201720CD
N = 53
MREDL, Active contouring model
  • Compare manual vs semiautomatic measurements (active contouring segmentation algorithm) for delineating bowel and measurement of wall thickness in CD.

  • Reproducibility of delineating diseased bowel evaluated by comparing area of overlap on 2 independent segmentations for each approach (reflected by overlap coefficient).

  • Reproducibility of measuring bowel wall thickness evaluated by interobserver agreement.

  • Semiautomatic delineation of diseased regions of active CD was more reproducible than manual (median overlap 0.89 vs 0.72, p = 1.4e-5).

  • Semiautomatic measurement (ICC 0.88) of bowel wall thickness had higher interobserver agreement than manual (ICC 0.45, P = .005)

Gollifer
201921
CD
N = 105
MREDL, CNN
  • Compare software automated vs subjective quantification of intestinal motility and identify the combination of motility metrics most associated with symptom severity (HBI)

  • On multivariable model, software quantified temporal motility variation (β = -0.23, P = .005) and area of motile bowel (β = 0.16, P = .01) were associated with HBI.

  • Subjective quantification of motility metrics was not associated with HBI.

Klang
202022
CD
N = 49
VCE imagesDL, CNN
  • Develop and evaluate a DL algorithm for detecting small bowel ulcers in CD.

  • The algorithm yielded AUC 0.99 for detecting ulcers.

Klang
202023
CD
N = 27 892 images
VCE imagesDL, CNN
  • Evaluate the accuracy of DL for detecting strictures from VCE images in CD.

  • The model differentiated strictures from normal mucosa and small-bowel ulcers with AUC 0.99 and 0.94, respectively.

Li
202124
IBD
N = 165 lesions (UC = 66, CD 99)
Multislice CTRadiomic
  • Develop a radiomic nomogram to distinguish between UC vs CD.

  • Multivariable regression model including only radiomic features yielded AUC 0.81, accuracy 70%, sensitivity 80%, and specificity 54%

  • Combining radiomic features with 3 significant clinical features (presence of inflammatory mesenteric fat, lesion location, and CT-value of arterial-phase enhancement of bowel wall) improved the performance to AUC 0.88.

Zhu
202125
CD (n = 93) and intestinal TB (n = 67)CTERadiomic
  • Develop and validate a clinical radiomics nomogram to differentiate CD from intestinal TB using clinical and radiomic features.

  • The clinical radiomic nomogram containing 9 radiomic and 2 clinical features had good performance (AUC 0.96) and superior to the only clinical and only radiomic model.

Arkko
202126
CD and non CD
(n = 369; 50% CD)
MREDL, CNN
  • Determine feasibility of detecting CD using automated quantification of intestinal motility on MRE.

  • After testing with 4 different ROI approaches and 3 motility indices in 3 independent data sets, using full image ROI and motility index 1 (average of all generated motility maps) had the best performance with AUC 0.78.

Klang
202127
CDVCE
N = 19 245 images
DL, CNN
  • Determine if a CNN model trained to detect small bowel CD ulcers can differentiate between for CD vs NSAID-induced ulcers.

  • The CNN model trained on CD ulcers detected NSAID-induced ulcers with AUC 0.97, which is similar performance for detecting CD ulcers.

  • Thus, CNN model unable to differentiate between NSAID vs CD ulcers.

Jiang
202228
IBD
N = 120
CTEDL, GIF (gradient image filter) algorithm
  • Compare accuracy of diagnosing IBD with traditional CTE vs low-dose CTE with an optimized GIF algorithm.

  • The diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the GIF algorithm group were higher than traditional CTE control group (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05).

  • AI assisted imaging enhancement improves diagnostic accuracy

Wang
202229
IBD
N = 496 (217 CD)
Endoscopic imagesDL, CNN
  • Develop a CNN-model to differentiate between CD vs UC vs healthy controls.

  • The CNN model yielded higher differential diagnosis accuracy than humans for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).

Brodersen
202330
IBD
N = 132
VCEDL
  • Determine agreement between AI-aided vs standard evaluation of pan-enteric capsule endoscopy assessment for detecting CD.

  • The AI-aided evaluation reduced the number of output images to 470 and median review time was 3.2minutes/patient.

  • For detecting CD, observers reviewing AI-selected images had 92-96% sensitivity and 90-93% specificity for diagnosing CD.

  • The negative predictive value for CD was 95%.

Carter
202331
IBD
N = 308
Intestinal USDL, CNN
  • Develop and validate an automated DL module to distinguished between increased and normal bowel wall thickness.

  • The module had 90.1% accuracy, 86.4% sensitivity, and 94% specificity to detecting bowel wall thickening with an AUC 0.98.

Gong
202332
CD and iTB
N = 108
CTERadiomics
  • Develop and test a clinical multiregional radiomic model to differentiate CD from iTB.

  • Radiomic features were extracted from bowel wall, largest lymph node, and region surrounding the ileocecal region.

  • A multimodal nomogram including 2 radiomic features, involved bowel segment on CTE, and longitudinal ulcer on endoscopy yielded AUC 0.96, which was better than clinical only model (AUC 0.88, P = .004).

  • Combined nomogram had greater accuracy (89.5%) than 2 radiologists (66.7-75.2%).

  • Decision curve analysis showed combined model had highest net benefit.

Zhou
202333
IBD
N = 316
CTEDL, CNN, Radiomics
  • Investigate if volumetric visceral adipose tissue features using radiomics and 3D CNN can differentiate between CD vs UC.

  • A radiomics model had a higher AUC than the CNN model (AUC 0.71 vs 0.69, P = .750).

  • A nomogram incorporating the radiomics model and clinical factors differentiated UC from CD with AUC 0.78, which was better than radiomics model only (AUC 0.72) or clinical variables only (AUC 0.74).

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to number of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.

In regards to endoscopy in the diagnosis of IBD, 2 of the biggest challenges are differentiating between UC and CD, especially if there is no ileal involvement, and the significant time requirement for reviewing video capsule endoscopy (VCEs) to evaluate for small bowel CD. Additionally, VCE interpretation suffers from substantial heterogeneity and suboptimal agreement in both inter- and intra-observer evaluation.34 To address this, Klang et al developed DL algorithms systems for detecting small bowel ulcers and strictures in CD that yielded area under the receiver operator curve (AUC) 0.99 and 0.99, respectively.22,23 However, in a later study, the same investigators found the CNN-model for detecting CD ulcers could not differentiate CD from NSAID ulcers.27 Nonetheless, AI-aided VCE interpretation has been shown to substantially reduce review with one study observing a median review time of 3.2 minutes per patient.30 While not ready for primetime, AI-assisted VCE evaluation is an exciting and much needed clinical tool in IBD. To date, only 1 study has developed a AI model for differentiating between CD and UC.29 The model developed by Wang et al yielded higher differential diagnosis accuracy than human observers for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).

Regarding the role of AI in the histologic diagnosis of IBD, there was only 1 study. Mossotto et al explored several different machine-learning models using endoscopic and histologic data to classify disease type (UC vs CD) in pediatric patients with IBD.19 An ML model using endoscopic and histologic data had superior performance compared with a ML model using endoscopic or histologic data alone (AUC 0.83 vs 0.71 and 0.77, respectively). This study also highlights the importance of using endoscopic and histologic data together in the diagnosis of IBD types.

In terms of cross-sectional imaging, 7 studies developed CNN-based algorithms to automate the detection and diagnosis of IBD through enhanced imaging interpretation as well as development of a novel biomarker using quantifying small bowel motility. Using MRE, Naziroglu et al developed an active contouring algorithm to perform volumetric segmentation of the inner and outer layers of the bowel wall to semiautomatically measure bowel wall thickness (BWT).20 The algorithm-generated measurements yield better interobserver agreement than human-generated measurements of BWT (intraclass correlation coefficient [ICC] 0.88 vs 0.45, P = .005). This study highlights the strength of AI to detect inflammatory lesions in IBD more consistently than human observers. Intestinal ultrasound is increasingly being used to detect and monitor IBD globally, and Carter et al developed an AI algorithm to automatically detect IBD on IUS.31 Increased bowel wall thickness (BWT) on IUS is one of the main features that reflect active inflammation and, while BWT has high ICC among expert performers, novice performers often struggle with accurately and consistently detecting inflammatory lesions. To address this challenge, the investigators developed a CNN algorithm to automatically detect bowel wall thickening using over 1000 labeled images. The final CNN algorithm accurately detected thickened bowel wall with an AUC 0.98 with 90.1% accuracy, 86.4% sensitivity, and 94.0% specificity. This study highlights how AI can be used to train inexperienced operators and improve the standardizing imaging interpretation. Moreover, this is the only available study on AI applications with IUS in IBD. Because it is a radiation-sparing, point-of-care exam, IUS is a promising medium to explore other AI approaches for IBD.

In addition to detecting inflamed segments of bowel, DL algorithms can be used to improve imaging processing and allow for safer imaging protocols with less radiation exposure. Using a gradient image filter algorithm on low-dose CTE, Jiang et al found the diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the gradient image filter (GIF) algorithm group were higher than the traditional CTE protocol control group for differentiating CD from UC (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05).28

Another opportunity for AI to detect IBD is quantifying intestinal motility, which is often impaired secondary to active inflammation and/or fibrosis. Leveraging cine images captured during MRE, 3 studies developed deep learning algorithms to quantify intestinal motility for detecting IBD. Hanhemann et al found using automatically generated intestinal motility maps with static MRIs had a higher detection rate of inflammatory lesions (66 lesions in 38 subjects) compared with static MRI alone (51 lesions in 34 subjects, P = .0002).18 In a larger study with 302 subjects, investigators used CNN to develop an automated intestinal motility quantification algorithm that was able to differentiate between CD and non-CD with AUC 0.78.26 Deep learning algorithms to detect and quantify changes in motility also offer greater granularity than the human eye can detect. Gollifer et al demonstrated software automated quantification of intestinal motility parameters, particularly temporal motility variation (β = −0.23, P = .005) and area of motile bowel (β = 0.16, P = .01), were significantly associated with symptom severity defined by the Harvey-Bradshaw Index (HBI).21 Conversely, subjective quantification by humans of the same intestinal motility parameters used in the algorithm was not associated with HBI. These studies support intestinal motility as a promising novel objective biomarker to detect IBD and highlight the ability of AI to aid in the discovery of a new biomarker. However, the clinical value of intestinal motility as a biomarker in IBD needs further evaluation with future studies correlating intestinal motility metrics with endoscopic disease severity as well as understanding its role for disease monitoring.

Regarding radiomics, 6 studies developed unique multivariate models and nomograms to better detect and diagnose IBD. In the one of the earliest IBD radiomics studies, a novel method to quantify shape asymmetry was able to detect CD-affected segments of bowel with high sensitivity (90.4%) and specificity (90.1%).17 Radiomics may also help differentiate CD from intestinal TB (iTB), which is a frequent challenge in endemic countries. Two studies developed and validated multimodal radiomic nomograms incorporating clinical and/or endoscopic data to differentiate CD from iTB. By extracting radiomic features from intestinal lesions, Zhu et al developed a radiomics model with AUC 0.78 for differentiating CD from iTB.25 However, when combined with a clinical model that included demographic, biochemical, and predefined radiographic features (ie, Comb’s sign), the prediction model improved to an AUC 0.90. The final nomogram contained 9 radiomic and 2 clinical features and yielded good performance (AUC 0.96). Likewise, Gong et al developed a multimodal clinical radiomic model using radiomic features extracted from the diseased segment of bowel as well as the largest lymph node and mesentery surrounding the affected segment of bowel. In addition to clinical variables, the investigators also incorporated endoscopic data into the final nomogram and found the clinical radiomic nomogram had greater accuracy for differentiating CD from iTB than interpretation by human radiologists (89.5% vs 75.22%).32 Finally, radiomics may also help differentiate CD from UC. A multimodal model that incorporated radiomic features of the inflamed bowel wall, clinical features (age and gender) and radiology features (bowel wall thickness, arterial-phase enhancement, increased attenuation of mesenteric fat, vasa recta engorgement, lymphadenopathy, and lesions location) differentiated CD from UC with an AUC 0.88.24 Leveraging the differences in inflammatory alterations of the mesenteric fat in CD vs UC, another model combining radiomic features of visceral adipose tissue (VAT) with clinical factors helped differentiate CD from UC with good diagnostic performance (AUC 0.78).33 Because inflammatory alterations of mesenteric fat are difficult to study noninvasively, imaging studies often use VAT, which has a surrogate marker because mesenteric fat is the largest compartment of VAT.35 While the performance of these radiomic-based models varies from moderate to good, the available studies support radiomic features can provide valuable information not easily appreciate by the human eye. However, the studies consistently demonstrated radiomics alone are not enough to develop prediction models and need to be incorporated with clinical variables to improve model performance.

Disease Characterization/Phenotyping

In the treat-to-target era of IBD, endoscopy, histology, and imaging are critical for the tight monitoring of IBD to prevent disease progression and complications, and studies have demonstrated AI-based applications have tremendous potential in this arena (Table 2). During endoscopy, endoscopic disease scores such as the Mayo endoscopic score (MES), UC Endoscopic Index of Severity, and Simple Endoscopic Score for CD are important for monitoring improvement/progression of disease as well as standardizing communication with other providers. However, studies have found significant intra- and interobserver variability with endoscopy scores.61 Also, endoscopy scores may not fully capture disease severity. For example, MES only accounts for the colonic segment with the most severe disease in UC, but it does not account for variability in disease severity in other colon segments. This limitation has important implications when assessing therapeutic response. To address this, several studies have now developed deep learning algorithms for automating endoscopy scores, which can improve standardization of scoring, but these studies are primarily for MES in UC.41,42,45,47,48 One of the most innovative studies was by Stidham et al, where investigators developed a new Cumulative Disease Score (CDS) for UC using computer vision analyses on endoscopic videos from the UNIFI and JAK-UC clinical trials.60 The CDS correlated strongly with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460). Automated scoring systems such as CDS will not only improve work flow efficiency during endoscopy but also better evaluate therapeutic response in UC.

Table 2.

Twenty-five studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the characterization of IBD and phenotyping CD strictures.

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Bhatnagar 201636CD
N = 7
MRERadiomics
  • To investigate if texture analysis of MRI (MRTA) of CD-afflicted small bowel differs based on presence of histological markers of hypoxia and angiogenesis in CD (VEGF).

  • Segments of bowel with VEGF present significantly lower mean pixel intensity (P = .004) and mean positive pixels within region of interest (P = .007) than segments of bowel without VEGF.

Makanyanga 201737CD
N = 16
MRERadiomics
  • Associate MRI textural analysis with MRI and histological CD activity

  • MRTA features were associated with CD activity:

  • Entropy was correlated with MRI activity (rc 1.00, P = .01) while Kurtosis was negatively associated MRI activity (Rc -0.45, P = .002)

  • Skewness was associated with histologic activity (Rc 4.27, P = .02)

Lamash
201838
Ped CD
N = 23 pediatric
MRECNN
  • To develop and test an algorithm to semiautomatically segment bowel wall and lumen to facilitate development of future algorithms to measure luminal diameter and bowel wall thickness in CD.

  • The algorithm had good performance for segmenting the bowel wall (Dice coefficient 75%) and lumen (Dice coefficient 81%)

Puylaert 201839CD
N = 106
MREDL, Active contouring model
  • Develop/validate a predictive MRI score (VIGOR score) using qualitative (mural T2 signal) and semiautomatically extracted MRI features (bowel wall thickness, excess volume, and dynamic contrast enhancement).

  • Compare diagnostic accuracy to existing MRI score (MaRIA, London score, and CDMI) with CDEIS as reference.

  • VIGOR score achieved comparable correlation with CDEIS (r = 0.58 vs 0.59).

  • VIGOR score had improved interobserver agreement vs other scores (ICC 0.81 vs 0.44-0.59)

  • VIGOR score achieved 80-81% diagnostic accuracy, like other scores.

Maeda
201940
UC
N = 187
Endocytoscopy imagesDL, CNN
  • Develop a computer-aided diagnostic (CAD) system for predicting histologic inflammation using endocytoscopy.

  • The CAD system had a diagnostic sensitivity, specificity, and accuracy of 74%, 97%, and 91%, respectively.

Ozawa
201941
UC
N = 955
Endoscopic imagesDL, CNN
  • Develop a computer-assisted diagnostic (CAD) system using CNN to identify normal mucosa (Mayo 0) and mucosal healing state (Mayo 0-1).

  • The CNN-based CAD system yielded a AUC 8.06 and 0.98 for identifying Mayo 0 and Mayo 0-1, respectively.

  • Performance was better in rectum (AUC 0.92) than right and left colon (AUC 0.83 and 0.83, respectively).

Stidham
201942
UC
N = 2778
Endoscopic images and videosDL, CNN
  • Develop deep learning models for grading UC endoscopic severity and compare its performance to experienced human reviewers.

  • For distinguishing between moderate-severe disease and endoscopic remission, CNN model yielded AUC 0.97.

  • The CNN model had good agreement with human reviewers (κ = 0.84), which was similar to agreement between human reviewers (κ = 0.86).

Tabari
201943
CD
N = 25
MRERadiomics
  • Determine if MRTA can determine CD stricture histologic type (degree of mucosal inflammation vs mural fibrosis) using surgical resection specimens.

  • Multivariable prediction model including mean, skewness, and entropy predicted stricture fibrosis with goodness-of-fit value 0.995.

  • Combination of threshold values for 3 features correctly classified 100% of strictures as inflammatory or fibrotic.

Stidham 202044CD
N = 138
CTEDL, Active contouring model
  • Compare the agreement of small bowel damage measurements (max bowel wall thickness, max bowel dilation, min lumen diameter, and presence of stricture) on CTE between semiautomated image analysis techniques vs radiologists

  • Semi-automated measurements correlated with radiologists for max bowel wall thickness (r = 0.70, P > .0001), max bowel dilation (r = 0.75, P < .0001), min lumen diameter (r = 0.38, P < .0001).

  • Multivariate model using semiautomatic measurements to detect radiologist-defined intestinal strictures had an accuracy of 88% with AUC 0.86.

Takenaka
202045
UC
N = 2012
Endoscopic imagesDL, CNN
  • Develop a deep neural network to analyze endoscopic images of UC patients and predict histologic disease activity.

  • The algorithm identified endoscopic remission with 90.1% accuracy with good agreement with human reviewers (κ = 0.80).

  • The algorithm identified subjects in histologic remission with 92.9% accuracy.

Barash
202146
CD
N = 49
VCE imagesDL, CNN
  • Develop a DL algorithm for automated grading for CD ulcers on VCE (Grades 1-3 mild to severe)

  • The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2.

Gottlieb
202147
UC
N = 249
Endoscopic videosDL, CNN
  • Develop an automated neural network model to predict UC endoscopic severity determined by Mayo endoscopic score (MES) and UC Endoscopic Index of Severity (UCEIS).

  • The model had excellent agreement with human reviewers for predicting MES (quadratic weight κ = 0.84) and UCEIS (quadratic weight κ = 0.86).

Yao
202148
UC
N = 315 videos
Endoscopic VideosDL, CNN
  • Pilot a fully automated video analysis system for grading endoscopic disease severity in UC.

  • CNN model correctly predicted MES 78% of videos.

  • CNN model correctly distinguished MES 0-1 from MES 2-3 in 83.7% of videos.

Li
202149
CD
n = 167
CTERadiomic
  • Develop and validate a CTE-based radiomic model to characterize intestinal fibrosis in CD as none/mild or moderate/severe degree of fibrosis based on pathology.

  • In the training data set, the model performance yielded AUC 0.89 for distinguishing between moderate-severe from no fibrosis with similar performance with validation data from 3 centers (AUC 0.75-0.82),

  • Radiomic model performed better than visual interpretation by 2 radiologists (AUCradiologist 0.55-0.60).

  • Decision curve analysis showed radiomic model always had higher net benefit than radiologists’ visual interpretation.

Ding
202250
CD
N = 121
MRERadiomics
  • Compare accuracy to detecting severity of CD activity using radiomics vs MaRIA score evaluated by radiologists.

  • Using terminal ileal (TI) CDEIS as ground truth, the radiomics model (containing 6 features) and MaRIA performed similarly for detecting TI CDEIS > 7 (AUC 0.87 vs 0.88, P = .85).

  • MaRIA score between radiologists had fair agreement (ICC 0.58) while radiomic features had high reproducibility (ICC 0.93-0.96).

Guez
202251
Pediatric CD
N = 121
MREMachine learning
  • Develop a multi-modal machine learning fusion model with radiological and biochemical biomarkers to predict ileal endoscopic activity based on SES-CD.

  • Compare performance to MaRIA and biochemical biomarker only prediction model.

  • A multimodal model containing disease length on MRE, CRP, and fecal calprotectin performed better than MaRIA for classifying SES-CD ≥3 (AUC 0.84 vs 0.80, P < 1e-9).

  • The multimodal model performed better than the biochemical biomarker only model (AUC 0.84 vs 0.67, P < 1e-5).

Li
202252
CD
N = 100
CTRadiomic
  • Determine value of a CT-based radiomics model to identify active vs inactive CD.

  • The radiomics model achieved AUC 0.94 with high accuracy (90.3%), sensitivity (91.1%), and specificity (89.2%).

Meng
202253
CD
N = 235
CTEDL, CNN, radiomics
  • Develop a CTE-based DL and radiomics model to assess severity of bowel fibrosis and compare accuracy to radiologists.

  • Severity of histologic fibrosis was assessed semi-quantitatively as none/mild vs moderate/severe.

  • DL and radiomics model (AUC 0.81) performed similarly (AUC 0.81 vs 0.83, P = .97) with shorter processing time (48.4 vs 599.8 seconds, P < .001).

  • DL model performed better than visual interpretation by 2 radiologists (AUC 0.58-0.64, P < .005).

  • Decision curve analysis showed DL and radiomics model possessed better net benefit than radiologists’ interpretation.

Noguchi
202254
UC
N = 12
Histologic imagesDL, CNN
  • Develop a CNN model to predict p53 immunohistochemical staining from histologic images to diagnose UC-associated dysplasia/cancer.

  • The CNN model had an average accuracy of 86-91% for predicting p53 positivity.

Yuan
202255
CD
N = 48
CTEDL, automated body composition segmentation
  • Evaluate the diagnostic performance of visceral adiposity to predict degree of inflammation vs fibrosis of CD strictures on surgical resection histopathology

  • Prediction model containing visceral:subcutaneous fat area ratio and lumen narrowing:prestenotic dilation ratio classified severe fibrosis with AUC 0.80 with 61.5% sensitivity, 91.4% specificity, and 83.3% accuracy.

Najdawi
202356
UC
N = 637
Histologic imagesDL, CNN
  • Develop a CNN model to predict Nancy histologic index score and histologic remission.

  • The CNN model accurately predicted Nancy histologic index score (weight k = 0.91) and predicted histologic remission with accuracy of 0.97.

Ruiqing
202357
CD
N = 167
CTERadiomics
  • Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.

  • The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83

  • The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85.

  • A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.

Rymarczyk
202358
IBD
N = 1189 (302 CD)
Histologic imagesDL, CNN
  • Develop deep learning models for automating histologic assessment in IBD.

  • The CNN model was able to detect histologic disease in the colon and ileum with accuracies ranging from 87-94% and 76-83%, respectively.

Xie
202359
CD
N = 628
Endoscopic imagesDL, CNN
  • Develop a CNN model to detect and grade severity of small bowel CD ulcers on double balloon endoscopy images

  • The CNN model detected ulcers with 96% accuracy.

  • The model had 87%, 88%, and 85%, accuracy for grading ulcerated surface, ulcer size, and ulcer depth.

Stidham
202460
UC
N = 1096
Endoscopic videosDL
  • Develop computer vision methods to better quantify mucosal injury in UC and compare to MES

  • An automated cumulative disease score correlated with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460)

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Bhatnagar 201636CD
N = 7
MRERadiomics
  • To investigate if texture analysis of MRI (MRTA) of CD-afflicted small bowel differs based on presence of histological markers of hypoxia and angiogenesis in CD (VEGF).

  • Segments of bowel with VEGF present significantly lower mean pixel intensity (P = .004) and mean positive pixels within region of interest (P = .007) than segments of bowel without VEGF.

Makanyanga 201737CD
N = 16
MRERadiomics
  • Associate MRI textural analysis with MRI and histological CD activity

  • MRTA features were associated with CD activity:

  • Entropy was correlated with MRI activity (rc 1.00, P = .01) while Kurtosis was negatively associated MRI activity (Rc -0.45, P = .002)

  • Skewness was associated with histologic activity (Rc 4.27, P = .02)

Lamash
201838
Ped CD
N = 23 pediatric
MRECNN
  • To develop and test an algorithm to semiautomatically segment bowel wall and lumen to facilitate development of future algorithms to measure luminal diameter and bowel wall thickness in CD.

  • The algorithm had good performance for segmenting the bowel wall (Dice coefficient 75%) and lumen (Dice coefficient 81%)

Puylaert 201839CD
N = 106
MREDL, Active contouring model
  • Develop/validate a predictive MRI score (VIGOR score) using qualitative (mural T2 signal) and semiautomatically extracted MRI features (bowel wall thickness, excess volume, and dynamic contrast enhancement).

  • Compare diagnostic accuracy to existing MRI score (MaRIA, London score, and CDMI) with CDEIS as reference.

  • VIGOR score achieved comparable correlation with CDEIS (r = 0.58 vs 0.59).

  • VIGOR score had improved interobserver agreement vs other scores (ICC 0.81 vs 0.44-0.59)

  • VIGOR score achieved 80-81% diagnostic accuracy, like other scores.

Maeda
201940
UC
N = 187
Endocytoscopy imagesDL, CNN
  • Develop a computer-aided diagnostic (CAD) system for predicting histologic inflammation using endocytoscopy.

  • The CAD system had a diagnostic sensitivity, specificity, and accuracy of 74%, 97%, and 91%, respectively.

Ozawa
201941
UC
N = 955
Endoscopic imagesDL, CNN
  • Develop a computer-assisted diagnostic (CAD) system using CNN to identify normal mucosa (Mayo 0) and mucosal healing state (Mayo 0-1).

  • The CNN-based CAD system yielded a AUC 8.06 and 0.98 for identifying Mayo 0 and Mayo 0-1, respectively.

  • Performance was better in rectum (AUC 0.92) than right and left colon (AUC 0.83 and 0.83, respectively).

Stidham
201942
UC
N = 2778
Endoscopic images and videosDL, CNN
  • Develop deep learning models for grading UC endoscopic severity and compare its performance to experienced human reviewers.

  • For distinguishing between moderate-severe disease and endoscopic remission, CNN model yielded AUC 0.97.

  • The CNN model had good agreement with human reviewers (κ = 0.84), which was similar to agreement between human reviewers (κ = 0.86).

Tabari
201943
CD
N = 25
MRERadiomics
  • Determine if MRTA can determine CD stricture histologic type (degree of mucosal inflammation vs mural fibrosis) using surgical resection specimens.

  • Multivariable prediction model including mean, skewness, and entropy predicted stricture fibrosis with goodness-of-fit value 0.995.

  • Combination of threshold values for 3 features correctly classified 100% of strictures as inflammatory or fibrotic.

Stidham 202044CD
N = 138
CTEDL, Active contouring model
  • Compare the agreement of small bowel damage measurements (max bowel wall thickness, max bowel dilation, min lumen diameter, and presence of stricture) on CTE between semiautomated image analysis techniques vs radiologists

  • Semi-automated measurements correlated with radiologists for max bowel wall thickness (r = 0.70, P > .0001), max bowel dilation (r = 0.75, P < .0001), min lumen diameter (r = 0.38, P < .0001).

  • Multivariate model using semiautomatic measurements to detect radiologist-defined intestinal strictures had an accuracy of 88% with AUC 0.86.

Takenaka
202045
UC
N = 2012
Endoscopic imagesDL, CNN
  • Develop a deep neural network to analyze endoscopic images of UC patients and predict histologic disease activity.

  • The algorithm identified endoscopic remission with 90.1% accuracy with good agreement with human reviewers (κ = 0.80).

  • The algorithm identified subjects in histologic remission with 92.9% accuracy.

Barash
202146
CD
N = 49
VCE imagesDL, CNN
  • Develop a DL algorithm for automated grading for CD ulcers on VCE (Grades 1-3 mild to severe)

  • The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2.

Gottlieb
202147
UC
N = 249
Endoscopic videosDL, CNN
  • Develop an automated neural network model to predict UC endoscopic severity determined by Mayo endoscopic score (MES) and UC Endoscopic Index of Severity (UCEIS).

  • The model had excellent agreement with human reviewers for predicting MES (quadratic weight κ = 0.84) and UCEIS (quadratic weight κ = 0.86).

Yao
202148
UC
N = 315 videos
Endoscopic VideosDL, CNN
  • Pilot a fully automated video analysis system for grading endoscopic disease severity in UC.

  • CNN model correctly predicted MES 78% of videos.

  • CNN model correctly distinguished MES 0-1 from MES 2-3 in 83.7% of videos.

Li
202149
CD
n = 167
CTERadiomic
  • Develop and validate a CTE-based radiomic model to characterize intestinal fibrosis in CD as none/mild or moderate/severe degree of fibrosis based on pathology.

  • In the training data set, the model performance yielded AUC 0.89 for distinguishing between moderate-severe from no fibrosis with similar performance with validation data from 3 centers (AUC 0.75-0.82),

  • Radiomic model performed better than visual interpretation by 2 radiologists (AUCradiologist 0.55-0.60).

  • Decision curve analysis showed radiomic model always had higher net benefit than radiologists’ visual interpretation.

Ding
202250
CD
N = 121
MRERadiomics
  • Compare accuracy to detecting severity of CD activity using radiomics vs MaRIA score evaluated by radiologists.

  • Using terminal ileal (TI) CDEIS as ground truth, the radiomics model (containing 6 features) and MaRIA performed similarly for detecting TI CDEIS > 7 (AUC 0.87 vs 0.88, P = .85).

  • MaRIA score between radiologists had fair agreement (ICC 0.58) while radiomic features had high reproducibility (ICC 0.93-0.96).

Guez
202251
Pediatric CD
N = 121
MREMachine learning
  • Develop a multi-modal machine learning fusion model with radiological and biochemical biomarkers to predict ileal endoscopic activity based on SES-CD.

  • Compare performance to MaRIA and biochemical biomarker only prediction model.

  • A multimodal model containing disease length on MRE, CRP, and fecal calprotectin performed better than MaRIA for classifying SES-CD ≥3 (AUC 0.84 vs 0.80, P < 1e-9).

  • The multimodal model performed better than the biochemical biomarker only model (AUC 0.84 vs 0.67, P < 1e-5).

Li
202252
CD
N = 100
CTRadiomic
  • Determine value of a CT-based radiomics model to identify active vs inactive CD.

  • The radiomics model achieved AUC 0.94 with high accuracy (90.3%), sensitivity (91.1%), and specificity (89.2%).

Meng
202253
CD
N = 235
CTEDL, CNN, radiomics
  • Develop a CTE-based DL and radiomics model to assess severity of bowel fibrosis and compare accuracy to radiologists.

  • Severity of histologic fibrosis was assessed semi-quantitatively as none/mild vs moderate/severe.

  • DL and radiomics model (AUC 0.81) performed similarly (AUC 0.81 vs 0.83, P = .97) with shorter processing time (48.4 vs 599.8 seconds, P < .001).

  • DL model performed better than visual interpretation by 2 radiologists (AUC 0.58-0.64, P < .005).

  • Decision curve analysis showed DL and radiomics model possessed better net benefit than radiologists’ interpretation.

Noguchi
202254
UC
N = 12
Histologic imagesDL, CNN
  • Develop a CNN model to predict p53 immunohistochemical staining from histologic images to diagnose UC-associated dysplasia/cancer.

  • The CNN model had an average accuracy of 86-91% for predicting p53 positivity.

Yuan
202255
CD
N = 48
CTEDL, automated body composition segmentation
  • Evaluate the diagnostic performance of visceral adiposity to predict degree of inflammation vs fibrosis of CD strictures on surgical resection histopathology

  • Prediction model containing visceral:subcutaneous fat area ratio and lumen narrowing:prestenotic dilation ratio classified severe fibrosis with AUC 0.80 with 61.5% sensitivity, 91.4% specificity, and 83.3% accuracy.

Najdawi
202356
UC
N = 637
Histologic imagesDL, CNN
  • Develop a CNN model to predict Nancy histologic index score and histologic remission.

  • The CNN model accurately predicted Nancy histologic index score (weight k = 0.91) and predicted histologic remission with accuracy of 0.97.

Ruiqing
202357
CD
N = 167
CTERadiomics
  • Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.

  • The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83

  • The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85.

  • A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.

Rymarczyk
202358
IBD
N = 1189 (302 CD)
Histologic imagesDL, CNN
  • Develop deep learning models for automating histologic assessment in IBD.

  • The CNN model was able to detect histologic disease in the colon and ileum with accuracies ranging from 87-94% and 76-83%, respectively.

Xie
202359
CD
N = 628
Endoscopic imagesDL, CNN
  • Develop a CNN model to detect and grade severity of small bowel CD ulcers on double balloon endoscopy images

  • The CNN model detected ulcers with 96% accuracy.

  • The model had 87%, 88%, and 85%, accuracy for grading ulcerated surface, ulcer size, and ulcer depth.

Stidham
202460
UC
N = 1096
Endoscopic videosDL
  • Develop computer vision methods to better quantify mucosal injury in UC and compare to MES

  • An automated cumulative disease score correlated with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460)

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; AUC, area under the curve; VEGF, vascular endothelial growth fact; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; MES, Mayo Endoscopic Score.

Table 2.

Twenty-five studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the characterization of IBD and phenotyping CD strictures.

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Bhatnagar 201636CD
N = 7
MRERadiomics
  • To investigate if texture analysis of MRI (MRTA) of CD-afflicted small bowel differs based on presence of histological markers of hypoxia and angiogenesis in CD (VEGF).

  • Segments of bowel with VEGF present significantly lower mean pixel intensity (P = .004) and mean positive pixels within region of interest (P = .007) than segments of bowel without VEGF.

Makanyanga 201737CD
N = 16
MRERadiomics
  • Associate MRI textural analysis with MRI and histological CD activity

  • MRTA features were associated with CD activity:

  • Entropy was correlated with MRI activity (rc 1.00, P = .01) while Kurtosis was negatively associated MRI activity (Rc -0.45, P = .002)

  • Skewness was associated with histologic activity (Rc 4.27, P = .02)

Lamash
201838
Ped CD
N = 23 pediatric
MRECNN
  • To develop and test an algorithm to semiautomatically segment bowel wall and lumen to facilitate development of future algorithms to measure luminal diameter and bowel wall thickness in CD.

  • The algorithm had good performance for segmenting the bowel wall (Dice coefficient 75%) and lumen (Dice coefficient 81%)

Puylaert 201839CD
N = 106
MREDL, Active contouring model
  • Develop/validate a predictive MRI score (VIGOR score) using qualitative (mural T2 signal) and semiautomatically extracted MRI features (bowel wall thickness, excess volume, and dynamic contrast enhancement).

  • Compare diagnostic accuracy to existing MRI score (MaRIA, London score, and CDMI) with CDEIS as reference.

  • VIGOR score achieved comparable correlation with CDEIS (r = 0.58 vs 0.59).

  • VIGOR score had improved interobserver agreement vs other scores (ICC 0.81 vs 0.44-0.59)

  • VIGOR score achieved 80-81% diagnostic accuracy, like other scores.

Maeda
201940
UC
N = 187
Endocytoscopy imagesDL, CNN
  • Develop a computer-aided diagnostic (CAD) system for predicting histologic inflammation using endocytoscopy.

  • The CAD system had a diagnostic sensitivity, specificity, and accuracy of 74%, 97%, and 91%, respectively.

Ozawa
201941
UC
N = 955
Endoscopic imagesDL, CNN
  • Develop a computer-assisted diagnostic (CAD) system using CNN to identify normal mucosa (Mayo 0) and mucosal healing state (Mayo 0-1).

  • The CNN-based CAD system yielded a AUC 8.06 and 0.98 for identifying Mayo 0 and Mayo 0-1, respectively.

  • Performance was better in rectum (AUC 0.92) than right and left colon (AUC 0.83 and 0.83, respectively).

Stidham
201942
UC
N = 2778
Endoscopic images and videosDL, CNN
  • Develop deep learning models for grading UC endoscopic severity and compare its performance to experienced human reviewers.

  • For distinguishing between moderate-severe disease and endoscopic remission, CNN model yielded AUC 0.97.

  • The CNN model had good agreement with human reviewers (κ = 0.84), which was similar to agreement between human reviewers (κ = 0.86).

Tabari
201943
CD
N = 25
MRERadiomics
  • Determine if MRTA can determine CD stricture histologic type (degree of mucosal inflammation vs mural fibrosis) using surgical resection specimens.

  • Multivariable prediction model including mean, skewness, and entropy predicted stricture fibrosis with goodness-of-fit value 0.995.

  • Combination of threshold values for 3 features correctly classified 100% of strictures as inflammatory or fibrotic.

Stidham 202044CD
N = 138
CTEDL, Active contouring model
  • Compare the agreement of small bowel damage measurements (max bowel wall thickness, max bowel dilation, min lumen diameter, and presence of stricture) on CTE between semiautomated image analysis techniques vs radiologists

  • Semi-automated measurements correlated with radiologists for max bowel wall thickness (r = 0.70, P > .0001), max bowel dilation (r = 0.75, P < .0001), min lumen diameter (r = 0.38, P < .0001).

  • Multivariate model using semiautomatic measurements to detect radiologist-defined intestinal strictures had an accuracy of 88% with AUC 0.86.

Takenaka
202045
UC
N = 2012
Endoscopic imagesDL, CNN
  • Develop a deep neural network to analyze endoscopic images of UC patients and predict histologic disease activity.

  • The algorithm identified endoscopic remission with 90.1% accuracy with good agreement with human reviewers (κ = 0.80).

  • The algorithm identified subjects in histologic remission with 92.9% accuracy.

Barash
202146
CD
N = 49
VCE imagesDL, CNN
  • Develop a DL algorithm for automated grading for CD ulcers on VCE (Grades 1-3 mild to severe)

  • The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2.

Gottlieb
202147
UC
N = 249
Endoscopic videosDL, CNN
  • Develop an automated neural network model to predict UC endoscopic severity determined by Mayo endoscopic score (MES) and UC Endoscopic Index of Severity (UCEIS).

  • The model had excellent agreement with human reviewers for predicting MES (quadratic weight κ = 0.84) and UCEIS (quadratic weight κ = 0.86).

Yao
202148
UC
N = 315 videos
Endoscopic VideosDL, CNN
  • Pilot a fully automated video analysis system for grading endoscopic disease severity in UC.

  • CNN model correctly predicted MES 78% of videos.

  • CNN model correctly distinguished MES 0-1 from MES 2-3 in 83.7% of videos.

Li
202149
CD
n = 167
CTERadiomic
  • Develop and validate a CTE-based radiomic model to characterize intestinal fibrosis in CD as none/mild or moderate/severe degree of fibrosis based on pathology.

  • In the training data set, the model performance yielded AUC 0.89 for distinguishing between moderate-severe from no fibrosis with similar performance with validation data from 3 centers (AUC 0.75-0.82),

  • Radiomic model performed better than visual interpretation by 2 radiologists (AUCradiologist 0.55-0.60).

  • Decision curve analysis showed radiomic model always had higher net benefit than radiologists’ visual interpretation.

Ding
202250
CD
N = 121
MRERadiomics
  • Compare accuracy to detecting severity of CD activity using radiomics vs MaRIA score evaluated by radiologists.

  • Using terminal ileal (TI) CDEIS as ground truth, the radiomics model (containing 6 features) and MaRIA performed similarly for detecting TI CDEIS > 7 (AUC 0.87 vs 0.88, P = .85).

  • MaRIA score between radiologists had fair agreement (ICC 0.58) while radiomic features had high reproducibility (ICC 0.93-0.96).

Guez
202251
Pediatric CD
N = 121
MREMachine learning
  • Develop a multi-modal machine learning fusion model with radiological and biochemical biomarkers to predict ileal endoscopic activity based on SES-CD.

  • Compare performance to MaRIA and biochemical biomarker only prediction model.

  • A multimodal model containing disease length on MRE, CRP, and fecal calprotectin performed better than MaRIA for classifying SES-CD ≥3 (AUC 0.84 vs 0.80, P < 1e-9).

  • The multimodal model performed better than the biochemical biomarker only model (AUC 0.84 vs 0.67, P < 1e-5).

Li
202252
CD
N = 100
CTRadiomic
  • Determine value of a CT-based radiomics model to identify active vs inactive CD.

  • The radiomics model achieved AUC 0.94 with high accuracy (90.3%), sensitivity (91.1%), and specificity (89.2%).

Meng
202253
CD
N = 235
CTEDL, CNN, radiomics
  • Develop a CTE-based DL and radiomics model to assess severity of bowel fibrosis and compare accuracy to radiologists.

  • Severity of histologic fibrosis was assessed semi-quantitatively as none/mild vs moderate/severe.

  • DL and radiomics model (AUC 0.81) performed similarly (AUC 0.81 vs 0.83, P = .97) with shorter processing time (48.4 vs 599.8 seconds, P < .001).

  • DL model performed better than visual interpretation by 2 radiologists (AUC 0.58-0.64, P < .005).

  • Decision curve analysis showed DL and radiomics model possessed better net benefit than radiologists’ interpretation.

Noguchi
202254
UC
N = 12
Histologic imagesDL, CNN
  • Develop a CNN model to predict p53 immunohistochemical staining from histologic images to diagnose UC-associated dysplasia/cancer.

  • The CNN model had an average accuracy of 86-91% for predicting p53 positivity.

Yuan
202255
CD
N = 48
CTEDL, automated body composition segmentation
  • Evaluate the diagnostic performance of visceral adiposity to predict degree of inflammation vs fibrosis of CD strictures on surgical resection histopathology

  • Prediction model containing visceral:subcutaneous fat area ratio and lumen narrowing:prestenotic dilation ratio classified severe fibrosis with AUC 0.80 with 61.5% sensitivity, 91.4% specificity, and 83.3% accuracy.

Najdawi
202356
UC
N = 637
Histologic imagesDL, CNN
  • Develop a CNN model to predict Nancy histologic index score and histologic remission.

  • The CNN model accurately predicted Nancy histologic index score (weight k = 0.91) and predicted histologic remission with accuracy of 0.97.

Ruiqing
202357
CD
N = 167
CTERadiomics
  • Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.

  • The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83

  • The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85.

  • A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.

Rymarczyk
202358
IBD
N = 1189 (302 CD)
Histologic imagesDL, CNN
  • Develop deep learning models for automating histologic assessment in IBD.

  • The CNN model was able to detect histologic disease in the colon and ileum with accuracies ranging from 87-94% and 76-83%, respectively.

Xie
202359
CD
N = 628
Endoscopic imagesDL, CNN
  • Develop a CNN model to detect and grade severity of small bowel CD ulcers on double balloon endoscopy images

  • The CNN model detected ulcers with 96% accuracy.

  • The model had 87%, 88%, and 85%, accuracy for grading ulcerated surface, ulcer size, and ulcer depth.

Stidham
202460
UC
N = 1096
Endoscopic videosDL
  • Develop computer vision methods to better quantify mucosal injury in UC and compare to MES

  • An automated cumulative disease score correlated with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460)

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Bhatnagar 201636CD
N = 7
MRERadiomics
  • To investigate if texture analysis of MRI (MRTA) of CD-afflicted small bowel differs based on presence of histological markers of hypoxia and angiogenesis in CD (VEGF).

  • Segments of bowel with VEGF present significantly lower mean pixel intensity (P = .004) and mean positive pixels within region of interest (P = .007) than segments of bowel without VEGF.

Makanyanga 201737CD
N = 16
MRERadiomics
  • Associate MRI textural analysis with MRI and histological CD activity

  • MRTA features were associated with CD activity:

  • Entropy was correlated with MRI activity (rc 1.00, P = .01) while Kurtosis was negatively associated MRI activity (Rc -0.45, P = .002)

  • Skewness was associated with histologic activity (Rc 4.27, P = .02)

Lamash
201838
Ped CD
N = 23 pediatric
MRECNN
  • To develop and test an algorithm to semiautomatically segment bowel wall and lumen to facilitate development of future algorithms to measure luminal diameter and bowel wall thickness in CD.

  • The algorithm had good performance for segmenting the bowel wall (Dice coefficient 75%) and lumen (Dice coefficient 81%)

Puylaert 201839CD
N = 106
MREDL, Active contouring model
  • Develop/validate a predictive MRI score (VIGOR score) using qualitative (mural T2 signal) and semiautomatically extracted MRI features (bowel wall thickness, excess volume, and dynamic contrast enhancement).

  • Compare diagnostic accuracy to existing MRI score (MaRIA, London score, and CDMI) with CDEIS as reference.

  • VIGOR score achieved comparable correlation with CDEIS (r = 0.58 vs 0.59).

  • VIGOR score had improved interobserver agreement vs other scores (ICC 0.81 vs 0.44-0.59)

  • VIGOR score achieved 80-81% diagnostic accuracy, like other scores.

Maeda
201940
UC
N = 187
Endocytoscopy imagesDL, CNN
  • Develop a computer-aided diagnostic (CAD) system for predicting histologic inflammation using endocytoscopy.

  • The CAD system had a diagnostic sensitivity, specificity, and accuracy of 74%, 97%, and 91%, respectively.

Ozawa
201941
UC
N = 955
Endoscopic imagesDL, CNN
  • Develop a computer-assisted diagnostic (CAD) system using CNN to identify normal mucosa (Mayo 0) and mucosal healing state (Mayo 0-1).

  • The CNN-based CAD system yielded a AUC 8.06 and 0.98 for identifying Mayo 0 and Mayo 0-1, respectively.

  • Performance was better in rectum (AUC 0.92) than right and left colon (AUC 0.83 and 0.83, respectively).

Stidham
201942
UC
N = 2778
Endoscopic images and videosDL, CNN
  • Develop deep learning models for grading UC endoscopic severity and compare its performance to experienced human reviewers.

  • For distinguishing between moderate-severe disease and endoscopic remission, CNN model yielded AUC 0.97.

  • The CNN model had good agreement with human reviewers (κ = 0.84), which was similar to agreement between human reviewers (κ = 0.86).

Tabari
201943
CD
N = 25
MRERadiomics
  • Determine if MRTA can determine CD stricture histologic type (degree of mucosal inflammation vs mural fibrosis) using surgical resection specimens.

  • Multivariable prediction model including mean, skewness, and entropy predicted stricture fibrosis with goodness-of-fit value 0.995.

  • Combination of threshold values for 3 features correctly classified 100% of strictures as inflammatory or fibrotic.

Stidham 202044CD
N = 138
CTEDL, Active contouring model
  • Compare the agreement of small bowel damage measurements (max bowel wall thickness, max bowel dilation, min lumen diameter, and presence of stricture) on CTE between semiautomated image analysis techniques vs radiologists

  • Semi-automated measurements correlated with radiologists for max bowel wall thickness (r = 0.70, P > .0001), max bowel dilation (r = 0.75, P < .0001), min lumen diameter (r = 0.38, P < .0001).

  • Multivariate model using semiautomatic measurements to detect radiologist-defined intestinal strictures had an accuracy of 88% with AUC 0.86.

Takenaka
202045
UC
N = 2012
Endoscopic imagesDL, CNN
  • Develop a deep neural network to analyze endoscopic images of UC patients and predict histologic disease activity.

  • The algorithm identified endoscopic remission with 90.1% accuracy with good agreement with human reviewers (κ = 0.80).

  • The algorithm identified subjects in histologic remission with 92.9% accuracy.

Barash
202146
CD
N = 49
VCE imagesDL, CNN
  • Develop a DL algorithm for automated grading for CD ulcers on VCE (Grades 1-3 mild to severe)

  • The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2.

Gottlieb
202147
UC
N = 249
Endoscopic videosDL, CNN
  • Develop an automated neural network model to predict UC endoscopic severity determined by Mayo endoscopic score (MES) and UC Endoscopic Index of Severity (UCEIS).

  • The model had excellent agreement with human reviewers for predicting MES (quadratic weight κ = 0.84) and UCEIS (quadratic weight κ = 0.86).

Yao
202148
UC
N = 315 videos
Endoscopic VideosDL, CNN
  • Pilot a fully automated video analysis system for grading endoscopic disease severity in UC.

  • CNN model correctly predicted MES 78% of videos.

  • CNN model correctly distinguished MES 0-1 from MES 2-3 in 83.7% of videos.

Li
202149
CD
n = 167
CTERadiomic
  • Develop and validate a CTE-based radiomic model to characterize intestinal fibrosis in CD as none/mild or moderate/severe degree of fibrosis based on pathology.

  • In the training data set, the model performance yielded AUC 0.89 for distinguishing between moderate-severe from no fibrosis with similar performance with validation data from 3 centers (AUC 0.75-0.82),

  • Radiomic model performed better than visual interpretation by 2 radiologists (AUCradiologist 0.55-0.60).

  • Decision curve analysis showed radiomic model always had higher net benefit than radiologists’ visual interpretation.

Ding
202250
CD
N = 121
MRERadiomics
  • Compare accuracy to detecting severity of CD activity using radiomics vs MaRIA score evaluated by radiologists.

  • Using terminal ileal (TI) CDEIS as ground truth, the radiomics model (containing 6 features) and MaRIA performed similarly for detecting TI CDEIS > 7 (AUC 0.87 vs 0.88, P = .85).

  • MaRIA score between radiologists had fair agreement (ICC 0.58) while radiomic features had high reproducibility (ICC 0.93-0.96).

Guez
202251
Pediatric CD
N = 121
MREMachine learning
  • Develop a multi-modal machine learning fusion model with radiological and biochemical biomarkers to predict ileal endoscopic activity based on SES-CD.

  • Compare performance to MaRIA and biochemical biomarker only prediction model.

  • A multimodal model containing disease length on MRE, CRP, and fecal calprotectin performed better than MaRIA for classifying SES-CD ≥3 (AUC 0.84 vs 0.80, P < 1e-9).

  • The multimodal model performed better than the biochemical biomarker only model (AUC 0.84 vs 0.67, P < 1e-5).

Li
202252
CD
N = 100
CTRadiomic
  • Determine value of a CT-based radiomics model to identify active vs inactive CD.

  • The radiomics model achieved AUC 0.94 with high accuracy (90.3%), sensitivity (91.1%), and specificity (89.2%).

Meng
202253
CD
N = 235
CTEDL, CNN, radiomics
  • Develop a CTE-based DL and radiomics model to assess severity of bowel fibrosis and compare accuracy to radiologists.

  • Severity of histologic fibrosis was assessed semi-quantitatively as none/mild vs moderate/severe.

  • DL and radiomics model (AUC 0.81) performed similarly (AUC 0.81 vs 0.83, P = .97) with shorter processing time (48.4 vs 599.8 seconds, P < .001).

  • DL model performed better than visual interpretation by 2 radiologists (AUC 0.58-0.64, P < .005).

  • Decision curve analysis showed DL and radiomics model possessed better net benefit than radiologists’ interpretation.

Noguchi
202254
UC
N = 12
Histologic imagesDL, CNN
  • Develop a CNN model to predict p53 immunohistochemical staining from histologic images to diagnose UC-associated dysplasia/cancer.

  • The CNN model had an average accuracy of 86-91% for predicting p53 positivity.

Yuan
202255
CD
N = 48
CTEDL, automated body composition segmentation
  • Evaluate the diagnostic performance of visceral adiposity to predict degree of inflammation vs fibrosis of CD strictures on surgical resection histopathology

  • Prediction model containing visceral:subcutaneous fat area ratio and lumen narrowing:prestenotic dilation ratio classified severe fibrosis with AUC 0.80 with 61.5% sensitivity, 91.4% specificity, and 83.3% accuracy.

Najdawi
202356
UC
N = 637
Histologic imagesDL, CNN
  • Develop a CNN model to predict Nancy histologic index score and histologic remission.

  • The CNN model accurately predicted Nancy histologic index score (weight k = 0.91) and predicted histologic remission with accuracy of 0.97.

Ruiqing
202357
CD
N = 167
CTERadiomics
  • Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.

  • The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83

  • The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85.

  • A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.

Rymarczyk
202358
IBD
N = 1189 (302 CD)
Histologic imagesDL, CNN
  • Develop deep learning models for automating histologic assessment in IBD.

  • The CNN model was able to detect histologic disease in the colon and ileum with accuracies ranging from 87-94% and 76-83%, respectively.

Xie
202359
CD
N = 628
Endoscopic imagesDL, CNN
  • Develop a CNN model to detect and grade severity of small bowel CD ulcers on double balloon endoscopy images

  • The CNN model detected ulcers with 96% accuracy.

  • The model had 87%, 88%, and 85%, accuracy for grading ulcerated surface, ulcer size, and ulcer depth.

Stidham
202460
UC
N = 1096
Endoscopic videosDL
  • Develop computer vision methods to better quantify mucosal injury in UC and compare to MES

  • An automated cumulative disease score correlated with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460)

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; AUC, area under the curve; VEGF, vascular endothelial growth fact; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; MES, Mayo Endoscopic Score.

Similarly, evaluating disease severity for small bowel CD has been an area of unmet need. Several VCE scores have been developed but are not routinely used in practice, potentially due to the added time requirement on top of the time needed to read and interpret VCE at baseline. Barash et al developed a DL algorithm grading small bowel ulcer severity (grade 1-3 mild to severe).46 The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2. In the same vein, Xie et al developed a CNN model for double balloon endoscopic images that could detect small bowel ulcers with 96% accuracy and grade small bowel ulcerated surface, ulcer size, and ulcer depth with 87%, 88%, and 85% accuracy, respectively.59 If validated, an AI-based VCE system for monitoring disease activity in the small bowel will be invaluable for improving efficiency and better monitoring of CD, especially for inexperience operators/observers.

During endoscopy, biopsies are often taken to evaluate for histologic disease activity. Several histologic disease severity scores have been developed to standardize histologic disease activity such as the PICaSSO Histologic Remission Index62 and Nancy histological index,63 but their clinical utility is limited by their time intensive nature. To address this clinical challenge, Najdawi et al56 developed a CNN model for UC that predicted Nancy histologic index score with high agreement with human reviewers (κ = 0.91) and could predict histology remission with an accuracy of 97%. Unlike the previous studies performed on colonic biopsies, another study developed an automated DL model that could detect histologic disease activity in the colon and ileum with 87% to 94% and 76% 83% accuracy, respectively.58 Using endocytoscopy, Maeda et al developed a computer-aided diagnostic system for predicting histologic inflammation with 91% accuracy.40 Finally, histologic evaluation is critical for diagnosing UC-associated dysplasia/cancer. One of the key studies in the work up with p53 immunohistochemistry along with hematoxylin and eosin staining. However, evaluation of p53 immunohistochemistry is expensive and time intensive, so Noguchi et al developed a CNN-model that predicted p53 immunohistochemistry staining with 86% to 91% accuracy.54 While the role of AI for histology in IBD is relatively understudied compared with endoscopy and imaging, it has tremendous implications not only for improving diagnostic accuracy but also elevating workflow efficiency and cost effectiveness.

Imaging offers noninvasive options to characterize and monitor IBD to inform therapeutic strategies and assess treatment response. In CD, several imaging scores have been developed to assess disease activity such as the MaRIA,64 London,65 Nancy,66 and Clermont score.67 Additionally, the Lemann Index was developed to quantify total gut damage in CD and incorporates clinical, surgical, endoscopic, and imaging findings from all segments of the GI tract into one composite score.68 However, these scores are often time-consuming, have variable sensitivity and specificity for detecting intestinal segments with active CD (with the MaRIA score being the best; 81% sensitivity, 89% specificity), have variable correlation with endoscopic disease activity, and have fair to good interobserver variability depending on the imaging feature of interest, which limits their clinical utility.69,70 These limitations present a very significant opportunity for AI-based imaging interpretation to improve patient care. Of the available studies, AI has been used to characterize disease activity and phenotype inflammatory vs fibrotic strictures in CD.

Presently, 8 studies have explored the use of AI for characterizing disease activity in CD. Studies using DL approaches are limited. One of the biggest challenges for automating the quantification of disease activity in CD is accurately separating the bowel wall from the lumen to make measurements unique to each compartment. In a pilot study in pediatric patients with CD, Lamash et al developed a semiautomated supervised 3D CNN algorithm that only required placement of seed points by the operator to segment the bowel wall and lumen.38 From this, the algorithm could measure lumen radius and bowel wall thickness. This study could not evaluate the algorithm performance due to lack of training date, and there was no endoscopic disease activity score to validate. However, it provides an excellent working foundation to develop future DL algorithms. Subsequent studies have developed DL algorithms using endoscopic scores such as the Simple Endoscopic Score for CD (SES-CD) and CD Endoscopic Index of Severity (CDEIS) as ground truth and compared the algorithms to established imaging disease activity scores. Puylaert et al developed the VIGOR score, which included both semiautomatic quantitative measurements (bowel thickness and contrast enhancement parameters) and qualitative measurements (degree of T2 mural signal enhancement determined by a radiologist).39 The novel VIGOR score demonstrated a moderate correlation with CDEIS (r = 0.58, P < .001), which was similar the MaRIA (r = 0.40, P = .001), London score (r = 0.38, P = .001), and CDMI r = 0.34, P = .003). The VIGOR score also had similar diagnostic accuracy (80%) as the other scores. However, the VIGOR score had superior inter-rater reliability compared with the other imaging scores (ICC 0.81 vs 0.44-0.59), which emphasizes the strength of AI for imaging interpretation in IBD. Another study developed a multimodal machine-learning fusion model that included disease length on MRE, CRP, and fecal calprotectin to predict a SES-CD ≥3.51 The machine-learning model (AUC 0.84) performed better than the MaRIA score (AUC 0.80, P < 1e-9) and biochemical markers alone (AUC 0.67, P < 1e-5). Artificial intelligence–based imaging algorithms such as this would not only have important implications for patient care but also clinical trial recruitment, which often requires endoscopic disease activity score cut-off for inclusions.

While the number of imaging studies using deep-learning approaches to assess disease activity is limited, several studies have evaluated radiomic approaches for characterizing disease activity with promising results. Studies have identified several unique associations between MRTA parameters and disease activity. On a macroscopic level, entropy has been correlated with MRI Crohn’s disease activity score65 (Rc 1.00, P = .01), while kurtosis has been negatively correlated (Rc −0.45, P = .002).37 On a histologic level, skewness has been associated with histologic disease activity (rc 4.27, P = .02), and lower mean pixel intensity and mean positive pixels have been associated with segments of bowel with increased neoangiogenesis, a hallmark of active inflammation, defined by presence of vascular endothelial growth factor (VEGF) expression.36 These studies demonstrate how radiomics can provide insight into the underlying biology of CD. In terms of quantifying disease activity, Ding et al developed an MRI radiomic-based model that could detect ileal disease with CDEIS >7 with similar performance as the MaRIA (AUC 0.87 vs 0.88, P = .85) but with superior reproducibility (radiomics ICC 0.93-0.96 vs MaRIA ICC 0.58).50 Using CTs, 2 groups developed radiomic-based algorithms that could differentiate intestinal segments with active vs inactive CD.52,57 Ruiqing et al developed a particularly interesting radiomics model that incorporated luminal and mesenteric radiomic features that could distinguish multicategorical SES-CD scores (ie, 0, 1, 2-5, 6-10, >10) with an AUC 0.83 and differentiate intestinal segments with moderate/severe disease (SES-CD >5) with AUC 0.85.57 Including mesenteric radiomics features is unique because it allows for objective quantification of inflammatory alterations in the mesenteric fat, which not only provides an additional data point for quantifying disease activity but also facilitates future imaging-based investigations into the mesenteric fat. Noninvasive characterization of inflammatory mesenteric fat will become increasingly important, as studies have demonstrated mesenteric fat is intimately involved in the pathogenesis and progression of CD.35

In CD, one of the biggest and most persistent clinical challenges is differentiating between inflammatory-predominant vs fibrotic predominant strictures to decide between medical vs surgical intervention. Also, multiple antifibrotic targets are under investigation, so the need to accurately phenotype CD stricture characteristics as potential trial end points is becoming increasingly important.71,72 Multiple imaging parameters in US, CTE, and MRE have been proposed to phenotype CD strictures, but their time intensity and interobserver variation are potential limitations. For this unmet need in CD, AI-powered imaging interpretation is a promising tool to help phenotype CD stricture consistently and efficiently. Using DL approaches, several studies have developed semiautomated and automated algorithms to measures minimal lumen diameter, maximal prestenotic dilation, bowel wall thickness, and/or body composition (VAT and SAT volumetrics) to develop multivariable models to detect CD strictures and predict degree of fibrosis using histologic scores as the ground truth.43,44,53,55 The performances of these models were generally good (AUC 0.80-0.86) and superior to radiologists’ interpretation (AUC 0.58-0.64). Additionally, compared with a radiomic approach, 1 study reported their DL algorithm had shorter processing time (48.4 vs 599.8 seconds, P < .0001).53 Using radiomics, studies have achieved comparable success as DL approaches. In the 2 largest radiomic-based studies to date, CTE-based radiomics models classified CD strictures with moderate to severe fibrosis better than radiologists (AUC 0.83-0.89 vs 0.55-0.64), and decision curve analyses supported net benefit with a radiomics prediction model.49,53 Considering data demonstrating the subpar human ability to classify fibrotic-predominant strictures, the ability to use AI to differentiate between inflammatory-predominant vs fibrosis-predominant strictures in CD has important and exciting implications for precision medicine and developing antifibrotic therapies. However, prospective clinical trials using AI-powered phenotyping of CD strictures to inform management decisions are needed to fully understand its clinical utility and safety.

Prognosis

The old saying “an ounce of prevention is worth a pound of cure” is a core concept for treating IBD to reduce the risk of complications and surgery, and a tremendous amount of research has been directed toward discovering prognostic biomarkers to improve our ability to position interventions earlier. However, many prognostic biomarkers are variably supported by the literature and have limited predictive value, and AI-powered clinical tools may help advance this area of need in IBD. Several studies have evaluated the role for AI in for prognosis in IBD with studies focusing on histology and imaging yielding the most exciting results (Table 3). In our search, we did not identify any studies that developed an AI-based model to determine prognosis based on endoscopy.

Table 3.

Ten studies evaluating the role of AI in endoscopy, histology and cross-sectional imaging for prognosis in IBD.

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Klein
201773
CD
N = 105
Histologic imagesDL
  • Validate a morphometric histology image analysis of baseline intestinal biopsies for predicting clinical phenotype in CD colitis.

  • Using B1 as reference the model was able to predict future B2 and B3 disease in 5 years with AUC 0.74 and 0.78, respectively.

Chen
202174
CD
N = 186
CTERadiomic
  • Develop and validate a radiomic nomogram using the intestinal segment with the worst disease to predict secondary loss of response to infliximab.

  • A multivariate prediction model containing 8 features had significant discrimination (AUC 0.88).

  • Ten-fold cross validation of the model yielded mean AUC 0.82 and 82% accuracy.

Feng
202175
CD
N = 322
MRERadiomic
  • Explore the correlation between R2* (an MRI-based radiomic index that detects changes in hepatic iron metabolism) and inflammation.

  • Develop a nomogram based on R2* to identify secondary loss of response to infliximab.

  • R2* was higher in active vs inactive CD (28.0 vs 26.6, P = .03).

  • Multivariable prediction model including CRP, hemoglobin, and R2* had good discrimination for secondary loss of response with AUC 0.72 for training and validation data set.

Ohara
202276
UC
N = 114
Histologic imagesDL
  • Determine if an DL-based system for automatic quantification of goblet cell mucin is useful for predicting future relapse in UC patients in endoscopic remission.

  • The relapse group had lower goblet cell mucus area calculated by the DL system compared with non-relapse group.

Chirra
202377
CD
N = 80
MRERadiomics
  • Develop and test a prognostic radiomic model for early surgery in CD patients requiring immunomodulators and/or biologics.

  • A model combining radiomics, simplified MaRIA scoring, and clinical variables yielded the best performance (AUC 0.83) compared with radiomics (AUC 0.74), simplified MaRIA (AUC 0.58) or clinical variables alone (AUC 0.48).

  • The model accurately predicted time to surgery (HR, 4.13, p = 6.90e-6, C-index 0.71).

Iacucci
202344
UC
N = 273
Histologic imagesDL, CNN
  • Develop and validate AI computer-aided diagnosis system to evaluate UC biopsies and predict prognosis.

  • The system had 89% sensitivity and 85% specificity for distinguishing histologic remission/activity based on PICaSSO Histologic Remission Index (PHRI).

  • The AI-assessed PHRI was associated with flare up in 1 year with hazard ratio 4.64 compared with 3.56 with a human-assessed PHRI.

Li
202353
CD
N = 256
CTERadiomics
  • Develop and validate a visceral fat-based radiomics model for predicting CD progression (development of penetrating/stricturing disease or surgery) and compare prediction accuracy to a subcutaneous fat-based radiomics model and six conventional fat metrics.

  • The visceral fat-based radiomics model (AUC 0.85) outperformed the subcutaneous fat-based model (AUC 0.79).

  • On multivariate Cox regression analysis, the visceral fat-based radiomics model was the most important independent predictor of CD progression (HR, 9.29, P < .005) followed by the subcutaneous fat-based radiomics model (HR, 3.28, P = .06)

  • Decision curve analysis showed visceral fat-based radiomics model had better net benefit over subcutaneous fat-based model.

  • The conventional fat metrics were not associated with disease progression (P = .089-0.996).

Ruiqing
202339
CD
N = 167
CTERadiomics
  • Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.

  • The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83

  • The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85.

  • A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.

Shen
202373
CD
N = 186
CTERadiomics
  • Develop and validate a preoperative CTE-based radiomics signature to predict postoperative recurrence (POR).

  • Compare predictive accuracy of a multimodal nomogram (incorporating radiomics signature, clinical, and radiologic features) vs clinical-radiologic only model for POR.

  • An intestinal lesion only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signature (HR, 2.19, P = .0018) were associated with POR.

  • The multi-modal nomogram performed modestly better than the clinical-radiologic only model (AUC 0.69 vs 0.66).

  • Decision curve analysis showed the multi-modal nomogram had moderately better net benefits than clinical-radiological model.

Yao
202378
CD
N = 268
CTERadiomics
  • Develop and validate a clinical-radiomic nomogram to predict 1-year surgical risk after CD diagnosis.

  • The radiomics model will extra features from the inflamed segment of bowel and peri-intestinal mesenteric fat.

  • The clinical-radiomic model had superior performance (AUC 0.90) compared with the clinical only (AUC 0.77), intestinal radiomic only (AUC 0.88), and peri-intestinal mesenteric fat only (AUC 0.80) models.

  • The clinical-radiomic nomogram was 71% sensitive 90% specific, and 85% accurate for predicting surgery within 1 year of CD diagnosis.

  • Decision curve analysis supported net benefit of the clinical-radiomic nomogram over other models.

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Klein
201773
CD
N = 105
Histologic imagesDL
  • Validate a morphometric histology image analysis of baseline intestinal biopsies for predicting clinical phenotype in CD colitis.

  • Using B1 as reference the model was able to predict future B2 and B3 disease in 5 years with AUC 0.74 and 0.78, respectively.

Chen
202174
CD
N = 186
CTERadiomic
  • Develop and validate a radiomic nomogram using the intestinal segment with the worst disease to predict secondary loss of response to infliximab.

  • A multivariate prediction model containing 8 features had significant discrimination (AUC 0.88).

  • Ten-fold cross validation of the model yielded mean AUC 0.82 and 82% accuracy.

Feng
202175
CD
N = 322
MRERadiomic
  • Explore the correlation between R2* (an MRI-based radiomic index that detects changes in hepatic iron metabolism) and inflammation.

  • Develop a nomogram based on R2* to identify secondary loss of response to infliximab.

  • R2* was higher in active vs inactive CD (28.0 vs 26.6, P = .03).

  • Multivariable prediction model including CRP, hemoglobin, and R2* had good discrimination for secondary loss of response with AUC 0.72 for training and validation data set.

Ohara
202276
UC
N = 114
Histologic imagesDL
  • Determine if an DL-based system for automatic quantification of goblet cell mucin is useful for predicting future relapse in UC patients in endoscopic remission.

  • The relapse group had lower goblet cell mucus area calculated by the DL system compared with non-relapse group.

Chirra
202377
CD
N = 80
MRERadiomics
  • Develop and test a prognostic radiomic model for early surgery in CD patients requiring immunomodulators and/or biologics.

  • A model combining radiomics, simplified MaRIA scoring, and clinical variables yielded the best performance (AUC 0.83) compared with radiomics (AUC 0.74), simplified MaRIA (AUC 0.58) or clinical variables alone (AUC 0.48).

  • The model accurately predicted time to surgery (HR, 4.13, p = 6.90e-6, C-index 0.71).

Iacucci
202344
UC
N = 273
Histologic imagesDL, CNN
  • Develop and validate AI computer-aided diagnosis system to evaluate UC biopsies and predict prognosis.

  • The system had 89% sensitivity and 85% specificity for distinguishing histologic remission/activity based on PICaSSO Histologic Remission Index (PHRI).

  • The AI-assessed PHRI was associated with flare up in 1 year with hazard ratio 4.64 compared with 3.56 with a human-assessed PHRI.

Li
202353
CD
N = 256
CTERadiomics
  • Develop and validate a visceral fat-based radiomics model for predicting CD progression (development of penetrating/stricturing disease or surgery) and compare prediction accuracy to a subcutaneous fat-based radiomics model and six conventional fat metrics.

  • The visceral fat-based radiomics model (AUC 0.85) outperformed the subcutaneous fat-based model (AUC 0.79).

  • On multivariate Cox regression analysis, the visceral fat-based radiomics model was the most important independent predictor of CD progression (HR, 9.29, P < .005) followed by the subcutaneous fat-based radiomics model (HR, 3.28, P = .06)

  • Decision curve analysis showed visceral fat-based radiomics model had better net benefit over subcutaneous fat-based model.

  • The conventional fat metrics were not associated with disease progression (P = .089-0.996).

Ruiqing
202339
CD
N = 167
CTERadiomics
  • Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.

  • The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83

  • The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85.

  • A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.

Shen
202373
CD
N = 186
CTERadiomics
  • Develop and validate a preoperative CTE-based radiomics signature to predict postoperative recurrence (POR).

  • Compare predictive accuracy of a multimodal nomogram (incorporating radiomics signature, clinical, and radiologic features) vs clinical-radiologic only model for POR.

  • An intestinal lesion only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signature (HR, 2.19, P = .0018) were associated with POR.

  • The multi-modal nomogram performed modestly better than the clinical-radiologic only model (AUC 0.69 vs 0.66).

  • Decision curve analysis showed the multi-modal nomogram had moderately better net benefits than clinical-radiological model.

Yao
202378
CD
N = 268
CTERadiomics
  • Develop and validate a clinical-radiomic nomogram to predict 1-year surgical risk after CD diagnosis.

  • The radiomics model will extra features from the inflamed segment of bowel and peri-intestinal mesenteric fat.

  • The clinical-radiomic model had superior performance (AUC 0.90) compared with the clinical only (AUC 0.77), intestinal radiomic only (AUC 0.88), and peri-intestinal mesenteric fat only (AUC 0.80) models.

  • The clinical-radiomic nomogram was 71% sensitive 90% specific, and 85% accurate for predicting surgery within 1 year of CD diagnosis.

  • Decision curve analysis supported net benefit of the clinical-radiomic nomogram over other models.

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.

Table 3.

Ten studies evaluating the role of AI in endoscopy, histology and cross-sectional imaging for prognosis in IBD.

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Klein
201773
CD
N = 105
Histologic imagesDL
  • Validate a morphometric histology image analysis of baseline intestinal biopsies for predicting clinical phenotype in CD colitis.

  • Using B1 as reference the model was able to predict future B2 and B3 disease in 5 years with AUC 0.74 and 0.78, respectively.

Chen
202174
CD
N = 186
CTERadiomic
  • Develop and validate a radiomic nomogram using the intestinal segment with the worst disease to predict secondary loss of response to infliximab.

  • A multivariate prediction model containing 8 features had significant discrimination (AUC 0.88).

  • Ten-fold cross validation of the model yielded mean AUC 0.82 and 82% accuracy.

Feng
202175
CD
N = 322
MRERadiomic
  • Explore the correlation between R2* (an MRI-based radiomic index that detects changes in hepatic iron metabolism) and inflammation.

  • Develop a nomogram based on R2* to identify secondary loss of response to infliximab.

  • R2* was higher in active vs inactive CD (28.0 vs 26.6, P = .03).

  • Multivariable prediction model including CRP, hemoglobin, and R2* had good discrimination for secondary loss of response with AUC 0.72 for training and validation data set.

Ohara
202276
UC
N = 114
Histologic imagesDL
  • Determine if an DL-based system for automatic quantification of goblet cell mucin is useful for predicting future relapse in UC patients in endoscopic remission.

  • The relapse group had lower goblet cell mucus area calculated by the DL system compared with non-relapse group.

Chirra
202377
CD
N = 80
MRERadiomics
  • Develop and test a prognostic radiomic model for early surgery in CD patients requiring immunomodulators and/or biologics.

  • A model combining radiomics, simplified MaRIA scoring, and clinical variables yielded the best performance (AUC 0.83) compared with radiomics (AUC 0.74), simplified MaRIA (AUC 0.58) or clinical variables alone (AUC 0.48).

  • The model accurately predicted time to surgery (HR, 4.13, p = 6.90e-6, C-index 0.71).

Iacucci
202344
UC
N = 273
Histologic imagesDL, CNN
  • Develop and validate AI computer-aided diagnosis system to evaluate UC biopsies and predict prognosis.

  • The system had 89% sensitivity and 85% specificity for distinguishing histologic remission/activity based on PICaSSO Histologic Remission Index (PHRI).

  • The AI-assessed PHRI was associated with flare up in 1 year with hazard ratio 4.64 compared with 3.56 with a human-assessed PHRI.

Li
202353
CD
N = 256
CTERadiomics
  • Develop and validate a visceral fat-based radiomics model for predicting CD progression (development of penetrating/stricturing disease or surgery) and compare prediction accuracy to a subcutaneous fat-based radiomics model and six conventional fat metrics.

  • The visceral fat-based radiomics model (AUC 0.85) outperformed the subcutaneous fat-based model (AUC 0.79).

  • On multivariate Cox regression analysis, the visceral fat-based radiomics model was the most important independent predictor of CD progression (HR, 9.29, P < .005) followed by the subcutaneous fat-based radiomics model (HR, 3.28, P = .06)

  • Decision curve analysis showed visceral fat-based radiomics model had better net benefit over subcutaneous fat-based model.

  • The conventional fat metrics were not associated with disease progression (P = .089-0.996).

Ruiqing
202339
CD
N = 167
CTERadiomics
  • Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.

  • The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83

  • The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85.

  • A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.

Shen
202373
CD
N = 186
CTERadiomics
  • Develop and validate a preoperative CTE-based radiomics signature to predict postoperative recurrence (POR).

  • Compare predictive accuracy of a multimodal nomogram (incorporating radiomics signature, clinical, and radiologic features) vs clinical-radiologic only model for POR.

  • An intestinal lesion only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signature (HR, 2.19, P = .0018) were associated with POR.

  • The multi-modal nomogram performed modestly better than the clinical-radiologic only model (AUC 0.69 vs 0.66).

  • Decision curve analysis showed the multi-modal nomogram had moderately better net benefits than clinical-radiological model.

Yao
202378
CD
N = 268
CTERadiomics
  • Develop and validate a clinical-radiomic nomogram to predict 1-year surgical risk after CD diagnosis.

  • The radiomics model will extra features from the inflamed segment of bowel and peri-intestinal mesenteric fat.

  • The clinical-radiomic model had superior performance (AUC 0.90) compared with the clinical only (AUC 0.77), intestinal radiomic only (AUC 0.88), and peri-intestinal mesenteric fat only (AUC 0.80) models.

  • The clinical-radiomic nomogram was 71% sensitive 90% specific, and 85% accurate for predicting surgery within 1 year of CD diagnosis.

  • Decision curve analysis supported net benefit of the clinical-radiomic nomogram over other models.

Author
Year
DatasetData sourceAlgorithm typeTaskPerformance
Klein
201773
CD
N = 105
Histologic imagesDL
  • Validate a morphometric histology image analysis of baseline intestinal biopsies for predicting clinical phenotype in CD colitis.

  • Using B1 as reference the model was able to predict future B2 and B3 disease in 5 years with AUC 0.74 and 0.78, respectively.

Chen
202174
CD
N = 186
CTERadiomic
  • Develop and validate a radiomic nomogram using the intestinal segment with the worst disease to predict secondary loss of response to infliximab.

  • A multivariate prediction model containing 8 features had significant discrimination (AUC 0.88).

  • Ten-fold cross validation of the model yielded mean AUC 0.82 and 82% accuracy.

Feng
202175
CD
N = 322
MRERadiomic
  • Explore the correlation between R2* (an MRI-based radiomic index that detects changes in hepatic iron metabolism) and inflammation.

  • Develop a nomogram based on R2* to identify secondary loss of response to infliximab.

  • R2* was higher in active vs inactive CD (28.0 vs 26.6, P = .03).

  • Multivariable prediction model including CRP, hemoglobin, and R2* had good discrimination for secondary loss of response with AUC 0.72 for training and validation data set.

Ohara
202276
UC
N = 114
Histologic imagesDL
  • Determine if an DL-based system for automatic quantification of goblet cell mucin is useful for predicting future relapse in UC patients in endoscopic remission.

  • The relapse group had lower goblet cell mucus area calculated by the DL system compared with non-relapse group.

Chirra
202377
CD
N = 80
MRERadiomics
  • Develop and test a prognostic radiomic model for early surgery in CD patients requiring immunomodulators and/or biologics.

  • A model combining radiomics, simplified MaRIA scoring, and clinical variables yielded the best performance (AUC 0.83) compared with radiomics (AUC 0.74), simplified MaRIA (AUC 0.58) or clinical variables alone (AUC 0.48).

  • The model accurately predicted time to surgery (HR, 4.13, p = 6.90e-6, C-index 0.71).

Iacucci
202344
UC
N = 273
Histologic imagesDL, CNN
  • Develop and validate AI computer-aided diagnosis system to evaluate UC biopsies and predict prognosis.

  • The system had 89% sensitivity and 85% specificity for distinguishing histologic remission/activity based on PICaSSO Histologic Remission Index (PHRI).

  • The AI-assessed PHRI was associated with flare up in 1 year with hazard ratio 4.64 compared with 3.56 with a human-assessed PHRI.

Li
202353
CD
N = 256
CTERadiomics
  • Develop and validate a visceral fat-based radiomics model for predicting CD progression (development of penetrating/stricturing disease or surgery) and compare prediction accuracy to a subcutaneous fat-based radiomics model and six conventional fat metrics.

  • The visceral fat-based radiomics model (AUC 0.85) outperformed the subcutaneous fat-based model (AUC 0.79).

  • On multivariate Cox regression analysis, the visceral fat-based radiomics model was the most important independent predictor of CD progression (HR, 9.29, P < .005) followed by the subcutaneous fat-based radiomics model (HR, 3.28, P = .06)

  • Decision curve analysis showed visceral fat-based radiomics model had better net benefit over subcutaneous fat-based model.

  • The conventional fat metrics were not associated with disease progression (P = .089-0.996).

Ruiqing
202339
CD
N = 167
CTERadiomics
  • Investigate the feasibility of developing a lumen-based, mesenteric-based, and fusion (lumen + mesenteric features) radiomics model to grade mucosal activity (SES-CD) and risk of surgery.

  • The fusion model could distinguish multicategorical SES-CD score (0, 1, 2-5, 6-10, >10) by bowel segment with AUC 0.83

  • The fusion model could distinguish bowel segments with moderate/severe disease (SES-CD > 5) with AUC 0.85.

  • A nomogram including image-based score (eg, mural enhancement, fistula, mesenteric fibrofatty profliferation) and fusion model could accurately predict need for surgery within 12 months from CTE.

Shen
202373
CD
N = 186
CTERadiomics
  • Develop and validate a preoperative CTE-based radiomics signature to predict postoperative recurrence (POR).

  • Compare predictive accuracy of a multimodal nomogram (incorporating radiomics signature, clinical, and radiologic features) vs clinical-radiologic only model for POR.

  • An intestinal lesion only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signature (HR, 2.19, P = .0018) were associated with POR.

  • The multi-modal nomogram performed modestly better than the clinical-radiologic only model (AUC 0.69 vs 0.66).

  • Decision curve analysis showed the multi-modal nomogram had moderately better net benefits than clinical-radiological model.

Yao
202378
CD
N = 268
CTERadiomics
  • Develop and validate a clinical-radiomic nomogram to predict 1-year surgical risk after CD diagnosis.

  • The radiomics model will extra features from the inflamed segment of bowel and peri-intestinal mesenteric fat.

  • The clinical-radiomic model had superior performance (AUC 0.90) compared with the clinical only (AUC 0.77), intestinal radiomic only (AUC 0.88), and peri-intestinal mesenteric fat only (AUC 0.80) models.

  • The clinical-radiomic nomogram was 71% sensitive 90% specific, and 85% accurate for predicting surgery within 1 year of CD diagnosis.

  • Decision curve analysis supported net benefit of the clinical-radiomic nomogram over other models.

MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).

Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.

While histologic remission can be difficult to achieve in the real world, several studies have found histologic remission may be associated with lower risk of future flares despite being in endoscopic remission.78 Using the PICaSSO Histologic Remission Index (PHRI) for UC, Iacucci et al developed a CNN-based system that could distinguish histologic remission vs activity with 89% sensitivity and 85% specificity.79 They also found an AI-assessed PHRI was associated with a UC flare within 1 year, with a hazard ratio (HR) of 4.64 compared with 3.56 with a human-assessed PHRI. Similarly, using computational pathology methods, Klein et al developed a system to analyze baseline histology images from patients with Crohn’s colitis that could predict future development of fibrostenosing and internal penetrating disease behavior within 5 years with AUC 0.74 and 0.78, respectively.73 Likewise, Ohara et al developed a DL-based system to automate quantification of goblet cell mucin to predict risk of relapse within 12 months in UC subjects in endoscopic remission.76 The investigators found the relapse group had lower goblet cell mucus area calculated by the DL system compared with the nonrelapse group. These studies highlight how AI can enhance our prognostic abilities using data previously not easily obtainable using traditional methods, primarily due to time restraints.

For imaging, radiomic-based models maybe the best approach for predicting outcomes in IBD. We identified no studies that used DL algorithms to develop prognostic models in IBD. In one study, a VAT-based radiomic signature independently predicted risk of CD progression (HR, 9.29, P = .005) with good performance in 2 independent test cohorts (AUC 0.82-0.87).80 Conventional VAT metrics such as BMI, VAT volume, or VAT:SAT volume were not associated with risk of CD progression (P = 0.089-0.996), highlighting the limitations of human-derived prognostic biomarkers. Studies have also developed multimodal radiomic-based nomograms to predict secondary loss of response to infliximab using pretreatment imaging with good performance (AUC 0.72-0.88).74,75 Interestingly, one of these studies developed a multivariable nomogram using an MRI-based radiomic index that detects changes in hepatic iron metabolism (R*) to predict secondary loss of response to infliximab with acceptable performance (AUC 0.72).75 This study is another example of how radiomics can help uncover additional information about the underlying biology of CD. Similarly, 3 studies have developed radiomic-based nomograms using features from the bowel and/or peri-intestinal mesenteric adipose tissue to predict 1-year risk of surgery in CD.57,77,81 Like other multimodal nomograms mentioned previously, incorporating clinical factors improved the performance for predicting surgery with acceptable to good performance (AUC 0.70-0.90). Finally, predicting postoperative recurrence in CD has remained an unmet challenge despite significant efforts, and many prognostic markers are variably supported by the literature. Using imaging obtained preoperatively, Shen et al identified intestinal only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signatures (HR, 2.19, P = .0018) that were associated with postoperative recurrence.82 Unfortunately, the multimodal nomogram incorporating these signatures with clinical factors had poor performance (AUC 0.69). The performance may have been limited by defining postoperative recurrence as composite end point including endoscopic, radiographic, or surgical recurrence, which can be confounded by patient adherence to postoperative disease monitoring. Overall, the available literature provides promising data supporting the use of AI to better predict outcomes in IBD. As studies in oncology and IBD have shown radiomics can reflect disease biology, correlating radiomic signatures with histologic or cellular level data (ie, transcriptomics) will not only advance our knowledge about the heterogenous nature of IBD but also develop more accurate prediction models.

Limitations and Future Directions

While studies investigating the role of AI in IBD have made significant strides, there are several important limitations. First, AI algorithms’ performance are dependent on the availability and quality of data. Majority of studies rely on retrospective data. Especially with CNN-based systems for automated endoscopic scoring, endoscopic image and videos acquisition were not standardized, as most were retrospective analyses from clinical practice. Using standardized acquisition of endoscopic data from randomized controlled trials will strengthen future development of AI-based endoscopic scoring systems such as Stidham et al and Gottlieb et al.47,60 Second, there are important inherent biases to recognize. There is likely a significant degree of publication bias in the current literature, as investigators are unlikely to report negative AI algorithms and journals are unlikely to publish these negative studies. Journals should encourage the submission and publication of negative studies to fully comprehend the role and value of AI for IBD. There is also potential bias in the data sets used to train the AI models. This is particularly important to recognize considering most IBD data sets comprise Caucasian subjects, so whether these AI-based systems are accurate in non-Caucasian subjects is unclear. Efforts to study AI in underrepresented demographics will be crucial to prevent exacerbation of healthcare disparities. Third, studies in AI-based systems for endoscopic, histology, and imaging tend to favor either UC or CD. For example, endoscopy-based AI studies are primarily conducted in UC subjects, while imaging-based AI studies are primarily in CD subjects. Endoscopic scoring systems for CD are subject to the same limitations as UC scoring systems, so future studies are needed to developed AI-based systems to automate scoring in CD. Additionally, more studies are needed to develop and validate AI-based systems that can differentiate between CD and UC on endoscopy. Furthermore, studies developing AI-based systems to determine prognosis based on endoscopic findings are also needed. Finally, standardization of endoscopy, histology, and imaging techniques and settings are not standardized across institutions. Future standardization of these data is needed for AI-based systems for IBD to function appropriately across different institutions.

Conclusion

In conclusion, the transformative potential of AI applications across endoscopy, histology, and imaging in IBD is undeniably promising. Artificial intelligence stands poised to revolutionize the landscape of IBD care by addressing unmet clinical needs, improving workflow efficiency and enhancing patient outcomes through multifaceted approaches. The integration of AI-based clinical tools will play a critical role in advancing precision medicine in IBD. Additionally, AI-powered analytics present opportunities to augment the efficiency of clinical trials, facilitating quicker and more insightful analyses, ultimately expediting the development of novel therapies for IBD.

While the strides made in AI applications for IBD are exciting, inherent limitations and gaps in knowledge in the available literature underscore the need for cautious optimism. Many AI algorithms necessitate rigorous validation in larger prospective studies to ensure their reliability, reproducibility, and robust performance across diverse patient populations and clinical settings. Thus, as we continue to explore the potential for an AI-driven healthcare, the translation of these innovative tools and technologies into routine clinical practice require a comprehensive understanding of their limitations, coupled with a commitment to address these through continual research and development. Embracing a collaborative approach among clinicians, researchers, and technology developers is imperative to realizing the full potential of AI in IBD.

Supplementary Data

Supplementary data is available at Inflammatory Bowel Diseases online.

Author Contributions

P.G. is the guarantor of the article and was involved in concept and design, drafting of article, and final approval of article.

O.M. was involved in the drafting and final approval of the article.

S.D. was involved in the drafting and final approval of the article.

D.C. was involved in the drafting and final approval of the article.

P.W. was involved in the drafting and final approval of the article.

X.H. was involved in the drafting and final approval of the article.

D.L. was involved in the drafting and final approval of the article.

J.H.M. was involved in the drafting and final approval of the article.

D.P.B.M. was involved in the drafting and final approval of the article.

Funding

There are no sources of funding to disclose related to the work for this article.

Conflicts of interest

D.C.: speaker’s fees and/ or research support from Takeda, Janssen, AbbVie, Illy Lilly, Reckitt,Lapidot

Consultancy fees from Takeda, AbbVie, and Taro.

D.P.B.M. has received consulting fees from Takeda, Prometheus Biosciences Inc, Prometheus Labs, Palisade Bio, and MERCK.

P.G., O.M., S.D., P.W., X.H., D.L., J.H.M. have no conflicts of interest to disclose.

References

1.

Lewis
JD
,
Parlett
LE
,
Jonsson-Funk
ML
, et al.
Incidence, prevalence and racial and ethnic distribution of inflammatory bowel disease in the United States
.
Gastroenterology.
2023
;
165
(
5
):
1197
-
1205
.

2.

Ng
SC
,
Shi
HY
,
Hamidi
N
, et al.
Worldwide incidence and prevalence of inflammatory bowel disease in the 21st century: a systematic review of population-based studies
.
Lancet.
2017
;
390
(
10114
):
2769
-
2778
.

3.

FDA authorizes marketing of first device that uses artificial intelligence to help detect potential signs of colon cancer.
2021
. https://www.fda.gov/news-events/press-announcements/fda-authorizes-marketing-first-device-uses-artificial-intelligence-help-detect-potential-signs-colon.
Accessed 8 March 2023
.

4.

Faghani
S
,
Codipilly
DC
,
David
V
, et al.
Development of a deep learning model for the histologic diagnosis of dysplasia in Barrett’s esophagus
.
Gastrointest Endosc.
2022
;
96
(
6
):
918
-
925 e3
.

5.

Kim
M
,
Yun
J
,
Cho
Y
, et al.
Deep learning in medical imaging
.
Neurospine.
2019
;
16
(
4
):
657
-
668
.

6.

Esteva
A
,
Kuprel
B
,
Novoa
RA
, et al.
Dermatologist-level classification of skin cancer with deep neural networks
.
Nature.
2017
;
542
(
7639
):
115
-
118
.

7.

Chen
MM
,
Terzic
A
,
Becker
AS
, et al.
Artificial intelligence in oncologic imaging
.
Eur J Radiol Open.
2022
;
9
(
1
):
100441
.

8.

Aslam
MF
,
Bano
S
,
Khalid
M
, et al.
The effectiveness of real-time computer-aided and quality control systems in colorectal adenoma and polyp detection during colonoscopies: a meta-analysis
.
Ann Med Surg (Lond).
2023
;
85
(
2
):
80
-
91
.

9.

Geis
JR
,
Brady
AP
,
Wu
CC
, et al.
Ethics of artificial intelligence in radiology: summary of the Joint European and North American Multisociety Statement
.
Radiology.
2019
;
293
(
2
):
436
-
440
.

10.

van Timmeren
JE
,
Cester
D
,
Tanadini-Lang
S
,
Alkadhi
H
,
Baessler
B.
Radiomics in medical imaging—“how-to” guide and critical reflection
.
Insights Imaging.
2020
;
11
(
1
):
91
-
107
.

11.

Ganeshan
B
,
Strukowska
O
,
Skogen
K
,
Young
R
,
Chatwin
C
,
Miles
K.
Heterogeneity of focal breast lesions and surrounding tissue assessed by mammographic texture analysis: preliminary evidence of an association with tumor invasion and estrogen receptor status
.
Front Oncol.
2011
;
1
(
1
):
33
.

12.

Limkin
EJ
,
Sun
R
,
Dercle
L
, et al.
Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology
.
Ann Oncol.
2017
;
28
(
6
):
1191
-
1206
.

13.

Rutman
AM
,
Kuo
MD.
Radiogenomics: creating a link between molecular diagnostics and diagnostic imaging
.
Eur J Radiol.
2009
;
70
(
2
):
232
-
241
.

14.

Segal
E
,
Sirlin
CB
,
Ooi
C
, et al.
Decoding global gene expression programs in liver cancer by noninvasive imaging
.
Nat Biotechnol.
2007
;
25
(
6
):
675
-
680
.

15.

Qureshi
TA
,
Gaddam
S
,
Wachsman
AM
, et al.
Predicting pancreatic ductal adenocarcinoma using artificial intelligence analysis of prediagnostic computed tomography images
.
Cancer Biomark.
2022
;
33
(
2
):
211
-
217
.

16.

Yip
SS
,
Aerts
HJ.
Applications and limitations of radiomics
.
Phys Med Biol.
2016
;
61
(
13
):
R150
-
R166
.

17.

Mahapatra
D
,
Schueffler
P
,
Tielbeek
JA
,
Buhmann
JM
,
Vos
FM.
A supervised learning approach for Crohn’s disease detection using higher-order image statistics and a novel shape asymmetry measure
.
J Digit Imaging.
2013
;
26
(
5
):
920
-
931
.

18.

Hahnemann
ML
,
Nensa
F
,
Kinner
S
, et al.
Improved detection of inflammatory bowel disease by additional automated motility analysis in magnetic resonance imaging
.
Invest Radiol.
2015
;
50
(
2
):
67
-
72
.

19.

Mossotto
E
,
Ashton
JJ
,
Coelho
T
, et al.
Classification of paediatric inflammatory bowel disease using machine learning
.
Sci Rep.
2017
;
7
(
1
):
2427
.

20.

Naziroglu
RE
,
Puylaert
CAJ
,
Tielbeek
JAW
, et al.
Semi-automatic bowel wall thickness measurements on MR enterography in patients with Crohn’s disease
.
Br J Radiol.
2017
;
90
(
1074
):
20160654
.

21.

Gollifer
RM
,
Menys
A
,
Plumb
A
, et al.
Automated versus subjective assessment of spatial and temporal MRI small bowel motility in Crohn’s disease
.
Clin Radiol.
2019
;
74
(
10
):
814.e9
-
814.e19
.

22.

Klang
E
,
Barash
Y
,
Margalit
RY
, et al.
Deep learning algorithms for automated detection of Crohn’s disease ulcers by video capsule endoscopy
.
Gastrointest Endosc.
2020
;
91
(
3
):
606
-
613.e2
.

23.

Klang
E
,
Grinman
A
,
Soffer
S
, et al.
Automated detection of Crohn’s disease intestinal strictures on capsule endoscopy images using deep neural networks
.
J Crohns Colitis.
2021
;
15
(
5
):
749
-
756
.

24.

Li
H
,
Mo
Y
,
Huang
C
, et al.
An MSCT-based radiomics nomogram combined with clinical factors can identify Crohn’s disease and ulcerative colitis
.
Ann Transl Med.
2021
;
9
(
7
):
572
.

25.

Zhu
C
,
Yu
Y
,
Wang
S
, et al.
a novel clinical radiomics nomogram to identify Crohn’s disease from intestinal tuberculosis
.
J Inflamm Res.
2021
;
14
(
1
):
6511
-
6521
.

26.

Arkko
A
,
Kaseva
T
,
Salli
E
, et al.
Automatic detection of Crohn’s disease using quantified motility in magnetic resonance enterography: initial experiences
.
Clin Radiol.
2022
;
77
(
2
):
96
-
103
.

27.

Klang
E
,
Kopylov
U
,
Mortensen
B
, et al.
A convolutional neural network deep learning model trained on CD ulcers images accurately identifies NSAID ulcers
.
Front Med (Lausanne).
2021
;
8
(
1
):
656493
.

28.

Jiang
F
,
Fu
X
,
Kuang
K
,
Fan
D.
Artificial intelligence algorithm-based differential diagnosis of Crohn’s disease and ulcerative colitis by CT image
.
Comput Math Methods Med.
2022
;
2022
(
1
):
3871994
.

29.

Wang
L
,
Chen
L
,
Wang
X
, et al.
Development of a convolutional neural network-based colonoscopy image assessment model for differentiating Crohn’s disease and ulcerative colitis
.
Front Med (Lausanne).
2022
;
9
(
1
):
789862
.

30.

Brodersen
JB
,
Jensen
MD
,
Leenhardt
R
, et al.
Artificial intelligence-assisted analysis of pan-enteric capsule endoscopy in patients with suspected Crohn’s disease. A study on diagnostic performance
.
J Crohns Colitis.
2023
;
18
(
1
):
75
-
81
.

31.

Carter
D
,
Albshesh
A
,
Shimon
C
, et al.
Automatized detection of Crohn’s disease in intestinal ultrasound using convolutional neural network
.
Inflamm Bowel Dis.
2023
;
29
(
12
):
16
.

32.

Gong
T
,
Li
M
,
Pu
H
, et al.
Computed tomography enterography-based multiregional radiomics model for differential diagnosis of Crohn’s disease from intestinal tuberculosis
.
Abdom Radiol.
2023
;
48
(
6
):
1900
-
1910
.

33.

Zhou
Z
,
Xiong
Z
,
Cheng
R
, et al.
Volumetric visceral fat machine learning phenotype on CT for differential diagnosis of inflammatory bowel disease
.
Eur Radiol.
2023
;
33
(
3
):
1862
-
1872
.

34.

Cortegoso Valdivia
P
,
Deding
U
,
Bjorsum-Meyer
T
, et al. ;
International CApsule endoscopy REsearch (I-CARE) Group
.
Inter/intra-observer agreement in video-capsule endoscopy: are we getting it all wrong? A systematic review and meta-analysis
.
Diagnostics (Basel).
2022
;
12
(
10
):
2400
.

35.

Gu
P
,
Dube
S
,
McGovern
DPB.
Medical and surgical implications of mesenteric adipose tissue in Crohn’s disease: a review of the literature
.
Inflamm Bowel Dis.
2023
;
29
(
3
):
458
-
469
.

36.

Bhatnagar
G
,
Makanyanga
J
,
Ganeshan
B
, et al.
MRI texture analysis parameters of contrast-enhanced T1-weighted images of Crohn’s disease differ according to the presence or absence of histological markers of hypoxia and angiogenesis
.
Abdom Radiol (NY).
2016
;
41
(
7
):
1261
-
1269
.

37.

Makanyanga
J
,
Ganeshan
B
,
Rodriguez-Justo
M
, et al.
MRI texture analysis (MRTA) of T2-weighted images in Crohn’s disease may provide information on histological and MRI disease activity in patients undergoing ileal resection
.
Eur Radiol.
2017
;
27
(
2
):
589
-
597
.

38.

Lamash
Y
,
Kurugol
S
,
Warfield
SK.
Semi-automated extraction of Crohns disease MR imaging markers using a 3D residual CNN with distance prior
. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain,
September 20, 2018
, Proceedings/
Danail
Stoyanov
,
Zeike
Taylor
,
Gustavo
Carneiro
,
Tanveer
Syeda-Mahmood
et al. (eds.)
2018
;
11045
:
218
-
226
. https://pubmed.ncbi.nlm.nih.gov/30450491/

39.

Puylaert
CAJ
,
Schuffler
PJ
,
Naziroglu
RE
, et al.
Semiautomatic assessment of the terminal ileum and colon in patients with Crohn disease using MRI (the VIGOR++ Project)
.
Acad Radiol.
2018
;
25
(
8
):
1038
-
1045
.

40.

Maeda
Y
,
Kudo
SE
,
Mori
Y
, et al.
Fully automated diagnostic system with artificial intelligence using endocytoscopy to identify the presence of histologic inflammation associated with ulcerative colitis (with video)
.
Gastrointest Endosc.
2019
;
89
(
2
):
408
-
415
.

41.

Ozawa
T
,
Ishihara
S
,
Fujishiro
M
, et al.
Novel computer-assisted diagnosis system for endoscopic disease activity in patients with ulcerative colitis
.
Gastrointest Endosc.
2019
;
89
(
2
):
416
-
421.e1
.

42.

Stidham
RW
,
Liu
W
,
Bishu
S
, et al.
Performance of a deep learning model vs human reviewers in grading endoscopic disease severity of patients with ulcerative colitis
.
JAMA Netw Open.
2019
;
2
(
5
):
e193963
.

43.

Tabari
A
,
Kilcoyne
A
,
Jeck
WR
,
Mino-Kenudson
M
,
Gee
MS.
Texture analysis of magnetic resonance enterography contrast enhancement can detect fibrosis in Crohn disease strictures
.
J Pediatr Gastroenterol Nutr.
2019
;
69
(
5
):
533
-
538
.

44.

Stidham
RW
,
Enchakalody
B
,
Waljee
AK
, et al.
Assessing small bowel stricturing and morphology in Crohn’s disease using semiautomated image analysis
.
Inflamm Bowel Dis.
2020
;
26
(
5
):
734
-
742
.

45.

Takenaka
K
,
Ohtsuka
K
,
Fujii
T
, et al.
Development and validation of a deep neural network for accurate evaluation of endoscopic images from patients with ulcerative colitis
.
Gastroenterology.
2020
;
158
(
8
):
2150
-
2157
.

46.

Barash
Y
,
Azaria
L
,
Soffer
S
, et al.
Ulcer severity grading in video capsule images of patients with Crohn’s disease: an ordinal neural network solution
.
Gastrointest Endosc.
2021
;
93
(
1
):
187
-
192
.

47.

Gottlieb
K
,
Requa
J
,
Karnes
W
, et al.
Central reading of ulcerative colitis clinical trial videos using neural networks
.
Gastroenterology.
2021
;
160
(
3
):
710
-
719.e2
.

48.

Yao
H
,
Najarian
K
,
Gryak
J
, et al.
Fully automated endoscopic disease activity assessment in ulcerative colitis
.
Gastrointest Endosc.
2021
;
93
(
3
):
728
-
736.e1
.

49.

Li
X
,
Liang
D
,
Meng
J
, et al.
Development and validation of a novel computed-tomography enterography radiomic approach for characterization of intestinal fibrosis in Crohn’s disease
.
Gastroenterology.
2021
;
160
(
7
):
2303
-
2316.e11
.

50.

Ding
H
,
Li
J
,
Jiang
K
, et al.
Assessing the inflammatory severity of the terminal ileum in Crohn disease using radiomics based on MRI
.
BMC Med Imaging.
2022
;
22
(
1
):
118
.

51.

Guez
I
,
Focht
G
,
Greer
MC
, et al.
Development of a multimodal machine-learning fusion model to noninvasively assess ileal Crohn’s disease endoscopic activity
.
Comput Methods Programs Biomed.
2022
;
227
(
1
):
107207
.

52.

Li
T
,
Liu
Y
,
Guo
J
, et al.
Prediction of the activity of Crohn’s disease based on CT radiomics combined with machine learning models
.
J Xray Sci Technol
.
2022
;
30
(
1
):
1155
-
1168
.

53.

Meng
J
,
Luo
Z
,
Chen
Z
, et al.
Intestinal fibrosis classification in patients with Crohn’s disease using CT enterography-based deep learning: comparisons with radiomics and radiologists
.
Eur Radiol.
2022
;
32
(
12
):
8692
-
8705
.

54.

Noguchi
T
,
Ando
T
,
Emoto
S
, et al.
Artificial intelligence program to predict p53 mutations in ulcerative colitis-associated cancer or dysplasia
.
Inflamm Bowel Dis.
2022
;
28
(
7
):
1072
-
1080
.

55.

Yuan
G
,
He
Y
,
Cao
QH
, et al.
Visceral adipose volume is correlated with surgical tissue fibrosis in Crohn’s disease of the small bowel
.
Gastroenterol Rep.
2022
;
10
(
1
):
goac044
.

56.

Najdawi
F
,
Sucipto
K
,
Mistry
P
, et al.
Artificial intelligence enables quantitative assessment of ulcerative colitis histology
.
Mod Pathol.
2023
;
36
(
6
):
100124
.

57.

Ruiqing
L
,
Jing
Y
,
Shunli
L
, et al.
A novel radiomics model integrating luminal and mesenteric features to predict mucosal activity and surgery risk in Crohn’s disease patients: a multicenter study
.
Acad Radiol.
2023
;
30
(
1
):
04
.

58.

Rymarczyk
D
,
Schultz
W
,
Borowa
A
, et al.
Deep learning models capture histological disease activity in Crohn’s disease and ulcerative colitis with high fidelity
.
J Crohns Colitis.
2023
:
jjad171
.

59.

Xie
W
,
Hu
J
,
Liang
P
, et al.
Deep learning based lesions detection and severity grading of small bowel Crohn’s disease ulcers on double-balloon endoscopy images
.
Gastrointest Endosc.
2023
.

60.

Stidham
RW
,
Cai
L
,
Cheng
S
, et al.
Using computer vision to improve endoscopic disease quantification in therapeutic clinical trials of ulcerative colitis
.
Gastroenterology.
2024
;
166
(
1
):
155
-
167.e2
.

61.

Pagnini
C
,
Menasci
F
,
Desideri
F
, et al.
Endoscopic scores for inflammatory bowel disease in the era of “mucosal healing”: old problem, new perspectives
.
Dig Liver Dis.
2016
;
48
(
7
):
703
-
708
.

62.

Gui
X
,
Bazarova
A
,
Del Amor
R
, et al.
PICaSSO histologic remission index (PHRI) in ulcerative colitis: development of a novel simplified histological score for monitoring mucosal healing and predicting clinical outcomes and its applicability in an artificial intelligence system
.
Gut.
2022
;
71
(
5
):
889
-
898
.

63.

Marchal-Bressenot
A
,
Salleron
J
,
Boulagnon-Rombi
C
, et al.
Development and validation of the Nancy histological index for UC
.
Gut.
2017
;
66
(
1
):
43
-
49
.

64.

Rimola
J
,
Rodriguez
S
,
Garcia-Bosch
O
, et al.
Magnetic resonance for assessment of disease activity and severity in ileocolonic Crohn’s disease
.
Gut.
2009
;
58
(
8
):
1113
-
1120
.

65.

Steward
MJ
,
Punwani
S
,
Proctor
I
, et al.
Non-perforating small bowel Crohn’s disease assessed by MRI enterography: derivation and histopathological validation of an MR-based activity index
.
Eur J Radiol.
2012
;
81
(
9
):
2080
-
2088
.

66.

Thierry
ML
,
Rousseau
H
,
Pouillon
L
, et al.
Accuracy of diffusion-weighted magnetic resonance imaging in detecting mucosal healing and treatment response, and in predicting surgery, in Crohn’s disease
.
J Crohns Colitis.
2018
;
12
(
10
):
1180
-
1190
.

67.

Buisson
A
,
Pereira
B
,
Goutte
M
, et al.
Magnetic resonance index of activity (MaRIA) and Clermont score are highly and equally effective MRI indices in detecting mucosal healing in Crohn’s disease
.
Dig Liver Dis.
2017
;
49
(
11
):
1211
-
1217
.

68.

Pariente
B
,
Cosnes
J
,
Danese
S
, et al.
Development of the Crohn’s disease digestive damage score, the Lemann score
.
Inflamm Bowel Dis.
2011
;
17
(
6
):
1415
-
1422
.

69.

Rozendorn
N
,
Amitai
MM
,
Eliakim
RA
,
Kopylov
U
,
Klang
E.
A review of magnetic resonance enterography-based indices for quantification of Crohn’s disease inflammation
.
Therap Adv Gastroenterol.
2018
;
11
(
1
):
1756284818765956
.

70.

Tielbeek
JA
,
Makanyanga
JC
,
Bipat
S
, et al.
Grading Crohn disease activity with MRI: interobserver variability of MRI features, MRI scoring of severity, and correlation with Crohn disease endoscopic index of severity
.
AJR Am J Roentgenol.
2013
;
201
(
6
):
1220
-
1228
.

71.

Rieder
F.
Toward an antifibrotic therapy for inflammatory bowel disease
.
United European Gastroenterol J.
2016
;
4
(
4
):
493
-
495
.

72.

Lin
SN
,
Mao
R
,
Qian
C
, et al. ;
Stenosis Therapy and Antifibrotic Research (STAR) Consortium
.
Development of antifibrotic therapy for stricturing Crohn’s disease: lessons from randomized trials in other fibrotic diseases
.
Physiol Rev.
2022
;
102
(
2
):
605
-
652
.

73.

Klein
A
,
Mazor
Y
,
Karban
A
, et al.
Early histological findings may predict the clinical phenotype in Crohn’s colitis
.
United European Gastroenterol J.
2017
;
5
(
5
):
694
-
701
.

74.

Chen
Y
,
Li
H
,
Feng
J
, et al.
A novel radiomics nomogram for the prediction of secondary loss of response to infliximab in Crohn’s disease
.
J Inflamm Res.
2021
;
14
(
1
):
2731
-
2740
.

75.

Feng
J
,
Feng
Q
,
Chen
Y
, et al.
MRI-based radiomic signature identifying secondary loss of response to infliximab in Crohn’s disease
.
Front Nutr.
2021
;
8
(
1
):
773040
.

76.

Ohara
J
,
Nemoto
T
,
Maeda
Y
, et al.
Deep learning-based automated quantification of goblet cell mucus using histological images as a predictor of clinical relapse of ulcerative colitis with endoscopic remission
.
J Gastroenterol.
2022
;
57
(
12
):
962
-
970
.

77.

Chirra
P
,
Sharma
A
,
Bera
K
, et al.
Integrating radiomics with clinicoradiological scoring can predict high-risk patients who need surgery in Crohn’s disease: a pilot study
.
Inflamm Bowel Dis.
2023
;
29
(
3
):
349
-
358
.

78.

Bryant
RV
,
Winer
S
,
Travis
SP
,
Riddell
RH.
Systematic review: histological remission in inflammatory bowel disease. Is “complete” remission the new treatment paradigm? An IOIBD initiative
.
J Crohns Colitis.
2014
;
8
(
12
):
1582
-
1597
.

79.

Iacucci
M
,
Parigi
TL
,
Del Amor
R
, et al.
Artificial intelligence enabled histological prediction of remission or activity and clinical outcomes in ulcerative colitis
.
Gastroenterology.
2023
;
164
(
7
):
1180
-
1188.e2
.

80.

Li
X
,
Zhang
N
,
Hu
C
, et al.
CT-based radiomics signature of visceral adipose tissue for prediction of disease progression in patients with Crohn’s disease: a multicentre cohort study
.
EClinicalMedicine.
2023
;
56
(
1
):
101805
.

81.

Yao
J
,
Zhou
J
,
Zhong
Y
, et al.
Computed tomography-based radiomics nomogram using machine learning for predicting 1-year surgical risk after diagnosis of Crohn’s disease
.
Med Phys.
2023
;
50
(
6
):
3862
-
3872
.

82.

Shen
XD
,
Zhang
RN
,
Huang
SY
, et al.
Preoperative computed tomography enterography-based radiomics signature: a potential predictor of postoperative anastomotic recurrence in patients with Crohn’s disease
.
Eur J Radiol.
2023
;
162
(
1
):
110766
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)