-
PDF
- Split View
-
Views
-
Cite
Cite
Phillip Gu, Oreen Mendonca, Dan Carter, Shishir Dube, Paul Wang, Xiuzhen Huang, Debiao Li, Jason H Moore, Dermot P B McGovern, AI-luminating Artificial Intelligence in Inflammatory Bowel Diseases: A Narrative Review on the Role of AI in Endoscopy, Histology, and Imaging for IBD, Inflammatory Bowel Diseases, Volume 30, Issue 12, December 2024, Pages 2467–2485, https://doi.org/10.1093/ibd/izae030
- Share Icon Share
Abstract
Endoscopy, histology, and cross-sectional imaging serve as fundamental pillars in the detection, monitoring, and prognostication of inflammatory bowel disease (IBD). However, interpretation of these studies often relies on subjective human judgment, which can lead to delays, intra- and interobserver variability, and potential diagnostic discrepancies. With the rising incidence of IBD globally coupled with the exponential digitization of these data, there is a growing demand for innovative approaches to streamline diagnosis and elevate clinical decision-making. In this context, artificial intelligence (AI) technologies emerge as a timely solution to address the evolving challenges in IBD. Early studies using deep learning and radiomics approaches for endoscopy, histology, and imaging in IBD have demonstrated promising results for using AI to detect, diagnose, characterize, phenotype, and prognosticate IBD. Nonetheless, the available literature has inherent limitations and knowledge gaps that need to be addressed before AI can transition into a mainstream clinical tool for IBD. To better understand the potential value of integrating AI in IBD, we review the available literature to summarize our current understanding and identify gaps in knowledge to inform future investigations.
Introduction
Inflammatory bowel diseases (IBD), which consists of Crohn’s disease (CD) and ulcerative colitis (UC), are chronic immune-mediated inflammatory diseases (IMID) of the gastrointestinal (GI) tract that affects over 2 million Americans and are associated with significant morbidity.1 Accurate and timely diagnosis, personalized treatment strategies, and tight monitoring are critical for mitigating disease progression and complications. To accomplish these objectives, gastroenterologists frequently rely on endoscopy, histology, and cross-sectional imaging, such as magnetic resonance enterography (MRE), computed tomography enterography (CTE), and intestinal ultrasound (IUS) to provide insights into anatomical and behavioral disease features to inform clinical decisions. However, interpretation of studies often relies on subjective human judgment, which can lead to delays, interobserver variability and/or potential diagnostic discrepancies. Moreover, the rising incidence of IBD globally,2 coupled with the exponential digitalization of these data, has intensified the demand for innovative approaches to streamline diagnosis and enhance clinical decision-making—a demand perfect for artificial intelligence (AI). Finally, novel imaging techniques in IBD such as IUS have gained interest during the last years, and AI-based operator supporting system can facilitate their use by less experienced operators.
In the last decade, AI technologies have disrupted the field of gastroenterology (GI) and have been extensively applied to medical imaging in several different medical disciplines that face similar challenges as those in IBD. Artificial intelligence algorithms, particularly those rooted in deep learning and pattern recognition, can process and analyze large volumes of data with remarkable accuracy and efficiency. These AI applications can perform a variety of tasks such as image classification, lesion segmentation, and detection and even uncover novel biomarkers with biologic and prognostic significance. In GI, perhaps the most disruptive AI-based technology is the FDA-approved computer-aided diagnosis system for detecting colon polyps during colonoscopy.3 In Barrett’s esophagus, deep learning models have been developed to predict dysplasia grade on histologic images.4 Oncology has arguably witnessed the great advancements in AI for medical imaging. Several AI-based imaging applications have been developed for the early detection of malignant lesions, predicting tumor biology to inform precision medicine strategies, informing prognosis, and predicting future development of certain cancers. These applications will potentially not only improve patient outcomes but also have significant downstream effects on disease prevention, reducing healthcare expenditures, and potentially drug discovery.
While our understanding about the role of AI for endoscopy, histology, and imaging in IBD is in its infancy, the available literature contains promising findings that support the potential of AI to improve patient care and also advance our understanding about the heterogenous nature of IBD. Additionally, providers will need a working knowledge about the strengths and limitations of AI technologies as it becomes increasingly integrated into routine clinical practice. As such, investing in investigations on the role of AI in endoscopy, histology, and imaging has significant importance for advancing the field of IBD. Thus, the purpose of this review is to provide a comprehensive summary of recent advances in the application of AI technologies in endoscopy, histology, and imaging for the diagnosis, phenotyping, and prognosis of IBD.
Artificial Intelligence in Medical Images
Artificial intelligence is an umbrella term that encompasses any machine, system, or software that can perform tasks that typically requires human intelligence. Artificial intelligence is composed of several subfields of which natural language processing (NLP) and machine learning have received the most attention. Natural language processing algorithms enable computers to understand, interpret, and generate human language. Machine learning involves developing algorithms and statistical models that enable computers to train from large amounts of data and make predictions or decisions. The central motivation of ML is to allow the learned function to be applied to new data. For medical images from endoscopy, histology, and cross-sectional imaging, deep learning (DL), a subset of machine learning, and radiomics are the most commonly used methods and will be the main focus of this review. The limitations unique to each AI subtype are also reviewed.
Deep Learning
Deep learning is inspired by the neural networks of the human visual cortex, and this artificial neural network (Figure 1) consists of interconnected nodes (neurons) that are organized into different layers, which include an input layer, hidden layer, and output layer. In between these layers are connections with associated weights that represent the strength of the connection between each neuron and are adjusted during training using various optimization algorithms to uncover patterns to generate a conclusion or complete the intended task.5 There are several different types of neural networks including fully connected neural network, recurrent neural network, deep generative networks, and convolutional neural network (CNN).

Among the different DL algorithms, CNN is the most commonly used for medical images. Convolutional neural network is an artificial neural network that uses images as input and can perform automated tasks such as image classification, object detection, segmentation, and image generation.5 In CNN, its unique feature and core building block is the convolutional layer. In this layer, a set of small filters (ie, kernels) are applied to the input image. Through a computational method called convolution, these filters highlight specific features of the imaging such edges and textures to name a few. Afterwards, an activation function is applied for the CNN algorithm to learn complex relationships and generate a feature map. Following the activation layer, a pooling layer is often added to retain the most important imaging features while discarding the rest. Then, in the final 2 layers (fully connected layers and output layer), the learned imaging features are used to make classifications or predictions from the input image.
In the literature and media, CNN has received a lot of attention. A team of investigators from Stanford University developed a CNN algorithm that can classify skin cancer as accurately as dermatologists.6 Afterwards, the investigators developed a mobile phone app with their algorithm to improve healthcare access. Other CNN algorithms for oncologic imaging have been developed to inform follow-up lung cancer screening recommendations, detect brain cancer metastases for radiation planning, inform volumetric reconstruction of renal tumors for surgical planning, and also monitor tumor response to therapy, just to name a few.7 In GI, the FDA has approved CNN-based computer-aided diagnostic system for detecting colon polyps on colonoscopy, which has proven to improve polyp detection.8 These studies highlight how CNN-based algorithms can transform care, increase accessibility, potentially reduce costs, and consequently address healthcare disparities. CNN; however, has some limitations that limit its widespread clinical adoption. Firstly, CNNs are heavily dependent on the quality and amount of data in the training data set, which risks overfitting and biased results. Secondly, CNNs are often considered “black-box” models, meaning the logical basis of the internal workings of the model that enable it to complete the intended task is often difficult to explain or unknown. In high-stakes situations, the ambiguous interpretability of CNN models can introduce distrust from patients and providers and discourage its use for clinical decision-making. Finally, there are important ethical and legal considerations regarding patient privacy, data security, sex/gender biases, and adequate human oversight when used as a clinical decision-aid tool.9 This is a particularly important limitation when applying AI to medical imaging in IBD because most data are generated from European ancestry populations, resulting in inherit biases in using AI approaches in IBD.
Radiomics
Radiomics is a subfield of AI that uses advanced computational methods to analyze cross-sectional imaging and quantify a wide range of imaging features that are not easily appreciated by the human eye.10 Radiomics can quantify visual differences in image intensity, shape, texture, and spatial relationships and has been used in MRI, CT, PET, and US (Figure 2). Unlike CNN, radiomics does not imply automation of the diagnostic process but rather uses AI to generate additional data points. Radiomics is often combined with other clinical or “-omic” data to develop comprehensive and more accurate prediction models. One example of an IBD radiomic approach is MRI textural analysis (MRTA).11 This approach is a postprocessing procedure that commonly uses the filtration-histogram technique that extracts features of difference sizes in the region of interest (ROI) to construct representative histogram distribution of gray-scale levels and/or pixel intensity on MRI to allow for quantification of different parameters that reflect the underlying tissue biology. These parameters include mean (average value of the pixels within the ROI), standard deviation (SD, degree of variation/dispersion from the average), skewness (symmetry of the histogram distribution), mean positive pixels (average of the pixel values that are bright), kurtosis (a measure of the peakedness of the histogram relative to a Gaussian distribution), and entropy (degree of irregularity of ROI). The use of MRTA in IBD will be discussed later in this review.

Example of radiomics workflow from image acquisition to decision support tool development.
While probably lesser known to the public than DL-based imaging applications, radiomics shows great promise in medicine. In oncology, radiomic studies have identified signatures that can predict outcomes, risk of distant metastasis, and tumor biology.12 Additionally, radiomic studies have developed novel methods to predict tumor gene expression on a genome-wide scale without the use of tumor biopsies, giving rise to the field of “radiogenomics.”13 For example, in hepatocellular carcinoma, Segal et al found combinations of 28 radiomic features of a hepatocellular carcinoma could account for 78% of the tumor’s transcriptome variation, and the involved genetic variations shared common physiologic functions such as cell proliferation or liver enzyme synthesis.14 Studies have also found radiomic features can predict the future development of cancer from prediagnostic imaging. Utilizing prediagnostic abdominal CTs, Touseef et al’s radiomic analyses could predict future development of pancreatic ductal adenocarcinoma with 86% accuracy.15 Such AI applications would be invaluable for facilitating early intervention for aggressive cancers like pancreatic cancer.
While promising, radiomics also possess limitations that need to be addressed before integrating it into clinical practice.16 Firstly, accurate delineation of organ(s) of interest is critical for computation of radiomic features. This is not only time-intensive but can potentially be affected by interobserver variability. A possible solution is using semiautomatic or automatic methods for image annotation. Secondly, differences in imaging equipment and protocols can affect radiomic feature quantification and generalization of findings across institutions. Harmonization and standardization of imaging acquisition and feature computation can mitigate this variability, but certain technical factors such as differences in imaging systems and image intensity scales can still affect radiomic feature computation despite this. Finally, because radiomic analysis requires large amounts of data to draw robust conclusions, few radiomic studies have been validated in independent data sets.
Methods
Searches of Medline were performed from 1946 to July 2, 2023, to identify any studies that described AI approaches with endoscopy, histology, and imaging in IBD. Inclusion criteria required that studies were published in a peer-reviewed journal and included unselected adult or pediatric subjects with a possible diagnosis of UC, CD, or IBD-unclassified with available endoscopic, histologic, or cross-sectional images evaluated by or incorporated into an AI algorithm (deep learning, convolutional neural network, automated segmentation, machine-learning algorithm, radiomics). Scientific conference abstracts were excluded. There were no restrictions on number of subjects, type of AI approach, or study aims (diagnostics vs phenotyping vs prognosis). Search terms are described in the Supplemental Content 1 online. All articles were screened for relevance to the study question, and potentially relevant articles were reviewed in more detailed; P.G. performed assessment of article eligibility.
The following terms were used to identify potentially eligible IBD articles: inflammatory bowel diseases, ulcerative colitis, and Crohn disease. These terms were combined with the Boolean operator “OR.” The following terms were used to identify potentially eligible endoscopy articles: endoscopy, colonoscopy, and endoscopic scoring. The following terms were used to identify potentially eligible histopathology articles: histology, histopathology, and immunohistochemistry. The following terms were used to identify potentially eligible imaging articles: diagnostic imaging, magnetic resonance imaging, X-ray computed tomography, ultrasound, and radiology. These terms were combined with the set operator “OR” The following terms were used to identify potentially eligible articles about artificial intelligence: artificial intelligence, machine learning, radiomic, convolutional neural network, automat.* These terms were combined with the Boolean operator “OR” The 3 searches were combined using the set operator “AND” and limited to humans.
Artificial Intelligence in Endoscopy, Histology, and Cross-sectional Imaging in IBD
Overview
The search strategy yielded 52 unique studies (Figure 3), the majority of which included cross-sectional imaging (n = 31) followed by endoscopy (n = 12) and histology (n = 7). Imaging studies primarily studied CD, while endoscopy study primarily evaluated UC. The studies will be reviewed based on the intended task of the AI algorithm: detection/diagnosis (n = 16), characterization/phenotyping of IBD lesions (n = 25), and prognosis (n = 10).

Diagnosis/Detection
During the initial diagnostic work up in individuals suspected to have IBD, the index endoscopy, histology, and imaging are arguably the most important compared with future exams in the disease course. The index data points help establish the diagnosis (CD vs UC), anatomical distribution, disease severity, which are all important to direct therapeutic and monitoring strategies. In this context, AI has the potential to streamline clinical workflow through automated algorithms and improve detection accuracy and intra- and interobserver variability. Table 1 summarizes the relevant studies.
Sixteen studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the detection and diagnosis of IBD.
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Mahapatra 201317 | CD N = 26 | MRE | Radiomics |
|
|
Hahnemann 201518 | IBD N = 50 | MRE | DL, CNN |
|
|
Mossotto 201719 | Pediatric IBD N = 287 | Endoscopic and histologic images | Supervised and unsupervised ML methods |
|
|
Naziroglu 201720 | CD N = 53 | MRE | DL, Active contouring model |
|
|
Gollifer 201921 | CD N = 105 | MRE | DL, CNN |
|
|
Klang 202022 | CD N = 49 | VCE images | DL, CNN |
|
|
Klang 202023 | CD N = 27 892 images | VCE images | DL, CNN |
|
|
Li 202124 | IBD N = 165 lesions (UC = 66, CD 99) | Multislice CT | Radiomic |
|
|
Zhu 202125 | CD (n = 93) and intestinal TB (n = 67) | CTE | Radiomic |
|
|
Arkko 202126 | CD and non CD (n = 369; 50% CD) | MRE | DL, CNN |
|
|
Klang 202127 | CD | VCE N = 19 245 images | DL, CNN |
|
|
Jiang 202228 | IBD N = 120 | CTE | DL, GIF (gradient image filter) algorithm |
|
|
Wang 202229 | IBD N = 496 (217 CD) | Endoscopic images | DL, CNN |
|
|
Brodersen 202330 | IBD N = 132 | VCE | DL |
|
|
Carter 202331 | IBD N = 308 | Intestinal US | DL, CNN |
|
|
Gong 202332 | CD and iTB N = 108 | CTE | Radiomics |
|
|
Zhou 202333 | IBD N = 316 | CTE | DL, CNN, Radiomics |
|
|
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Mahapatra 201317 | CD N = 26 | MRE | Radiomics |
|
|
Hahnemann 201518 | IBD N = 50 | MRE | DL, CNN |
|
|
Mossotto 201719 | Pediatric IBD N = 287 | Endoscopic and histologic images | Supervised and unsupervised ML methods |
|
|
Naziroglu 201720 | CD N = 53 | MRE | DL, Active contouring model |
|
|
Gollifer 201921 | CD N = 105 | MRE | DL, CNN |
|
|
Klang 202022 | CD N = 49 | VCE images | DL, CNN |
|
|
Klang 202023 | CD N = 27 892 images | VCE images | DL, CNN |
|
|
Li 202124 | IBD N = 165 lesions (UC = 66, CD 99) | Multislice CT | Radiomic |
|
|
Zhu 202125 | CD (n = 93) and intestinal TB (n = 67) | CTE | Radiomic |
|
|
Arkko 202126 | CD and non CD (n = 369; 50% CD) | MRE | DL, CNN |
|
|
Klang 202127 | CD | VCE N = 19 245 images | DL, CNN |
|
|
Jiang 202228 | IBD N = 120 | CTE | DL, GIF (gradient image filter) algorithm |
|
|
Wang 202229 | IBD N = 496 (217 CD) | Endoscopic images | DL, CNN |
|
|
Brodersen 202330 | IBD N = 132 | VCE | DL |
|
|
Carter 202331 | IBD N = 308 | Intestinal US | DL, CNN |
|
|
Gong 202332 | CD and iTB N = 108 | CTE | Radiomics |
|
|
Zhou 202333 | IBD N = 316 | CTE | DL, CNN, Radiomics |
|
|
MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to number of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).
Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.
Sixteen studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the detection and diagnosis of IBD.
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Mahapatra 201317 | CD N = 26 | MRE | Radiomics |
|
|
Hahnemann 201518 | IBD N = 50 | MRE | DL, CNN |
|
|
Mossotto 201719 | Pediatric IBD N = 287 | Endoscopic and histologic images | Supervised and unsupervised ML methods |
|
|
Naziroglu 201720 | CD N = 53 | MRE | DL, Active contouring model |
|
|
Gollifer 201921 | CD N = 105 | MRE | DL, CNN |
|
|
Klang 202022 | CD N = 49 | VCE images | DL, CNN |
|
|
Klang 202023 | CD N = 27 892 images | VCE images | DL, CNN |
|
|
Li 202124 | IBD N = 165 lesions (UC = 66, CD 99) | Multislice CT | Radiomic |
|
|
Zhu 202125 | CD (n = 93) and intestinal TB (n = 67) | CTE | Radiomic |
|
|
Arkko 202126 | CD and non CD (n = 369; 50% CD) | MRE | DL, CNN |
|
|
Klang 202127 | CD | VCE N = 19 245 images | DL, CNN |
|
|
Jiang 202228 | IBD N = 120 | CTE | DL, GIF (gradient image filter) algorithm |
|
|
Wang 202229 | IBD N = 496 (217 CD) | Endoscopic images | DL, CNN |
|
|
Brodersen 202330 | IBD N = 132 | VCE | DL |
|
|
Carter 202331 | IBD N = 308 | Intestinal US | DL, CNN |
|
|
Gong 202332 | CD and iTB N = 108 | CTE | Radiomics |
|
|
Zhou 202333 | IBD N = 316 | CTE | DL, CNN, Radiomics |
|
|
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Mahapatra 201317 | CD N = 26 | MRE | Radiomics |
|
|
Hahnemann 201518 | IBD N = 50 | MRE | DL, CNN |
|
|
Mossotto 201719 | Pediatric IBD N = 287 | Endoscopic and histologic images | Supervised and unsupervised ML methods |
|
|
Naziroglu 201720 | CD N = 53 | MRE | DL, Active contouring model |
|
|
Gollifer 201921 | CD N = 105 | MRE | DL, CNN |
|
|
Klang 202022 | CD N = 49 | VCE images | DL, CNN |
|
|
Klang 202023 | CD N = 27 892 images | VCE images | DL, CNN |
|
|
Li 202124 | IBD N = 165 lesions (UC = 66, CD 99) | Multislice CT | Radiomic |
|
|
Zhu 202125 | CD (n = 93) and intestinal TB (n = 67) | CTE | Radiomic |
|
|
Arkko 202126 | CD and non CD (n = 369; 50% CD) | MRE | DL, CNN |
|
|
Klang 202127 | CD | VCE N = 19 245 images | DL, CNN |
|
|
Jiang 202228 | IBD N = 120 | CTE | DL, GIF (gradient image filter) algorithm |
|
|
Wang 202229 | IBD N = 496 (217 CD) | Endoscopic images | DL, CNN |
|
|
Brodersen 202330 | IBD N = 132 | VCE | DL |
|
|
Carter 202331 | IBD N = 308 | Intestinal US | DL, CNN |
|
|
Gong 202332 | CD and iTB N = 108 | CTE | Radiomics |
|
|
Zhou 202333 | IBD N = 316 | CTE | DL, CNN, Radiomics |
|
|
MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to number of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).
Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.
In regards to endoscopy in the diagnosis of IBD, 2 of the biggest challenges are differentiating between UC and CD, especially if there is no ileal involvement, and the significant time requirement for reviewing video capsule endoscopy (VCEs) to evaluate for small bowel CD. Additionally, VCE interpretation suffers from substantial heterogeneity and suboptimal agreement in both inter- and intra-observer evaluation.34 To address this, Klang et al developed DL algorithms systems for detecting small bowel ulcers and strictures in CD that yielded area under the receiver operator curve (AUC) 0.99 and 0.99, respectively.22,23 However, in a later study, the same investigators found the CNN-model for detecting CD ulcers could not differentiate CD from NSAID ulcers.27 Nonetheless, AI-aided VCE interpretation has been shown to substantially reduce review with one study observing a median review time of 3.2 minutes per patient.30 While not ready for primetime, AI-assisted VCE evaluation is an exciting and much needed clinical tool in IBD. To date, only 1 study has developed a AI model for differentiating between CD and UC.29 The model developed by Wang et al yielded higher differential diagnosis accuracy than human observers for CD (92.4% vs 91.7%), UC (93.4% vs 92.4%), and normal (98.4% vs 97.3%).
Regarding the role of AI in the histologic diagnosis of IBD, there was only 1 study. Mossotto et al explored several different machine-learning models using endoscopic and histologic data to classify disease type (UC vs CD) in pediatric patients with IBD.19 An ML model using endoscopic and histologic data had superior performance compared with a ML model using endoscopic or histologic data alone (AUC 0.83 vs 0.71 and 0.77, respectively). This study also highlights the importance of using endoscopic and histologic data together in the diagnosis of IBD types.
In terms of cross-sectional imaging, 7 studies developed CNN-based algorithms to automate the detection and diagnosis of IBD through enhanced imaging interpretation as well as development of a novel biomarker using quantifying small bowel motility. Using MRE, Naziroglu et al developed an active contouring algorithm to perform volumetric segmentation of the inner and outer layers of the bowel wall to semiautomatically measure bowel wall thickness (BWT).20 The algorithm-generated measurements yield better interobserver agreement than human-generated measurements of BWT (intraclass correlation coefficient [ICC] 0.88 vs 0.45, P = .005). This study highlights the strength of AI to detect inflammatory lesions in IBD more consistently than human observers. Intestinal ultrasound is increasingly being used to detect and monitor IBD globally, and Carter et al developed an AI algorithm to automatically detect IBD on IUS.31 Increased bowel wall thickness (BWT) on IUS is one of the main features that reflect active inflammation and, while BWT has high ICC among expert performers, novice performers often struggle with accurately and consistently detecting inflammatory lesions. To address this challenge, the investigators developed a CNN algorithm to automatically detect bowel wall thickening using over 1000 labeled images. The final CNN algorithm accurately detected thickened bowel wall with an AUC 0.98 with 90.1% accuracy, 86.4% sensitivity, and 94.0% specificity. This study highlights how AI can be used to train inexperienced operators and improve the standardizing imaging interpretation. Moreover, this is the only available study on AI applications with IUS in IBD. Because it is a radiation-sparing, point-of-care exam, IUS is a promising medium to explore other AI approaches for IBD.
In addition to detecting inflamed segments of bowel, DL algorithms can be used to improve imaging processing and allow for safer imaging protocols with less radiation exposure. Using a gradient image filter algorithm on low-dose CTE, Jiang et al found the diagnostic sensitivity (91.5%), specificity (92.3%), accuracy (91.7%), positive predictive value (97.7%), and negative predictive value (75.0%) of the gradient image filter (GIF) algorithm group were higher than the traditional CTE protocol control group for differentiating CD from UC (69.1%, 44.4%, 61.7%, 74.4%, 38.1%; P < .05).28
Another opportunity for AI to detect IBD is quantifying intestinal motility, which is often impaired secondary to active inflammation and/or fibrosis. Leveraging cine images captured during MRE, 3 studies developed deep learning algorithms to quantify intestinal motility for detecting IBD. Hanhemann et al found using automatically generated intestinal motility maps with static MRIs had a higher detection rate of inflammatory lesions (66 lesions in 38 subjects) compared with static MRI alone (51 lesions in 34 subjects, P = .0002).18 In a larger study with 302 subjects, investigators used CNN to develop an automated intestinal motility quantification algorithm that was able to differentiate between CD and non-CD with AUC 0.78.26 Deep learning algorithms to detect and quantify changes in motility also offer greater granularity than the human eye can detect. Gollifer et al demonstrated software automated quantification of intestinal motility parameters, particularly temporal motility variation (β = −0.23, P = .005) and area of motile bowel (β = 0.16, P = .01), were significantly associated with symptom severity defined by the Harvey-Bradshaw Index (HBI).21 Conversely, subjective quantification by humans of the same intestinal motility parameters used in the algorithm was not associated with HBI. These studies support intestinal motility as a promising novel objective biomarker to detect IBD and highlight the ability of AI to aid in the discovery of a new biomarker. However, the clinical value of intestinal motility as a biomarker in IBD needs further evaluation with future studies correlating intestinal motility metrics with endoscopic disease severity as well as understanding its role for disease monitoring.
Regarding radiomics, 6 studies developed unique multivariate models and nomograms to better detect and diagnose IBD. In the one of the earliest IBD radiomics studies, a novel method to quantify shape asymmetry was able to detect CD-affected segments of bowel with high sensitivity (90.4%) and specificity (90.1%).17 Radiomics may also help differentiate CD from intestinal TB (iTB), which is a frequent challenge in endemic countries. Two studies developed and validated multimodal radiomic nomograms incorporating clinical and/or endoscopic data to differentiate CD from iTB. By extracting radiomic features from intestinal lesions, Zhu et al developed a radiomics model with AUC 0.78 for differentiating CD from iTB.25 However, when combined with a clinical model that included demographic, biochemical, and predefined radiographic features (ie, Comb’s sign), the prediction model improved to an AUC 0.90. The final nomogram contained 9 radiomic and 2 clinical features and yielded good performance (AUC 0.96). Likewise, Gong et al developed a multimodal clinical radiomic model using radiomic features extracted from the diseased segment of bowel as well as the largest lymph node and mesentery surrounding the affected segment of bowel. In addition to clinical variables, the investigators also incorporated endoscopic data into the final nomogram and found the clinical radiomic nomogram had greater accuracy for differentiating CD from iTB than interpretation by human radiologists (89.5% vs 75.22%).32 Finally, radiomics may also help differentiate CD from UC. A multimodal model that incorporated radiomic features of the inflamed bowel wall, clinical features (age and gender) and radiology features (bowel wall thickness, arterial-phase enhancement, increased attenuation of mesenteric fat, vasa recta engorgement, lymphadenopathy, and lesions location) differentiated CD from UC with an AUC 0.88.24 Leveraging the differences in inflammatory alterations of the mesenteric fat in CD vs UC, another model combining radiomic features of visceral adipose tissue (VAT) with clinical factors helped differentiate CD from UC with good diagnostic performance (AUC 0.78).33 Because inflammatory alterations of mesenteric fat are difficult to study noninvasively, imaging studies often use VAT, which has a surrogate marker because mesenteric fat is the largest compartment of VAT.35 While the performance of these radiomic-based models varies from moderate to good, the available studies support radiomic features can provide valuable information not easily appreciate by the human eye. However, the studies consistently demonstrated radiomics alone are not enough to develop prediction models and need to be incorporated with clinical variables to improve model performance.
Disease Characterization/Phenotyping
In the treat-to-target era of IBD, endoscopy, histology, and imaging are critical for the tight monitoring of IBD to prevent disease progression and complications, and studies have demonstrated AI-based applications have tremendous potential in this arena (Table 2). During endoscopy, endoscopic disease scores such as the Mayo endoscopic score (MES), UC Endoscopic Index of Severity, and Simple Endoscopic Score for CD are important for monitoring improvement/progression of disease as well as standardizing communication with other providers. However, studies have found significant intra- and interobserver variability with endoscopy scores.61 Also, endoscopy scores may not fully capture disease severity. For example, MES only accounts for the colonic segment with the most severe disease in UC, but it does not account for variability in disease severity in other colon segments. This limitation has important implications when assessing therapeutic response. To address this, several studies have now developed deep learning algorithms for automating endoscopy scores, which can improve standardization of scoring, but these studies are primarily for MES in UC.41,42,45,47,48 One of the most innovative studies was by Stidham et al, where investigators developed a new Cumulative Disease Score (CDS) for UC using computer vision analyses on endoscopic videos from the UNIFI and JAK-UC clinical trials.60 The CDS correlated strongly with MES (P < .0001) and was more sensitive for detecting endoscopic changes compared with MES (Hedge’s g = 0.743 vs 0.460). Automated scoring systems such as CDS will not only improve work flow efficiency during endoscopy but also better evaluate therapeutic response in UC.
Twenty-five studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the characterization of IBD and phenotyping CD strictures.
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Bhatnagar 201636 | CD N = 7 | MRE | Radiomics |
|
|
Makanyanga 201737 | CD N = 16 | MRE | Radiomics |
|
|
Lamash 201838 | Ped CD N = 23 pediatric | MRE | CNN |
|
|
Puylaert 201839 | CD N = 106 | MRE | DL, Active contouring model |
|
|
Maeda 201940 | UC N = 187 | Endocytoscopy images | DL, CNN |
|
|
Ozawa 201941 | UC N = 955 | Endoscopic images | DL, CNN |
|
|
Stidham 201942 | UC N = 2778 | Endoscopic images and videos | DL, CNN |
|
|
Tabari 201943 | CD N = 25 | MRE | Radiomics |
|
|
Stidham 202044 | CD N = 138 | CTE | DL, Active contouring model |
|
|
Takenaka 202045 | UC N = 2012 | Endoscopic images | DL, CNN |
|
|
Barash 202146 | CD N = 49 | VCE images | DL, CNN |
|
|
Gottlieb 202147 | UC N = 249 | Endoscopic videos | DL, CNN |
|
|
Yao 202148 | UC N = 315 videos | Endoscopic Videos | DL, CNN |
|
|
Li 202149 | CD n = 167 | CTE | Radiomic |
|
|
Ding 202250 | CD N = 121 | MRE | Radiomics |
|
|
Guez 202251 | Pediatric CD N = 121 | MRE | Machine learning |
|
|
Li 202252 | CD N = 100 | CT | Radiomic |
|
|
Meng 202253 | CD N = 235 | CTE | DL, CNN, radiomics |
|
|
Noguchi 202254 | UC N = 12 | Histologic images | DL, CNN |
|
|
Yuan 202255 | CD N = 48 | CTE | DL, automated body composition segmentation |
|
|
Najdawi 202356 | UC N = 637 | Histologic images | DL, CNN |
|
|
Ruiqing 202357 | CD N = 167 | CTE | Radiomics |
|
|
Rymarczyk 202358 | IBD N = 1189 (302 CD) | Histologic images | DL, CNN |
|
|
Xie 202359 | CD N = 628 | Endoscopic images | DL, CNN |
|
|
Stidham 202460 | UC N = 1096 | Endoscopic videos | DL |
|
|
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Bhatnagar 201636 | CD N = 7 | MRE | Radiomics |
|
|
Makanyanga 201737 | CD N = 16 | MRE | Radiomics |
|
|
Lamash 201838 | Ped CD N = 23 pediatric | MRE | CNN |
|
|
Puylaert 201839 | CD N = 106 | MRE | DL, Active contouring model |
|
|
Maeda 201940 | UC N = 187 | Endocytoscopy images | DL, CNN |
|
|
Ozawa 201941 | UC N = 955 | Endoscopic images | DL, CNN |
|
|
Stidham 201942 | UC N = 2778 | Endoscopic images and videos | DL, CNN |
|
|
Tabari 201943 | CD N = 25 | MRE | Radiomics |
|
|
Stidham 202044 | CD N = 138 | CTE | DL, Active contouring model |
|
|
Takenaka 202045 | UC N = 2012 | Endoscopic images | DL, CNN |
|
|
Barash 202146 | CD N = 49 | VCE images | DL, CNN |
|
|
Gottlieb 202147 | UC N = 249 | Endoscopic videos | DL, CNN |
|
|
Yao 202148 | UC N = 315 videos | Endoscopic Videos | DL, CNN |
|
|
Li 202149 | CD n = 167 | CTE | Radiomic |
|
|
Ding 202250 | CD N = 121 | MRE | Radiomics |
|
|
Guez 202251 | Pediatric CD N = 121 | MRE | Machine learning |
|
|
Li 202252 | CD N = 100 | CT | Radiomic |
|
|
Meng 202253 | CD N = 235 | CTE | DL, CNN, radiomics |
|
|
Noguchi 202254 | UC N = 12 | Histologic images | DL, CNN |
|
|
Yuan 202255 | CD N = 48 | CTE | DL, automated body composition segmentation |
|
|
Najdawi 202356 | UC N = 637 | Histologic images | DL, CNN |
|
|
Ruiqing 202357 | CD N = 167 | CTE | Radiomics |
|
|
Rymarczyk 202358 | IBD N = 1189 (302 CD) | Histologic images | DL, CNN |
|
|
Xie 202359 | CD N = 628 | Endoscopic images | DL, CNN |
|
|
Stidham 202460 | UC N = 1096 | Endoscopic videos | DL |
|
|
MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).
Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; AUC, area under the curve; VEGF, vascular endothelial growth fact; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; MES, Mayo Endoscopic Score.
Twenty-five studies evaluating the role of AI in endoscopy, histology, and cross-sectional imaging for the characterization of IBD and phenotyping CD strictures.
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Bhatnagar 201636 | CD N = 7 | MRE | Radiomics |
|
|
Makanyanga 201737 | CD N = 16 | MRE | Radiomics |
|
|
Lamash 201838 | Ped CD N = 23 pediatric | MRE | CNN |
|
|
Puylaert 201839 | CD N = 106 | MRE | DL, Active contouring model |
|
|
Maeda 201940 | UC N = 187 | Endocytoscopy images | DL, CNN |
|
|
Ozawa 201941 | UC N = 955 | Endoscopic images | DL, CNN |
|
|
Stidham 201942 | UC N = 2778 | Endoscopic images and videos | DL, CNN |
|
|
Tabari 201943 | CD N = 25 | MRE | Radiomics |
|
|
Stidham 202044 | CD N = 138 | CTE | DL, Active contouring model |
|
|
Takenaka 202045 | UC N = 2012 | Endoscopic images | DL, CNN |
|
|
Barash 202146 | CD N = 49 | VCE images | DL, CNN |
|
|
Gottlieb 202147 | UC N = 249 | Endoscopic videos | DL, CNN |
|
|
Yao 202148 | UC N = 315 videos | Endoscopic Videos | DL, CNN |
|
|
Li 202149 | CD n = 167 | CTE | Radiomic |
|
|
Ding 202250 | CD N = 121 | MRE | Radiomics |
|
|
Guez 202251 | Pediatric CD N = 121 | MRE | Machine learning |
|
|
Li 202252 | CD N = 100 | CT | Radiomic |
|
|
Meng 202253 | CD N = 235 | CTE | DL, CNN, radiomics |
|
|
Noguchi 202254 | UC N = 12 | Histologic images | DL, CNN |
|
|
Yuan 202255 | CD N = 48 | CTE | DL, automated body composition segmentation |
|
|
Najdawi 202356 | UC N = 637 | Histologic images | DL, CNN |
|
|
Ruiqing 202357 | CD N = 167 | CTE | Radiomics |
|
|
Rymarczyk 202358 | IBD N = 1189 (302 CD) | Histologic images | DL, CNN |
|
|
Xie 202359 | CD N = 628 | Endoscopic images | DL, CNN |
|
|
Stidham 202460 | UC N = 1096 | Endoscopic videos | DL |
|
|
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Bhatnagar 201636 | CD N = 7 | MRE | Radiomics |
|
|
Makanyanga 201737 | CD N = 16 | MRE | Radiomics |
|
|
Lamash 201838 | Ped CD N = 23 pediatric | MRE | CNN |
|
|
Puylaert 201839 | CD N = 106 | MRE | DL, Active contouring model |
|
|
Maeda 201940 | UC N = 187 | Endocytoscopy images | DL, CNN |
|
|
Ozawa 201941 | UC N = 955 | Endoscopic images | DL, CNN |
|
|
Stidham 201942 | UC N = 2778 | Endoscopic images and videos | DL, CNN |
|
|
Tabari 201943 | CD N = 25 | MRE | Radiomics |
|
|
Stidham 202044 | CD N = 138 | CTE | DL, Active contouring model |
|
|
Takenaka 202045 | UC N = 2012 | Endoscopic images | DL, CNN |
|
|
Barash 202146 | CD N = 49 | VCE images | DL, CNN |
|
|
Gottlieb 202147 | UC N = 249 | Endoscopic videos | DL, CNN |
|
|
Yao 202148 | UC N = 315 videos | Endoscopic Videos | DL, CNN |
|
|
Li 202149 | CD n = 167 | CTE | Radiomic |
|
|
Ding 202250 | CD N = 121 | MRE | Radiomics |
|
|
Guez 202251 | Pediatric CD N = 121 | MRE | Machine learning |
|
|
Li 202252 | CD N = 100 | CT | Radiomic |
|
|
Meng 202253 | CD N = 235 | CTE | DL, CNN, radiomics |
|
|
Noguchi 202254 | UC N = 12 | Histologic images | DL, CNN |
|
|
Yuan 202255 | CD N = 48 | CTE | DL, automated body composition segmentation |
|
|
Najdawi 202356 | UC N = 637 | Histologic images | DL, CNN |
|
|
Ruiqing 202357 | CD N = 167 | CTE | Radiomics |
|
|
Rymarczyk 202358 | IBD N = 1189 (302 CD) | Histologic images | DL, CNN |
|
|
Xie 202359 | CD N = 628 | Endoscopic images | DL, CNN |
|
|
Stidham 202460 | UC N = 1096 | Endoscopic videos | DL |
|
|
MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).
Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; AUC, area under the curve; VEGF, vascular endothelial growth fact; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; MES, Mayo Endoscopic Score.
Similarly, evaluating disease severity for small bowel CD has been an area of unmet need. Several VCE scores have been developed but are not routinely used in practice, potentially due to the added time requirement on top of the time needed to read and interpret VCE at baseline. Barash et al developed a DL algorithm grading small bowel ulcer severity (grade 1-3 mild to severe).46 The algorithm had a classification accuracy of 0.91 for grade 1 vs 3, 0.78 for grade 2 vs 3, and 0.62 for grade 1 vs 2. In the same vein, Xie et al developed a CNN model for double balloon endoscopic images that could detect small bowel ulcers with 96% accuracy and grade small bowel ulcerated surface, ulcer size, and ulcer depth with 87%, 88%, and 85% accuracy, respectively.59 If validated, an AI-based VCE system for monitoring disease activity in the small bowel will be invaluable for improving efficiency and better monitoring of CD, especially for inexperience operators/observers.
During endoscopy, biopsies are often taken to evaluate for histologic disease activity. Several histologic disease severity scores have been developed to standardize histologic disease activity such as the PICaSSO Histologic Remission Index62 and Nancy histological index,63 but their clinical utility is limited by their time intensive nature. To address this clinical challenge, Najdawi et al56 developed a CNN model for UC that predicted Nancy histologic index score with high agreement with human reviewers (κ = 0.91) and could predict histology remission with an accuracy of 97%. Unlike the previous studies performed on colonic biopsies, another study developed an automated DL model that could detect histologic disease activity in the colon and ileum with 87% to 94% and 76% 83% accuracy, respectively.58 Using endocytoscopy, Maeda et al developed a computer-aided diagnostic system for predicting histologic inflammation with 91% accuracy.40 Finally, histologic evaluation is critical for diagnosing UC-associated dysplasia/cancer. One of the key studies in the work up with p53 immunohistochemistry along with hematoxylin and eosin staining. However, evaluation of p53 immunohistochemistry is expensive and time intensive, so Noguchi et al developed a CNN-model that predicted p53 immunohistochemistry staining with 86% to 91% accuracy.54 While the role of AI for histology in IBD is relatively understudied compared with endoscopy and imaging, it has tremendous implications not only for improving diagnostic accuracy but also elevating workflow efficiency and cost effectiveness.
Imaging offers noninvasive options to characterize and monitor IBD to inform therapeutic strategies and assess treatment response. In CD, several imaging scores have been developed to assess disease activity such as the MaRIA,64 London,65 Nancy,66 and Clermont score.67 Additionally, the Lemann Index was developed to quantify total gut damage in CD and incorporates clinical, surgical, endoscopic, and imaging findings from all segments of the GI tract into one composite score.68 However, these scores are often time-consuming, have variable sensitivity and specificity for detecting intestinal segments with active CD (with the MaRIA score being the best; 81% sensitivity, 89% specificity), have variable correlation with endoscopic disease activity, and have fair to good interobserver variability depending on the imaging feature of interest, which limits their clinical utility.69,70 These limitations present a very significant opportunity for AI-based imaging interpretation to improve patient care. Of the available studies, AI has been used to characterize disease activity and phenotype inflammatory vs fibrotic strictures in CD.
Presently, 8 studies have explored the use of AI for characterizing disease activity in CD. Studies using DL approaches are limited. One of the biggest challenges for automating the quantification of disease activity in CD is accurately separating the bowel wall from the lumen to make measurements unique to each compartment. In a pilot study in pediatric patients with CD, Lamash et al developed a semiautomated supervised 3D CNN algorithm that only required placement of seed points by the operator to segment the bowel wall and lumen.38 From this, the algorithm could measure lumen radius and bowel wall thickness. This study could not evaluate the algorithm performance due to lack of training date, and there was no endoscopic disease activity score to validate. However, it provides an excellent working foundation to develop future DL algorithms. Subsequent studies have developed DL algorithms using endoscopic scores such as the Simple Endoscopic Score for CD (SES-CD) and CD Endoscopic Index of Severity (CDEIS) as ground truth and compared the algorithms to established imaging disease activity scores. Puylaert et al developed the VIGOR score, which included both semiautomatic quantitative measurements (bowel thickness and contrast enhancement parameters) and qualitative measurements (degree of T2 mural signal enhancement determined by a radiologist).39 The novel VIGOR score demonstrated a moderate correlation with CDEIS (r = 0.58, P < .001), which was similar the MaRIA (r = 0.40, P = .001), London score (r = 0.38, P = .001), and CDMI r = 0.34, P = .003). The VIGOR score also had similar diagnostic accuracy (80%) as the other scores. However, the VIGOR score had superior inter-rater reliability compared with the other imaging scores (ICC 0.81 vs 0.44-0.59), which emphasizes the strength of AI for imaging interpretation in IBD. Another study developed a multimodal machine-learning fusion model that included disease length on MRE, CRP, and fecal calprotectin to predict a SES-CD ≥3.51 The machine-learning model (AUC 0.84) performed better than the MaRIA score (AUC 0.80, P < 1e-9) and biochemical markers alone (AUC 0.67, P < 1e-5). Artificial intelligence–based imaging algorithms such as this would not only have important implications for patient care but also clinical trial recruitment, which often requires endoscopic disease activity score cut-off for inclusions.
While the number of imaging studies using deep-learning approaches to assess disease activity is limited, several studies have evaluated radiomic approaches for characterizing disease activity with promising results. Studies have identified several unique associations between MRTA parameters and disease activity. On a macroscopic level, entropy has been correlated with MRI Crohn’s disease activity score65 (Rc 1.00, P = .01), while kurtosis has been negatively correlated (Rc −0.45, P = .002).37 On a histologic level, skewness has been associated with histologic disease activity (rc 4.27, P = .02), and lower mean pixel intensity and mean positive pixels have been associated with segments of bowel with increased neoangiogenesis, a hallmark of active inflammation, defined by presence of vascular endothelial growth factor (VEGF) expression.36 These studies demonstrate how radiomics can provide insight into the underlying biology of CD. In terms of quantifying disease activity, Ding et al developed an MRI radiomic-based model that could detect ileal disease with CDEIS >7 with similar performance as the MaRIA (AUC 0.87 vs 0.88, P = .85) but with superior reproducibility (radiomics ICC 0.93-0.96 vs MaRIA ICC 0.58).50 Using CTs, 2 groups developed radiomic-based algorithms that could differentiate intestinal segments with active vs inactive CD.52,57 Ruiqing et al developed a particularly interesting radiomics model that incorporated luminal and mesenteric radiomic features that could distinguish multicategorical SES-CD scores (ie, 0, 1, 2-5, 6-10, >10) with an AUC 0.83 and differentiate intestinal segments with moderate/severe disease (SES-CD >5) with AUC 0.85.57 Including mesenteric radiomics features is unique because it allows for objective quantification of inflammatory alterations in the mesenteric fat, which not only provides an additional data point for quantifying disease activity but also facilitates future imaging-based investigations into the mesenteric fat. Noninvasive characterization of inflammatory mesenteric fat will become increasingly important, as studies have demonstrated mesenteric fat is intimately involved in the pathogenesis and progression of CD.35
In CD, one of the biggest and most persistent clinical challenges is differentiating between inflammatory-predominant vs fibrotic predominant strictures to decide between medical vs surgical intervention. Also, multiple antifibrotic targets are under investigation, so the need to accurately phenotype CD stricture characteristics as potential trial end points is becoming increasingly important.71,72 Multiple imaging parameters in US, CTE, and MRE have been proposed to phenotype CD strictures, but their time intensity and interobserver variation are potential limitations. For this unmet need in CD, AI-powered imaging interpretation is a promising tool to help phenotype CD stricture consistently and efficiently. Using DL approaches, several studies have developed semiautomated and automated algorithms to measures minimal lumen diameter, maximal prestenotic dilation, bowel wall thickness, and/or body composition (VAT and SAT volumetrics) to develop multivariable models to detect CD strictures and predict degree of fibrosis using histologic scores as the ground truth.43,44,53,55 The performances of these models were generally good (AUC 0.80-0.86) and superior to radiologists’ interpretation (AUC 0.58-0.64). Additionally, compared with a radiomic approach, 1 study reported their DL algorithm had shorter processing time (48.4 vs 599.8 seconds, P < .0001).53 Using radiomics, studies have achieved comparable success as DL approaches. In the 2 largest radiomic-based studies to date, CTE-based radiomics models classified CD strictures with moderate to severe fibrosis better than radiologists (AUC 0.83-0.89 vs 0.55-0.64), and decision curve analyses supported net benefit with a radiomics prediction model.49,53 Considering data demonstrating the subpar human ability to classify fibrotic-predominant strictures, the ability to use AI to differentiate between inflammatory-predominant vs fibrosis-predominant strictures in CD has important and exciting implications for precision medicine and developing antifibrotic therapies. However, prospective clinical trials using AI-powered phenotyping of CD strictures to inform management decisions are needed to fully understand its clinical utility and safety.
Prognosis
The old saying “an ounce of prevention is worth a pound of cure” is a core concept for treating IBD to reduce the risk of complications and surgery, and a tremendous amount of research has been directed toward discovering prognostic biomarkers to improve our ability to position interventions earlier. However, many prognostic biomarkers are variably supported by the literature and have limited predictive value, and AI-powered clinical tools may help advance this area of need in IBD. Several studies have evaluated the role for AI in for prognosis in IBD with studies focusing on histology and imaging yielding the most exciting results (Table 3). In our search, we did not identify any studies that developed an AI-based model to determine prognosis based on endoscopy.
Ten studies evaluating the role of AI in endoscopy, histology and cross-sectional imaging for prognosis in IBD.
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Klein 201773 | CD N = 105 | Histologic images | DL |
|
|
Chen 202174 | CD N = 186 | CTE | Radiomic |
|
|
Feng 202175 | CD N = 322 | MRE | Radiomic |
|
|
Ohara 202276 | UC N = 114 | Histologic images | DL |
|
|
Chirra 202377 | CD N = 80 | MRE | Radiomics |
|
|
Iacucci 202344 | UC N = 273 | Histologic images | DL, CNN |
|
|
Li 202353 | CD N = 256 | CTE | Radiomics |
|
|
Ruiqing 202339 | CD N = 167 | CTE | Radiomics |
|
|
Shen 202373 | CD N = 186 | CTE | Radiomics |
|
|
Yao 202378 | CD N = 268 | CTE | Radiomics |
|
|
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Klein 201773 | CD N = 105 | Histologic images | DL |
|
|
Chen 202174 | CD N = 186 | CTE | Radiomic |
|
|
Feng 202175 | CD N = 322 | MRE | Radiomic |
|
|
Ohara 202276 | UC N = 114 | Histologic images | DL |
|
|
Chirra 202377 | CD N = 80 | MRE | Radiomics |
|
|
Iacucci 202344 | UC N = 273 | Histologic images | DL, CNN |
|
|
Li 202353 | CD N = 256 | CTE | Radiomics |
|
|
Ruiqing 202339 | CD N = 167 | CTE | Radiomics |
|
|
Shen 202373 | CD N = 186 | CTE | Radiomics |
|
|
Yao 202378 | CD N = 268 | CTE | Radiomics |
|
|
MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).
Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.
Ten studies evaluating the role of AI in endoscopy, histology and cross-sectional imaging for prognosis in IBD.
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Klein 201773 | CD N = 105 | Histologic images | DL |
|
|
Chen 202174 | CD N = 186 | CTE | Radiomic |
|
|
Feng 202175 | CD N = 322 | MRE | Radiomic |
|
|
Ohara 202276 | UC N = 114 | Histologic images | DL |
|
|
Chirra 202377 | CD N = 80 | MRE | Radiomics |
|
|
Iacucci 202344 | UC N = 273 | Histologic images | DL, CNN |
|
|
Li 202353 | CD N = 256 | CTE | Radiomics |
|
|
Ruiqing 202339 | CD N = 167 | CTE | Radiomics |
|
|
Shen 202373 | CD N = 186 | CTE | Radiomics |
|
|
Yao 202378 | CD N = 268 | CTE | Radiomics |
|
|
Author Year . | Dataset . | Data source . | Algorithm type . | Task . | Performance . |
---|---|---|---|---|---|
Klein 201773 | CD N = 105 | Histologic images | DL |
|
|
Chen 202174 | CD N = 186 | CTE | Radiomic |
|
|
Feng 202175 | CD N = 322 | MRE | Radiomic |
|
|
Ohara 202276 | UC N = 114 | Histologic images | DL |
|
|
Chirra 202377 | CD N = 80 | MRE | Radiomics |
|
|
Iacucci 202344 | UC N = 273 | Histologic images | DL, CNN |
|
|
Li 202353 | CD N = 256 | CTE | Radiomics |
|
|
Ruiqing 202339 | CD N = 167 | CTE | Radiomics |
|
|
Shen 202373 | CD N = 186 | CTE | Radiomics |
|
|
Yao 202378 | CD N = 268 | CTE | Radiomics |
|
|
MRTA features mean (avg value of pixels within ROI), standard deviation, mean of positive pixels, entropy, kurtosis (inversely related to # of objects highlighted and increased by intensity variations in highlighted objects, skewness (reflects brightness of highlighted object).
Abbreviations: IBD, inflammatory bowel disease, CD, Crohn’s disease; UC, ulcerative colitis; iTB, intestinal TB; Rc- regression coefficient; AUC, area under the curve; ROI, region of interest; MaRIA, magnetic resonance index of activity; CDEIS, Crohn’s disease index of severity; SES-CD, Simple Endoscopic Score for Crohn’s disease; HR, hazard ratio.
While histologic remission can be difficult to achieve in the real world, several studies have found histologic remission may be associated with lower risk of future flares despite being in endoscopic remission.78 Using the PICaSSO Histologic Remission Index (PHRI) for UC, Iacucci et al developed a CNN-based system that could distinguish histologic remission vs activity with 89% sensitivity and 85% specificity.79 They also found an AI-assessed PHRI was associated with a UC flare within 1 year, with a hazard ratio (HR) of 4.64 compared with 3.56 with a human-assessed PHRI. Similarly, using computational pathology methods, Klein et al developed a system to analyze baseline histology images from patients with Crohn’s colitis that could predict future development of fibrostenosing and internal penetrating disease behavior within 5 years with AUC 0.74 and 0.78, respectively.73 Likewise, Ohara et al developed a DL-based system to automate quantification of goblet cell mucin to predict risk of relapse within 12 months in UC subjects in endoscopic remission.76 The investigators found the relapse group had lower goblet cell mucus area calculated by the DL system compared with the nonrelapse group. These studies highlight how AI can enhance our prognostic abilities using data previously not easily obtainable using traditional methods, primarily due to time restraints.
For imaging, radiomic-based models maybe the best approach for predicting outcomes in IBD. We identified no studies that used DL algorithms to develop prognostic models in IBD. In one study, a VAT-based radiomic signature independently predicted risk of CD progression (HR, 9.29, P = .005) with good performance in 2 independent test cohorts (AUC 0.82-0.87).80 Conventional VAT metrics such as BMI, VAT volume, or VAT:SAT volume were not associated with risk of CD progression (P = 0.089-0.996), highlighting the limitations of human-derived prognostic biomarkers. Studies have also developed multimodal radiomic-based nomograms to predict secondary loss of response to infliximab using pretreatment imaging with good performance (AUC 0.72-0.88).74,75 Interestingly, one of these studies developed a multivariable nomogram using an MRI-based radiomic index that detects changes in hepatic iron metabolism (R*) to predict secondary loss of response to infliximab with acceptable performance (AUC 0.72).75 This study is another example of how radiomics can help uncover additional information about the underlying biology of CD. Similarly, 3 studies have developed radiomic-based nomograms using features from the bowel and/or peri-intestinal mesenteric adipose tissue to predict 1-year risk of surgery in CD.57,77,81 Like other multimodal nomograms mentioned previously, incorporating clinical factors improved the performance for predicting surgery with acceptable to good performance (AUC 0.70-0.90). Finally, predicting postoperative recurrence in CD has remained an unmet challenge despite significant efforts, and many prognostic markers are variably supported by the literature. Using imaging obtained preoperatively, Shen et al identified intestinal only (HR, 2.17, P = .002) and peri-intestinal mesenteric fat only radiomic signatures (HR, 2.19, P = .0018) that were associated with postoperative recurrence.82 Unfortunately, the multimodal nomogram incorporating these signatures with clinical factors had poor performance (AUC 0.69). The performance may have been limited by defining postoperative recurrence as composite end point including endoscopic, radiographic, or surgical recurrence, which can be confounded by patient adherence to postoperative disease monitoring. Overall, the available literature provides promising data supporting the use of AI to better predict outcomes in IBD. As studies in oncology and IBD have shown radiomics can reflect disease biology, correlating radiomic signatures with histologic or cellular level data (ie, transcriptomics) will not only advance our knowledge about the heterogenous nature of IBD but also develop more accurate prediction models.
Limitations and Future Directions
While studies investigating the role of AI in IBD have made significant strides, there are several important limitations. First, AI algorithms’ performance are dependent on the availability and quality of data. Majority of studies rely on retrospective data. Especially with CNN-based systems for automated endoscopic scoring, endoscopic image and videos acquisition were not standardized, as most were retrospective analyses from clinical practice. Using standardized acquisition of endoscopic data from randomized controlled trials will strengthen future development of AI-based endoscopic scoring systems such as Stidham et al and Gottlieb et al.47,60 Second, there are important inherent biases to recognize. There is likely a significant degree of publication bias in the current literature, as investigators are unlikely to report negative AI algorithms and journals are unlikely to publish these negative studies. Journals should encourage the submission and publication of negative studies to fully comprehend the role and value of AI for IBD. There is also potential bias in the data sets used to train the AI models. This is particularly important to recognize considering most IBD data sets comprise Caucasian subjects, so whether these AI-based systems are accurate in non-Caucasian subjects is unclear. Efforts to study AI in underrepresented demographics will be crucial to prevent exacerbation of healthcare disparities. Third, studies in AI-based systems for endoscopic, histology, and imaging tend to favor either UC or CD. For example, endoscopy-based AI studies are primarily conducted in UC subjects, while imaging-based AI studies are primarily in CD subjects. Endoscopic scoring systems for CD are subject to the same limitations as UC scoring systems, so future studies are needed to developed AI-based systems to automate scoring in CD. Additionally, more studies are needed to develop and validate AI-based systems that can differentiate between CD and UC on endoscopy. Furthermore, studies developing AI-based systems to determine prognosis based on endoscopic findings are also needed. Finally, standardization of endoscopy, histology, and imaging techniques and settings are not standardized across institutions. Future standardization of these data is needed for AI-based systems for IBD to function appropriately across different institutions.
Conclusion
In conclusion, the transformative potential of AI applications across endoscopy, histology, and imaging in IBD is undeniably promising. Artificial intelligence stands poised to revolutionize the landscape of IBD care by addressing unmet clinical needs, improving workflow efficiency and enhancing patient outcomes through multifaceted approaches. The integration of AI-based clinical tools will play a critical role in advancing precision medicine in IBD. Additionally, AI-powered analytics present opportunities to augment the efficiency of clinical trials, facilitating quicker and more insightful analyses, ultimately expediting the development of novel therapies for IBD.
While the strides made in AI applications for IBD are exciting, inherent limitations and gaps in knowledge in the available literature underscore the need for cautious optimism. Many AI algorithms necessitate rigorous validation in larger prospective studies to ensure their reliability, reproducibility, and robust performance across diverse patient populations and clinical settings. Thus, as we continue to explore the potential for an AI-driven healthcare, the translation of these innovative tools and technologies into routine clinical practice require a comprehensive understanding of their limitations, coupled with a commitment to address these through continual research and development. Embracing a collaborative approach among clinicians, researchers, and technology developers is imperative to realizing the full potential of AI in IBD.
Supplementary Data
Supplementary data is available at Inflammatory Bowel Diseases online.
Author Contributions
P.G. is the guarantor of the article and was involved in concept and design, drafting of article, and final approval of article.
O.M. was involved in the drafting and final approval of the article.
S.D. was involved in the drafting and final approval of the article.
D.C. was involved in the drafting and final approval of the article.
P.W. was involved in the drafting and final approval of the article.
X.H. was involved in the drafting and final approval of the article.
D.L. was involved in the drafting and final approval of the article.
J.H.M. was involved in the drafting and final approval of the article.
D.P.B.M. was involved in the drafting and final approval of the article.
Funding
There are no sources of funding to disclose related to the work for this article.
Conflicts of interest
D.C.: speaker’s fees and/ or research support from Takeda, Janssen, AbbVie, Illy Lilly, Reckitt,Lapidot
Consultancy fees from Takeda, AbbVie, and Taro.
D.P.B.M. has received consulting fees from Takeda, Prometheus Biosciences Inc, Prometheus Labs, Palisade Bio, and MERCK.
P.G., O.M., S.D., P.W., X.H., D.L., J.H.M. have no conflicts of interest to disclose.