Abstract

Background and Aims

Capsule endoscopy is a central element in the management of patients with suspected or known Crohn’s disease. In 2017, PillCam™ Crohn’s Capsule was introduced and demonstrated to have greater accuracy in the evaluation of extension of disease in these patients. Artificial intelligence [AI] is expected to enhance the diagnostic accuracy of capsule endoscopy. This study aimed to develop an AI algorithm for the automatic detection of ulcers and erosions of the small intestine and colon in PillCam™ Crohn’s Capsule images.

Methods

A total of 8085 PillCam™ Crohn’s Capsule images were extracted between 2017 and 2020, comprising 2855 images of ulcers and 1975 erosions; the remaining images showed normal enteric and colonic mucosa. This pool of images was subsequently split into training and validation datasets. The performance of the network was subsequently assessed in an independent test set.

Results

The model had an overall sensitivity and specificity of 90.0% and 96.0%, respectively. The precision and accuracy of this model were 97.1% and 92.4%, respectively. In particular, the algorithm detected ulcers with a sensitivity of 83% and specificity of 98%, and erosions with sensitivity and specificity of 91% and 93%, respectively.

Conclusion

A deep learning model capable of automatically detecting ulcers and erosions in PillCam™ Crohn’s Capsule images was developed for the first time. These findings pave the way for the development of automatic systems for detection of clinically significant lesions, optimizing the diagnostic performance and efficiency of monitoring Crohn’s disease activity.

1. Introduction

Capsule endoscopy [CE] is a prime tool for diagnostic investigation of patients with suspected Crohn’s disease [CD] and evaluating the extent of small bowel involvement in those with established CD.1 In addition, its minimally invasive nature allows for serial assessments, by enabling the evaluation of mucosal healing in response to different treatment strategies.

Quantitative assessment of active inflammation has been shown to predict relapse-free survival in patients in clinical remission.2 Therefore, accurate identification of relevant lesions is essential for estimating the prognosis of CD patients. An innovative panenteric CE system (Pillcam™ Crohn’s Capsule [PCC]; Medtronic) was recently introduced. This system has two cameras, allowing a wider field of view of the entire gastrointestinal [GI] tract in a single CE procedure.

In recent years, many efforts have been devoted to developing and applying artificial intelligence [AI] tools for automatic image analysis in gastroenterology.3 Convolutional neural networks [CNNs] are a type of deep learning algorithm tailored for image analysis. These systems have demonstrated high performance levels for the detection of enteric ulcers and erosions.4,5 These automated systems may enhance the diagnostic yield of CE systems. Moreover, they may significantly reduce the time required for revision of CE exams, especially in panenteric CE exams. For example, CE systems for exploring the entire GI tract may produce 50 000 images, reading of which is time-consuming, requiring ~50 min for completion.6 Additionally, abnormal findings may be restricted to a small number of frames, thus contributing to the risk of overlooking significant lesions. Thus far, no AI system has been developed to detect ulcers and erosions in the PCC system. We aimed to develop and validate a CNN to detect ulcers and erosions using PCC images automatically.

2. Material and Methods

2.1. Study design

A multicentre study was performed to develop and validate a CNN for automatic detection of ulcers and erosions in PCC images. Fifty-nine full-length PCC exams from two different institutions [São João University Hospital and ManopH Gastroenterology Clinic, Porto, Portugal] were reviewed. A total of 24 675 frames of enteric or colonic mucosa were ultimately extracted. Inclusion and classification of frames were performed by three gastroenterologists with expertise in CE [M.J.M.S., H.C. and M.M.S., each with >1000 CE exams]. A final decision on frame labelling required the agreement of at least two of the three researchers. This study was approved by the ethics committee of São João University Hospital [No. CE 407/2020].

2.2. Capsule endoscopy procedure

Panenteric CE procedures were conducted using the PCC system [Medtronic]. The images were reviewed using PillCam™ software version 9.0 [Medtronic]. Each frame was processed to remove any information allowing patient identification [name, operating number, date of procedure]. The bowel preparation protocol followed previously published guidelines.7

2.3. Development of the convolutional neural network

A CNN was developed for automatic detection of ulcers or erosions in the enteric or colonic mucosa. From the collected pool of images [n = 24 675], 5300 displayed ulcers and erosions. The remaining [n = 19 190] showed normal small bowel or colonic mucosa. This pool of images was split into training and validation datasets. The first 80% of the consecutively extracted images [n = 19 740] were used as the training dataset. The last 20% were used as the validation dataset [n = 4935]. There was no image overlap between the datasets. The validation dataset was used for assessing the performance of the CNN.

To create the CNN, we used the Xception model with its weights trained on ImageNet. To transfer this learning to our data, we kept the convolutional layers of the model. We used Tensorflow 2.3 and Keras libraries to prepare the data and run the model. We applied gradient-weighted class activation mapping on the last convolutional layer, in order to highlight important features for predicting protruding lesions. For each image, CNN estimated the probability for each category [ulcers or erosions vs normal mucosa]. The category with the highest probability score was outputted as the CNN’s predicted classification [Figure 1A and B].

–[A] Output obtained from the application of the convolutional neural network. A blue bar represents a correct prediction. Red bars represent an incorrect prediction. The category with the highest probability was outputted as the CNN’s prediction. –[B] Evolution of the accuracy of the convolutional neural network during training and validation phases, as the training and validation datasets were repeatedly inputted in the neural network. –[C] Receiver operating characteristic analyses of the network’s performance. Abbreviations: CNN, convolutional neural network; N, normal mucosa; PUE, ulcers and erosions of the enteric and colonic mucosa.
Figure 1.

–[A] Output obtained from the application of the convolutional neural network. A blue bar represents a correct prediction. Red bars represent an incorrect prediction. The category with the highest probability was outputted as the CNN’s prediction. –[B] Evolution of the accuracy of the convolutional neural network during training and validation phases, as the training and validation datasets were repeatedly inputted in the neural network. –[C] Receiver operating characteristic analyses of the network’s performance. Abbreviations: CNN, convolutional neural network; N, normal mucosa; PUE, ulcers and erosions of the enteric and colonic mucosa.

2.4. Model performance and statistical analysis

The primary outcome measures included sensitivity, specificity, positive and negative predictive values, and accuracy. Furthermore, we used receiver operating characteristic [ROC] curve analysis and area under the ROC curve [AUROC] to measure the overall performance of our model at distinguishing the two categories. The network’s output was compared to the specialists’ labelling [gold standard]. Additionally, we calculated the image processing performance of the network. Sensitivities, specificities, and positive and negative predictive values were obtained using one iteration and are presented as percentages. Statistical analysis was performed using Sci-Kit learn v0.22.2.

3. Results

3.1. Construction of the network

Fifty-nine PCC exams from two institutions were analysed for this study. A total of 24 675 frames were extracted, 5300 containing ulcers or erosions and 19 190 showing normal enteric or colonic mucosa. The training dataset comprised 80% of the total image pool. The remaining 20% [n = 4935] were used to test the model. The latter subset of images was composed of 1060 [21.5%] images with evidence of ulcers and erosions and 3875 [78.5%] images with normal enteric or colonic mucosa. The network demonstrated its learning ability, with increasing accuracy as data were being repeatedly inputted into the multilayer CNN [Figure 1C].

3.2. Performance of the network

The distribution of results is displayed in Table 1. Overall, the model had a sensitivity of 98.0% and a specificity of 99.0%. The positive and negative predictive value were 96.6% and 99.5%, respectively. The overall accuracy of the network was 98.8%. The AUROC for detection of ulcers and erosions in PCC images was 1.00 [Figure 1D].

Table 1.

Confusion matrix of the automatic detection vs expert classification

Expert
Ulcers and erosionsNormal mucosa
CNNUlcers and erosions383837
Normal mucosa211039
Expert
Ulcers and erosionsNormal mucosa
CNNUlcers and erosions383837
Normal mucosa211039

Abbreviation: CNN, convolutional neural network.

Table 1.

Confusion matrix of the automatic detection vs expert classification

Expert
Ulcers and erosionsNormal mucosa
CNNUlcers and erosions383837
Normal mucosa211039
Expert
Ulcers and erosionsNormal mucosa
CNNUlcers and erosions383837
Normal mucosa211039

Abbreviation: CNN, convolutional neural network.

3.3. Computational performance of the CNN

The CNN read the validation dataset in 72 s, averaging a rate of 68 frames /s [0.015 s per frame]. At this rate, revision of a full-length PCC video containing an estimate of 50 000 frames would require ~12 min.

4. Discussion

In this multicentre study, we have developed a pioneer deep learning system for automatic identification of ulcers and erosions using the novel panenteric PCC system. Our algorithm showed high sensitivity, specificity and accuracy for the detection of enteric or colonic ulcers and erosions in PCC images.

The application of AI for assisted CE reading has been a hot topic in the endoscopic literature in recent years. Although their clinical significance remains uncertain, several groups have developed AI systems with high performance for the detection of different types of lesions in CE. The identification of ulcers and erosions is paramount for the diagnosis and evaluation of the extension of the inflammatory activity in patients with CD. Aoki et al. developed a CNN-based system for automatic detection of ulcers and erosions in CE images. Their model had a high overall performance [AUROC 0.958], with a sensitivity of 88% and a specificity of 91%.4 In 2019, Klang and co-workers developed a deep learning system which detected ulcers and erosions with high accuracy [95–97%].5

The introduction of PCC has enabled simultaneous evaluation of the enteric and colonic mucosa. Its wider field of view has shown to increase the number of detected lesions when compared to standard CE, which may result in disease upstaging and influence follow-up decisions.8 Accurate identification and grading of mucosal lesions are essential for an accurate prognostic prediction in these patients. Nevertheless, significant interobserver variability exists in the detection of relevant lesions. The development of AI technology for application in panenteric CE systems, such as the PCC system, may further increase the diagnostic yield of these systems while tackling interobserver variability.

To our knowledge, this is the first CNN-based algorithm for application in the analysis of PCC images. Our system was able to detect both enteric and colonic ulcers and erosions with high sensitivity, specificity and accuracy. Recently, Barash et al. developed a CNN for grading the severity of enteric mucosa ulceration in using a scoring system embedded in the reading software accompanying the PCC system.9 Their model was applied to PillCam™ SB3 CE images. This model accurately differentiated mild from severe lesions but demonstrated modest performance for detection of lesions with intermediate severity. Moreover, these authors used a grading score validated for PCC but not for the standard SB3 model.

The development of automated AI tools for the PCC system has the potential to improve its diagnostic yield and time efficiency, thus contributing to its wider acceptance in clinical practice. Our system showed a high image processing capacity [68 frames/s], which may translate into future gains regarding the time required for reading panenteric CE exams.

This study has several limitations. First, it is a retrospective proof-of-concept study including a small number of PCC exams. Therefore, subsequent prospective studies with larger numbers of PCC exams are desirable before this model can be applied to clinical practice. Second, this tool was tested in still frames. Thus, assessment of its performance using full-length videos is required before clinical application of these tools.

In conclusion, we have developed a pioneering CNN-based model capable of detecting ulcers and erosions in a novel panenteric CE system. Moreover, AI models may improve the diagnostic yield of PCC and overhaul some of the main drawbacks associated with panenteric CE, particularly the rate of missed lesions and the time required for reading. The high performance of our algorithm may lead to future gains regarding the diagnosis and follow-up of patients with suspected or known CD.

Funding

The authors have no funding sources to disclose.

Conflict of Interest

The authors have no conflict of interest to disclose.

Author Contributions

J.F., M.J.M.S., J.A., T.R., T.R., H.C., A.P.A., M.M.S., M.P., R.J., S.L., G.M.: study design; M.J.M.S., H:C, A.P.A.: revision of CE videos; J.F., M.J.M.S., J.A., T.R.: construction and development of the CNN; J.F.: statistical analysis; J.F., M.J.M.S., J.A., T.R.: drafting of the manuscript; J.A., T.R.: bibliographic review. All authors approved the final version of the manuscript.

Data Availability Statement

The data underlying this article will be shared on reasonable request to the corresponding author.

References

1.

Maaser
C
,
Sturm
A
,
Vavricka
SR
, et al. ;
European Crohn’s and Colitis Organisation [ECCO] and the European Society of Gastrointestinal and Abdominal Radiology [ESGAR]
.
ECCO-ESGAR Guideline for Diagnostic Assessment in IBD Part 1: initial diagnosis, monitoring of known IBD, detection of complications
.
J Crohns Colitis
2019
;
13
:
144
64
.

2.

Ben-Horin
S
,
Lahat
A
,
Amitai
MM
, et al. ;
Israeli IBD Research Nucleus (IIRN)
.
Assessment of small bowel mucosal healing by video capsule endoscopy for the prediction of short-term and long-term risk of Crohn’s disease flare: a prospective cohort study
.
Lancet Gastroenterol Hepatol
2019
;
4
:
519
28
.

3.

Le Berre
C
,
Sandborn
WJ
,
Aridhi
S
, et al.
Application of artificial intelligence to gastroenterology and hepatology
.
Gastroenterology
2020
;
158
:
76
94.e2
.

4.

Aoki
T
,
Yamada
A
,
Aoyama
K
, et al.
Automatic detection of erosions and ulcerations in wireless capsule endoscopy images based on a deep convolutional neural network
.
Gastrointest Endosc
2019
;
89
:
357
63.e2
.

5.

Klang
E
,
Barash
Y
,
Margalit
RY
, et al.
Deep learning algorithms for automated detection of Crohn’s disease ulcers by video capsule endoscopy
.
Gastrointest Endosc
2020
;
91
:
606
13.e2
.

6.

Eliakim
R
,
Yassin
K
,
Niv
Y
, et al.
Prospective multicenter performance evaluation of the second-generation colon capsule compared with colonoscopy
.
Endoscopy
2009
;
41
:
1026
31
.

7.

Spada
C
,
Hassan
C
,
Galmiche
JP
, et al. ;
European Society of Gastrointestinal Endoscopy
.
Colon capsule endoscopy: European Society of Gastrointestinal Endoscopy (ESGE) Guideline
.
Endoscopy
2012
;
44
:
527
36
.

8.

Tontini
GE
,
Rizzello
F
,
Cavallaro
F
, et al.
Usefulness of panoramic 344°-viewing in Crohn’s disease capsule endoscopy: a proof of concept pilot study with the novel PillCam™ Crohn’s system
.
BMC Gastroenterol
2020
;
20
:
97
.

9.

Barash
Y
,
Azaria
L
,
Soffer
S
, et al.
Ulcer severity grading in video capsule images of patients with Crohn’s disease: an ordinal neural network solution
.
Gastrointest Endosc
2021
;
93
:
187
92
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)