Pavel Brodskiy, PhD1, Julian Lehrer, 1, Mohammad Haft-Javaherian, PhD1, Daniel Colucci, 2, Darren Thomason, MBA1, Klaus Gottlieb, MD, PhD, JD3 1Iterative Health Inc, Cambridge, MA; 2Iterative Health Inc, New York, NY; 3Eli Lilly and Company, Indianapolis, IN Introduction: Artificial intelligence has been extensively applied to assess endoscopic disease severity in ulcerative colitis; however, its use in Crohn’s disease (CD) is limited. In CD, the presence of ulcers is a key marker of active mucosal inflammation and has emerged as an endpoint, with recent therapeutic trials reporting ulcer-free remission rates. We aim to develop a machine learning (ML) model to classify the presence or absence of ulcers in both the colon and the ileum to support more standardized endoscopic evaluations in CD. Methods: A dataset of endoscopic recordings was sampled from the Phase 2 SERENITY trial for mirikizumab in patients with active CD. Videos were randomly stratified by patient into a training (n=472) and validation (n=119) cohort, which was used for model selection. Each video was assigned a Simple Endoscopic Score for Crohn’s Disease by one to three readers, per protocol, from which binary distinction of presence or absence of ulcers in the ileum and colon was derived. We developed a multi-stage deep learning model consisting of convolutional image classifiers and a transformer-based video classification model to classify the presence or absence of ulcers in the colon and ileum. Receiver Operating Characteristic (ROC) curves and confusion matrices were used to evaluate sensitivity and specificity of model assessments. Results: Of the 119 videos in the validation cohort, all had at least one segment of the colon assessed and 104 videos had the ileum assessed. Among videos with the segment assessed, a consensus annotation was reached in 91% of videos for the colon (51% ulcers present) and 98% of videos for the ileum (31% ulcers present) among readers. In cases with more than one read, two randomly selected readers agreed on their assessment of ulcer presence or absence in the colon and ileum in 75% and 79% of cases, respectively. Model assessment of ulcer presence or absence in the colon showed an accuracy of 89% and area under ROC curve (AUC) of 0.94. Model assessment of ulcer presence or absence in the ileum showed an accuracy of 83% and AUC of 0.87. Discussion: We developed an ML model to assess the presence or absence of ulcers in the colon and ileum in CD. The model demonstrated strong performance in both locations and may be utilized to aid in the assessment standardization of endoscopic severity. Future research will investigate the use of ML to provide more granularity into endoscopic disease severity in patients with CD.
Figure: Figure 1. Receiver Operating Characteristic (ROC) curves for the deep learning model against the consensus ground truth for assessment of ulcer presence or absence in the colon (A) and ileum (B). Confusion matrices for the deep learning model against the consensus ground truth for assessment of ulcer presence in the colon (C) and ileum (D) at the threshold which best discriminates ulcer presence.
Disclosures: Pavel Brodskiy: Iterative Health Inc – Employee. Julian Lehrer: Iterative Health Inc – Employee. Mohammad Haft-Javaherian: Iterative Health Inc – Employee. Daniel Colucci: Iterative Health Inc – Employee. Darren Thomason: Iterative Health – Employee. Klaus Gottlieb: Eli Lilly – Employee.
Pavel Brodskiy, PhD1, Julian Lehrer, 1, Mohammad Haft-Javaherian, PhD1, Daniel Colucci, 2, Darren Thomason, MBA1, Klaus Gottlieb, MD, PhD, JD3. P3296 - Machine Learning Classification of Ulcers in the Colon and Ileum in Crohn’s Disease, ACG 2025 Annual Scientific Meeting Abstracts. Phoenix, AZ: American College of Gastroenterology.