P1067 - Segment Level Validation of a Deep Learning Model to Assess Endoscopic Severity in Ulcerative Colitis Using Regulatory-Grade Consensus

Sunday, October 26, 2025

3:30 PM - 7:00 PM PDT

Location: Exhibit Hall

Presenting Author(s)

David T. Rubin, MD

University of Chicago Medicine Inflammatory Bowel Disease Center, Chicago, IL, USA
Chicago, IL

Julian Lehrer, ¹, Pavel Brodskiy, PhD¹, Mohammad Haft-Javaherian, PhD¹, Daniel Colucci, ², Darren Thomason, MBA¹, Klaus Gottlieb, MD, PhD, JD³, David T. Rubin, MD⁴
1Iterative Health Inc, Cambridge, MA; 2Iterative Health Inc, New York, NY; 3Eli Lilly and Company, Indianapolis, IN; 4University of Chicago Medicine Inflammatory Bowel Disease Center, Chicago, IL, USA, Chicago, IL

Introduction: Artificial Intelligence assessment of Endoscopic Severity and Extent (AI-ESe) is a deep learning approach to continuous assessment of inflammation on endoscopy in ulcerative colitis (UC). AI-ESe measures the endoscopic subscore at discrete segments from the rectum to the maximum extent, generating a granular heatmap of inflammatory activity. We aim to validate the performance of our segment-level endoscopic subscore assessment model.

Methods: We used a previously developed deep learning model to assess the endoscopic subscore, and adapted the algorithm to optimize performance on segments. A total of 47 endoscopic video recordings from the Phase 3 induction trial for mirikizumab in UC (NCT03518086) and from routine practice were used as a holdout test set. Each video had segments pre-defined every 15 seconds with correction for stalling, aligned with AI-ESe’s design. A panel of seven experienced human readers were trained to evaluate the endoscopy subscore with a minimum quadratic weighted kappa (QWK) of 0.6 required. Each segment was assigned an endoscopic subscore via independent assessment by 3 readers with the median assigned as the final segment score to mimic the 2+1 workflow, the current regulatory standard. We evaluated the agreement between the model and human readers.

Results: A total of 823 segments were scored. Complete agreement in the endoscopy subscore among all three reviewers was achieved in 57.2% of segments, in line with published data on inter-rater variability among human reviewers. Inter-rater agreement for segment scores between the model and the final score generated via the 2+1 workflow was very good (QWK 0.82 (95% confidence interval 0.79-0.85)). Model performance is similar to any individual human reader (Spearman correlation range human-human 0.57-0.85, model-human 0.70-0.87).

Discussion: We demonstrate that our deep learning model accurately assesses the endoscopic subscore at a segment-level in UC. This supports the use of AI-ESe to accurately assess inflammation severity at a more granular level in UC.

Figure: Table 1. Key model performance metrics for assessment of the endoscopic subscore against the 2+1 reference standard. Abbreviations: Acc, accuracy; QWK, quadratic weighted kappa.

Figure: Figure 1. Spearman correlation matrix for the endoscopy subscore between AI-ESe and all individual human readers.

Disclosures:
Julian Lehrer: Iterative Health Inc – Employee.
Pavel Brodskiy: Iterative Health Inc – Employee.
Mohammad Haft-Javaherian: Iterative Health Inc – Employee.
Daniel Colucci: Iterative Health Inc – Employee.
Darren Thomason: Iterative Health – Employee.
Klaus Gottlieb: Eli Lilly – Employee.
David Rubin: AbbVie – Advisory Committee/Board Member, Consultant, Speaker fees. Abivax SA – Consultant. Altrubio – Advisory Committee/Board Member, Consultant, Speaker feees, Stock Options. Avalo – Advisory Committee/Board Member, Consultant, Speaker fees. Bausch Health – Consultant. Bristol Myers Squibb – Advisory Committee/Board Member, Consultant, Speaker fees. Buhlmann Diagnostics – Advisory Committee/Board Member, Consultant, Speaker fees. Celltrion – Consultant. ClostraBio – Consultant. Connect BioPharma – Consultant. Cornerstones Health, Inc – Board of Directors membership. Douglas Pharmaceuticals – Consultant. Eli Lilly & Co. – Consultant. Foresee, Genentech (Roche) Inc. – Consultant. Image Analysis Group – Consultant. InDex Pharmaceutical – Consultant. Intouch Group – Advisory Committee/Board Member, Consultant, Speaker fees. Iterative Health – Advisory Committee/Board Member, Consultant, Speaker fees. Iterative Health – Stock Options. Janssen Pharmaceuticals – Consultant. Lilly – Advisory Committee/Board Member, Consultant, Speaker fees. Odyssey Therapeutics – Consultant. Pfizer – Advisory Committee/Board Member, Consultant, Speaker fees. Sanofi – Consultant. Takeda – Advisory Committee/Board Member, Consultant, Grant/Research Support, Speaker fees. Throne – Consultant. Vedanta – Consultant.

Julian Lehrer, ¹, Pavel Brodskiy, PhD¹, Mohammad Haft-Javaherian, PhD¹, Daniel Colucci, ², Darren Thomason, MBA¹, Klaus Gottlieb, MD, PhD, JD³, David T. Rubin, MD⁴. P1067 - Segment Level Validation of a Deep Learning Model to Assess Endoscopic Severity in Ulcerative Colitis Using Regulatory-Grade Consensus, ACG 2025 Annual Scientific Meeting Abstracts. Phoenix, AZ: American College of Gastroenterology.