University of California San Francisco San Francisco, CA
Award: ACG Presidential Poster Award
Goktug Onal, , Aryan Ayati, MD, MPH, Mao-Yuan Chen, MD, Anshu Mukherjee, , Vivek Rudrapatna, MD, PhD University of California San Francisco, San Francisco, CA Introduction: Diagnostic delays in inflammatory bowel disease (IBD) are common and associated with worse outcomes and increased healthcare costs. We evaluated whether machine learning models using EHR data could identify undiagnosed IBD patients earlier than standard care. Methods: We conducted a single-center retrospective study from 2012 to 2024. Using the GPT-4o model for data extraction and manual chart reviews, we identified 324 patients. These patients had IBD-related symptoms and were later diagnosed with IBD at our center. We sampled 4,000 controls who also had a documentation of IBD-related symptoms without a subsequent IBD diagnosis. Patients entered the risk cohort after their first encounter for an IBD symptom. Cases exited the cohort after the earliest occurrence of one of four events: a fecal calprotectin order, a colonoscopy procedure, a GI referral order, or an IBD diagnosis code. Data collected after the cohort exit date were excluded. This ensured that models would learn to make timely and clinically useful predictions of undiagnosed IBD. Structured EHR inputs were collected for each patient visit and transformed into 7,700 time-resolved variables representing clinical activity before the index date. LASSO regression was used to select 230 predictors for the models. Four models—logistic regression, CatBoost, Transformer, and Mamba—were trained using an 80:10:10 train/test/validation split. Each model generated a risk score at every visit. Patients were predicted as positive if their conditional probability of IBD exceeded 90%. Results: The Mamba model had the highest performance: Accuracy 95%, PPV 75%, False Positive Rate 0.5%, specificity 99%, and sensitivity 28% (Fig. 1). Alerts preceded chart diagnosis by a median of 37 days (IQR 15–191). The most predictive features were age and patterns of accumulating diagnoses over time (Fig. 2). Manual review of the model predictions identified one patient who was identified 3.8 years before her diagnosis date. She was referred to GI for hematochezia but was lost to follow-up for several years. Discussion: A deep-learning model using EHR data identifies a clinically meaningful subset of yet-undiagnosed IBD patientsmonths to years earlier than standard care. While sensitivity was modest, the model identified patients with a low false-positive rate. This supports its use as an early triage tool, particularly in high-volume settings where earlier GI evaluation may improve outcomes.
Figure: Figure 1: Receiver Operating Characteristic (ROC) curve of the predictive model. The model achieved an area under the curve (AUC) of 0.772, indicating good discriminative performance in distinguishing patients likely to be diagnosed with IBD. The ROC curve plots true positive rate against false positive rate across varying thresholds, with the dashed diagonal representing random classification.
Figure: Figure 2: Integrated Gradients heatmap showing the top 20 features contributing to model predictions across the first 40 patient visits. Red tones indicate features that positively contributed to the prediction, while blue tones indicate negative contributions.
Disclosures: Goktug Onal indicated no relevant financial relationships. Aryan Ayati indicated no relevant financial relationships. Mao-Yuan Chen indicated no relevant financial relationships. Anshu Mukherjee indicated no relevant financial relationships. Vivek Rudrapatna: Acucare – Advisory Committee/Board Member. Blueprint Medicines – Grant/Research Support. Data Unite – Advisory Committee/Board Member. Genentech – Grant/Research Support. Ironwood – Payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events. Merck – Grant/Research Support. Microsoft – Grant/Research Support. Mitsubishi Tanabe – Grant/Research Support. Natera – Payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events. Stryker – Grant/Research Support. Takeda – Grant/Research Support. ZebraMD – Advisory Committee/Board Member.
Goktug Onal, , Aryan Ayati, MD, MPH, Mao-Yuan Chen, MD, Anshu Mukherjee, , Vivek Rudrapatna, MD, PhD. P1181 - Reducing Diagnostic Delays in Inflammatory Bowel Disease Using Electronic Health Records and Longitudinal Deep Learning, ACG 2025 Annual Scientific Meeting Abstracts. Phoenix, AZ: American College of Gastroenterology.