Sheza Malik, MD1, Umair Rasheed, MS2, Dushyant S. Dahiya, MD3, Umar Hayat, MD4, Jason Gutman, MD5 1Emory University, Atlanta, GA; 2Volkswagen Group of America, Belmont, CA; 3University of Kansas School of Medicine, Kansas City, KS; 4Geisinger Wyoming Valley Medical Center, Wilkes-Barre, PA; 5rochester general hospital, Rochester, NY Introduction: Gastroenterological emergencies such as acute pancreatitis, Crohn’s disease flare-ups, ischemic colitis, and severe Clostridioides difficile colitis demand swift, guideline-adherent decision-making, often in high-pressure clinical environments. Large language models (LLMs) like ChatGPT and DeepSeek have shown promise in synthesizing medical knowledge but struggle with complex, multimorbid scenarios. This study introduces a collaborative, iterative framework where two LLMs refine each other’s clinical management plans to enhance decision quality and guideline adherence. Methods: Four representative, high-acuity gastroenterological cases were developed to simulate real-world complexity. Each case was input independently into ChatGPT (GPT-4) and DeepSeek, producing initial management plans. These were exchanged between models for critique, improvement, and consensus development over up to five iterative rounds. Outcome metrics included the Guideline Adherence Score (GAS; 0–5 scale), assessed by expert gastroenterologists and fellows, and inter-model agreement measured via Cohen’s kappa. Thematic analysis using NVivo 12 identified recurring patterns in the refinements. Results: Iterative critique significantly improved guideline adherence across all cases, with mean GAS scores increasing from 3.0 ± 0.8 to 5.0 ± 0.0 (p < 0.001). Initial discrepancies in antibiotic selection, imaging timing, and procedural decisions were resolved through consensus, achieving perfect inter-model agreement post-refinement (κ = 1.0). Thematic analysis highlighted improvements in four clinical domains: resuscitation protocols, infection control, procedural decision-making, and long-term management. Notably, AI-generated plans incorporated guideline-consistent strategies such as delayed empiric antibiotic use in pancreatitis, stress-dose steroids for adrenal crisis in Crohn’s flare, and appropriate anticoagulation reversal in ischemic colitis. Discussion: Collaborative LLM refinement offers a scalable, effective strategy for improving clinical decision-making in complex gastroenterological emergencies. By emulating peer review and multidisciplinary deliberation, this framework enhances accuracy, consistency, and adherence to clinical guidelines. Integration into real-time clinical workflows and further validation in prospective clinical settings could pave the way for AI-augmented precision care.
Disclosures: Sheza Malik indicated no relevant financial relationships. Umair Rasheed indicated no relevant financial relationships. Dushyant Dahiya indicated no relevant financial relationships. Umar Hayat indicated no relevant financial relationships. Jason Gutman indicated no relevant financial relationships.
Sheza Malik, MD1, Umair Rasheed, MS2, Dushyant S. Dahiya, MD3, Umar Hayat, MD4, Jason Gutman, MD5. P6198 - Integrating Iterative Large Language Model Collaboration Into Complex Gastrointestinal Case Management: A Pilot Evaluation, ACG 2025 Annual Scientific Meeting Abstracts. Phoenix, AZ: American College of Gastroenterology.