Sri Harsha Boppana, MBBS, MD1, Manaswitha Thota, MD2, Gautam Maddineni, MD3, Sachin Sravan Kumar Komati, 4, Sai Lakshmi Prasanna Komati, MBBS5, Aditya Chandrashekar, MBBS6, C. David Mintz, MD, PhD7 1Nassau University Medical Center, East Meadow, NY; 2Virginia Commonwealth University, Richmond, VA; 3Florida State University, Cape Coral, FL; 4Florida International University, Florida, FL; 5Government Medical College, Ongole, Ongole, Andhra Pradesh, India; 6The Johns Hopkins Hospital, Baltimore, MD; 7Johns Hopkins University School of Medicine, Baltimore, MD Introduction: Early identification of high-risk gastrointestinal lesions enables timely intervention and improves patient outcomes. Traditional methods struggle with variable image quality and diverse clinical presentations. We developed a multimodal ensemble that integrates image embeddings with structured clinical data to rapidly stratify lesion risk. Methods: We retrospectively sampled 5,000 lesion frames from KYUCapsule with simulated metadata (age, sex, diagnoses, medications, familial GI history). Images were resized to 224×224 pixels, normalized, and augmented via rotations, brightness shifts, and flips. A Vision Transformer (ViT-Base) pretrained on ImageNet produced 512-dimensional embeddings. Metadata were encoded by a multilayer perceptron (128-64 neurons, ReLU), with continuous variables standardized and categoricals one-hot encoded. Image and metadata embeddings were concatenated and fed to two classifiers: a gradient boosting machine (500 trees, max depth 6, learning rate 0.01) and a deep neural network. We trained using stratified 5-fold cross-validation, tuned hyperparameters (learning rate, batch size, dropout) via grid search with early stopping, using cross-entropy loss for the DNN and log-loss for the GBM. Final probabilities were calibrated with isotonic regression. We interpreted results on the held-out test set using SHAP. Results: On the test set, the ensemble achieved 94.2 % accuracy, 93.7 % precision, 94.5 % recall, 94.1 % F1-score, and an area under the ROC curve (AUC) of 0.948. Stratified accuracy was 95.0 % for low risk, 93.8 % for moderate risk, and 93.9 % for high risk. Calibration was close, with a Brier score of 0.055. The GBM alone attained 88.3 % accuracy (AUC = 0.912), whereas the DNN alone reached 90.1 % accuracy (AUC = 0.927); their ensemble improved overall robustness. SHAP analysis identified lesion texture (mean |SHAP| = 0.22), patient age (0.19), prior GI conditions (0.17), and lesion type (0.16) as top contributors; medication use and familial history had lower impact (≤ 0.10). Inference time averaged 60 ms per case, supporting near-real-time use. Model performance did not differ significantly for patients over age 65 (93.5 % vs. 94.5 %, p = 0.12). Discussion: This integrated transformer-GBM ensemble combines image morphology and clinical context to predict lesion risk accurately and rapidly. Its consistent performance across subgroups and transparency via SHAP facilitate clinical adoption for timely gastrointestinal lesion management.
Disclosures: Sri Harsha Boppana indicated no relevant financial relationships. Manaswitha Thota indicated no relevant financial relationships. Gautam Maddineni indicated no relevant financial relationships. Sachin Sravan Kumar Komati indicated no relevant financial relationships. Sai Lakshmi Prasanna Komati indicated no relevant financial relationships. Aditya Chandrashekar indicated no relevant financial relationships. C. David Mintz indicated no relevant financial relationships.
Sri Harsha Boppana, MBBS, MD1, Manaswitha Thota, MD2, Gautam Maddineni, MD3, Sachin Sravan Kumar Komati, 4, Sai Lakshmi Prasanna Komati, MBBS5, Aditya Chandrashekar, MBBS6, C. David Mintz, MD, PhD7. P5130 - Multimodal AI Model for Predicting High-Risk Patients Requiring Capsule Endoscopy: A Fusion of Medical History and Imaging, ACG 2025 Annual Scientific Meeting Abstracts. Phoenix, AZ: American College of Gastroenterology.