Sri Harsha Boppana, MBBS, MD1, Manaswitha Thota, MD2, Gautam Maddineni, MD3, Sachin Sravan Kumar Komati, 4, Sarath Chandra Ponnada, 5, Sai Lakshmi Prasanna Komati, MBBS6, C. David Mintz, MD, PhD7 1Nassau University Medical Center, East Meadow, NY; 2Virginia Commonwealth University, Richmond, VA; 3Florida State University, Cape Coral, FL; 4Florida International University, Florida, FL; 5Great Eastern Medical School and Hospital, Srikakulam, Srikakulam, Andhra Pradesh, India; 6Government Medical College, Ongole, Ongole, Andhra Pradesh, India; 7Johns Hopkins University School of Medicine, Baltimore, MD Introduction: Colorectal polyp characterization guides surveillance and treatment but depends heavily on subjective endoscopic judgment. Automated image‐based classifiers improved static lesion assessment yet often ignored dynamic morphological changes and patient factors. Methods: We used the ERC PMP-v5 dataset with high-resolution endoscopic videos, expert annotations, and clinical data from 217 patients. Video frames were sampled every six seconds; polyp areas were cropped and resized to 224×224 pixels. Each frame was linked to tabular variables (size, dysplasia grade, demographics). Data were split at the patient level: 70% training, 15% validation, 15% test. Our pipeline fine-tuned a Vision Transformer (ViT-B/16) pretrained on ImageNet to extract spatial embeddings. Sequences of eight consecutive frame embeddings fed into a four-layer Transformer encoder to capture temporal morphology changes. From this representation, we added (a) an LSTM regression head to predict polyp size trajectories and (b) a dense classification head to predict morphological transitions (e.g., tubular → villous → adenocarcinoma). For malignancy risk, ViT features were fused with patient metadata via a fully connected network. All components were optimized end-to-end with AdamW (learning rate 1×10⁻⁴), cosine annealing, and early stopping on validation loss. Performance metrics included mean absolute error, classification accuracy, area under the ROC curve, sensitivity, and specificity. Results: The size-prediction model achieved an MAE of 0.0689 cm on held-out data. Morphology classification reached 85 % accuracy with an F1 score of 0.84. The fusion network distinguished malignant from benign lesions with 99.89 % accuracy, AUC = 0.997, sensitivity = 98.5 %, and specificity = 99.2 %. Risk stratification aligned strongly with histopathology: 92 % of high-risk predictions corresponded to confirmed malignancy. Recurrence prediction yielded a balanced accuracy of 94 %. Inference ran at 18 frames per second on a single GPU, enabling real-time frame-level analysis and decision support. Discussion: Our pipeline achieved submillimeter accuracy in size forecasting and near‐perfect discrimination between benign and malignant lesions. Integrating temporal morphology with patient information outperformed models limited to single‐frame analysis. Running at 18 frames per second on standard hardware, the framework can deploy directly in endoscopy suites, offering immediate decision support and the potential to streamline polyp management.
Disclosures: Sri Harsha Boppana indicated no relevant financial relationships. Manaswitha Thota indicated no relevant financial relationships. Gautam Maddineni indicated no relevant financial relationships. Sachin Sravan Kumar Komati indicated no relevant financial relationships. Sarath Chandra Ponnada indicated no relevant financial relationships. Sai Lakshmi Prasanna Komati indicated no relevant financial relationships. C. David Mintz indicated no relevant financial relationships.
Sri Harsha Boppana, MBBS, MD1, Manaswitha Thota, MD2, Gautam Maddineni, MD3, Sachin Sravan Kumar Komati, 4, Sarath Chandra Ponnada, 5, Sai Lakshmi Prasanna Komati, MBBS6, C. David Mintz, MD, PhD7. P4778 - Real-Time Multimodal Transformer Pipeline for Dynamic Prediction of Colorectal Polyp Growth and Malignancy Risk, ACG 2025 Annual Scientific Meeting Abstracts. Phoenix, AZ: American College of Gastroenterology.