Sri Harsha Boppana, MBBS, MD1, Manaswitha Thota, MD2, Gautam Maddineni, MD3, Sachin Sravan Kumar Komati, 4, Sarath Chandra Ponnada, 5, Sai Lakshmi Prasanna Komati, MBBS6, Aditya Chandrashekar, MBBS7, C. David Mintz, MD, PhD8 1Nassau University Medical Center, East Meadow, NY; 2Virginia Commonwealth University, Richmond, VA; 3Florida State University, Cape Coral, FL; 4Florida International University, Florida, FL; 5Great Eastern Medical School and Hospital, Srikakulam, Srikakulam, Andhra Pradesh, India; 6Government Medical College, Ongole, Ongole, Andhra Pradesh, India; 7The Johns Hopkins Hospital, Baltimore, MD; 8Johns Hopkins University School of Medicine, Baltimore, MD Introduction: Accurate endoscopic pathology classification depends on time-consuming expert annotations that limit large-scale deployment. We pretrained a Vision Transformer with a decorrelation-based self-supervised framework on ~100 000 unlabeled images to learn high-fidelity representations without manual labels. Methods: We leveraged the HyperKvasir repository to assemble 99 417 unlabeled endoscopic images. A self-supervised encoder based on the Barlow Twins architecture processed paired augmentations of each image to learn invariant, non-redundant feature representations. We extracted high-level embeddings with a Vision Transformer backbone and refined them through a projection network optimized by a decorrelation loss. Following pretraining, we fine-tuned the encoder with a multilayer perceptron classifier on 10 662 manually labeled images spanning 23 gastrointestinal conditions. We set aside 20 % of the labeled data for testing. Performance metrics included accuracy, precision, recall, weighted F1-score, and area under the receiver operating characteristic curve. Results: The encoder achieved 87.6 % accuracy and a weighted F1-score of 0.87 on the held-out test set. It maintained a mean AUC of 0.99 across all 23 categories. Esophagitis, Barrett’s esophagus, and colonic polyps attained class-level F1-scores above 0.95. Performance remained consistent across both prevalent and rare conditions. t-SNE projections revealed distinct, well-separated clusters, and the confusion matrix exhibited minimal misclassification among closely related lesions. Such robust results demonstrate that decorrelation-based self-supervision can deliver clinically meaningful representations while dramatically reducing the need for extensive manual annotation. Discussion: Fine-tuning on 10 662 expert-annotated images yielded 87.6 % accuracy, a weighted F1-score of 0.87, and a mean AUC of 0.99 across 23 gastrointestinal conditions, with Barrett’s esophagus and colonic polyps exceeding class-specific F1-scores of 0.95. Feature-space visualization and confusion-matrix analysis confirmed clear separation of disease categories and minimal overlap among similar lesions. By slashing annotation demands, decorrelation-based self-supervision lays a scalable, practical foundation for deploying AI-driven diagnostic support in everyday endoscopic practice.
Disclosures: Sri Harsha Boppana indicated no relevant financial relationships. Manaswitha Thota indicated no relevant financial relationships. Gautam Maddineni indicated no relevant financial relationships. Sachin Sravan Kumar Komati indicated no relevant financial relationships. Sarath Chandra Ponnada indicated no relevant financial relationships. Sai Lakshmi Prasanna Komati indicated no relevant financial relationships. Aditya Chandrashekar indicated no relevant financial relationships. C. David Mintz indicated no relevant financial relationships.
Sri Harsha Boppana, MBBS, MD1, Manaswitha Thota, MD2, Gautam Maddineni, MD3, Sachin Sravan Kumar Komati, 4, Sarath Chandra Ponnada, 5, Sai Lakshmi Prasanna Komati, MBBS6, Aditya Chandrashekar, MBBS7, C. David Mintz, MD, PhD8. P5122 - Barlow Twins Self-Supervised Vision Transformer for Robust Classification of Gastrointestinal Endoscopic Images, ACG 2025 Annual Scientific Meeting Abstracts. Phoenix, AZ: American College of Gastroenterology.