AN ARTIFICIAL INTELLIGENCE SYSTEM FOR POLYP RE-IDENTIFICATION IN COLONOSCOPY PROCEDURES TRAINED WITH UNLABELED DATA INTRODUCTION During colonoscopy procedures, it is common to lose visibility of a detected polyp, specifically during tool insertion for polyp management. In such cases, the endoscopist can often spend precious clinical time finding and re-identifying the “lost” polyp. Low confidence in polyp re-identification leads to inefficiency due to searching for a polyp already found. False re-identification may result in missing a polyp the endoscopist would have otherwise chosen to manage. The same clinical situation may occur when deciding to manage a polyp detected during the insertion phase versus waiting until withdrawal. With AI polyp re-identification (ReID) we aim to automatically re-identify polyps, in real-time during the colonoscopy video procedure, hence alleviating the problems outlined above. CONCLUSIONS We developed a method of using unlabeled colonoscopy videos to train an AI ReID system, and were able to show encouraging initial results on a labeled evaluation set, in spite of imperfect interobserver agreement. We believe that this method, building on the combination of multi-view early fusion and self-supervised learning, may be further extended to other types of analysis of video-based medical procedures. METHOD RESULTS The ReID model achieved 0.8 AUC ROC when tested on labeled polyp pairs. We find this initial result quite encouraging, specifically considering interrater reliability was found to be imperfect: Cohen’s Kappa coefficient was 0.76. ACKNOWLEDGEMENTS M. Fruchter, T. Borreda, C. Foo, J. Widen, S. Plowman, S. Schlachter, Y. Zheng and D. Livovsky. REFERENCES [1] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). [2] Jiang, Dongwei, et al. "Speech simclr: Combining contrastive and reconstruction objective for self-supervised speech representation learning." arXiv preprint arXiv:2010.13991 (2020). MULTI IMAGE : We propose a machine learned model trained to represent a polyp composed of multiple images as a vector in a latent space . Thus we can leverage the video modality, which inherently provides multi-image input for each polyp. The learned representation will contain important information inferred from polyp images, and may be directly compared to vectors representing other polyps, via distance calculation. EARLY FUSION : The multi-image input is fused with transformer [1] architecture ("early fusion"), to directly learn a single representation vector for multiple images, rather than applying “late fusion” heuristics on top of single image representations. SELF SUPERVISED: Lack of labeled data for ReID is addressed by using a self-supervised approach, called contrastive learning, which allows using unlabeled data for optimization, SimCLR [2] specifically. With this approach we can use the same polyp video segments, for both positive and negative examples without any additional labeling. We trained the model on 11,240 such segments. EVALUATION : Performance is evaluated on a smaller labeled set containing 444 polyp segment pairs (198 positive and 246 negative). Each pair extracted from the same procedure, and labeled by an experienced endoscopist as either the same or two different polyps. We calculate area under the curve (AUC) of receiver operating characteristic (ROC) over the normalized vector distance for each of the pairs. We had an additional 77 pairs annotated by two experienced endoscopists to evaluate interobserver disagreement rate. CONTACT INFORMATION Yotam Intrator - yotami@verily.com Natalie Aizenberg - nataliea@verily.com Y. Intrator, N. Aizenberg, O. Weinstein, R. Goldenberg