PUC Chile team at Caption Prediction: ResNet visual encoding and caption classification with Parametric ReLU Vicente Castro, Pablo Pino, Denis Parra and Hans Lobel Pontifcia Universidad Católica de Chile, Av. Vicuña Mackena 4860, Macul, 7820244, Chile Abstract This article describes PUC Chile team’s participation in the Caption Prediction task of ImageCLEFmed- ical challenge 2021, which resulted in the team winning this task. We frst show how a very simple approach based on statistical analysis of captions, without relying on images, results in a competitive baseline score. Then, we describe how to improve the performance of this preliminary submission by encoding the medical images with a ResNet CNN, pre-trained on ImageNet and later fne-tuned with the challenge dataset. Afterwards, we use this visual encoding as the input for a multi-label classif- cation approach for caption prediction. We describe in detail our fnal approach, and we conclude by discussing some ideas for future work. Keywords Image Captioning, Medical Artifcial Intelligence, Deep Learning, Perceptual Similarity, Convolutional Neural Networks 1. Introduction ImageCLEF [1] is an initiative with the aim of advancing the feld of image retrieval (IR) as well as enhancing the evaluation of technologies for annotation, indexing and retrieval of visual data. The initiative takes the form of several challenges, and it is especially aware of the changes in the IR feld in recent years, which have brought about tasks requiring the use of diferent types of data such as text, images and other features moving towards multi-modality. ImageCLEF has been running annually since 2003, and since the second version (2004) there are medical images involved in some tasks, such as medical image retrieval. Since those versions, the ImageCLEFmedical challenge group of tasks [2] has integrated new ones involving medical images, with the medical image captioning task taking place since 2017. It consists of two subtasks: concept prediction and caption detection. Although there have been changes in the data used for the newest versions of the challenge, the goal of this task is the same: help physicians reduce the burden of manually translating visual medical information (such as radiology images) into textual descriptions. In particular, the caption prediction task within the ImageCLEFmedical challenge 2021 aims at supporting clinicians in their responsibility to provide clinical diagnoses by composing coherent captions for the entirety of a medical image. CLEF 2021 ś Conference and Labs of the Evaluation Forum, September 21ś24, 2021, Bucharest, Romania vvcastro@uc.cl (V. Castro) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)