A deep learning approach for detecting and correcting highlights in endoscopic images Antonio Rodr´ ıguez-S´ anchez 1 , Daly Chea 1 , George Azzopardi 2 and Sebastian Stabinger 1 1 Intelligent and Interactive Systems Department of Computer Science University of Innsbruck, Innsbruck, Austria email: antonio.rodriguez-sanchez@uibk.ac.at 2 Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen Groningen, the Netherlands Abstract— The image of an object changes dramatically de- pending on the lightning conditions surrounding that object. Shadows, reflections and highlights can make the object very difficult to be recognized for an automatic system. Additionally, images used in medical applications, such as endoscopic images and videos contain a large amount of such reflective components. This can pose an extra difficulty for experts to analyze such type of videos and images. It can then be useful to detect - and possibly correct - the locations where those highlights happen. In this work we designed a Convolutional Neural Network for that task. We trained such a network using a dataset that contains groundtruth highlights showing that those reflective elements can be learnt and thus located and extracted. We then used that trained network to localize and correct the highlights in endoscopic images from the El Salvador Atlas Gastrointestinal videos obtaining promising results. Keywords— Image processing theory, Image processing tools, Image processing applications, Template, Typesetting. I. I NTRODUCTION Specular and diffuse reflections present in images can be a nuissance for algorithms dealing with stereo matching, seg- mentation, tracking, object recogntion and other applications. The appearance of a surface can significantly vary in the presence of reflected lights. For those applications, reflections may cover surface details and appear as additional features that are not intrinsic to the object. Highlights can have more serious consequences in cases such as when present in medical images. They can pose a factor in the correct evaluation of an image or video, or at least make more difficult such evaluation from an expert. One such example is cervical cancer screening; i.e. to detect precancerous lesions during digital colposcopy. These images contain reflective components that generally appear as bright spots heavily saturated with white light. Another example is endoscopic examination, where pictures from the inside of the human body are displayed on a computer monitor. They often contain large areas with light reflections. Usually the physician can avoid these highlights by changing the perspective, turning the tip of the endoscope. However, this solution is not effective in case of a camera-in-pill examination because it is not possible to force the pill to move to a better position. Fig. 1: Endoscopic images obtained from the El Salvador Atlas of Gastrointestinal videos. There is a large and interesting amount of work in highlight segmentation and removal in computer vision. Most of it deals with the separation between diffuse and specular reflectance. We can classify them depending on whether they use a polar- ization reflectance model [1], color spaces [2] or segmentation [3], diffusion [4] or a multiview (stereo, motion) approach [5], [6]. Concerning medical images, some algorithms deal just with specular highlights and thus apply intensity thresholding [7], [8], [9]. These thresholding methods use either a fixed range of intensity values or implement a method where these thresholds are adaptive in order to overcome the problem of having to deal with different thresholds. The main problem of these thresholding methods is the over/under-estimation of highlight areas. Another group of methods rely either on averaging 9978-1-5386-1842-4/17/$31.00 c 2017 European Union