Retinal Image Registration Based on Keypoint Correspondences, Spherical Eye Modeling and Camera Pose Estimation Carlos Hernandez-Matas 1,2 , Xenophon Zabulis 1 and Antonis A. Argyros 1,2 Abstract— In this work, an image registration method for two retinal images is proposed. The proposed method utilizes keypoint correspondences and assumes a spherical model of the eye. Image registration is treated as a pose estimation problem, which requires estimation of the rigid transformation that relates the two images. Using this estimate, one image can be warped so that it is registered to the coordinate frame of the other. Experimental evaluation shows improved accuracy over state-of-the-art approaches as well as robustness to noise and spurious keypoint correspondences. Experiments also indicate the method’s applicability to diagnostic image enhancement and comparative analysis of images from different examinations. I. INTRODUCTION Small vessel structure and function assessment can lead to more accurate and timely diagnosis of diseases whose common denominator is vasculopathy, i.e. hypertension and diabetes [1]. Small vessels exist in all internal and external organs. Of them, the retina provides an open and accessible window for assessing their condition. Retinal vessels are imaged through fundoscopy, an efﬁcient and non-invasive imaging technique that is suitable for screening. Accurate image registration is of interest in the comparison of images from different examinations [2] and in the combination of multiple images into larger [3] or enhanced [4] ones. Image registration has been employed frequently on slightly overlapping images of the same examination, to create mosaic images of large tissue areas, i.e. [3]. Small overlap increases examination efﬁciency, but increases reg- istration difﬁculty as it is based on less data. This difﬁcultly is tackled by strong registration cues, such as keypoint correspondences, i.e. [5]. Not frequently, image registration has been employed to register images of (approximately) the same retinal region. Motivation is twofold. First, to combine images from the same examination into an image of higher resolution, facil- itating more precise measurements, [6], [7], [4]. Second, to register images from different examinations and compara- tively analyze them [2], [8]. In this work, the image registration problem refers to a pair of images, the reference and the test image. Its solution is the aligning transformation that warps the test image so that the same physical points are imaged in the same pixel coordinates as in the reference image. Henceforth, image This research was made possible by a Marie Curie grant from the European Commission in the framework of the REVAMMAD ITN (Initial Training Research network), Project number 316990. 1 Institute of Computer Science, Foundation for Research and Technology – Hellas (FORTH), Heraklion, Greece. 2 Computer Science Department, University of Crete, Heraklion, Greece. {carlos, zabulis, argyros} at ics.forth.gr registration methods which provide a solution by means of transformation(s) upon the image plane are characterized as “2D”, while methods which account for the retina as a surface imaged from different views as “3D”. The proposed method focuses on the cue to image registra- tion due to keypoint correspondences. The additional value of other cues is acknowledged; i.e. edge, bifurcation matching. The proposed framework is open to additional cues and their adoption is left for future work. II. RELATED WORK For retinal image registration, overlapping image regions have been matched using similarity of intensities over spatial regions [9] or the frequency domain [10], keypoint feature correspondences [5], retinal feature matching i.e. vessel trees [11], bifurcations [12]. Feature-based approaches are pre- ferred in 3D approaches, as point correspondences comprise a relatively stronger cue for estimating the motion between two images and, also, are robust to local image differences. Retinal image registration has been studied using 2D and 3D transformation models. 2D models do not explicitly account for perspectivity [11], though some [12] employ non- linear transformations for this purpose. 3D models account for perspectivity, but require the shape of the imaged surface. Consideration of perspectivity improves image registration. Even simple surface models, as a planar patch, were shown to promote registration accuracy [4]. At the other end, in [5], the retinal surface is reconstructed to achieve registration. However it requires a stereo reconstruction of the retina, which for signiﬁcantly overlapping images is inaccurate due to the very short baseline. Fundus imaging has been modeled by the pinhole camera model [5]. Usually lens distortion has been judged as negli- gible, due to the ﬁne optics of fundus cameras. Visual dis- tortions due to the cornea, the eye lens, the vitreous humor, as well as pulsation, have been approximated as negligible. We also follow these approximations, acknowledging that compensating for pertinent distortions, would increase the accuracy of the proposed method. The proposed method utilizes a 3D cost optimization method that is robust to correspondence errors and copes with local minima. Efﬁciency is supported by a parallel implementation and evaluation shows improved performance with respect to state-of-the-art. The method is open to the addition of more visual cues (i.e. due to edges, intensity). III. METHOD The proposed method estimates the rigid transformation {R, t} that relates the reference (F 0 ) and the test (F r ) image, 978-1-4244-9270-1/15/$31.00 ©2015 IEEE 5650