Retinal Image Registration Through Simultaneous Camera Pose and Eye Shape Estimation Carlos Hernandez-Matas 1,2 , Xenophon Zabulis 1 and Antonis A. Argyros 1,2 Abstract—In this paper, a retinal image registration method is proposed. The approach utilizes keypoint correspondences and assumes that the human eye has a spherical or ellipsoidal shape. The image registration problem amounts to solving a camera 3D pose estimation problem and, simultaneously, an eye 3D shape estimation problem. The camera pose estimation problem is solved by estimating the relative pose between the views from which the images were acquired. The eye shape estimation problem parameterizes the shape and orientation of an ellipsoidal model for the eye. Experimental evaluation shows 17.91% reduction of registration error and 47.52% reduction of the error standard deviation over state of the art methods. I. INTRODUCTION Assessment of small vessels in vivo can promote the diagnosis and monitor the evolution of diseases that present strong vasculopathy, such as diabetes or hypertension [1]. The eye, and the retina in particular, allows for non- invasive observation of the microvascular circulation via fundoscopy [2]. Image registration can assist greatly in that direction. It aims at warping a test image to the coordinate frame of a reference image, so that corresponding points are imaged at the same locations. For images acquired during the same session, if they present small overlap, it can be utilized for creating mosaics imaging larger areas of the retina [3], [4], [5]. If the overlap is large, the images can be combined to images of higher resolution and deﬁnition [6], [7], [8], promoting more accurate measurements. Images acquired at different sessions allow for longitudinal studies of the retina [9], [10], which enable monitoring disease progression. Besides being a useful clinical tool, retinal image regis- tration is also a challenging problem, as images acquired at different times or from different viewpoints can present illu- mination, color, and contrast changes as well as potentially small overlapping areas. The support of medical diagnoses requires precise measurements. Therefore, the requirements on registration accuracy are very high. II. RELATED WORK Image registration methods utilize the parts of the ob- served scene that are commonly visible in the image pair to be registered. This information extraction is performed either globally or locally or using a mixture of both. Global methods are based on similarity of intensities, with retinal 1 Institute of Computer Science, Foundation for Research and Technology – Hellas (FORTH), Heraklion, Greece. 2 Computer Science Department, University of Crete, Heraklion, Greece. {carlos, zabulis, argyros} at ics.forth.gr registration methods usually relying on mutual informa- tion [11], [12]. Local methods extract information relying on localized features, such as keypoint correspondences [8], [13], [14], [15], [16], [17], vessel trees [18] and bifurca- tions [4], [19], [20], [21]. Recently, hybrid methods are gaining traction [22], [23]. The transformation of the images can be estimated on the basis of either 2D or 3D models. 2D methods do not explicitly account for perspective, but overcome this by utilizing non-linear transformations [11], [13], [14], [23]. These transformations do not account for the shape and size of the eye. 3D models enable metric measurements in 3D that lack perspective distortion. Simple eye models have proved to provide accurate registration [16], [17]. In this work, we propose an accurate and robust retinal image registration method that is local and utilizes a 3D transformation model. The main improvement over [16], [17] is the utilization of an ellipsoidal model whose shape pa- rameters are calculated simultaneously with the pose estimate that enables image registration. Other improvements include the utilization of SIFT [24] keypoints instead of SURF [25] and introduction of a pose estimation initialization. III. METHOD The proposed method (Figure 1) registers the reference (F 0 ) and test (F t ) images by simultaneously estimating the relative pose of the cameras that acquired the images, as well as the 3D shape and 3D orientation of an ellipsoidal eye model. The eye model has semi-axes [a, b, c] and rotations along said semi-axes [r a ,r b ,r c ] leading to surface E . If a static camera is assumed, the pose estimate can be calculated as the pose transformation of the retina between the two frames. The eye model is centered at c s = [0, 0, 0] T .A calibrated camera for F 0 is located at c c = [0, 0, -δ] T . K c and K t are the intrinsic camera matrices for F 0 and F t . Point correspondences between the images are utilized to achieve this registration. An initial pose estimate is calculated utilizing RANSAC and a spherical model. Subsequently, Particle Swarm Optimization is utilized to reﬁne this pose, as well as to estimate the lengths of the semi-axes of the ellipsoidal model and their rotation. Three variants of the eye model are formulated and experimentally validated. A. Eye Models Three models are utilized in this work. Baseline model is spherical, as utilized in our previous works [16], [17]. 978-1-4577-0220-4/16/$31.00 ©2016 IEEE 3247