Retinal Image Registration Through
Simultaneous Camera Pose and Eye Shape Estimation
Carlos Hernandez-Matas
1,2
, Xenophon Zabulis
1
and Antonis A. Argyros
1,2
Abstract—In this paper, a retinal image registration method
is proposed. The approach utilizes keypoint correspondences
and assumes that the human eye has a spherical or ellipsoidal
shape. The image registration problem amounts to solving a
camera 3D pose estimation problem and, simultaneously, an
eye 3D shape estimation problem. The camera pose estimation
problem is solved by estimating the relative pose between the
views from which the images were acquired. The eye shape
estimation problem parameterizes the shape and orientation of
an ellipsoidal model for the eye. Experimental evaluation shows
17.91% reduction of registration error and 47.52% reduction
of the error standard deviation over state of the art methods.
I. INTRODUCTION
Assessment of small vessels in vivo can promote the
diagnosis and monitor the evolution of diseases that present
strong vasculopathy, such as diabetes or hypertension [1].
The eye, and the retina in particular, allows for non-
invasive observation of the microvascular circulation via
fundoscopy [2].
Image registration can assist greatly in that direction. It
aims at warping a test image to the coordinate frame of a
reference image, so that corresponding points are imaged at
the same locations. For images acquired during the same
session, if they present small overlap, it can be utilized for
creating mosaics imaging larger areas of the retina [3], [4],
[5]. If the overlap is large, the images can be combined
to images of higher resolution and definition [6], [7], [8],
promoting more accurate measurements. Images acquired
at different sessions allow for longitudinal studies of the
retina [9], [10], which enable monitoring disease progression.
Besides being a useful clinical tool, retinal image regis-
tration is also a challenging problem, as images acquired at
different times or from different viewpoints can present illu-
mination, color, and contrast changes as well as potentially
small overlapping areas. The support of medical diagnoses
requires precise measurements. Therefore, the requirements
on registration accuracy are very high.
II. RELATED WORK
Image registration methods utilize the parts of the ob-
served scene that are commonly visible in the image pair
to be registered. This information extraction is performed
either globally or locally or using a mixture of both. Global
methods are based on similarity of intensities, with retinal
1
Institute of Computer Science, Foundation for Research and Technology
– Hellas (FORTH), Heraklion, Greece.
2
Computer Science Department, University of Crete, Heraklion, Greece.
{carlos, zabulis, argyros} at ics.forth.gr
registration methods usually relying on mutual informa-
tion [11], [12]. Local methods extract information relying
on localized features, such as keypoint correspondences [8],
[13], [14], [15], [16], [17], vessel trees [18] and bifurca-
tions [4], [19], [20], [21]. Recently, hybrid methods are
gaining traction [22], [23].
The transformation of the images can be estimated on
the basis of either 2D or 3D models. 2D methods do not
explicitly account for perspective, but overcome this by
utilizing non-linear transformations [11], [13], [14], [23].
These transformations do not account for the shape and size
of the eye. 3D models enable metric measurements in 3D that
lack perspective distortion. Simple eye models have proved
to provide accurate registration [16], [17].
In this work, we propose an accurate and robust retinal
image registration method that is local and utilizes a 3D
transformation model. The main improvement over [16],
[17] is the utilization of an ellipsoidal model whose shape pa-
rameters are calculated simultaneously with the pose estimate
that enables image registration. Other improvements include
the utilization of SIFT [24] keypoints instead of SURF [25]
and introduction of a pose estimation initialization.
III. METHOD
The proposed method (Figure 1) registers the reference
(F
0
) and test (F
t
) images by simultaneously estimating the
relative pose of the cameras that acquired the images, as well
as the 3D shape and 3D orientation of an ellipsoidal eye
model. The eye model has semi-axes [a, b, c] and rotations
along said semi-axes [r
a
,r
b
,r
c
] leading to surface E . If a
static camera is assumed, the pose estimate can be calculated
as the pose transformation of the retina between the two
frames. The eye model is centered at c
s
= [0, 0, 0]
T
.A
calibrated camera for F
0
is located at c
c
= [0, 0, -δ]
T
. K
c
and K
t
are the intrinsic camera matrices for F
0
and F
t
.
Point correspondences between the images are utilized to
achieve this registration. An initial pose estimate is calculated
utilizing RANSAC and a spherical model. Subsequently,
Particle Swarm Optimization is utilized to refine this pose,
as well as to estimate the lengths of the semi-axes of the
ellipsoidal model and their rotation. Three variants of the
eye model are formulated and experimentally validated.
A. Eye Models
Three models are utilized in this work. Baseline model is
spherical, as utilized in our previous works [16], [17].
978-1-4577-0220-4/16/$31.00 ©2016 IEEE 3247