IEEE TRANSACTIONS ON IMAGE PROCESSING VOL. XX, NO. X, MONTH YEAR 1 Afﬁne Covariant Features for Fisheye Distortion Local Modelling Antonino Furnari, Giovanni Maria Farinella, Member, IEEE, Arcangelo Ranieri Bruna, and Sebastiano Battiato, Senior Member, IEEE Abstract—Perspective cameras are the most popular imaging sensors used in Computer Vision. However, many application ﬁelds including automotive, surveillance and robotics, require the use of wide angle cameras (e.g., ﬁsheye), which allow to acquire a larger portion of the scene using a single device at the cost of the introduction of noticeable radial distortion in the images. Afﬁne covariant feature detectors have proven successful in a variety of Computer Vision applications includ- ing object recognition, image registration and visual search. Moreover, their robustness to a series of variabilities related to both the scene and the image acquisition process has been thoroughly studied in the literature. In this paper, we investigate their effectiveness on ﬁsheye images providing both theoretical and experimental analyses. As theoretical outcome, we show that the inherently non-linear radial distortion can be locally approximated by linear functions with a reasonably small error. The experimental analysis builds on Mikolajczyk’s benchmark to assess the robustness of three popular afﬁne region detectors (i.e., Maximally Stable Extremal Regions (MSER), Harris and Hessian afﬁne region detectors), with respect to different variabilities as well as to radial distortion. To support the evaluations, we rely on the Oxford dataset and introduce a novel benchmark dataset comprising 50 images depicting different scene categories. Experiments are carried out on rectilinear images to which radial distortion is artiﬁcially added, and on real-world images acquired using ﬁsheye lenses. Our analysis points out that afﬁne region detectors can be effectively employed directly on ﬁsheye images and that the radial distortion is locally modelled as an additional afﬁne variability. Index Terms—ﬁsheye distortion, afﬁne region detectors, omni- directional vision, division model I. I NTRODUCTION AND MOTIVATIONS C OMPUTER Vision algorithms are usually designed to work on images acquired using perspective cameras. The adherence to the perspective camera model ensures that straight lines in the real world are mapped to straight lines in the image, which produces a representation of the scene coherent with our perception [1]. However, many application ﬁelds such as automotive, surveillance and robotics [2], [3], [4], [5], [6], [7], require the use of wide angle cameras, which are characterized by a wide Field Of View (FOV) and are able to acquire a large part of the scene using a single device. Fig. 1 shows some examples of wide angle images, along with their Antonino Furnari, Giovanni Maria Farinella and Sebastiano Battiato are with the University of Catania, Department of Mathematics and Computer Science, Catania, 95125, Italy (e-mail: furnari@dmi.unict.it, gfarinella@dmi.unict.it, battiato@dmi.unict.it) Arcangelo Ranieri Bruna is with STMicroelectronics, Advanced Sys- tem Technology - Computer Vision, Catania 95121, Italy (e-mail: arcan- gelo.bruna@st.com) Manuscript received Month day, year; revised Month day, year. (a) rectilinear (b) full frame (c) full circle Fig. 1. Examples of perspective (a), full frame (b) and full circle (c) images. The two ﬁsheye images are obtained by artiﬁcially adding different amounts of radial distortion to the rectilinear image (a). perspective counterpart. Unfortunately the perspective camera model cannot be efﬁciently used to model the image formation process of wide angle cameras which hence require different projection models with the consequent introduction of inherent radial distortion [1], [2], [8]. Wide angle cameras can be built following two main designs: dioptric [1], [8] and catadioptric [2], [9], [10]. In particular, we consider the dioptric systems, which are built substituting the regular lens of a perspective camera with a ﬁsheye lens able to divert the light rays on the sensor in order to achieve the desired wide Field Of View. As discussed by Miyamoto [8], the distortion introduced by such cameras should not be considered as an aberration but as the result of the projection of an hemisphere on a ﬁnite plane. The most straightforward approach to deal with wide angle images consists in explicitly removing the distortion through a rectiﬁcation process. Such process however can be computationally expensive, especially in embedded settings, since it requires interpolation to account for the spatially non-uniform sampling performed by the wide angle sensor. Moreover, the interpolation process introduces artefacts in the image which can affect the feature extraction process [11]. Additionally, in order to perform the rectiﬁcation process, the camera needs to be calibrated so that a mapping between the distorted points and their positions in the ideal (rectilinear) image plane can be established. Some calibration techniques require a special pattern to be present in the scene [12], [13] while others [14], [15] just require a few images of the scene and no other information. However, even when the camera can be easily calibrated, it would be advantageous to be able to work directly on the distorted images to avoid the rectiﬁcation process and get rid of the artefacts due to the interpolation. Many efforts in the context of wide angle calibrated cameras already exist: the authors of [16], [17] studied how to compute the scale space of omnidirectional images, in [3], [11], [18] the Scale Invariant Feature Transform (SIFT) pipeline [19] is modiﬁed in order to be used directly on wide angle images; in [20] scale invariant features are derived