1 FPGA-based Real time Extraction of visual features Merwan BIREM, Franc ¸ois BERRY LAboratoire des Sciences et Mat´ eriaux pour l’Electronique et l’Automatique UMR 6602 CNRS - Universite Blaise Pascal 24, Avenue des Landais, 63177 Aubiere Cedex, France firstname.lastname@univ-bpclermont.fr Abstract— In these last years a new category of sensor is appearing. It has the name of intelligent sensor or smart sensor. This paper focuses on smart camera used in the ﬁeld of robotics and more precisely in the extraction of features. Object tracking, autonomous navigation or 3D mapping are some application examples. The feature chosen is, the interest points that can be detected by several algorithms. Among these algorithms we have the Harris & Stephen that has a simple principle based on the calculations of multiples derivatives and who gives acceptable results. The smart camera has beside the camera an FPGA, the detection of the interest points algorithm’s will be implemented on this FPGA. Other modules were added to the system to deliver more appropriate results. Index Terms— Image Processing - Point of interest - Harris & Stephen - Field-programmable gate array (FPGA) - VHDL - Hardware Implementation. I. I NTRODUCTION The term ”feature extraction” is used to describe the com- bination of a interest point detection, combined to a feature extraction. This extraction aims to select the most interesting detections and rejects the others. Detectors are used to ﬁnd interesting points in an image and extraction process pro- vides visual features (corner, junction,...) with a semantic. An overview of the state of the art in feature extractors can be found in [?]. A visual feature is an higher-level interest point which corresponds to a local bi-dimensional important change in the intensity function of the image. Interest points are low level features. The signal at these points contains more information comparing to points from edges or ﬂat region [13]. As an example, these points can be the corners (in L form with an obtuse angle) and junctions (in T or Y form) 1. Fig. 1. Different types of interest points: (a) junction in “L”, (b) junction in “V”, (c) junction in “T”, (d) junction in “Y”, (e) junction in “X” and (f) junction in “draughtboard” These features are used in multiples applications like stereo- scopic vision, 3D reconstruction and object tracking. and offer many advantages such as : 1) A more reliable source of information than the edges, 2) Robust to occlusions, 3) No need to chaining operation, 4) Present in a vast majority of images unlike edges, 5) Stable if the image undergoes a simple transformation (rotation, translation). In this paper, we focus our work on a ”Harris and Stephen”- based detector performing a real time visual features ex- traction. Our main contribution consists in proposing several efﬁcient modules in order to select and sort the features in real time. The literature proposes many detectors to extract features from an image. These detectors differs in the method used to extract the feature which implies a difference in the algorithm complexity, processing times and resources amount needed. Among those detectors, Harris & Stephen [3] is probably the most used method to extract features of type ”interest points”. Generally, detection of interest points is done by estimating for each pixel an interest value ”R”. If the interest value is higher than a predeﬁned threshold ”T ”, the pixel is considered as an interest point. In Harris & Stephen, the interest value for each pixel is calculated using the following formula : R = Det(M ) - k * T race(M ) 2 (1) where : k ∈ [0.04, 0.06] Det(M )= AB - C 2 et T race(M )= A + B with :  • A =( δI δx ) 2 ⊗ w, • B =( δI δy ) 2 ⊗ w, • C =( δI δx δI δy ) ⊗ w. (2) where I is a window 3 × 3 containing the pixel that we want to know if it is an interest point and w is a gaussian ﬁlter 3 × 3. As it appears in 2, Harris & Stephen detector is based on calculation of multiple ﬁrst derivative along “x” and “y”. Most of these operations are SIMD and therefore highly parallelizable. A major advantage of this detector is that the result is not affected if the image rotates, moves or has a slight change in light intensity. Obviously, this processing is costly in terms of compu- tational load when the image size grows, specially when it implements on a CPU. In order to improve the treatment time, a classical way consists in using an FGPA which is particularly suitable to the SIMD processing. Previous works have been done like [7], [9], [10] or [11]. In these works, the extraction is based on a non-maximum suppression method. In terms of hardware considerations, this method has several disadvantages such as the use of more memories and FIFO pile. In addition, this method induces latency due to the buffering of three lines. Another important aspect is that