Visual servoing from robust direct color image registration Geraldo Silveira and Ezio Malis Abstract—To date, there exist only few works on the use of color images for visual servoing. Perhaps, this is due to the difﬁculties usually found to cope with illumination changes in these images. This paper presents new parametric models and optimization methods for robustly and directly registering color images. Direct methods refer to those that exploit the pixel intensities, without resorting to image features. We then show how a robust and generic visual servoing scheme can be constructed using the obtained optimal parameters. The proposed models ensure robustness to arbitrary illumination changes in color images, do not require prior knowledge (in- cluding the spectral ones) of the object, illuminants or camera, and naturally encompass gray-level images. Furthermore, the exploitation of all information within the images, even from areas where no features exist, allow the algorithm to achieve high levels of accuracy. Various results are reported to show that visual servoing can indeed be highly accurate and robust despite unknown objects and unknown imaging conditions. I. INTRODUCTION Visual tracking of an object of interest can be formulated as an image registration problem. Image registration consists in estimating the transformations that best align a reference image to a second one. Generally, they can be classiﬁed into feature-based methods or direct methods [1]. Feature- based methods require extracting and matching a set of features (e.g., points, lines) from the two images. Since they may afford relatively larger displacements of the object in the ﬁeld-of-view, feature-based methods are suitable when the two images are taken under disparate viewpoints. In turn, direct methods exploit the pixel intensities without having to rely on image features. They can then be highly accurate mainly owing to the exploitation of all possible image information, even from areas where no features exist. On the other hand, direct methods assume that the two images of the object have a sufﬁcient overlapping [2]. Since this paper considers real-time vision-based robot control [3], we can suppose that the frame rate is sufﬁciently high such that only relatively small inter-frame displacements of the object are observed. Moreover, high accuracy is often needed for robot positioning applications. Thus, we focus in this article on direct registration methods of color images and their integration in visual servoing schemes, e.g., [4]. Note however that the parameters estimated by image registration methods can in fact be used in a variety of visual servoing techniques, e.g., [5]. Geraldo Silveira is with CTI Renato Archer – Division DRVC, Rod. Dom Pedro I, km 143,6, Amarais, CEP 13069-901, Campinas/SP, Brazil, Geraldo.Silveira@cti.gov.br Ezio Malis is with INRIA Sophia-Antipolis – Project ARobAS, 2004 Route des Lucioles, BP 93, 06902 Sophia-Antipolis Cedex, France, Ezio.Malis@sophia.inria.fr (a) (b) Fig. 1. (a) Original color image and (b) after its conversion to gray-scale. Almost all information has been lost in this example. This illustrates the need to work with the color image directly. Please print in color so as to see how rich the original image is! To our knowledge, only few techniques on the use of color images in a visual servoing scheme have been proposed to date. Perhaps, this is due to the difﬁculties usually found to adequately cope with illumination changes in color images. Another possible reason is that one may think that the use of color images do not contribute so much to the ﬁnal precision of the servoing. This is not always true, and extreme cases exist where all visual information is lost when gray-scale cameras are used (see Fig. 1). Even if this is an unlikely situation in practice, we can conjecture that in many cases color cameras provide much richer information than their gray-scale counterparts. Therefore, their application should be studied in more depth. Color cameras, like the human eye, are generally (but not always) trichromatic. In this case each pixel of a color image is a three-vector, one component per sensor channel. An active research topic concerns color constancy, which seeks illuminant-invariant color descriptors. A closely related problem is to ﬁnd illuminant-invariant relationships between color vectors. Given two images of a Mondrian world 1 under speciﬁc conditions, 2 the results presented in [6] claim that a multiplication of each tristimulus value (in an appropriate ba- sis) by a scale factor is sufﬁcient to support color constancy in practice. This framework has been exploited in color- based point tracking, e.g., [7], and has also been applied in [8] to the control of a pan-tilt (i.e., 2 dofs) by ﬁnding the centroid of a red object. An effective technique also to ﬁnd the centroid of an object in color images is through mean-shift [9]. However, these methods are not enough to our purposes since we are interested in accurately and robustly controlling all 6 dofs of a robot end-effector. In this paper, we propose new models and methods to overcome the limitations of both the Mondrian world 1 and 1 A Mondrian is a planar surface composed of only Lambertian patches, and is after Piet Mondrian (1872-1944) whose paintings are similar. 2 For example, the light that strikes the surface has to be of uniform intensity and spectrally unchanging, no inter-reﬂections, etc. The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems October 11-15, 2009 St. Louis, USA 978-1-4244-3804-4/09/$25.00 ©2009 IEEE 5450