Learn to Track Edges Yanghai Tsin Yakup Genc Ying Zhu Visvanathan Ramesh Real-Time Vision and Modeling Department Siemens Corporate Research, Princeton, NJ, 08540, USA Abstract Reliability of a model-based edge tracker critically de- pends on its ability to establish correct correspondences between points on the model edges and edge pixels in an image. This is a non-trivial problem especially in the pres- ence of large inter-frame motions and in cluttered environ- ments. We propose an online learning approach to solving this problem. An edge pixel is represented by a descriptor composed of a small segment of intensity patterns. From training examples the algorithm utilizes the randomized for- est model to learn a posteriori distribution of correspon- dence given the descriptor. In a new frame, the edge pixels are classiﬁed using maximum a posteriori (MAP) estima- tion. The proposed method is very powerful and it enables us to apply the proposed tracker to many previously impos- sible scenarios with unprecedented robustness. 1. Introduction The goal of this work is to design a precise and ro- bust model-based edge tracker. Typical applications of the tracker include robot localization, robotic assembly and augmented reality. Edge tracking is an important technique in these areas largely because it is view-independent and it is resilient to moderate appearance changes. In some cases it is the only viable choice when there are no strong cor- ner features in a scene. Edge tracking has been studied intensively since the inception of the computer vision re- search. However, a reliable tracker is still unavailable and researchers still deem edge trackers “brittle” [24]. They usually cannot survive the harsh conditions in real appli- cations where clutter, large appearance change and fast mo- tions are ubiquitous. The main difﬁculty in edge tracking is to establish cor- respondences between points on the model edges and the edge pixels in an image. To fully appreciate the difﬁculty involved, we give an example in Figure 1. We would like to direct the reader’s attention to regions A and B marked on the top image and their corresponding areas in the cen- A B Figure 1. Top: input image with the model projection superim- posed as red lines. Middle: Canny edge detector results (Matlab implementation). Bottom: the model-associated edge pixels clas- siﬁed using the proposed approach. ter one. Region A contains complex structures. Many ir- relevant edge pixels are detected by an edge detector [6]. Identifying true correspondences becomes error-prone sim- ply because there are too many possible candidates. The case is opposite for the line marked by B. The edge pixels corresponding to the model are missing. Almost all tracking failures can be traced back to false correspondences caused by the above two cases, clutter or missing edge pixels. In dynamic environments, the problem becomes even harder due to many factors, e.g., lighting change, motion blur, scale variation and camera gain control. Difﬁculties involved in binary edge detection can be par- tially alleviated by using intensity gradient [11]. Inten- sity gradient incorporates more context than a binary edge map. However, in cluttered environments these methods en- counter similar complexity and they usually converge to a local minimum. 1 978-1-4244-1631-8/07/$25.00 ©2007 IEEE