Shape Context and Chamfer Matching in Cluttered Scenes A. Thayananthan B. Stenger P. H. S. Torr R. Cipolla University of Cambridge Microsoft Research Ltd. Department of Engineering 7 JJ Thompson Avenue Cambridge, CB2 1PZ, UK Cambridge, CB3 OFB, UK at315|bdrs2|cipolla @eng.cam.ac.uk philtorr@microsoft.com Abstract This paper compares two methods for object localization from contours: shape context and chamfer matching of tem- plates. In the light of our experiments, we suggest improve- ments to the shape context: Shape contexts are used to find corresponding features between model and image. In real images it is shown that the shape context is highly influ- enced by clutter, furthermore even when the object is cor- rectly localized, the feature correspondence may be poor. We show that the robustness of shape matching can be in- creased by including a figural continuity constraint. The combined shape and continuity cost is minimized using the Viterbi algorithm on features sequentially around the contour, resulting in improved localization and correspon- dence. Our algorithm can be generally applied to any fea- ture based shape matching method. Chamfer matching correlates model templates with the distance transform of the edge image. This can be done efficiently using a coarse-to-fine search over the transfor- mation parameters. The method is robust in clutter, how- ever multiple templates are needed to handle scale, rotation and shape variation. We compare both methods for locat- ing hand shapes in cluttered images, and applied to word recognition in EZ-Gimpy images. 1. Introduction People use multiple visual cues to recognize objects, such as object color, texture and shape. In the absence of color and texture information, we can mostly still recognize ob- jects by their geometry alone, for example in line drawings. Grouping low level features to segment the object is by itself a hard problem. A common approach is, therefore, to use a prototype shape, and search for it in the image. This leads to the task of shape matching, which has numerous appli- cations, such as object localization, image retrieval, model registration, and tracking. One way to represent a shape is by a set number of feature points, for example Canny edges. In order to match two shapes, point correspondences on the two shapes have to be established. Subsequently a transformation which aligns the two shapes can be found. The type of transformation depends on the particular set- ting. Two examples are 2D affine transforms, and non- rigid thin-plate spline transformations. The two problems of finding correspondences and estimating the transforma- tion are tightly coupled: The better the correspondences are known, the better the transformation can be estimated, and vice versa. Therefore, many methods are based on an it- erated two-step algorithm, alternating estimation of corre- spondence and transformation. In the next section, we review existing work on shape based and chamfer matching. The two methods are ex- plained briefly in section 2, and we outline some of the problems that arise when applied to scenes with cluttered background in section 3. In section 4 we show how shape context matching can be significantly improved by using a continuity constraint. The dynamic programming algo- rithm used for optimization readily generalizes to any other type of feature. Section 5 shows experimental results on two types of data, images of hands, and words on textured background. 1.1. Previous Work Belongie et al. [3] have introduced the shape context de- scriptor, which characterizes a particular point location on the shape. This descriptor is the histogram of the relative polar coordinates of all other points. Corresponding points on two different shapes have a similar relative position in each shape, and will ideally have a similar shape context. Shape context matching has been applied to a variety of ob- ject recognition problems [3, 13]. The background clutter in these applications was usually limited. Sullivan and Carlsson [17] use a topology-based shape descriptor to find correspondences. The topological type of all combinations of four points is recorded in a voting matrix, and one-to-one correspondences are found using a greedy algorithm. The examples shown did not contain significant clutter. While their topological descriptor has 1