A Hybrid Object-Level/Pixel-Level Framework For Shape-based Recognition Owen Carmichael and Martial Hebert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 Abstract This paper presents a technique for shape-based recognition that fuses pixel- level and object-level approaches into a unified framework. A pixel-level algorithm classifies individual pixels as belonging to a target object or clutter based on automatically-selected shape features computed in a spatial arrange- ment around them; an object-level algorithm classifies object-sized rectangu- lar image regions as objects or clutter by aggregating pixel classifier scores in the regions. We train a cascade of interleaved pixel-level and object- level modules to quickly localize complex-shaped objects in highly cluttered scenes under arbitrary out-of-image-plane rotation. Experimental results on a large set of real, highly-cluttered images of a common object under arbi- trary out of image plane rotation demonstrate improvements over cascades of strictly pixel-level modules. 1 Introduction Object recognition algorithms have made great strides in recent years, leading to tech- niques capable of robust, real-time recognition of certain types of objects such as faces, cars, and buildings [23][19]. However, recognizing objects with complex shape charac- teristics such as holes and networks of thin linear structures (e.g. the legs and supports on the stool and ladder in Figure 1(a)) remains challenging. In this paper, we present an effi- cient technique for using example images of a particular complex-shaped object in typical environments to automatically select shape features and train a classifier cascade to local- ize that object in highly cluttered novel views under arbitrary out-of-image-plane rotation 1 . Figure 1(a) shows two typical results. Recently, several pixel-level algorithms have successfully addressed the problem of using local shape features to estimate whether im- age pixels correspond to an instance of a target object, or to clutter [4][14][1]. Like local patch-based recognition techniques (e.g., [17]), they need to apply a separate object-level 1 We note that it is possible to extend our technique to handle object scale variations either by processing the same image repeatedly at a variety of scales [19] or by rectifying image features to a canonical scale [12][8]. We also note that we focus on detecting a single, individual instance of a wiry object across a broad variety of viewing conditions because this problem is extremely challenging and largely unsolved; we are confident that solutions to the more general problem of detecting entire classes of wiry objects will build on advances made toward detecting individual wiry objects BMVC 2004 doi:10.5244/C.18.99