IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, SC, June 2000 1 Object Recognition for an Intelligent Room Richard Campbell John Krumm Department of Electrical Engineering Microsoft Research The Ohio State University Microsoft Corporation Columbus, OH 43210 Redmond, WA 98052 campbelr@ee.eng.ohio-state.edu jckrumm@microsoft.com Abstract Intelligent rooms equipped with video cameras can exhibit compelling behaviors, many of which depend on object recognition. Unfortunately, object recognition algorithms are rarely written with a normal consumer in mind, leading to programs that would be impractical to use for a typical person. These impracticalities include speed of execution, elaborate training rituals, and setting adjustable parameters. We present an algorithm that can be trained with only a few images of the object, that requires only two parameters to be set, and that runs at 0.7 Hz on a normal PC with a normal color camera. The algorithm represents an object’s features as small, quantized edge templates, and it represents the object’s geometry with “Hough kernels”. The Hough kernels implement a variant of the generalized Hough transform using simple, 2D image correlation. The algorithm also uses color information to eliminate parts of the image from consideration. We give our results in terms of ROC curves for recognizing a computer keyboard with partial occlusion and background clutter. Even with two hands occluding the keyboard, the detection rate is 0.885 with a false alarm rate of 0.03. 1. Object Recognition for an Intelligent Room This paper introduces a new object recognition algorithm that is especially suited for finding everyday objects in an intelligent environment monitored by color video cameras. For a typical room in a home or office building, we could base the following applications on object recognition: • Customize a device’s behavior based on location. A keyboard near a computer monitor should direct its input to the applications on that monitor. A keyboard in the hands of a particular user should direct its input to that user’s applications, and it should invoke that user’s preferences (e.g. repeat rate on keys). • Find lost objects in a room like a television remote control. • Infer actions and intents from which objects are being used. A user picking up a book probably wants to read, and the lights and music should be adjusted appropriately. Besides the usual requirements for being robust to background clutter and partial occlusion, these applications share a need for moderate speed on cheap hardware if they are ever to be included in a consumer product. The main elements of the algorithm we develop – color lookup, edge detection, vector quantization, and image correlation – are standard image processing tasks that can run quickly on a normal PC. For recognizing a single object like a computer keyboard, our program runs at 0.7 Hz on a 500 MHz PC. We are also sensitive to the need for a simple training procedure if this algorithm is ever to become a consumer product. While many object recognition routines require several carefully controlled training views of the object (e.g. on a motorized turntable) or detailed, manual feature selection, our algorithm only requires the user to outline the object of interest in one image for every unique face of the object. The training images for different poses are generated synthetically from the actual images. In addition, we cannot expect a normal user to set several parameters for the program. Our program requires the setting of only two parameters, a sensitivity level for eliminating certain pixels from consideration based on their color and a detection threshold. Figure 1: Our algorithm locates common objects that can be useful for interacting with intelligent rooms.