EMERGING SENSOR NETWORK APPLICATIONS, NOVEMBER 2010 1 Embedded Imagers: Detecting, Localizing and Recognizing Objects and Events in Natural Habitats Teresa Ko, Josh Hyman, Eric Graham, Mark Hansen, Stefano Soatto, Deborah Estrin Abstract—Imaging sensors, or “imagers,” embedded in the natural environment enable remote collection of large quantities of data, thus easing the design and deployment of sensing systems in a variety of application domains. Yet, the data collected from such imagers is difficult to interpret due to a variety of “nuisance factors” in the data formation process, such as illumination, vantage point, partial occlusions, etc. These are especially severe in natural environments, where the objects of interest (e.g., plants, animals) have evolved to blend with their habitat, exhibit complex variability in shape and appearance, perform rapid motions against dynamic backgrounds with rapid illumination changes. We describe three applications that exemplify these problems and the solutions we developed. First, we show how temporal over-sampling can simplify the analysis of a slow process such as the avian nesting cycle. Then, we show how to overcome temporal under-sampling in order to detect birds at a feeder station. Finally, we show how to exploit temporal consistency to reliably detect pollinators as they visit flowers in the field. Index Terms— I. I NTRODUCTION Imaging sensors, either in the visible, infrared or other spectra (“imagers”) are a natural choice of sensor for monitoring natural habitats. They are cheap, both in terms of cost and energy, yet data-rich. They are remote - not requiring contact, and passive - not requiring signals to be broadcast. At the same time, they may be tuned to be sensitive to different bands, most commonly the visible and near-infrared spectra. However, natural habitats present unique challenges to image analysis as commonly performed using algorithms developed in the Computer Vision community, due to the variability that objects or events of interest can manifest depending on “nuisance factors” such as uncontrolled changes in illumination, or poor vantage points resulting in partial occlusions. For instance, events of interest exhibit complex dynamics such as the motion of birds or the configuration of a swarm of pollinators, but so do nuisances such as complex illumination changes or moving foliage in the background. As a result, sensing the environment with imagers requires modeling the complex spatio-temporal statistics of the objects and events of interest, as well as the nuisances, for they often overlap due to natural adaptation of species to their habitats (e.g., cryptic coloration matching a background). Unlike indoor or urban envi- ronments, where one can assume a static background, monitoring T. Ko, J. Hyman, E. Graham, M. Hansen, S. Soatto, and D. Estrin are with the Center of Embedded Networked Sensors, Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90032 USA e-mail: {tko,jhyman}@cs.ucla.edu, egraham@ucla.edu, cocteau@stat.ucla.edu, {soatto,destrin}@cs.ucla.edu. natural environments requires modeling the distributional prop- erties of portions of images (natural textures) and their temporal evolution, and learning the natural statistics from training data. For instance, detecting the presence of a bird at a feeder station from an image collected by an embedded imaging sensor can be difficult even for a trained expert. However, extended temporal observation reveals the characteristic variabilities of the object and enables successful detection, localization and species recognition. Different species can exhibit different appearances depending on their pose and patterns of typical motion. These in turn differ from the characteristic background motion (e.g., foliage moving in wind). Embedded sensing system technologies are readily applicable to the visual monitoring of the natural environment. Data can now be collected, processed, stored, and transmitted from remote locations with little setup due to advances in low-power micro- processors, wireless communication, battery form factor, and the software abstractions that support these devices. The design constraints of environmental monitoring present a tradeoff between spatio-temporal coverage and image quality. We can leverage the specifics of a deployed system to achieve a better design point by tailoring the specific computer vision techniques used. The underlying design philosophy though presents a conflict between increasing sensor coverage, either in space (i.e., physical space visible from the camera) or time (i.e., frame rate of image capture), and degrading data quality. The images collected from monitoring applications tend to be the poorest quality that still accomplish the task at hand. On the other hand, deployed sensing systems are not general, and therefore do not need to solve the general vision problems of detection, recognition, and tracking. It is thus possible to simplify the vision problem if the system is engineered so. Finally, to aggregate the information abstracted from the raw data at each imaging sensor, parsimonious representations of these processes are needed. Depending on the task, this could include storage, transmission, relation to human observers, or performing in-situ decisions such as the triggering of other sensory or communica- tion assets. We explore these issues in three different application domains. First, we describe algorithms for recognizing objects and events based on extended observations of spatial and temporal statistics. We start with the automated inference of the nesting cycle of birds, in Section II. Tens of thousands of images were collected over the course of the avian nesting season. Applying estab- lished vision techniques and tailoring them to take advantage of the conditions of deployment resulted in a sufficiently accurate