SenseTK: A Multimodal, Multimedia Sensor Networking Toolkit Phillip Sitbon, Wu-Chi Feng, Nirupama Bulusu, Thanh Dang {sitbon, wuchi, nbulusu, dangtx}@cs.pdx.edu ABSTRACT This paper describes the design and implementation of a multi-modal, multimedia capable sensor networking framework called SenseTK. SenseTK allows application writers to easily construct multi-modal, multimedia sensor networks that include both traditional scalar-based sensors as well as sensors capable of recording sound and video. The distinguishing features of such systems include the need to push application processing deep within the sensor network, the need to bridge extremely low power and low computation devices, and the need to distribute and manage such systems. This paper describes the design and implementation of SenseTK and provides several diverse examples to show the flexibility and unique aspects of SenseTK. Finally, we experimentally measure several aspects of SenseTK. Keywords: Video sensing, video applications, video adaptation. 1. INTRODUCTION Over the last half decade, sensor networking technologies have deployed and demonstrated for a range of scientific, industry, and military applications. For example, researchers at Intel have demonstrated a sensor system to do predictive maintenance on ship vessels as well as semiconductor plants [12]. As small computing devices continue to be developed, the ability to integrate more complex data types such as audio, images, and video are becoming possible. Multimodal and multimedia-based sensor networks are defined by a diversity of sensing modes and data types. Multimodal sensing can involve both traditional scalar-based sensor technologies and more complex computing devices such as a video sensor. Multimodal also can imply the integration of mobile components such as robots into a static scalar-based sensor network. Multimedia sensor networks can combine audio, images, and video to provide sensing. Several examples have been demonstrated in the past such as the Great Duck Island experiment (which used images and scalar sensors), cane toad monitoring in Australia (which used audio signal processing to distinguish types of toads), and Panoptes (which used video on low-power sensor platforms). There are a number of unique aspects of such multimodal and multimedia-based (MM) networks. First, as the data types become more complex, the processing of the data becomes necessarily more application specific. For example, in the cane toad monitoring example, the signal processing is based upon the cane toad’s vocal signature. For video and image processing in such networks, the processing of the video will be even more specific to the application. For example, a video camera pointed at the highway may want to convert the video into the speed of cars on the highway or detect cars on the side of the road. In another application, the processing might be completely different. Second, sensor networks pose a many-to-one information implosion problem. Unlike video streaming systems which can deliver a single stream to many hosts through multicast, MM networks can potentially deliver many streams to a single client. Thus, processing and adaptation of the data needs to occur deep within the sensor network for optimal power management and scalability. Third, the diversity of computing devices means that programmatically and operationally, the system needs to be able to bridge a variety of hardware and software configurations. Finally, with the scale of such networks, the ability to easily program and retask a large number of sensors is necessary. While some primitive MM networks exist today, they are characterized by extremely brittle software infrastructures. That is, changing the functionality of the system requires significant intervention on the user’s behalf, potentially requiring significant modifications to the software as well. Second, such applications are typically written and optimized by the computer scientists that know the low level hardware and networking issues, rather than the application writer, who may not be a computer scientist at all. The key limitation today’s MM networks is that there is no bridge between the application writers and the diverse low level sensor hardware. While multimedia toolkits have been proposed in the past, they are not well suited for the unique challenges posed by sensor networks. In this paper, we describe the design and implementation of SenseTK, a toolkit that we have designed and implemented to help bridge the applications and the diversity of low-level hardware in sensor networking applications. With the MM networks in mind, the toolkit was designed with a number of goals including deployability, programmability, management, and retaskability. We will describe the basic systems architecture of SenseTK and provide several examples that integrate audio, images, video,