D-TOUCH: A CONSUMER-GRADE TANGIBLE INTERFACE MODULE AND MUSICAL APPLICATIONS E. Costanza Media Engineering Department of Electronics University of York York UK e.costanza@ieee.org S. B. Shelley Media Engineering Department of Electronics University of York York UK sbs102@york.ac.uk J. A. Robinson Media Engineering Department of Electronics University of York York UK jar11@ohm.york.ac.uk ABSTRACT We define a class of tangible media applications that can be implemented on consumer-grade personal computers. These applications interpret user manipulation of physical objects in a restricted space and produce unlocalized outputs. We propose a generic approach to the implementation of such interfaces using flexible fiducial markers, which identify objects to a robust and fast video-processing algorithm, so they can be recognized and tracked in real time. We describe an implementation of the technology, then report two new, flexible music performance applications that demonstrate and validate it. Keywords Tangible Media, Physical Objects Interface, Video Analysis, Music User Interface. 1. INTRODUCTION The power of physical or "tangible" interfaces has been particularly well demonstrated in Video-Augmented Environments (VAEs) ([14], [13], [12], [9]). But VAEs require expensive equipment such as data projectors and specially designed interaction objects. In low-budget environments like schools and homes, interfaces must be implemented with conventionally-equipped personal computers and everyday objects. We are interested in extending the paradigm of tangible interfaces to educational and recreational applications, and so seek ways of realising them on consumer-grade equipment. We confine our attention to tangible media applications that do not generate localized outputs – or, at least, for which the localization of the outputs at the interaction objects is not required – such as those that output only audio. The physical configuration of objects provides all the visual feedback needed. The main requirement for implementing tangible media input is robust and fast analysis of the movement of physical objects. Our tangible media environment consists of a small area, usually on a table top, viewed by a web cam, near which are placed the computer speakers. It is assumed that the computer screen will not be used in tangible media applications. For interaction we expect different applications to use different simple moveable objects. We therefore need a way for the system to recognize these. Our solution is to use a library of generic fiducial symbols that the system has high probability of recognizing and tracking accurately and quickly even in adverse illumination and with partial occlusion. 2. CONTEXT AND PRIOR ART The use of physical objects as tangible interactors offers several benefits noted by previous workers. These include: enhanced multiuser interaction ([5], [14], [3]; enhanced spatial awareness through 3D vision and kinaesthetic memory and improved use of spatial reasoning skills ([8], [14]). A fiducial-based approach to both wearable and projected augmented reality has been advocated by Rekimoto and others ([10], [7], [13], [11]) with barcode, matrix code and character recognition methods demonstrated. However, whereas these methods are based on geometrical feature extraction, generally followed by template matching, our approach relies on the topological structure of the markers. This provides a number of advantages discussed in section 4, and prompts careful co-design of the fiducial and the image processing method. We use topological image processing to achieve real-time fiducial identification and localization. Inspired by a region adjacency graph approach by Clarke and Johnston [6], we have developed a novel region adjacency tree algorithm. This is a simplification of previous topological