PresentPostures: A Wrist and Body Capture Approach for Augmenting Presentations Jochen Kempfle Ubiquitous Computing University of Siegen Siegen, Germany jochen.kempfle@uni-siegen.de Kristof van Laerhoven Ubiquitous Computing University of Siegen Siegen, Germany kvl@eti.uni-siegen.de Abstract—Capturing and digitizing all nuances during presen- tations is notoriously difficult. At best, digital slides tend to be combined with audio, while video footage of the presenter’s body language often turns out to be either too sensitive, occluded, or hard to achieve for common lighting conditions. If presentations require capturing what is written on the whiteboard, more expensive setups are usually needed. In this paper, we present an approach that complements the data from a wrist-worn inertial sensor with depth camera footage, to obtain an accurate posture representation of the presenter. A wearable inertial measurement unit complements the depth footage by providing more accurate arm rotations and wrist postures when the depth images are occluded, whereas the depth images provide an accurate full- body posture for indoor environments. In an experiment with 10 volunteers, we show that posture estimates from depth images and inertial sensors complement each other well, resulting in far less occlusions and tracking of the wrist with an accuracy that supports capturing sketches. Index Terms—motion capture, inertial measurement, kinect I. I NTRODUCTION Tracking a person’s wrist’s position and orientation is a key feature in many applications such as virtual reality, medical applications, computer games, or manual task analysis [1]. In this paper, we present a novel approach that combines a wrist- worn inertial measurement unit (IMU) with depth images of the entire person, to robustly track the human posture in real time, for capturing a presenter’s body language and writing. We argue that the dominant wrist needs to be tracked very accurate for this purpose, and that the two modalities combined will lead to a more accurate system that can cope with common problems that the individual sensors suffer from, in particular occlusions and inertial sensor drift. To this end, we focus here on a study that measures how accurate depth imaging and inertial sensing can track the hand’s position while writing on a whiteboard. The contributions of this paper are threefold: • A software framework is presented that allows, in real- time, to acquire and combine the measurements of body- worn inertial data and depth images. • We present custom methods to calibrate and synchronize smartwatch data with the depth data for a body model. • A study evaluates the tracking performance of both body and wrist for the special case of writing on a whiteboard. In the following, we highlight our approach with relation to related research, before presenting the study and its results. Fig. 1. Our approach combines a depth camera data with wrist-worn 9D inertial (IMU) readings in real-time, to robustly capture a presenter’s postures. II. RELATED WORK IMU-based posture estimation successfully is applied in many applications and IMU-based full body tracking systems already are deployed industrially [2]. Integrating the IMU sensor data into a biomechanical model and modelling the sensor to bone offset, such as in [3] or [4], increases the overall accuracy [2]. Accessing the various calibration parameters therefore is a vital requirement. For camera-based systems, extensive frameworks exist for so-called RGB-D sensors that use depth information, such as [5], and for highly accurate commercial systems that rely on fiducial markers. Vision- based motion capture systems are known to have their spe- cific weaknesses as well. Self-occlusion by the person under observation and occlusion by nearby structures, as well as adverse lighting conditions tend to hamper an accurate body posture recognition [6]. Additionally, these systems tend to be less flexible to be moved at different locations, and their setup effort and costs tends to be higher than wearable inertial measurement solutions. In recent years, some examples have shown how these weaknesses in one modality can be addressed by another. In [7], for instance, cameras in the environment