Hand Posture Recognition: IR, IMU and sEMG Richard Polfreman Music, University of Southampton University Road, Southampton, SO17 1BJ, UK r.polfreman@soton.ac.uk ABSTRACT Hands are important anatomical structures for musical performance, and recent developments in input device technology have allowed rather detailed capture of hand gestures using consumer-level products. While in some musical contexts, detailed hand and finger movements are required, in others it is sufficient to communicate discrete hand postures to indicate selection or other state changes. This research compared three approaches to capturing hand gestures where the shape of the hand, i.e. the relative positions and angles of finger joints, are an important part of the gesture. A number of sensor types can be used to capture information about hand posture, each of which has various practical advantages and disadvantages for music applications. This study compared three approaches, using optical, inertial and muscular information, with three sets of 5 hand postures (i.e. static gestures) and gesture recognition algorithms applied to the device data, aiming to determine which methods are most effective. Author Keywords Hand posture, Gesture recognition, Motion capture CCS Concepts Human-centered computing~Gestural input Applied computing~Performing arts • Social and professional topics~Hardware selection 1. INTRODUCTION Hand gesture recognition is of interest to a number of fields such as sign-language interpretation, robotics, prosthetics, virtual reality, health applications, video games, as well as computer music. Recent devices have brought much of the necessary technology to consumer level systems, such as the Leap Motion [19] and Kinect [24] expanding the potential user base beyond scientific and research environments and into everyday use. Real-time hand gesture recognition systems typically comprise of a hardware sensor arrangement to supply a stream of data about the user’s hand, signal processing algorithms to extract features from the data and machine-learning tools to then determine the current gesture. A number of different sensor types can be used at the input stage, and in this work we wanted to compare the performance of optical (IR depth camera), surface electromyography (sEMG), and inertial measurement units (IMUs), although other sensor technologies can be used for hand tracking such as electromagnetic sensing (e.g. Polhemus) and bend sensors (e.g. Cyberglove). 1.1 Optical Systems Optical approaches can include the use of standard video cameras, IR depth cameras and multi-camera motion capture systems, with both marker-less or marker-based techniques. Each type of system has a number of practical difficulties and advantages, but generally optical systems are subject to lighting issues, occlusion/line-of-sight problems and often quite severe spatial/orientation constraints. High- end multi-camera mo-cap systems can provide highly accurate spatial information and fast response, but can be difficult to setup in a concert environment and can be prohibitively expensive. Video cameras can be cheap and simple to set up, at the cost of spatial constraints and occlusion problems, while IR depth-camera hardware is now readily available and affordable, but again with some line-of- sight and lighting/noise problems. Microsoft’s Kinect provides full body skeletal information but currently provides little direct hand information – the second generation device providing thumb orientation and some basic hand- posture detection (lasso, fist, open). Others have used the Kinect or other depth cameras to track fully articulated hand gestures, including Microsoft (although not released), Intel (RealSense) and other research teams (FORTH), although often at low frame-rates and/or very high computational cost requiring GPU processing [38]. Both generations of Kinect have been used in a number of installations and music applications, including [21], [39], [33], [15]. The Leap Motion controller currently provides a detailed skeletal model of both hands and fingers and again has been explored in a number of music applications, including [11], [12], [4]. It has also been adopted commercially by some music technology companies, including Steinberg for controlling their Cubase software [37] and Fairlight (now Blackmagic Design), who embedded a Leap Motion controller into their 3D Audio Workstation for “Air panning” of sounds [2]. Camera frame-rates are an important consideration with optical systems, as this sets an upper-limit on the speed of tracking and gesture recognition. Low-cost industrial USB cameras can achieve 120fps (at the cost of resolution), high-end multi-camera mo-cap systems typically 100-500fps, while consumer depth camera systems range from ~30-100fps. In our experience, frame rates of ~50fps and higher provide systems which feel responsive in music applications, depending on the subsequent processing latency. 1.2 sEMG Systems sEMG devices need to have skin contact near to the muscles of interest, which for hand control are in the upper forearm. They measure the electrical activity in the muscle fibres, which correlates with the exertion being made. Holding the hand in different finger postures requires different patterns of muscle activation in the forearm, and therefore the sEMG data can be interpreted to say something about the shape of the hand, although of course the sEMG data will vary with the pressure being exerted by the fingers (e.g. if a fist is being squeezed tightly or more relaxed). Indeed it is possible to tense the forearm muscles voluntarily without changing hand posture, which will affect the sEMG signals being generated. Nymoen [29] explored a wireless sEMG device, the Myo Armband, as a digital musical instrument controller, and found issues with misattribution of hand postures, and jitter in posture detection response as limiting factors in the utility of the device in music performance. Despite this, wireless sEMG devices offer a potential solution to hand posture capture, without the lighting, spatial and line-of-sight limitations of optical systems, or the intrusion of gloves. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Copyright remains with the author(s). NIME’18, June 3-6, 2018, Blacksburg, Virginia, USA.