A Sliding Window Approach to Natural Hand Gesture Recognition using a Custom Data Glove Granit Luzhnica * Know-Center GmhH J¨ org Simon † Know-Center GmhH Elisabeth Lex ‡ Knowledge Technologies Institute, Graz Univ. of Technology & Know-Center GmhH Viktoria Pammer § Knowledge Technologies Institute, Graz Univ. of Technology & Know-Center GmhH ABSTRACT This paper explores the recognition of hand gestures based on a data glove equipped with motion, bending and pressure sensors. We se- lected 31 natural and interaction-oriented hand gestures that can be adopted for general-purpose control of and communication with computing systems. The data glove is custom-built, and contains 13 bend sensors, 7 motion sensors, 5 pressure sensors and a magne- tometer. We present the data collection experiment, as well as the design, selection and evaluation of a classiﬁcation algorithm. As we use a sliding window approach to data processing, our algorithm is suitable for stream data processing. Algorithm selection and feature engineering resulted in a combination of linear discriminant anal- ysis and logistic regression with which we achieve an accuracy of over 98.5% on a continuous data stream scenario. When removing the computationally expensive FFT-based features, we still achieve an accuracy of 98.2%. Index Terms: C.3 [Special-Purpose and Application-Based Sys- tems]: Signal processing systems I.5.2 [Design Methodology]: Classiﬁer design and evaluation I.5.2 [Design Methodology]: Fea- ture evaluation and selection I.5.2 [Design Methodology]: Pattern analysis I.5.4 [Applications]: Signal processing H.5.2 [User Inter- faces]: Input devices and strategies H.5.2 [User Interfaces]: Inter- action styles 1 I NTRODUCTION Gesture recognition has been an active ﬁeld of research for more than two decades in human computer interaction. Initially, the mo- tivation was to detect and recognise sign language [1, 14, 33, 41]. The goal mostly was to develop computing systems that could understand and translate sign language. More recently, gesture recognition has gained interest as basis for gesture based interac- tion in a wide range of use cases, such as crisis management [45], TV remote controlling [34], interacting with computer [18, 24], gaming interfaces [23, 26, 45, 52], augmented reality applica- tions [17, 43, 48, 50], hands-free interaction in car driving [27], providing virtual training for car driving [50] or detecting a driver’s fatigue [25]. In the medical area, robot nurses are envisioned to detect surgeon’s hand gestures and to assist with necessary surgi- cal instrument [45]. In another type of use case, computer systems detect gestures in order to understand user activities. For instance, robots have been envisioned to analyse gestures in order to track which tasks are already completed in order to be able to seamlessly take over with the next steps [6, 35]. Sometimes, it is useful to only observe and document the gestures, as in the case of assem- * e-mail:gluzhnica@know-center.at † e-mail:jsimon@kow-center.at ‡ e-mail: elex@know-center.at § e-mail: viktoria.pammer@tugraz.at bly lines to document the work for quality assurance [40]. More in general, the goal to detect assembly line tasks is an area of active re- search [20, 32, 46, 51]. Gesture recognition has also been explored in the context of logging activities of daily life: In [38], the authors explore the possibility to detect eating habits via recognising the gestures for eating and drinking (bringing the hand to the mouth). In [39], activity logging based on both smartwatch and smartphone sensing is used to detect drinking too much coffee or not eating. With this work we contribute to the ﬁeld of gesture recogni- tion by exploring the recognition of natural and interaction-oriented hand gestures based on sensors worn on users’ hands. To that end, we designed a custom data glove equipped with sensors that cap- ture both motion and state of the hand and ﬁngers. We concen- trated on gestures that are widely known and that can reasonably be adopted to control and communicate with computing systems. Our envisioned use case is that of mapping out a general-purpose gesture alphabet. It should be easy to learn for users, and should be able to replace some of the interactions with computing systems (selecting, browsing, etc.) that are currently performed via mouse or smartphone gestures. We approached this goal by conducting a data collection experi- ment in which multiple users performed such gestures. In parallel to sensing, the gestures were manually annotated with gesture names. This resulted in a labelled set of hand gestures, which we used to extract representative features and to train a supervised learning al- gorithm. Then, we evaluated the performance of our algorithm “on- line”, i.e. on a continuous sensor data stream. The contributions of this work are three-fold: • A data set of natural hand gestures, which were gathered in a data collection experiment with 18 adults, and are manually annotated with gesture names. • Features selection - We identiﬁed characteristic features for gestures and investigated similarities between gestures. • Algorithm selection - We identiﬁed a performant algorithm for classifying gestures in a continuous sensor data stream. 2 RELATED WORK We identiﬁed two strands of research that are relevant for our work: ﬁrstly, research that deals with vision based systems for gesture recognition and secondly research that deals with wearable sen- sors for gesture recognition. In the ﬁrst case, the gesture recog- nition relies on an infrastructure built into the environment (e.g., using Kinect or webcam) whereas in the second case, the gesture recognition relies on wearable sensor technologies like data gloves, armbands or smartwatches. Vision based systems for gesture recognition. Typically a cam- era that is mounted in the environment records human hands and the system extracts features from the individual frames of the record- ing. Sometimes there is a ﬁltering process involved which removes unwanted objects like e.g heads from the image or video [7]. Typ- ically, postures are predicted [5, 7] and then a grammar is con-