Towards a Professional Gesture Recognition with RGB-D from Smartphone Pablo Vicente Mo˜ nivar 12 , Sotiris Manitsaris 13 , and Alina Glushkova 14 1 Centre for Robotics, MINES ParisTech, PSL Universit´e Paris 2 pablo.vicente monivar@mines-paristech.fr 3 sotiris.manitsaris@mines-paristech.fr 4 alina.glushkova@mines-paristech.fr Abstract. The goal of this work is to build the basis for a smartphone application that provides functionalities for recording human motion data, train machine learning algorithms and recognize professional ges- tures. First, we take advantage of the new mobile phone cameras, either infrared or stereoscopic, to record RGB-D data. Then, a bottom-up pose estimation algorithm based on Deep Learning extracts the 2D human skeleton and exports the 3rd dimension using the depth. Finally, we use a gesture recognition engine, which is based on K-means and Hidden Markov Models (HMMs). The performance of the machine learning al- gorithm has been tested with professional gestures using a silk-weaving and a TV-assembly datasets. Keywords: pose estimation, depth map, gesture recognition, Hidden Markov Models, smartphone 1 Introduction The role of professional actions, activities and gestures is of high impor- tance in most industries. Motion sensing and machine learning have actively con- tributed to the capturing of gestures and the recognition of meaningful movement patterns by machines. Therefore, very interesting applications have emerged ac- cording to the industry. For example, in the factories of the future, the capabil- ities of workers will be augmented by machines that can continuously recognize their gestures and collaborate accordingly, whereas in the creative and cultural industries it is still a challenge to recognize and identify the motor skills of a given expert. Therefore, capturing the motion of workers or craftsmen using oﬀ- the-shelf devices, such as smartphones, has a great value. New smartphones are equipped with depth sensors and high power processors, which allow us to record data even without very sophisticated devices. In this work, we aim to create a smartphone application that allows for recording gestures using RGB or RGB-D images, estimating human poses, train- ing machine learning models by using only few shots and recognizing meaningful patterns. The motivation of this work is to give the possibility to the users to eas- ily record, annotate, train and recognize human, actions, activities and gestures in professional environments.