Generic System for Human-Computer Gesture Interaction: Applications on Sign Language Recognition and Robotic Soccer Refereeing Paulo Trigueiros 1,2,4,5 , Fernando Ribeiro 1,4 , Luis Paulo Reis 3,4,5 1 Escola de Engenharia da Universidade do Minho, Dep. Eletrónica Industrial (EEUM/DEI) 2 Instituto Politécnico do Porto, Departamento de Informática (ISCAP/IPP) 3 Escola de Engenharia da Universidade do Minho, Dep. Sistemas de Informação (EEUM/DSI) 4 Centro ALGORITMI, Universidade do Minho, Portugal 5 Laboratório de Inteligência Artificial e Ciência de Computadores (LIACC), Portugal Emails:pjt@iscap.ipp.pt, fernando@dei.uminho.pt, lpreis@dsi.uminho.pt Abstract - Hand gestures are a powerful way for human communication, with lots of potential applications in the area of human computer interaction. Vision-based hand gesture recognition techniques have many advantages compared with traditional devices, giving users a simpler and more natural way to communicate with electronic devices. This work proposes a generic system architecture based in computer vision and machine learning, able to be used with any interface for real-time human-machine interaction. Its novelty is the integration of different tools for gesture spotting and the proposed solution is mainly composed of three modules: a pre-processing and hand segmentation module, a static gesture interface module and a dynamic gesture interface module. The experiments showed that the core of vision-based interaction systems could be the same for all applications and thus facilitate the implementation. For hand posture recognition, a SVM (Support Vector Machine) model was trained with a centroid distance dataset composed of 2170 records, able to achieve a final accuracy of 99.4%. For dynamic gestures, an HMM (Hidden Markov Model) model was trained for each one of the defined gestures that the system should recognize with a final average accuracy of 93.7%. The datasets were built from four different users with a total of 25 gestures per user, totalling 1100 records for model construction. The proposed solution has the advantage of being generic enough with the trained models able to work in real-time, allowing its application in a wide range of human-machine applications. To validate the proposed framework two applications were implemented. The first one is a real- time system able to interpret the Portuguese Sign Language. The second one is an online system able to help a robotic soccer game referee judge a game in real-time. Keywords - Human-machine interaction; Gesture Recognition; Computer Vision; Machine Learning 1 Introduction Hand gestures are a powerful way for human communication, with lots of potential applications in the area of human computer interaction. Vision-based hand gesture recognition techniques have many advantages compared with traditional devices, giving users a simpler and more intuitive way of communication between a human and a computer. Using visual input in this context makes it possible to communicate remotely with computerized equipment, without the need for physical contact. For gesture-based applications, we need to model them in the spatial and temporal domains, where a hand posture is the static structure of the hand and a gesture is the dynamic movement of the hand. Being hand-pose one of the most important communication tools in human’s daily life, and with the continuous advances of image and video processing techniques, research on human-machine interaction through gesture recognition led to the use of such technology in a very broad range of applications [1, 2], of which some are here highlighted: • Virtual reality: enable realistic manipulation of virtual objects using ones hands [3, 4].