Proceedings of the eNTERFACE’07 Workshop on Multimodal Interfaces, ˙ Istanbul, Turkey, July 16 - August 10, 2007 A MULTIMODAL FRAMEWORK FOR THE COMMUNICATION OF THE DISABLED Savvas Argyropoulos 1 , Konstantinos Moustakas 1 , Alexey A. Karpov 2 , Oya Aran 3 , Dimitrios Tzovaras 1 , Thanos Tsakiris 1 , Giovanna Varni 4 , Byungjun Kwon 5 1 Informatics and Telematics Institute (ITI), Aristotle University of Thessaloniki, Hellas, Greece 2 Saint-Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, Russian Federation 3 Perceptual Intelligence Lab, Bo ˘ gazic ¸i University, ˙ Istanbul, Turkey 4 InfoMus Lab - Casa Paganini, DIST, University of Genoa, Italy 5 Koninklijk Conservatorium, The Hague, Netherlands ABSTRACT In this paper, a novel system, which aims to provide alterna- tive tools and interfaces to blind and deaf-and-mute people and enable their communication and interaction with the computer is presented. All the involved technologies are integrated into a treasure hunting game application that is jointly played by the blind and deaf-and-mute user. The integration of the multimodal interfaces into a game application serves both as an entertain- ment and a pleasant education tool to the users. The proposed application integrates haptics, audio, visual output and computer vision, sign language analysis and synthesis, speech recognition and synthesis, in order to provide an interactive environment where the blind and deaf-and-mute users can collaborate to play the treasure hunting game. KEYWORDS Multimodal interfaces – Multimodal fusion – Sign language anal- ysis – Sign language synthesis – Speech recognition 1. INTRODUCTION The widespread deployment of novel human-computer inter- action methods has changed the way individuals communicate with computers. Since Sutherland’s SketchPad in 1961 or Xe- rox’ alto in 1973, computer users have long been acquainted with more than the traditional keyboard to interact with a sys- tem. More recently, with the desire of increased productiv- ity, seamless interaction and immersion, e-inclusion of people with disabilities, and with the progress in ﬁelds such as multi- media/multimodal signal analysis and human-computer interac- tion, multimodal interaction has emerged as a very active ﬁeld of research [1]. Multimodal interfaces are those encompassing more than the traditional keyboard and mouse. Natural input modes are employed [2], [3], such as voice, gestures and body movement, haptic interaction, facial expressions, and physiological signals. As described in [4], multimodal interfaces should follow several guiding principles. Multiple modalities that operate in different spaces need to share a common interaction space and to be syn- chronized. Also, multimodal interaction should be predictable and not unnecessarily complex, and should degrade gracefully, for instance by providing for modality switching. Finally, mul- timodal interfaces should adapt to user’s needs, abilities, and the environment. A key aspect in multimodal interfaces is also the integra- tion of information from several different modalities in order to extract high-level information non-verbally conveyed by users. Such high-level information can be related to expressive, emo- tional content the user wants to communicate. In this frame- work, gesture has a relevant role as a primary non-verbal con- veyor of expressive, emotional information. Research on ges- ture analysis, processing, and synthesis has received a grow- ing interest from the scientiﬁc community in recent years and demonstrated its paramount importance for human machine in- teraction. The present work aims to make the ﬁrst step in the devel- opment of efﬁcient tools and interfaces for the generation of an integrated platform for the intercommunication of blind and deaf-mute persons. It is obvious that while multimodal signal processing is essential in such applications, speciﬁc issues like modality replacement and enhancement should be addressed in detail. In the blind user’s terminal the major modality to perceive a virtual environment is haptics while audio input is provided as supplementary side information. Force feedback interfaces al- low blind and visually impaired users to access not only twodi- mensional graphic information, but also information presented in 3D virtual reality environments (VEs) [5]. The greatest po- tential beneﬁts from virtual environments can be found in appli- cations concerning areas such as education, training, and com- munication of general ideas and concepts [6]. Several research projects have been conducted to assist vi- sually impaired to understand 3D objects, scientiﬁc data and mathematical functions, by using force feedback devices [7]. PHANToM™ is one of the most commonly used force feed- back device. Due its hardware design, only one point of contact at a time is supported. This is very different from the way that people usually interact with surroundings and thus, the amount of information that can be transmitted through this haptic chan- nel at a given time is very limited. However, research has shown that this form of exploration, although time consuming, allows users to recognize simple 3D objects. The PHANToM™ device has the advantage to provide the sense of touch along with the feeling of force feedback at the ﬁngertip. Deaf and mute users have visual access to 3D virtual en- vironments; however their immersion is signiﬁcantly reduced by the luck of audio feedback. Furthermore effort has been done to provide applications for the training of hearing impaired. Such applications include the visualization of the hand and body movements performed in order to produce words in sign lan- guage as well as applications based on computer vision tech- niques that aim to recognize such gestures in order to allow nat- ural human machine interaction for the hearing impaired. In the context of the presented framework the deaf-mute terminal in- corporates sign-language analysis and synthesis tools so as to 27