An On-Line, Cloud-Based Spanish-Spanish Sign Language Translation System Javier Tejedor, Fernando L´ opez-Colino, Jordi Porta, Jos´ e Col´ as Human Computer Technology Laboratory, Departamento de Tecnolog´ ıa Electr ´ onica y de las Comunicaciones, Universidad Aut´ onoma de Madrid, Madrid, Spain {javier.tejedor, fj.lopez, jordi.porta, jose.colas}@uam.es Abstract An on-line Spanish-Spanish Sign Language (LSE) trans- lation system is presented in which Spanish speech con- tent is translated into LSE to provide Spanish deaf peo- ple access to speech information. It is cloud-based, built over a speech recognition module, a transfer-based ma- chine translation module and a Sign Language synthesis module that employs an avatar to present the signed con- tent. Index Terms: Spanish Sign Language, machine transla- tion, speech recognition, deaf people 1. Introduction Spanish Sign Language (LSE) is a visual-gestural lan- guage used by the majority of the Spanish Deaf com- munity. It is considered an ofﬁcial language in Spain since 2007. Data provided by INE 1 and MEC 2 report 1.064.000 deaf people in Spain, being half of them more than 65 years old. 47% of the deaf people only have basic-level studies or are even illiterate. Only between 1% and 3% of the Spanish deaf people are university graduated (INE, 1999; MEC, 2000). Moreover, about 92% of the Spanish deaf population have important dif- ﬁculties to use written Spanish (INE, 2008). This bleak panorama encourages researchers to contribute in the in- tegration of the Deaf in the society. This demonstration presents an on-line, cloud-based Spanish-LSE translation system, aiming at translating speech content corresponding to Spanish language to LSE and at representing it by an avatar. This system can be applied in several scenarios: deaf children school, in- formation services (bank services, etc.), TV new-based services, etc. 2. Spanish-LSE translation system The Spanish-LSE machine translation system is com- posed of three main modules: the speech recognition 1 Instituto Nacional de Estad´ ıstica (Spanish National Statistics Insti- tute) 2 Ministerio de Educaci ´ on y Cultura (Spanish Ministry of Education) module transcribes speech into text; the machine trans- lation module uses speech transcriptions and translates them into a sequence of glosses; and the Sign Language synthesis module, which takes this sequence of glosses and generates an animation using an avatar. These com- ponents, along with the user interface for accessing the system, and the gateway that controls the communica- tions between each module, are described next. 2.1. Speech recognition module The ATK software (an HTK-based API for on-line recog- nition) [1] produces the most likely sequence of words ac- cording to the input speech content, which is next passed to the machine translation module. 3-state context- independent (due to the limited training data), left-to- right topology Hidden Markov Models represent the acoustic models (AMs). Each state is modeled from 60 Gaussian Mixture Model components. AMs have been trained from the ALBAYZIN database [2] for a set of 47 phones [3] in Spanish plus beginning and end silences with 39-dimensional Mel Frequency Cepstrum Coefﬁ- cient features. There is an additional short pause model that contains a single emitting state and a skip transi- tion. This module also provides a bigram language model trained from a corpus related to the disability domain, chosen for the INTERSPEECH Show&Tell demonstra- tion, and a vocabulary that consists of 2000 words. 2.2. Machine translation module Spanish and LSE are typologically different languages. To give an example, Spanish is a Subject-Verb-Object language, whereas LSE is a topic-oriented language. These differences cause direct translation not to be an appropriate approach to translation. The machine trans- lation module is implemented from a transfer-based ap- proach with the analysis, transfer and generation stages. In the analysis stage, the input text is analyzed mor- phologically and syntactically to derive a constituency tree. A wide-coverage uniﬁcation-based grammar which includes core linguistic phenomena in Spanish such as complementation, adjunction, pronominalization, etc., ISCA Archive http://www.isca-speech.org/archive INTERSPEECH 2012 ISCA's 13 th Annual Conference Portland, OR, USA September 9-13, 2012 INTERSPEECH 2012 2127