Glenn, et. al., “An Image Processing Technique for the Translation of ASL Finger-Spelling to Digital Audio or Text”, NTID International Instructional Technology and Education of the Deaf Symposium, June 2005. 1 An Image Processing Technique for the Translation of ASL Finger-Spelling to Digital Audio or Text Chance M. Glenn, Divya Mandloi, Kanthi Sarella, and Muhammed Lonon The Laboratory for Advanced Communications Technology/CASCI ECTET Department/CAST Rochester Institute of Technology Rochester, New York 14623 cmgiee@rit.edu Abstract – This paper describes the ongoing development of an image processing technique for the translation of American Sign Language (ASL) finger-spelling to text. This work is phase one of a broader project, The Sign2 Project, that is focused on a complete technological approach to the translation of ASL to digital audio and/or text. We describe the approach to the phase one problem and show results that have been derived where several words are distinguished with a fairly high degree of reliability. We also discuss our approach to the next phase of development for the project. Keywords – image processing, sign language, ASL, finger-spelling, linguistics, communication Introduction The Sign2 Project is a focused research and development effort whose three-fold goal is to (a) further establish and enhance the body of knowledge in physical movement/position to language translation, (b) to conceptualize and engineer a prototype device that closes the communication gap between the deaf and the hearing, and (c) to establish and build a statistical database from the prototype results useful to the research and development community. The first phase of this project is to develop a fully image-processing approach to the translation of ASL finger-spelling. The image-processing approach was taken as opposed to other techniques such as data gloves [1] and more exotic techniques [2,3] because it is a more natural approach to the problem, because it is less intrusive to the signer, and because data reduction techniques are readily available in the form of image compression [4] and feature extraction [5,6]. Also, image processing techniques can be integrated with standing and developing technologies such as PDAs, smart-phones, video-phones, high-tech kiosks, etc. The power of the technique falls to the data processing and to the memory storage. The key to the present success of our approach is the imaging system and the adaptive statistical database that we form for comparison. Background There has already been a great deal of work done both in the US and abroad in the area of text- to-sign language conversion [7-9]. The area of sign language-to-text (or audio) is less mature, although there have been some recent breakthroughs incorporating data gloves for positional extraction. We are attempting to bridge cultural barriers, with technology as a medium. The incorporation of image processing applied to this challenge is in itself, unique; however, incorporating new breakthroughs in feature extraction promises to further enhance the research