Modeling Signs Using Functional Data Analysis Sunita Nayak Sudeep Sarkar Kuntal Sengupta Department of Computer Science & Engineering AuthenTec Inc University of South Florida 709 S. Harbor City Blvd Tampa, FL 33620, USA Melbourne FL 32901, USA {snayak,sarkar}@csee.usf.edu kuntal.sengupta@authentec.com Abstract We present a functional data analysis (FDA) based method to statistically model continuous signs of the Ameri- can Sign Language (ASL) for use in the recognition of signs in continuous sentences. We build models in the Space of Probability Functions (SoPF) that captures the evolution of the relationships among the low-level features (e.g. edge pixels) in each frame. The distribution (histogram) of the horizontal and vertical displacements between all pairs of edge pixels in an image frame forms the relational distri- butions. We represent these sequence of relational distribu- tions, corresponding to the sequence of image frames in a sign, as a sequence of points in a multi-dimensional space, capturing the salient variations in these relational distri- butions over time; we call this space the SoPF. Each sign model consists of a mean sign function and covariance func- tions, capturing the variability of each sign in the training set. We use functional data analysis to arrive at this model. Recognition and sign localization is performed by correlat- ing this statistical model with any given sentence. We also present a method to infer and learn sign models, in an un- supervised manner, from sentence samples containing the sign; there is no need for manual intervention. 1. Introduction While speech recognition has made rapid advances, sign language recognition is lagging behind. With gradual shift to speech based I/O devices, there is great danger that per- sons who rely solely on sign languages for communication will be deprived access to state-of-the-art technology un- less there are significant advances in automated recognition of sign languages. Previous works in sign language have been mostly in the recognition of static gestures, e.g. [2, 21, 13] and isolated signs, e.g. [19]. Yeasin and Chaudhuri [20] had worked on dynamic hand gestures. Bobick and Wilson [1] had pro- posed a state-based approach to model gestures. Starner and Pentland [11] were the first to seriously consider con- tinuous sign recognition. Using Hidden Markov Model (HMM) based representations, they achieved near perfect recognition with sentences of fixed structure, i.e. contain- ing personal pronoun, verb, noun, adjective, personal pro- noun in that order. Vogler and Metaxas [15, 16, 17] have been instrumental in significantly pushing the state-of-the- art in automated ASL recognition using HMMs. In terms of the basic HMM formalism, they have explored many vari- ations, such as context dependent HMMs, HMMs coupled with partially segmented sign streams, and parallel HMMs. The wide use of HMM is also seen in other sign language recognizers. Most of the works in continuous sign language recog- nition have avoided the basic problem of segmentation and tracking of hands by using wearable devices, such as col- ored gloves, or magnetic markers, to directly get the loca- tion features. For example Vogler and Metaxas [15, 16, 17] have used 3D magnetic tracking system; Starner and Pent- land [11] have used colored gloves while Ma et.al. [5, 18] have used Cybergloves. In this paper, we restrict ourselves to plain color images, without the use of any augmenting wearable devices. There are two kinds of information that can be used for recognition, viz. manual and non-manual. The manual in- formation relates to the hand motion or shape, while the non-manual information relates to the facial expressions, head movement, or torso movement. Here we use the man- ual information from hand motion. The hand motion is first modeled using relational distributions, which are efficiently represented as points in the Space of Probability functions (SoPF). The points are then transformed into smooth curves that are registered and trained to form a unique model for a sign using Functional Data Analysis. 2. Data Set A vital component in ASL recognition research is the data set used in the study. The largest corpus used in ASL recognition contains a vocabulary of around 50 signs, em-