Computer Vision and Image Understanding 141 (2015) 138–151 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu A framework for live and cross platform fingerspelling recognition using modified shape matrix variants on depth silhouettes Lalit Kane, Pritee Khanna Computer Science and Engineering descipline, PDPM Indian Institute of Information Technology Design and Manufacturing, Dumna Airport Road, Jabalpur - 482005, INDIA article info Article history: Received 25 October 2014 Accepted 3 August 2015 Keywords: Depth sensing Shape matrix Line of reference Principal axis Fingerspelling recognition abstract Automatic recognition of fingerspelling postures in a live environment is a challenging task primarily due to the complex computation of popular moment-based and spectral descriptors. Shape matrix offers a time- efficient alternative that samples the shape region through the intersection points of adjacent log-polar sections. However, sparse sampling of the region by discrete log-polar intersection points cannot capture salience of the shape. This manuscript proposes modified forms of the shape matrix which can capture salience of the fingerspelling postures by the precise sampling of contours and regions. For effective segmen- tation and subsequent description, hand postures are acquired through the depth sensor. Proposed shape matrix variants are evaluated for fingerspelling recognition with one-handed and two-handed postures. Ex- periments are rigorously performed on three datasets including one-handed signs of American Sign Language (ASL), NTU hand digits, and both one-handed and two-handed signs of Indian Sign Language (ISL). Proposed shape matrix variants supersede the benchmark shape context and Gabor features by obtaining 94.15% accu- racy on ISL dataset with minimum mean running time of 0.029 s. On ASL and NTU datasets, 91.86% and 95.11% accuracies are obtained with 0.0172 and 0.0483 s mean running times, respectively. © 2015 Elsevier Inc. All rights reserved. 1. Introduction Fingerspelling signs are the basic communication tool for hearing and speech disabled persons, which include hand postures for alpha- bets and digits of the pertaining sign language. The task of vision- based fingersign recognition is daunting in itself. Even with one- handed postures, self-occlusions and out-of-plane rotations always pose challenges. When one of the hands touches or overlaps the other to articulate a combined posture, the task becomes complex by many folds. It is very tough to implement a general hand posture recog- nizer due to multiplied degrees of the freedom of each hand that re- sults into a galore of new shapes. Nevertheless, it is possible to realize context specific systems with appropriate acquisition mechanisms, features, and tolerable assumptions. The context might be universal, e.g., fingersign recognition or it can be defined to suit a specific need like identification of the postures for virtual 3D interfaces, robotic control, and hand grasp analysis. A solution to the automatic finger- spelling recognition is important as it not only targets a social cause but also motivates to resolve small context specific posture subsets. Systems working offline [1–6], on prerecorded videos [7–9], and in Corresponding author. Tel.: +917612632618. E-mail address: pkhanna@iiitdmj.ac.in, lalit.kane@iiitdmj.ac.in, priteekh@ gmail.com (P. Khanna). constrained acquisition setups [10–14] are readily available. Live and deployable hand posture recognition systems, capable of recognizing both one and two-handed postures, and performing across the con- texts in a generalized perspective are required. Features such as shape matrix [15] can prove to be efficient for the live recognition environments. The shape matrix aligns the shape with a circular grid and samples it through the intersection points of adjacent log-polar sections. Each cell corresponding to a sampled point is set in the binary shape matrix. The matrices of the two shapes are matched using XOR operation to determine the extent of their similarity. Although the feature is quite compact and fast, its pos- ture recognition performance suffers due to the discrete sampling of the region which misses fine shape details and reliance on the max- imum radius line as an axis of reference. The proposed work aims to develop a framework for the live fingerspelling recognition targeting both one-handed and two-handed postures of Indian Sign Language (ISL) with modified interpretations of the shape matrix. ISL dataset, customized to accomplish the task, comprises of 19 two-handed and 7 one-handed postures as shown in Fig. 1. The contributions of the presented work are: 1. Modifications to the shape matrix are introduced to capture con- tour and region salience by; selecting the principal axis as a more stable axis of reference, selective radial subdivisions, and inclu- sion of the distance and the depth in log-polar sections. http://dx.doi.org/10.1016/j.cviu.2015.08.001 1077-3142/© 2015 Elsevier Inc. All rights reserved.