Computer Vision and Image Understanding 141 (2015) 138–151 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu A framework for live and cross platform ﬁngerspelling recognition using modiﬁed shape matrix variants on depth silhouettes Lalit Kane, Pritee Khanna ∗ Computer Science and Engineering descipline, PDPM Indian Institute of Information Technology Design and Manufacturing, Dumna Airport Road, Jabalpur - 482005, INDIA article info Article history: Received 25 October 2014 Accepted 3 August 2015 Keywords: Depth sensing Shape matrix Line of reference Principal axis Fingerspelling recognition abstract Automatic recognition of ﬁngerspelling postures in a live environment is a challenging task primarily due to the complex computation of popular moment-based and spectral descriptors. Shape matrix offers a time- eﬃcient alternative that samples the shape region through the intersection points of adjacent log-polar sections. However, sparse sampling of the region by discrete log-polar intersection points cannot capture salience of the shape. This manuscript proposes modiﬁed forms of the shape matrix which can capture salience of the ﬁngerspelling postures by the precise sampling of contours and regions. For effective segmen- tation and subsequent description, hand postures are acquired through the depth sensor. Proposed shape matrix variants are evaluated for ﬁngerspelling recognition with one-handed and two-handed postures. Ex- periments are rigorously performed on three datasets including one-handed signs of American Sign Language (ASL), NTU hand digits, and both one-handed and two-handed signs of Indian Sign Language (ISL). Proposed shape matrix variants supersede the benchmark shape context and Gabor features by obtaining 94.15% accu- racy on ISL dataset with minimum mean running time of 0.029 s. On ASL and NTU datasets, 91.86% and 95.11% accuracies are obtained with 0.0172 and 0.0483 s mean running times, respectively. © 2015 Elsevier Inc. All rights reserved. 1. Introduction Fingerspelling signs are the basic communication tool for hearing and speech disabled persons, which include hand postures for alpha- bets and digits of the pertaining sign language. The task of vision- based ﬁngersign recognition is daunting in itself. Even with one- handed postures, self-occlusions and out-of-plane rotations always pose challenges. When one of the hands touches or overlaps the other to articulate a combined posture, the task becomes complex by many folds. It is very tough to implement a general hand posture recog- nizer due to multiplied degrees of the freedom of each hand that re- sults into a galore of new shapes. Nevertheless, it is possible to realize context speciﬁc systems with appropriate acquisition mechanisms, features, and tolerable assumptions. The context might be universal, e.g., ﬁngersign recognition or it can be deﬁned to suit a speciﬁc need like identiﬁcation of the postures for virtual 3D interfaces, robotic control, and hand grasp analysis. A solution to the automatic ﬁnger- spelling recognition is important as it not only targets a social cause but also motivates to resolve small context speciﬁc posture subsets. Systems working oﬄine [1–6], on prerecorded videos [7–9], and in ∗ Corresponding author. Tel.: +917612632618. E-mail address: pkhanna@iiitdmj.ac.in, lalit.kane@iiitdmj.ac.in, priteekh@ gmail.com (P. Khanna). constrained acquisition setups [10–14] are readily available. Live and deployable hand posture recognition systems, capable of recognizing both one and two-handed postures, and performing across the con- texts in a generalized perspective are required. Features such as shape matrix [15] can prove to be eﬃcient for the live recognition environments. The shape matrix aligns the shape with a circular grid and samples it through the intersection points of adjacent log-polar sections. Each cell corresponding to a sampled point is set in the binary shape matrix. The matrices of the two shapes are matched using XOR operation to determine the extent of their similarity. Although the feature is quite compact and fast, its pos- ture recognition performance suffers due to the discrete sampling of the region which misses ﬁne shape details and reliance on the max- imum radius line as an axis of reference. The proposed work aims to develop a framework for the live ﬁngerspelling recognition targeting both one-handed and two-handed postures of Indian Sign Language (ISL) with modiﬁed interpretations of the shape matrix. ISL dataset, customized to accomplish the task, comprises of 19 two-handed and 7 one-handed postures as shown in Fig. 1. The contributions of the presented work are: 1. Modiﬁcations to the shape matrix are introduced to capture con- tour and region salience by; selecting the principal axis as a more stable axis of reference, selective radial subdivisions, and inclu- sion of the distance and the depth in log-polar sections. http://dx.doi.org/10.1016/j.cviu.2015.08.001 1077-3142/© 2015 Elsevier Inc. All rights reserved.