Computer Vision and Image Understanding 141 (2015) 138–151
Contents lists available at ScienceDirect
Computer Vision and Image Understanding
journal homepage: www.elsevier.com/locate/cviu
A framework for live and cross platform fingerspelling recognition using
modified shape matrix variants on depth silhouettes
Lalit Kane, Pritee Khanna
∗
Computer Science and Engineering descipline, PDPM Indian Institute of Information Technology Design and Manufacturing, Dumna Airport Road, Jabalpur -
482005, INDIA
article info
Article history:
Received 25 October 2014
Accepted 3 August 2015
Keywords:
Depth sensing
Shape matrix
Line of reference
Principal axis
Fingerspelling recognition
abstract
Automatic recognition of fingerspelling postures in a live environment is a challenging task primarily due to
the complex computation of popular moment-based and spectral descriptors. Shape matrix offers a time-
efficient alternative that samples the shape region through the intersection points of adjacent log-polar
sections. However, sparse sampling of the region by discrete log-polar intersection points cannot capture
salience of the shape. This manuscript proposes modified forms of the shape matrix which can capture
salience of the fingerspelling postures by the precise sampling of contours and regions. For effective segmen-
tation and subsequent description, hand postures are acquired through the depth sensor. Proposed shape
matrix variants are evaluated for fingerspelling recognition with one-handed and two-handed postures. Ex-
periments are rigorously performed on three datasets including one-handed signs of American Sign Language
(ASL), NTU hand digits, and both one-handed and two-handed signs of Indian Sign Language (ISL). Proposed
shape matrix variants supersede the benchmark shape context and Gabor features by obtaining 94.15% accu-
racy on ISL dataset with minimum mean running time of 0.029 s. On ASL and NTU datasets, 91.86% and 95.11%
accuracies are obtained with 0.0172 and 0.0483 s mean running times, respectively.
© 2015 Elsevier Inc. All rights reserved.
1. Introduction
Fingerspelling signs are the basic communication tool for hearing
and speech disabled persons, which include hand postures for alpha-
bets and digits of the pertaining sign language. The task of vision-
based fingersign recognition is daunting in itself. Even with one-
handed postures, self-occlusions and out-of-plane rotations always
pose challenges. When one of the hands touches or overlaps the other
to articulate a combined posture, the task becomes complex by many
folds. It is very tough to implement a general hand posture recog-
nizer due to multiplied degrees of the freedom of each hand that re-
sults into a galore of new shapes. Nevertheless, it is possible to realize
context specific systems with appropriate acquisition mechanisms,
features, and tolerable assumptions. The context might be universal,
e.g., fingersign recognition or it can be defined to suit a specific need
like identification of the postures for virtual 3D interfaces, robotic
control, and hand grasp analysis. A solution to the automatic finger-
spelling recognition is important as it not only targets a social cause
but also motivates to resolve small context specific posture subsets.
Systems working offline [1–6], on prerecorded videos [7–9], and in
∗
Corresponding author. Tel.: +917612632618.
E-mail address: pkhanna@iiitdmj.ac.in, lalit.kane@iiitdmj.ac.in, priteekh@
gmail.com (P. Khanna).
constrained acquisition setups [10–14] are readily available. Live and
deployable hand posture recognition systems, capable of recognizing
both one and two-handed postures, and performing across the con-
texts in a generalized perspective are required.
Features such as shape matrix [15] can prove to be efficient for
the live recognition environments. The shape matrix aligns the shape
with a circular grid and samples it through the intersection points
of adjacent log-polar sections. Each cell corresponding to a sampled
point is set in the binary shape matrix. The matrices of the two shapes
are matched using XOR operation to determine the extent of their
similarity. Although the feature is quite compact and fast, its pos-
ture recognition performance suffers due to the discrete sampling of
the region which misses fine shape details and reliance on the max-
imum radius line as an axis of reference. The proposed work aims to
develop a framework for the live fingerspelling recognition targeting
both one-handed and two-handed postures of Indian Sign Language
(ISL) with modified interpretations of the shape matrix. ISL dataset,
customized to accomplish the task, comprises of 19 two-handed and
7 one-handed postures as shown in Fig. 1. The contributions of the
presented work are:
1. Modifications to the shape matrix are introduced to capture con-
tour and region salience by; selecting the principal axis as a more
stable axis of reference, selective radial subdivisions, and inclu-
sion of the distance and the depth in log-polar sections.
http://dx.doi.org/10.1016/j.cviu.2015.08.001
1077-3142/© 2015 Elsevier Inc. All rights reserved.