978-1-4673-5637-4/13/$31.00 ©2013 IEEE
Abstract. The results of design and investigation on a
human gesture recognition system, based on a Kinect sensor,
are presented in this paper. In the presented research, we use
a Kinect device as a 3D data scanner. Therefore, the 3D
coordinates are calculated directly from depth images. The
system’s hardware description and computation method for
3D human gesture identification are presented in this study.
Ten specific single hand motion gestures, repeated several
times by seven different people were recorded and used in the
experimentation. Gesture recognition and interpretation are
performed by using a trained neural classifier in two ways. In
the first way, single hand motion gestures are captured in free
3D space, while in the second one people’s heads coordinates
in 3D are used as reference points for recorded hand gestures.
Such an approach provided easy adaptation and flexibility for
gesture interpretation. The structure of the classifier was
estimated through the trial and error approach.
Keywords: 3D gesture recognition, Kinect, neural network.
I. INTRODUCTION
HE traditional human and electronic devices interface
is not sufficiently effective in exploiting all the
advantages of nonverbal information. These days
keyboards, manipulators and touch screens prevail as
control devices, but they can only transfer small amounts of
data for the exchange of information between humans and
devices. Usually, all these devices work only as long as
humans have a direct contact with them. In order to process
the ever increasing amounts of information efficiently,
users of computers with 3D application software need more
natural and effective interaction methods.
Humans’ physical abilities and the development of
modern electronics technology make it possible to design
and develop new interaction methods between humans and
any device or system. The use of human gestures is a
noteworthy alternative to current interface devices for the
human – computer interaction (HCI) or robot control. In
particular, visual recognition and interpretation of the
human gesture recognition system provide the absence of
physical contact and naturalness desirable in a system’s
interface. According to [1] numerous approaches to gesture
recognition have been developed. A large variety of
techniques have been used to track the hand in 2D pictures,
and recognition is limited by use of Hidden Markov Models
(HMM) and Kalman filters. However, it is rather difficult to
recognize the hand in color images, when the background
color matches the hand’s skin color or when the lighting is
changeable [16].
In the past two years low-cost depth-sensing cameras
have also become commercially available, including the
very well-known Microsoft Kinect 3D scanner [2, 3]; the
latter has made it possible to sense not only the hand, but
also the whole body without using any markers or
hand-held devices [4, 5, 6].
The Kinect devices do not work reliably in areas lit by
direct sunlight. The OpenNI
TM
organization has emerged to
promote the standardization of these natural interaction
devices, and has made available an open source framework
for developers.
The study in [13] has revealed that aircraft marshalling
hand motion gestures used in the military air force can be
recognised with a ~99 % recognition rate on the testing data
set and with a 83% recognition rate on a data set with
unintentional (unseen) gestures.
Wang et. all in [14][15] have used Hidden Markov
Model (HMM) for 2D 7 hand motion gesture recognition
and have obtained a ~95% recognition rate.
The research presented here is focused on the design of a
system in which hand gestures are identified as commands
for robot control. The experiment was split into two parts:
first, 10 different gestures (commands) made by a single
hand were captured in free 3D space. Following this,
human head coordinates are used as a reference point in
space to bound hand movements at different distances. The
coordinates of human body parts in 3D space are obtained
as written in [4 - 6] with the Kinect sensor. Such a system
can be adapted to different systems as the human – system
interface. For gesture interpretation we have applied the
Neural Network (NN) classifier with tapped delay lines
(TDL) which was effective enough and robust in gesture
classification. The NN itself is a static data mapping
structure and by adding TDL to the NN the input data
dynamics is expressed. The series of delays break the input
up in time and these delayed values are fed into the
network. Based on these inputs, the NN generates an
estimated output value, i.e. a predefined class label. The
artificial neural network registers the sequence of hand
gestures (eg 10 reference points) on real time basis and then
generates the outputs.
The paper contains five main sections. In the second
section the experimental setup is surveyed. The third
section gives more information about the experiment and
gesture data. Experimental results are presented in section
four. Finally, the conclusions are presented in section five.
3D Human Hand Motion Recognition System
Kstas Rimkus
1
, Audrius Bukis
2
, Arnas Lipnickas
3
, Saulius Sinkeviius
1
1
Department of Control Technology and
2
Department of Process Control,
3
The Mechatronics Centre
for Studies, Information and Research, Kaunas University of Technology, Lithuania,
kestas.rimkus@gmail.com, arunas.lipnickas@ktu.lt, saulius.sinkevicius@stud.ktu.lt
T
180