Citation: Watanabe, T.;
Maniruzzaman, M.; Hasan, M.A.M.;
Lee, H.-S.; Jang, S.-W.; Shin, J. 2D
Camera-Based Air-Writing
Recognition Using Hand Pose
Estimation and Hybrid Deep
Learning Model. Electronics 2023, 12,
995. https://doi.org/10.3390/
electronics12040995
Academic Editors: Juan M. Corchado,
In Lee, Fuji Ren, Rashid Mehmood,
Byung-Gyu Kim and Carlos A.
Iglesias
Received: 13 January 2023
Revised: 8 February 2023
Accepted: 14 February 2023
Published: 16 February 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
2D Camera-Based Air-Writing Recognition Using Hand Pose
Estimation and Hybrid Deep Learning Model
Taiki Watanabe
1
, Md. Maniruzzaman
1
, Md. Al Mehedi Hasan
2
, Hyoun-Sup Lee
3
, Si-Woong Jang
4
and Jungpil Shin
1,
*
1
School of Computer Science and Engineering, The University of Aizu,
Aizuwakamatsu 965-8580, Fukushima, Japan
2
Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology,
Rajshahi 6204, Bangladesh
3
Department of Applied Software Engineering, Dongeui University, Busanjin-Gu,
Busan 47340, Republic of Korea
4
Department of Computer Engineering, Dongeui University, Busanjin-Gu, Busan 47340, Republic of Korea
* Correspondence: jpshin@u-aizu.ac.jp
Abstract: Air-writing is a modern human–computer interaction technology that allows participants
to write words or letters with finger or hand movements in free space in a simple and intuitive
manner. Air-writing recognition is a particular case of gesture recognition in which gestures can be
matched to write characters and digits in the air. Air-written characters show extensive variations
depending on the various writing styles of participants and their speed of articulation, which presents
quite a difficult task for effective character recognition. In order to solve these difficulties, this current
work proposes an air-writing system using a web camera. The proposed system consists of two
parts: alphabetic recognition and digit recognition. In order to assess our proposed system, two
character datasets were used: an alphabetic dataset and a numeric dataset. We collected samples
from 17 participants and asked each participant to write alphabetic characters (A to Z) and numeric
digits (0 to 9) about 5–10 times. At the same time, we recorded the position of the fingertips using
MediaPipe. As a result, we collected 3166 samples for the alphabetic dataset and 1212 samples for
the digit dataset. First, we preprocessed the dataset and then created two datasets: image data and
padding sequential data. The image data were fed into the convolution neural networks (CNN)
model, whereas the sequential data were fed into bidirectional long short-term memory (BiLSTM).
After that, we combined these two models and trained again with 5-fold cross-validation in order
to increase the character recognition accuracy. In this work, this combined model is referred to as
a hybrid deep learning model. Finally, the experimental results showed that our proposed system
achieved an alphabet recognition accuracy of 99.3% and a digit recognition accuracy of 99.5%. We
also validated our proposed system using another publicly available 6DMG dataset. Our proposed
system provided better recognition accuracy compared to the existing system.
Keywords: air-writing; hand pose estimation; deep learning; character recognition
1. Introduction
Over the last few decades, we have become habituated to interacting with digital
environments using touchscreens and other electronic devices for various purposes. In
order to interact with the digital world, the next wave of technology is expected to eliminate
the necessity of intermediary physical devices such as smartphones, which have the extra
load in order to carry them with us and take them out of our pockets [1]. Virtual reality
(VR) and augmented reality (AR), in which the output is frequently projected precisely into
the user ’s eyes, appear to be leading the next era of such technology [2]. Speech recognition
is one of the well-defined methods that has received great attention because it is considered
Electronics 2023, 12, 995. https://doi.org/10.3390/electronics12040995 https://www.mdpi.com/journal/electronics