Citation: Watanabe, T.; Maniruzzaman, M.; Hasan, M.A.M.; Lee, H.-S.; Jang, S.-W.; Shin, J. 2D Camera-Based Air-Writing Recognition Using Hand Pose Estimation and Hybrid Deep Learning Model. Electronics 2023, 12, 995. https://doi.org/10.3390/ electronics12040995 Academic Editors: Juan M. Corchado, In Lee, Fuji Ren, Rashid Mehmood, Byung-Gyu Kim and Carlos A. Iglesias Received: 13 January 2023 Revised: 8 February 2023 Accepted: 14 February 2023 Published: 16 February 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). electronics Article 2D Camera-Based Air-Writing Recognition Using Hand Pose Estimation and Hybrid Deep Learning Model Taiki Watanabe 1 , Md. Maniruzzaman 1 , Md. Al Mehedi Hasan 2 , Hyoun-Sup Lee 3 , Si-Woong Jang 4 and Jungpil Shin 1, * 1 School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Fukushima, Japan 2 Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh 3 Department of Applied Software Engineering, Dongeui University, Busanjin-Gu, Busan 47340, Republic of Korea 4 Department of Computer Engineering, Dongeui University, Busanjin-Gu, Busan 47340, Republic of Korea * Correspondence: jpshin@u-aizu.ac.jp Abstract: Air-writing is a modern human–computer interaction technology that allows participants to write words or letters with finger or hand movements in free space in a simple and intuitive manner. Air-writing recognition is a particular case of gesture recognition in which gestures can be matched to write characters and digits in the air. Air-written characters show extensive variations depending on the various writing styles of participants and their speed of articulation, which presents quite a difficult task for effective character recognition. In order to solve these difficulties, this current work proposes an air-writing system using a web camera. The proposed system consists of two parts: alphabetic recognition and digit recognition. In order to assess our proposed system, two character datasets were used: an alphabetic dataset and a numeric dataset. We collected samples from 17 participants and asked each participant to write alphabetic characters (A to Z) and numeric digits (0 to 9) about 5–10 times. At the same time, we recorded the position of the fingertips using MediaPipe. As a result, we collected 3166 samples for the alphabetic dataset and 1212 samples for the digit dataset. First, we preprocessed the dataset and then created two datasets: image data and padding sequential data. The image data were fed into the convolution neural networks (CNN) model, whereas the sequential data were fed into bidirectional long short-term memory (BiLSTM). After that, we combined these two models and trained again with 5-fold cross-validation in order to increase the character recognition accuracy. In this work, this combined model is referred to as a hybrid deep learning model. Finally, the experimental results showed that our proposed system achieved an alphabet recognition accuracy of 99.3% and a digit recognition accuracy of 99.5%. We also validated our proposed system using another publicly available 6DMG dataset. Our proposed system provided better recognition accuracy compared to the existing system. Keywords: air-writing; hand pose estimation; deep learning; character recognition 1. Introduction Over the last few decades, we have become habituated to interacting with digital environments using touchscreens and other electronic devices for various purposes. In order to interact with the digital world, the next wave of technology is expected to eliminate the necessity of intermediary physical devices such as smartphones, which have the extra load in order to carry them with us and take them out of our pockets [1]. Virtual reality (VR) and augmented reality (AR), in which the output is frequently projected precisely into the user ’s eyes, appear to be leading the next era of such technology [2]. Speech recognition is one of the well-defined methods that has received great attention because it is considered Electronics 2023, 12, 995. https://doi.org/10.3390/electronics12040995 https://www.mdpi.com/journal/electronics