Posted on 5 Sep 2024 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.172556898.80465282/v1 — This is a preprint and has not been peer-reviewed. Data may be preliminary. Emerging Techniques in Vision-Based Human Posture Detection: Machine Learning Methods and Applications Benji Peng 1,2 , Ziqian Bi 3 , Pohsun Feng 4 , Qian Niu 5 , Junyu Liu 5 , and Keyu Chen 2 1 AppCubic 2 Georgia Institute of Technology 3 Indiana University 4 National Taiwan Normal University 5 Kyoto University September 05, 2024 Abstract Human posture detection is a rapidly evolving ﬁeld with signiﬁcant implications for various applications, including healthcare, surveillance, and human-computer interaction. The continuous advancements in vision-based machine learning approaches have largely improved the accuracy and eﬃciency of human posture estimation. This review article focuses on some key aspects in this domain such as open datasets for training and validation, 2D and 3D detection methods, and novel image/video-based approaches, thereby oﬀering an overview of the latest innovations. Introduction Human pose estimation (HPE), a well-established and extensively-studied ﬁeld in computer vision (CV).(Andriluka et al., 2009; Ch´eron et al., 2015; Yao et al., 2012; Iqbal et al., 2017; Wang et al., 2013; Bilal et al., 2011; Tian et al., 2012; Yang & Ramanan, 2011) HPE determines the spatial conﬁguration of human body parts from sensor data, particularly images and videos, providing valuable geometric and motion insights, with applications ranging from human-computer interaction (Liu et al., 2022; Liu et al., 2021; Ehlers & Brama, 2016) and motion analysis (Bao et al., 2023; Xu et al., 2020; Ota et al., 2020) to Extended reality (XR) (Tome et al., 2020; Obdrˇz´alek et al., 2012) and healthcare (Jalal et al., 2017; Divya & Peter, 2022; Zhang et al., 2022). Recent advancements in deep learning (Vaswani, 2017; Li et al., 2023; Kim et al., 2021) have signiﬁcantly enhanced HPE ???, outperforming previous methods in various metrics. While novel models structures has led to remarkable progress in HPE, challenges still persist such as object occlusion, deformation, and insuﬃcient high-quality training data. Despite some inaccuracies and over-generalization, many articles in the past have surveyed and compared diﬀerent machine learning models in 2D and 3D HPE (Zheng et al., 2023; Liu et al., 2022; Chung et al., 2022; Desmarais et al., 2021; Wang et al., 2021; Dubey & Dixit, 2023). Tremendous advancements in computing power and model design over the past two years has enabled researchers’ increasingly in-depth studies on new tools such as vision transformers (Han et al., 2022), large language models (Zhang et al., 2024), and multi-modality fusion (Sapp & Taskar, 2013). This article introduces those new breakthroughs, focusing on the improvements researchers have relentlessly made upon existing models and their explorations of novel architectures. 1