Deep Learning for EMG-based Human-Machine Interaction: A Review Dezhen Xiong, Daohui Zhang, Member, IEEE, Xingang Zhao, Member, IEEE, and Yiwen Zhao Abstract—Electromyography (EMG) has already been broadly used in human-machine interaction (HMI) applications. Determining how to decode the information inside EMG signals robustly and accurately is a key problem for which we urgently need a solution. Recently, many EMG pattern recognition tasks have been addressed using deep learning methods. In this paper, we analyze recent papers and present a literature review describing the role that deep learning plays in EMG-based HMI. An overview of typical network structures and processing schemes will be provided. Recent progress in typical tasks such as movement classification, joint angle prediction, and force/torque estimation will be introduced. New issues, including multimodal sensing, inter-subject/inter-session, and robustness toward disturbances will be discussed. We attempt to provide a comprehensive analysis of current research by discussing the advantages, challenges, and opportunities brought by deep learning. We hope that deep learning can aid in eliminating factors that hinder the development of EMG-based HMI systems. Furthermore, possible future directions will be presented to pave the way for future research. Index Terms—Accuracy, deep learning, electromyography (EMG), human-machine interaction (HMI), robustness. I. Introduction E LECTROMYOGRAPHY (EMG) is the recording of electric signals generated during muscle contraction. EMG contains a large amount of information and reflects the movement intentions of a subject. EMG can be viewed as the summation of the motor unit action potential (MUAP) with noise, and can be decomposed into motor unit (MU), which are the minimum entity of the human muscle [1]. It can be classified into two classes, i.e., surface EMG (sEMG) and intramuscular EMG (iEMG), according to the electrodes’ location. The former is collected from the surface of human skin, while the latter is collected from needle electrodes planted inside the human muscle. sEMG has been widely used for hand gesture classification [2], [3], silent speech recogni- tion [4], [5], stroke rehabilitation [6], [7], robot control [8], [9], and other applications, mainly because it is cheap and easy to collect and it provides a method for more natural human-machine collaboration. Many approaches, such as video, inertial measurement units (IMU), and EMG, can be used to decode the movement intention of humans. The video-based method requires relatively higher computational resources, and it can be easily affected by environmental factors such as light change, background noise, and camera position. The IMU-based method can estimate joint angles while moving with high precision. For example, the Noraxon motion capture system 1 estimates the human joint angle using an IMU attached to the body. However, it has a larger time delay compared with EMG signals, which occur approximately 50–100 ms earlier [10], before the action happens. Moreover, it is invalid under some conditions, such with rehabilitation training of patients after stroke or prosthetic hand control of amputees, because it cannot predict actions when the limbs do not move. In contrast, an EMG provides a method for obtaining a more natural and fluent human-machine interaction (HMI) that reflects human intent physiologically. In [11], a review of EMG pattern recognition algorithms was presented. According to this paper, the typical EMG pattern recognition pipeline can be divided into three substages: 1) Preprocessing. The EMG data will be filtered to remove noise and keep the useful information unchanged. 2) Feature extraction. Time, frequency, or time-frequency domain features will be extracted for intention recognition. 3) Classification or regression. Feature extraction is of vital importance because it determines the ceiling of the recognition performance, which leads to a rise in feature engineering, which aims to provide a feature set that is optimal for representing the information from EMG to achieve better performance. Nevertheless, it is a very time-consuming task that requires professional knowledge to find the optimal feature set, which thus promotes great interest in deep learning. Deep learning belongs to representation learning, which aims to create a better representation from input data using Manuscript received August 4, 2020; revised October 29, 2020; accepted November 19, 2020. This work was supported in part by the National Natural Science Foundation of China (U1813214, 61773369, 61903360), the Self- planned Project of the State Key Laboratory of Robotics (2020-Z12), and China Postdoctoral Science Foundation funded project (2019M661155). Recommended by Associate Editor Hui Yu. (Corresponding author: Daohui Zhang and Xingang Zhao.) Citation: D. Z. Xiong, D. H. Zhang, X. G. Zhao, and Y. W. Zhao, “Deep learning for EMG-based human-machine interaction: a review,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 3, pp. 512–533, Mar. 2021. D. Z. Xiong is with the State Key Laboratory of Robotics, Shenyang Institute of Automation, Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, and also with the University of Chinese Academy of Sciences, Beijing 100049, China (e-mail: xiongdezhen@sia.cn). D. H. Zhang, X. G. Zhao, and Y. W. Zhao are with the State Key Laboratory of Robotics, Shenyang Institute of Automation, Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China (e-mail: zhangdaohui@sia.cn; zhaoxingang@sia.cn; zhaoyw@sia.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JAS.2021.1003865 1 https://www.noraxon.com/ 512 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 3, MARCH 2021