robotics Article On the Impact of Gravity Compensation on Reinforcement Learning in Goal-Reaching Tasks for Robotic Manipulators Jonathan Fugal 1 , Jihye Bae 1, * and Hasan A. Poonawala 2, *   Citation: Fugal, J.; Bae, J.; Poonawala, H.A. On the Impact of Gravity Compensation on Reinforcement Learning in Goal-Reaching Tasks for Robotic Manipulators. Robotics 2021, 10, 46. https://dx.doi.org/10.3390/ robotics10010046 Received: 24 December 2020 Accepted: 1 March 2021 Published: 9 March 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional afﬁl- iations. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1 Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY 40506, USA; jnephi12@gmail.com 2 Department of Mechanical Engineering, University of Kentucky, Lexington, KY 40506, USA * Correspondence: jihye.bae@uky.edu (J.B.);hasan.poonawala@uky.edu (H.A.P.); Tel.: +1-859-257-8043 (J.B.); +1-859-323-7436 (H.A.P.) Abstract: Advances in machine learning technologies in recent years have facilitated developments in autonomous robotic systems. Designing these autonomous systems typically requires manually spec- iﬁed models of the robotic system and world when using classical control-based strategies, or time consuming and computationally expensive data-driven training when using learning-based strate- gies. Combination of classical control and learning-based strategies may mitigate both requirements. However, the performance of the combined control system is not obvious given that there are two separate controllers. This paper focuses on one such combination, which uses gravity-compensation together with reinforcement learning (RL). We present a study of the effects of gravity compensation on the performance of two reinforcement learning algorithms when solving reaching tasks using a simulated seven-degree-of-freedom robotic arm. The results of our study demonstrate that gravity compensation coupled with RL can reduce the training required in reaching tasks involving elevated target locations, but not all target locations. Keywords: robotics; control; reinforcement learning; physics-based machine learning 1. Introduction Autonomous robotic systems are widely recognized as a worthwhile technological goal for humanity to achieve. Autonomy requires solving a multitude of decision problems, from high level semantic reasoning [1] to low level continuous control input selection [2]. In this paper, we focus on continuous controller design for autonomous robot motion control. A typical controller takes in a feedback signal, containing state information, as well as a reference point and computes an actuation command. There are various ways to develop autonomous robotic control systems such as fuzzy logic [3,4], adaptive control [5,6], behavioral control theory [7,8], traditional robot control theory [2], inverse reinforcement learning [9,10], and reinforcement learning [11–13]. Control theory provides a methodology for creating controllers for dynamical systems in order to accomplish a speciﬁed task [14]. These methods are model-based, with the advantage that the performance of such controllers maybe characterized and even guaran- teed before deployment. The use of models may be thought of as control based on indirect experience or knowledge. The limitation of model-based control approaches in robotic autonomy is the difﬁculty of obtaining accurate system models. Reinforcement Learning (RL), in contrast to traditional robot control, aims to learn controllers from direct experience, and any knowledge gained thereof. Therefore, with RL, robots can learn novel behaviors even under changing environment. This is of great beneﬁt for real world implementations. One example we can consider is brain machine interfaces (BMIs), which require real time adaptation of robot behaviors based on the user’s intention and changing environment [15,16]. However, approaches that use RL are, often intentionally, ignorant of the system dynamics and task. They learn controllers Robotics 2021, 10, 46. https://doi.org/10.3390/robotics10010046 https://www.mdpi.com/journal/robotics