doi:10.3311/BMEZalaZONE2022-014 Fehér et al.: Double Lane Change Path Planning Using Reinforcement Learning with Field Tests 67 Double Lane Change Path Planning Using Reinforcement Learning with Field Tests Árpád Fehér 1 , Szilárd Aradi 1a , Tamás Bécsi 1 1 Department of Control for Transportation and Vehicle Systems, Faculty of Transportation Engineering and Vehicle Engineering, Budapest University of Technology and Economics a Corresponding author: aradi.szilard@kjk.bme.hu Abstract Performing dynamic double lane-change maneuvers can be a challenge for highly automated vehicles. The algorithm must meet safety requirements while keeping the vehicle stable and controllable. The problem of path planning is numerically complex and must be run at a high refresh rate. The article presents a new approach to avoiding obstacles for autonomous vehicles. To solve this problem, a geometric path generation is provided by a single-step continuous Reinforcement Learning (RL) agent. At the same time, a model-predictive controller (MPC) handles the lateral control to perform the dual lane-change maneuver. The task of the learning agent in this architecture is optimization. It is trained for different scenarios to provide geometric route planning parameters at the output of a neural network. During training, the goodness of the generated track is evaluated using an MPC controller. A hardware architecture was developed to test the local planner on a test track. The real-time operation of the planner has been proven. Its performance has also been compared to human drivers. Keywords: Local path planning, Model predictive control, Reinforcement learning, Vehicle dynamics 1 Introduction With the beginning of the 2010s, machine learning, in-depth learning and artificial intelligence have undergone rapid development. In addition to classical control planning and decision algorithms, they are advancing in solving control tasks, especially for various vehicle control tasks. It seems that combining artificial intelligence-based developments with control techniques can be an effective method for autonomous vehicle controls. All vehicles need to drive the most optimal route when performing a critical maneuver, such as a double lane change. Several optimization criteria can be specified to minimize jerk and lateral acceleration. The moose test defined by ISO 3888-2 is a good tool for testing the stability of a vehicle in a dynamic limit situation. The best path and trajectory planning algorithms for fully autonomous vehicle functions were covered in the following review publications [1], [2], [3]. A path is a sequence of waypoints that the vehicle must follow, referred to as a trajectory with supplied velocity information. Several solvers can handle the entire constrained optimization problem, albeit it presents the issue of real- timeness [4]. One approach is to use deep learning to train a neural network for the solutions of many optimization results and use it as a practical solution or an initial guess for the solver [5], which can work for simple setups but is a difficult task to cover the entire state space in more complex scenarios. Another method is using Reinforcement Learning (RL). The agent interacts with its environment based on trial- and-error and previous experiences and learns the best behavior using performance measurements called rewards [6]. These techniques often use end-to-end solutions, which means the agent responds to steering and acceleration demands. The trajectory planning is planted somewhere in the knowledge acquired via training. Many sensor models, such as grid-based [7], beam sensors [8], camera [9], or ground truth position information [10], can be utilized since the agent can cope with unstructured data. The other end of the RL-based research spectrum focuses on strategic decisions, defining high-level actions, and delegating task execution to an underlying controller, which is beyond the scope of this study. Only a few solutions exist in the literature where the path planning is done with RL [11], [12]. A survey on RL-based motion planning can be found in [13].