BATCH REINFORCEMENT LEARNING An Application to a Controllable Semi-active Suspension System Simone Tognetti, Marcello Restelli, Sergio M. Savaresi Dipartimento di elettronica e informazione, Politecnico di Milano, via Ponzio 34/5, 20133 Milano, Italy Cristiano Spelta Dipartimento di Ingegneria dellInformazione e Metodi Matematici, Universit degli Studi di Bergamo viale Marconi 5, 24044 Dalmine (BG), Italy Keywords: Batch-reinforcement learning, Control theory, Non linear optimal control, Semi-active suspension. Abstract: The design problem of optimal comfort-oriented semi-active suspension has been addressed with different standard techniques which failed to come out with an optimal strategy because the system is hard non-linear and the solution is too complex to be found analytically. In this work, we aimed at solving such complex problem by applying Batch Reinforcement Learning (BRL), that is an artificial intelligence technique that approximates the solution of optimal control problems without knowing the system dynamics. Recently, a quasi optimal strategy for semi-active suspension has been designed and proposed: the Mixed SH-ADD algorithm, which the strategy designed in this paper is compared to. We show that an accurately tuned BRL provides a policy able to guarantee the overall best performance. 1 INTRODUCTION Among the many different types of controlled suspen- sion systems (see e.g., (Sammier et al., 2003; Savaresi et al., 2005; Silani et al., 2002)), semi-active sus- pensions have received a lot of attention since they provide the best compromise between cost (energy- consumption and actuators/sensors hardware) and performance. The research activity on controllable suspensions develops along two mainstreams: the de- velopment of reliable, high-performance, and cost- effective semi-active controllable shock-absorbers (Electro-Hydraulic or Magneto-Rheological see e.g., (Ahmadian et al., 2001; Guardabassi and Savaresi, 2001; Valasek et al., 1998; Williams, 1997)), and the development of control strategies and algorithms which can fully exploit the potential advantages of controllable shock-absorbers. This work focuses on the control-design issue for road vehicles. The design problem of optimal comfort oriented semi-active suspension has been addressed with dif- ferent standard techniques which failed to came out with an optimal strategy because the system is hard non-linear and the solution is too complex to be found analytically. The literature offers many contributions that provide approximate solutions to the non-linear problem, or alternatively, the non-linearity is par- tially removed to exploit linear techniques (see e.g., (Karnopp and Crosby, 1974; Sammier et al., 2003)- (Savaresi and Spelta, 2008; Valasek et al., 1998)). In this work, we aimed at solving the optimal con- trol problem of comfort-oriented semi-active suspen- sion by using Batch Reinforcement Learning (BRL). Developed in the artificial intelligent research field, BRL provides numerical algorithms able to approxi- mate the solution of an optimal-control problem with- out knowing the system dynamics (see (Kaelbling et al., 1996) and (Sutton and Barto, 1998)). The al- gorithm is independent from the model complexity and can be trained on the real system without knowing its dynamics. We compared the strategy obtained by BRL with the ones given by the state-of-the-art semi- active control algorithms. We showed that an accu- rately tuned BRL provides a policy able to guarantee the overall best performance. The outline of the paper is as follows. In Sec- tion 2 the control problem is stated. Section 3 re- calls the BRL technique. Section 4 sums up the de- sign of BRL-based control rule. Section 5 motivates the choice of algorithm parameters, section 6 presents 228 Tognetti S., Restelli M., M. Savaresi S. and Spelta C. (2009). BATCH REINFORCEMENT LEARNING - An Application to a Controllable Semi-active Suspension System. In Proceedings of the 6th International Conference on Informatics in Control, Automation and Robotics - Intelligent Control Systems and Optimization, pages 228-233 DOI: 10.5220/0002210302280233 Copyright c SciTePress