A Machine Learning System for Controlling a Rescue Robot Timothy Wiley 1 , Ivan Bratko 2 , and Claude Sammut 3 1 School of Computer Science and Engineering The University of New South Wales Sydney, Australia {t.wiley,c.sammut}@unsw.edu.au 2 Faculty of Computer and Information Science The University of Ljubljana, Slovenia bratko@fri.uni-lj.si Abstract. Many rescue robots are reconfigurable, having subtracks (or flippers) that can be adjusted to help the robot traverse different types of terrain. Knowing how to adjust them requires skill on the part of the operator. If the robot is intended to run autonomously, the control system must have an understanding of how the flippers affect the robot’s interaction with the ground. We describe a system that first learns the effects of a robot’s actions and then uses this knowledge to plan how to reconfigure the robot’s tracks so that it can overcome different types of obstacles. The system is a hybrid of qualitative symbolic learning and reinforcement learning. Keywords: Machine Learning, Qualitative Models, Rescue Robots 1 Introduction Driving a wheeled robot on flat ground is relatively straightforward once the robot has a map of its environment. The operator only needs to control the steering and speed. Driving a tracked vehicle over rough terrain is much more difficult, especially if the vehicle has subtracks, or flippers, because the operator must make decisions about the configuration of the flippers, as well as steering and speed. Thus, subtracks give the robot greater terrain traversal capabilities at the expense of greater control complexity. Reconfigurable robots are used commonly in urban search and rescue, but are mostly tele-operated. However, remote control is impossible when there is a loss of communication. Therefore, rescue robots need at least enough autonomy to be able to navigate out of a radio dropout zone. The goal of this research is to develop an autonomous driving system for reconfigurable robots. Since the interactions of the robot with the terrain in a disaster site are extremely difficulty to predict, we use a learning system to build a model of how control actions, including changing flipper angles, affect the robot’s state. Once we have the model, the driving system can plan a sequence of actions to achieve the desired goal state.