New GenerationComputing,24(2006)325-350 Ohmsha, Ltd. and Springer EW GEtRATIoN COMPUTING 9 Ltd. 2006 Tutorial Series on Brain-Inspired Computing Part 4: Reinforcement Learning: Machine Learning and Natural Learning Shin ISHII and Wako YOSHIDA Nara Institute of Science and Technology 8915-5 Takayama, Ikorna, Nara 630-0192, Japan {ishii, wako-y}r naist, jp Received 29 October 2005 Revised manuscript received 28 February 2006 Abstract The theory of reinforcement learning (RL) was originally motivated by animal learning of sequential behavior, but has been developed and extended in the field of machine learning as an approach to Markov decision processes. Recently, a number of neuroscience studies have suggested a relationship between reward-related activities in the brain and functions necessary for RL. Regarding the history of RL, we introduce in this article the theory of RL and present two engineering applications. Then we discuss possible implementations in the brain. Keywords: Reinforcement Learning, Temporal Difference, Actor-critic, Re- ward System, Dopamine. w Introduction When a rodent is placed in a box, called a Skinner box, the rodent receives food when it happens to press the lever attached to the box. By continually receiving food following lever presses, the rodent associates a cause, the lever press, with an effect, the food, and comes to be motivated to press the lever; that is, a reinforcement occurs. This situation, in which an animal's behavior is modified according to its outcome, is called 'the law of effect', and is considered to be the most primitive aspect of behavioral learning. Although this example was a simple association between a cause and an effect, the case of general motor controls is more complicated. Let us consider, for example, learning to snowboard. When a learner is successful in performing well, on one hand, this outcome is seen as cool and so the behavior seems to be rewarded. Tumbling on the snow, on the other hand, could be painful and thus this outcome could be viewed as a punishment. Intuitively, learning complicated motor controls