Adding Prediction Risk to the Theory of Reward Learning KERSTIN PREUSCHOFF AND PETER BOSSAERTS Computation and Neural Systems, California Institute of Technology, Pasadena, California, USA ABSTRACT: This article analyzes the simple Rescorla–Wagner learning rule from the vantage point of least squares learning theory. In partic- ular, it suggests how measures of risk, such as prediction risk, can be used to adjust the learning constant in reinforcement learning. It argues that prediction risk is most effectively incorporated by scaling the pre- diction errors. This way, the learning rate needs adjusting only when the covariance between optimal predictions and past (scaled) prediction er- rors changes. Evidence is discussed that suggests that the dopaminergic system in the (human and nonhuman) primate brain encodes prediction risk, and that prediction errors are indeed scaled with prediction risk (adaptive encoding). KEYWORDS: reinforcement learning; learning rate; least squares learn- ing; dopaminergic system; reward anticipation; prediction risk; uncer- tainty; adaptive encoding INTRODUCTION Major progress has been made in understanding the way the primate brain learns to anticipate uncertain rewards and about the crucial role of the dopamin- ergic system in such learning. Much of this work has been driven by reinforce- ment learning (RL) whereby prediction errors in trial t, e t , lead to updates of the prediction x t of the reward (payoff) p t using the Rescorla–Wagner (RW) learning rule: x t +1 = x t + + e t where is the learning constant (e t = p t -x t ; for generality, we added a constant to the usual formulation). The activation patterns of dopaminergic neurons in the nonhuman primate brain 1 and of subcortical dopaminoceptive areas of the human brain 2,3 have recently been formalized in terms of such RL models. Address for correspondence: Peter Bossaerts, m/c 228-77 California Institute of Technology, Pasadena, CA 91125, USA. Voice: +1-626-395-4028; fax: +1-626-405-9841. pbs@rioja.caltech.edu Ann. N.Y. Acad. Sci. 1104: 135–146 (2007). C 2007 New York Academy of Sciences. doi: 10.1196/annals.1390.005 135