304 nature neuroscience volume 1 no 4 august 1998 articles Current theories view learning as the acquisition of specific predic- tions 1–4 . Humans and animals learn to predict the outcomes of their behavior, including rewards. Learning depends on the extent to which these outcomes are different than predicted, being governed by the discrepancy or ‘error’ between outcome and prediction. Out- comes that affect learning in this way are termed ‘reinforcers’ . Learn- ing proceeds when outcomes occur that are not fully predicted, then slows down as outcomes become increasingly predicted and ends when outcomes are fully predicted. By contrast, behavior undergoes extinction when a predicted outcome fails to occur. (In the labora- tory, predictions may fail either because the subject made an error or because the experimenter withholds the reward for correct behav- ior.) Recent learning algorithms employ errors in the prediction of outcome as teaching signals for changing synaptic weights in neu- ronal networks 5 . In these models, an unpredicted outcome leads to a positive signal, a predicted outcome to zero signal and the absence of a predicted outcome to a negative signal. The most efficient mod- els capitalize on the observation that a key component of predic- tions concerns the exact time of reinforcement 6,7 . Their teaching signals use errors in the temporal prediction of reinforcement and compute the prediction error over consecutive time steps in indi- vidual trials (‘temporal difference’ algorithm 8 ). Thus, teaching signals come to report progressively earlier reinforcement-related events and thus predict the outcome rather than simply reporting that it has occurred. They are particularly efficient for learning, as they can influence the behavioral reaction before it is executed. Reinforce- ment models that use predictive teaching signals can learn a wide variety of behavioral tasks, from balancing a pole on a cart wheel 9 to playing world-class backgammon 10 . It is therefore of interest to determine whether real nervous systems might process rewards in a similar manner during learning. Results from lesioning and psychopharmacological experiments indicate a role of dopamine systems in behavior driven by rewards and in reward-based learning 12–14 . We have studied the neural mechanisms underlying this role of dopamine in monkeys and have previously reported that midbrain dopamine neurons show responses to food and liquid rewards that depend on their pre- dictability 15,16 . The present study investigated whether these responses could have the formal characteristics of teaching signals. We found that the magnitude of dopamine responses to a juice reward reflected the degree of reward predictability during indi- vidual learning episodes. An unexpected reward evoked a strong response in dopamine neurons. As the monkeys’ performance improved (i.e. as they learned to predict which response would trig- ger a reward), the neuronal response to the reward progressively decreased. Moreover, by varying the timing of reward, we found that dopamine neurons signal not only its occurrence but also its timing relative to expectations. Thus dopamine neurons seem to track the reward prediction error and emit a signal that has all the typical characteristics of a positive reinforcing signal for learning. Results Dopamine neurons in pars compacta of the substantia nigra and the ventral tegmental area were studied while monkeys learned to associate visual stimuli with liquid reward. Dopamine neurons in these two different midbrain groups showed similar respons- Dopamine neurons report an error in the temporal prediction of reward during learning Jeffrey R. Hollerman 1,2 and Wolfram Schultz 1 1 Institute of Physiology, University of Fribourg, CH-1700 Fribourg, Switzerland 2 Present address: Department of Psychology, Allegheny College, Meadville, Pennsylvania, 16335, USA Correspondence should be addressed to W.S. (Wolfram.Schultz@unifr.ch) M any behaviors are affected by rewards, undergoing long-term changes when rewards are different than predicted but remaining unchanged when rewards occur exactly as predicted. The discrepancy between reward occurrence and reward prediction is termed an ‘error in reward prediction’. Dopamine neurons in the substantia nigra and the ventral tegmental area are believed to be involved in reward-dependent behaviors. Consistent with this role, they are activated by rewards, and because they are activated more strongly by unpredicted than by predicted rewards they may play a role in learning. The present study investigated whether monkey dopamine neurons code an error in reward prediction during the course of learning. Dopamine neuron responses reflected the changes in reward prediction during individual learning episodes; dopamine neurons were activated by rewards during early trials, when errors were frequent and rewards unpredictable, but activation was progressively reduced as performance was consolidated and rewards became more predictable. These neurons were also activated when rewards occurred at unpredicted times and were depressed when rewards were omitted at the predicted times. Thus, dopamine neurons code errors in the pre- diction of both the occurrence and the time of rewards. In this respect, their responses resemble the teaching signals that have been employed in particularly efficient computational learning models. © 1998 Nature America Inc. • http://neurosci.nature.com © 1998 Nature America Inc. • http://neurosci.nature.com