304 nature neuroscience • volume 1 no 4 • august 1998
articles
Current theories view learning as the acquisition of specific predic-
tions
1–4
. Humans and animals learn to predict the outcomes of their
behavior, including rewards. Learning depends on the extent to
which these outcomes are different than predicted, being governed
by the discrepancy or ‘error’ between outcome and prediction. Out-
comes that affect learning in this way are termed ‘reinforcers’ . Learn-
ing proceeds when outcomes occur that are not fully predicted, then
slows down as outcomes become increasingly predicted and ends
when outcomes are fully predicted. By contrast, behavior undergoes
extinction when a predicted outcome fails to occur. (In the labora-
tory, predictions may fail either because the subject made an error or
because the experimenter withholds the reward for correct behav-
ior.) Recent learning algorithms employ errors in the prediction of
outcome as teaching signals for changing synaptic weights in neu-
ronal networks
5
. In these models, an unpredicted outcome leads to
a positive signal, a predicted outcome to zero signal and the absence
of a predicted outcome to a negative signal. The most efficient mod-
els capitalize on the observation that a key component of predic-
tions concerns the exact time of reinforcement
6,7
. Their teaching
signals use errors in the temporal prediction of reinforcement and
compute the prediction error over consecutive time steps in indi-
vidual trials (‘temporal difference’ algorithm
8
). Thus, teaching signals
come to report progressively earlier reinforcement-related events
and thus predict the outcome rather than simply reporting that it
has occurred. They are particularly efficient for learning, as they can
influence the behavioral reaction before it is executed. Reinforce-
ment models that use predictive teaching signals can learn a wide
variety of behavioral tasks, from balancing a pole on a cart wheel
9
to playing world-class backgammon
10
. It is therefore of interest to
determine whether real nervous systems might process rewards in
a similar manner during learning.
Results from lesioning and psychopharmacological experiments
indicate a role of dopamine systems in behavior driven by rewards
and in reward-based learning
12–14
. We have studied the neural
mechanisms underlying this role of dopamine in monkeys and have
previously reported that midbrain dopamine neurons show
responses to food and liquid rewards that depend on their pre-
dictability
15,16
. The present study investigated whether these
responses could have the formal characteristics of teaching signals.
We found that the magnitude of dopamine responses to a juice
reward reflected the degree of reward predictability during indi-
vidual learning episodes. An unexpected reward evoked a strong
response in dopamine neurons. As the monkeys’ performance
improved (i.e. as they learned to predict which response would trig-
ger a reward), the neuronal response to the reward progressively
decreased. Moreover, by varying the timing of reward, we found
that dopamine neurons signal not only its occurrence but also its
timing relative to expectations. Thus dopamine neurons seem to
track the reward prediction error and emit a signal that has all the
typical characteristics of a positive reinforcing signal for learning.
Results
Dopamine neurons in pars compacta of the substantia nigra and
the ventral tegmental area were studied while monkeys learned to
associate visual stimuli with liquid reward. Dopamine neurons
in these two different midbrain groups showed similar respons-
Dopamine neurons report an error in
the temporal prediction of reward
during learning
Jeffrey R. Hollerman
1,2
and Wolfram Schultz
1
1
Institute of Physiology, University of Fribourg, CH-1700 Fribourg, Switzerland
2
Present address: Department of Psychology, Allegheny College, Meadville, Pennsylvania, 16335, USA
Correspondence should be addressed to W.S. (Wolfram.Schultz@unifr.ch)
M any behaviors are affected by rewards, undergoing long-term changes when rewards are different
than predicted but remaining unchanged when rewards occur exactly as predicted. The discrepancy
between reward occurrence and reward prediction is termed an ‘error in reward prediction’.
Dopamine neurons in the substantia nigra and the ventral tegmental area are believed to be
involved in reward-dependent behaviors. Consistent with this role, they are activated by rewards,
and because they are activated more strongly by unpredicted than by predicted rewards they may
play a role in learning. The present study investigated whether monkey dopamine neurons code an
error in reward prediction during the course of learning. Dopamine neuron responses reflected the
changes in reward prediction during individual learning episodes; dopamine neurons were activated
by rewards during early trials, when errors were frequent and rewards unpredictable, but activation
was progressively reduced as performance was consolidated and rewards became more predictable.
These neurons were also activated when rewards occurred at unpredicted times and were depressed
when rewards were omitted at the predicted times. Thus, dopamine neurons code errors in the pre-
diction of both the occurrence and the time of rewards. In this respect, their responses resemble the
teaching signals that have been employed in particularly efficient computational learning models.
© 1998 Nature America Inc. • http://neurosci.nature.com
© 1998 Nature America Inc. • http://neurosci.nature.com