Behavioural Processes 57 (2002) 211 – 226
Teaching and learning in a probabilistic prisoner’s dilemma
Forest Baker, Howard Rachlin *
Department of Psychology, State Uniersity of New York at Stony Brook, Stony Brook, NY 11794, USA
Accepted 16 November 2001
Abstract
The prisoner’s dilemma is much studied in social psychology and decision-making because it models many
real-world conflicts. In everyday terms, the choice to ‘cooperate’ (maximize reward for the group) or ‘defect’
(maximize reward for the individual) is often attributed to altruistic or selfish motives. Alternatively, behavior during
a dilemma may be understood as a function of reinforcement and punishment. Human participants played a
prisoner’s-dilemma-type game (for points exchangeable for money) with a computer that employed either a teaching
strategy (a probabilistic version of tit-for-tat), in which the computer reinforced or punished participants’ cooperation
or defection, or a learning strategy (a probabilistic version of Pavlov), in which the computer’s responses were
reinforced and punished by participants’ cooperation and defection. Participants learned to cooperate against both
computer strategies. However, in a second experiment which varied the context of the game, they learned to cooperate
only against one or other strategy; participants did not learn to cooperate against tit-for-tat when they believed that
they were playing against another person; participants did not learn to cooperate against Pavlov when the computer’s
cooperation probability was signaled by a spinner. The results are consistent with the notion that people are biased
not only to cooperate or defect on individual social choices, but also to employ one or other strategy of interaction
in a pattern across social choices. © 2002 Elsevier Science B.V. All rights reserved.
Keywords: Prisoner’s dilemma; Cooperation; Strategies; Tit-for-tat; Pavlov; Instructions; Context; Humans
www.elsevier.com/locate/behavproc
1. Introduction
Fig. 1(a) illustrates a two-person prisoner’s
dilemma game. Each of two players (A and B)
chooses between two alternatives (C and D). If
both players ‘cooperate’ (choose C), each obtains
a reward of 5 U; if both ‘defect’ (choose D), each
obtains a reward of 2 U. However, if one player
cooperates while the other defects, the cooperator
obtains only 1 U, while the defector obtains 6 U.
The dilemma posed by the game is a conflict
between the choice (defect) that maximizes each
individual player’s reward and the choice (cooper-
ate) that maximizes reward for the group as a
whole. Consider the game of Fig. 1(a) from the
viewpoint of Player A: if Player B cooperates,
Player A’s reward is constrained to the upper two
boxes; 5 U for cooperating versus 6 U for defect-
ing; if Player B defects, Player A’s reward is
confined to the lower two boxes; 1 U for cooper-
ating versus 2 U for defecting. Regardless of
whether Player B cooperates or defects, Player A’s
reward is 1 U higher for defecting than for coop-
erating. Player A therefore maximizes reward by
* Corresponding author. Tel.: +1-631-632-7807; fax: +1-
631-632-7876.
E-mail address: howard.rachlin@sunysb.edu (H. Rachlin).
0376-6357/02/$ - see front matter © 2002 Elsevier Science B.V. All rights reserved.
PII:S0376-6357(02)00015-3