Behavioural Processes 57 (2002) 211 – 226 Teaching and learning in a probabilistic prisoner’s dilemma Forest Baker, Howard Rachlin * Department of Psychology, State Uniersity of New York at Stony Brook, Stony Brook, NY 11794, USA Accepted 16 November 2001 Abstract The prisoner’s dilemma is much studied in social psychology and decision-making because it models many real-world conﬂicts. In everyday terms, the choice to ‘cooperate’ (maximize reward for the group) or ‘defect’ (maximize reward for the individual) is often attributed to altruistic or selﬁsh motives. Alternatively, behavior during a dilemma may be understood as a function of reinforcement and punishment. Human participants played a prisoner’s-dilemma-type game (for points exchangeable for money) with a computer that employed either a teaching strategy (a probabilistic version of tit-for-tat), in which the computer reinforced or punished participants’ cooperation or defection, or a learning strategy (a probabilistic version of Pavlov), in which the computer’s responses were reinforced and punished by participants’ cooperation and defection. Participants learned to cooperate against both computer strategies. However, in a second experiment which varied the context of the game, they learned to cooperate only against one or other strategy; participants did not learn to cooperate against tit-for-tat when they believed that they were playing against another person; participants did not learn to cooperate against Pavlov when the computer’s cooperation probability was signaled by a spinner. The results are consistent with the notion that people are biased not only to cooperate or defect on individual social choices, but also to employ one or other strategy of interaction in a pattern across social choices. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Prisoner’s dilemma; Cooperation; Strategies; Tit-for-tat; Pavlov; Instructions; Context; Humans www.elsevier.com/locate/behavproc 1. Introduction Fig. 1(a) illustrates a two-person prisoner’s dilemma game. Each of two players (A and B) chooses between two alternatives (C and D). If both players ‘cooperate’ (choose C), each obtains a reward of 5 U; if both ‘defect’ (choose D), each obtains a reward of 2 U. However, if one player cooperates while the other defects, the cooperator obtains only 1 U, while the defector obtains 6 U. The dilemma posed by the game is a conﬂict between the choice (defect) that maximizes each individual player’s reward and the choice (cooper- ate) that maximizes reward for the group as a whole. Consider the game of Fig. 1(a) from the viewpoint of Player A: if Player B cooperates, Player A’s reward is constrained to the upper two boxes; 5 U for cooperating versus 6 U for defect- ing; if Player B defects, Player A’s reward is conﬁned to the lower two boxes; 1 U for cooper- ating versus 2 U for defecting. Regardless of whether Player B cooperates or defects, Player A’s reward is 1 U higher for defecting than for coop- erating. Player A therefore maximizes reward by * Corresponding author. Tel.: +1-631-632-7807; fax: +1- 631-632-7876. E-mail address: howard.rachlin@sunysb.edu (H. Rachlin). 0376-6357/02/$ - see front matter © 2002 Elsevier Science B.V. All rights reserved. PII:S0376-6357(02)00015-3