Anatomy of a Decision: Striato-Orbitofrontal Interactions in Reinforcement Learning, Decision Making, and Reversal Michael J. Frank and Eric D. Claus University of Colorado at Boulder The authors explore the division of labor between the basal ganglia– dopamine (BG-DA) system and the orbitofrontal cortex (OFC) in decision making. They show that a primitive neural network model of the BG-DA system slowly learns to make decisions on the basis of the relative probability of rewards but is not as sensitive to (a) recency or (b) the value of specific rewards. An augmented model that explores BG-OFC interactions is more successful at estimating the true expected value of decisions and is faster at switching behavior when reinforcement contingencies change. In the augmented model, OFC areas exert top-down control on the BG and premotor areas by representing reinforcement magnitudes in working memory. The model successfully captures patterns of behavior resulting from OFC damage in decision making, reversal learning, and devaluation paradigms and makes additional predictions for the underlying source of these deficits. Keywords: decision making, neural network, basal ganglia, orbitofrontal cortex, reinforcement learning What enables humans to make choices that lead to long-term gains, even when having to incur short-term losses? Such decision- making skills depend on the processes of action selection (choos- ing between one of several possible responses) and reinforcement learning (modifying the likelihood of selecting a given response on the basis of experienced consequences). Although all mammals can learn to associate their actions with consequences, humans are particularly advanced in their ability to flexibly modify the relative reinforcement values of alternative choices to select the most adaptive behavior in a particular behavioral, spatial, and temporal context. The behavioral and cognitive neurosciences have identified two neural systems that are involved in such adaptive behavior. On the one hand, the basal ganglia (BG) and the neuromodulator dopa- mine (DA) are thought to participate in both action selection and reinforcement learning (Beiser & Houk, 1998; Brown, Bullock, & Grossberg, 1999, 2004; Frank, 2005; Frank, Loughry, & O’Reilly, 2001; Gurney, Prescott, & Redgrave, 2001; Mink, 1996; O’Reilly & Frank, 2006). Patients with Parkinson’s disease (PD), who have low levels of DA in the BG, are impaired at making choices that require learning from trial and error (Cools, 2005; Knowlton, Mangels, & Squire, 1996; Shohamy et al., 2004). Biologically based computational models demonstrate how the BG-DA system can learn to make adaptive choices (Brown et al., 2004; Frank, 2005) and provide an account for how this is impaired in PD (Frank, 2005). On the other hand, various lines of evidence suggest that ven- tromedial and orbitofrontal cortices are critical for adaptive deci- sion making in humans and that homologous areas support more primitive forms of this behavior in animals (Kringelbach & Rolls, 2004; Rolls, 1996; Schoenbaum, Setlow, Saddoris, & Gallagher, 2003; Tremblay & Schultz, 2000). Patients with orbitofrontal cortex (OFC) damage exhibit decision-making deficits in their everyday lives, which have also been documented in the laboratory (Bechara, Damasio, Tranel, & Anderson, 1998). Drug abusers, who are almost by definition poor decision makers, have reduced OFC metabolism and gray matter volume (Milham et al., 2006; Volkow, Fowler, & Wang, 2003). Finally, OFC lesions impair one’s ability to learn when previous reward associations no longer apply, as in reversal learning (Chudasama & Robbins, 2003; Jones & Mishkin, 1972). Thus, both the BG-DA and OFC systems have been implicated in decision making and reinforcement and reversal learning, but surprisingly little theoretical work addresses how these systems are related or interact. Given that the OFC is a recent structure phylogenetically, a reasonable question to ask is “What unique function does the OFC contribute to decision making that is not supported by the more primitive BG-DA system?” In this article, we extend a previous neural network model of the BG-DA system (Frank, 2005) to explore additional contributions of the OFC that enable adaptive and flexible decision making. In brief, this account is consistent with the idea that the BG system is specialized to slowly integrate positive and negative outcomes over multiple trials, resulting in the ingraining of motor habits (Jog, Kubota, Connolly, Hillegaart, & Graybiel, 1999). The model accomplishes this by learning go to facilitate responses that gen- erally lead to positive outcomes while concurrently learning no-go to suppress inappropriate responses (Frank, 2005). In contrast, the prefrontal cortex (PFC) actively maintains information in working memory via persistent neural firing (Fuster, 1997; Goldman-Rakic, 1995; Miller, Erickson, & Desimone, 1996), and this has a top- Michael J. Frank and Eric D. Claus, Department of Psychology and Center for Neuroscience, University of Colorado at Boulder. This research was supported by Office of Naval Research Grant N00014-03-1– 0428 and National Institutes of Health Grant MH069597- 01. We thank Seth Herd and Randy O’Reilly for helpful discussion of these ideas. Correspondence concerning this article should be addressed to Michael J. Frank, who is now at the Laboratory for Neural Computation and Cognition, Department of Psychology and Program in Neuroscience, Uni- versity of Arizona, 1503 East University Boulevard, Building 68, Tucson, AZ 85721. E-mail: mfrank@u.arizona.edu Psychological Review Copyright 2006 by the American Psychological Association 2006, Vol. 113, No. 2, 300 –326 0033-295X/06/$12.00 DOI: 10.1037/0033-295X.113.2.300 300