ORGANIZATIONAL BEHAVIOR AND HUMAN PERFORMANCE 26, 149-171 (1980) Training for Calibration SARAH LICHTENSTEIN AND BARUCH FISCHHOFF Decision Research Two experiments attempted to improve the quality of people's probability assessments through intensive training. The first involved 11 sessions of 200 assessments ezch followed by comprehensive feedback. It produced consider- able learning, almost all of which was accomplished after receipt of the first feedback. There was modest generalization to several related probability as- sessment tasks, but no generalization at all to two others. The second experi- ment reduced the training to three sessions. It revealed the same pattern of learning and limited generalization. About one-third of all subjects appeared to use probabilities quite appropriately on some tasks before training began. Further research is needed to understand why the training worked as well as it did, why that training did not always generalize, and why some individuals seemed to need no training at all. According to the subjectivist, or Bayesian, position, all probability as- sessments are expressions of confidence in the state of one's knowledge (deFinetti, 1937; Phillips, 1973). All may be cast in the form "The proba- bility that proposition A is true is .XX." While probability statements express an internal state, degree of belief, they can be evaluated by exter- nal measures of goodness. For example, sets of probabilities must con- form with the laws of probability theory (e.g., P(A) + P(A) = 1). Another aspect of goodness is the correspondence between the probability as- sessments and the truth of the propositions. If such assessments are ap- propriate reflections of how much one knows, there should be a system- atic relationship between probability and truth. The formalization of this property is called "calibration." An assessor is considered to be Well calibrated if, over the long run, for all propositions assigned a given prob- ability, the proportion true equals the probability assigned. Thus across all the occasions that the assessor assigns the probability .7, 70% should be true; for all propositions to which .8 has been assigned, 80% should be true, and so forth. A great deal of empirical research (reviewed by Lichtenstein, Fisch- Our deepest thanks to Gerry Hanson for conducting this experiment, to Barbara Combs and Peggy Roecker for compiling the enormous item pool needed, to Bernie Corrigan for programming, and to Ruth Phelps, Gordon Pitz, and Paul Slovic for their comments on this project. This research was supported by Contract DAHC 19-77-C-0019 from the Army Research Institute to Perceptronics, Inc. Correspondence may be addressed to either author at Decision Research, A Branch of Perceptronics, 1201 Oak Street, Eugene, OR 97401. 149 0030-5073/80/050149-23502.00/0 Copyright © 1980 by Academic Press, Inc. All rights of reproduction in any form reserved.