ORGANIZATIONAL BEHAVIOR AND HUMAN PERFORMANCE 26, 149-171 (1980)
Training for Calibration
SARAH LICHTENSTEIN AND BARUCH FISCHHOFF
Decision Research
Two experiments attempted to improve the quality of people's probability
assessments through intensive training. The first involved 11 sessions of 200
assessments ezch followed by comprehensive feedback. It produced consider-
able learning, almost all of which was accomplished after receipt of the first
feedback. There was modest generalization to several related probability as-
sessment tasks, but no generalization at all to two others. The second experi-
ment reduced the training to three sessions. It revealed the same pattern of
learning and limited generalization. About one-third of all subjects appeared to
use probabilities quite appropriately on some tasks before training began.
Further research is needed to understand why the training worked as well as it
did, why that training did not always generalize, and why some individuals
seemed to need no training at all.
According to the subjectivist, or Bayesian, position, all probability as-
sessments are expressions of confidence in the state of one's knowledge
(deFinetti, 1937; Phillips, 1973). All may be cast in the form "The proba-
bility that proposition A is true is .XX." While probability statements
express an internal state, degree of belief, they can be evaluated by exter-
nal measures of goodness. For example, sets of probabilities must con-
form with the laws of probability theory (e.g., P(A) + P(A) = 1). Another
aspect of goodness is the correspondence between the probability as-
sessments and the truth of the propositions. If such assessments are ap-
propriate reflections of how much one knows, there should be a system-
atic relationship between probability and truth. The formalization of this
property is called "calibration." An assessor is considered to be Well
calibrated if, over the long run, for all propositions assigned a given prob-
ability, the proportion true equals the probability assigned. Thus across
all the occasions that the assessor assigns the probability .7, 70% should
be true; for all propositions to which .8 has been assigned, 80% should be
true, and so forth.
A great deal of empirical research (reviewed by Lichtenstein, Fisch-
Our deepest thanks to Gerry Hanson for conducting this experiment, to Barbara Combs
and Peggy Roecker for compiling the enormous item pool needed, to Bernie Corrigan for
programming, and to Ruth Phelps, Gordon Pitz, and Paul Slovic for their comments on this
project. This research was supported by Contract DAHC 19-77-C-0019 from the Army
Research Institute to Perceptronics, Inc. Correspondence may be addressed to either author
at Decision Research, A Branch of Perceptronics, 1201 Oak Street, Eugene, OR 97401.
149
0030-5073/80/050149-23502.00/0
Copyright © 1980 by Academic Press, Inc.
All rights of reproduction in any form reserved.