Psychological Assessment
2013, Vol. 25, No. 1.94-104
© 2012 American Psychological Associarion
1O4O-359O/13/$I2.OO DOI: 10.1037/a0029061
A Generalizability Analysis of Score Consistency for the Balanced
Inventory of Desirable Responding
Walter P. Vispoel and Shuqin Tao
University of Iowa
Our goal in this investigation was to evaluate thereliabilityof scoresfromthe Balanced Inventory of Desirable
Responding (BIDR) more comprehensively than in prior research using a generalizabiüty-theory framework
based on both dichotomous and polytomous scoring of items. Generalizability coefficients accounting for
specific-factor, transient, and random-response error ranged from .64 to .75 for the BIDR's Self-Deception
Enhancement (SDE) and Impression Management (IM) subscale scores, and these values were systematically
lower than corresponding alpha (.66 to .83) and I-week test-retest (.78 to .86) coefficients. Polytomous
scoring provided higher reliability than dichotomous scoring on nearly all indexes reported. Random-response
(8%-17%) and specific factor error (11%-17%) exceeded transient error (3%-6%) for both subscales and
scoring methods. Doubling the number of items on a single occasion provided greater improvements in
generalizability (.76-.83) than aggregating scores across 2 administrations (.72-.81). Both scoring methods
provided reasonably high indexes of consistency (cj) coefficients a: .91) at cut scores on the IM scale for
detecting faked responses when all sources of error were taken into account. Implications of these results for
common uses of the BIDR are discussed.
Keywords: generalizability theory, socially desirable responding, BIDR, reliability
One of the challenges that users of questionnaire data face is
dealing with response biases (Edwards, 1957; Paulhus, 1991, 2002;
Paulhus & Vazire, 2007). Response biases reflect systematic tenden-
cies to respond inaccurately in a particular situation (i.e., a response
set) or across time and assessments (i.e., a response style). In either
case, these tendencies undermine the validity of inferences made from
scores (Gendreau, Irvine, & Knight, 1973; Holden, Kroner, Fekken,
& Popham, 1992; Paulhus, 1991; Posey & Hess, 1984; Schretlen &
Arkowitz, 1990; Walters, 1988). One of the most serious and heavily
researched response biases is socially desirable responding (SDR).
SDR reflects either an unconscious or willful tendency to respond to
items to make one look good rather than answer truthfully. As a result,
researchers and questionnaire developers have long sought ways to
assess, detect, and, if necessary, control for such distortions in self-
reported responses. This has led to the development of a variety of
measures to assess SDR. Some, such as the Balanced Inventory of
Desirable Responding (BIDR; Paulhus, 1991, 1999, 2002) and
Marlowe-Crowne Social Desirability Scale (MCSDS; Crowne &
Marlowe, 1960) can be used independently as companions to other
questionnaires. Others, such as the L or Lie and K or Defensiveness
subscales from the Minnesota Multiphasic Personality Inventory-2
This article was published Online First August 6, 2012.
Walter P. Vispoel and Shuqin Tao, Department of Psychological and
Quantitative Foundations, University of Iowa.
Shuqin Tao is now at the Psychometric Services Department of the Data
Recognition Corporation, Maple Grove, Minnesota.
We thank Hanyi Kim for her help with data analysis and reporting, Linan
Sun and Yi He for their assistance with data collection, and Patricia Martin
for her help in preparing various versions of the manuscript.
Correspondence conceming this article should be addressed to Walter P.
Vispoel, Department of Psychological and Quantitative Foundations, 361
Lindquist Center, University of Iowa, Iowa City, Iowa 52242-1529.
E-mail; walter-vispoel@uiowa.edu
(Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) and
Good Impression Scale from the Califomia Psychological Inventory
(Gough & Bradley, 1996) are imbedded with other subscales within
the same questionnaires.
To clarify the nature of constructs assessed by measures of SDR,
researchers have administered various measures collectively to the
same individuals and factor analyzed the results (Cattell & Sheier,
1959; Edwards, Diers, & Walker, 1962; Jackson & Messick, 1962;
Paulhus, 1991; Wiggins, 1964). The pervasive finding from such
research is that SDR instruments as a group measure two basic factors
that Paulhus (1991) has labeled self-deception enhancement (SDE)
and impression management (IM). Individuals exhibiting SDE re-
spond to questionnaire items in a narcissistic way. They believe that
they are answering honestly, but demonstrate low self-knowledge by
systematically overreporting their abilities. M can be a more calcu-
lated attempt for respondents to tailor their answers to give a desired
impression to others. This form of SDR is sometimes described as
lying or faking, and it can take either positive or negative forms
(Paulhus, 1999). For example, in one situation, the individual may
exhibit a positive bias by endorsing socially desirable responses to
look good or to hide negative personality traits. In a different situation,
the same individual could answer questions in such a way as to
exaggerate negative qualities either to get attention or sympathy or to
qualify for some sort of compensation. IM typically represents a more
serious threat to the validity of questionnaire results than SDE, be-
cause it can represent willful distortion of information. However,
measurement of SDE tendencies is important as well. Such inclina-
tions, for example, may inflate or deflate responses to other subscales
and thereby provide additional insights into interpreting those scores.
A problem illuminated by the factor analjtic research on SDR
instruments is that they do not necessarily measure the same facets of
SDR. Most measures of SDR assess either SDE, IM, or an undefined
combination of the two. The BIDR has the advantage of providing
separate and distinct measurement of SDE and M . Inclusion of
94