Psychological Assessment 2013, Vol. 25, No. 1.94-104 © 2012 American Psychological Associarion 1O4O-359O/13/$I2.OO DOI: 10.1037/a0029061 A Generalizability Analysis of Score Consistency for the Balanced Inventory of Desirable Responding Walter P. Vispoel and Shuqin Tao University of Iowa Our goal in this investigation was to evaluate thereliabilityof scoresfromthe Balanced Inventory of Desirable Responding (BIDR) more comprehensively than in prior research using a generalizabiüty-theory framework based on both dichotomous and polytomous scoring of items. Generalizability coefficients accounting for specific-factor, transient, and random-response error ranged from .64 to .75 for the BIDR's Self-Deception Enhancement (SDE) and Impression Management (IM) subscale scores, and these values were systematically lower than corresponding alpha (.66 to .83) and I-week test-retest (.78 to .86) coefficients. Polytomous scoring provided higher reliability than dichotomous scoring on nearly all indexes reported. Random-response (8%-17%) and specific factor error (11%-17%) exceeded transient error (3%-6%) for both subscales and scoring methods. Doubling the number of items on a single occasion provided greater improvements in generalizability (.76-.83) than aggregating scores across 2 administrations (.72-.81). Both scoring methods provided reasonably high indexes of consistency (cj) coefficients a: .91) at cut scores on the IM scale for detecting faked responses when all sources of error were taken into account. Implications of these results for common uses of the BIDR are discussed. Keywords: generalizability theory, socially desirable responding, BIDR, reliability One of the challenges that users of questionnaire data face is dealing with response biases (Edwards, 1957; Paulhus, 1991, 2002; Paulhus & Vazire, 2007). Response biases reflect systematic tenden- cies to respond inaccurately in a particular situation (i.e., a response set) or across time and assessments (i.e., a response style). In either case, these tendencies undermine the validity of inferences made from scores (Gendreau, Irvine, & Knight, 1973; Holden, Kroner, Fekken, & Popham, 1992; Paulhus, 1991; Posey & Hess, 1984; Schretlen & Arkowitz, 1990; Walters, 1988). One of the most serious and heavily researched response biases is socially desirable responding (SDR). SDR reflects either an unconscious or willful tendency to respond to items to make one look good rather than answer truthfully. As a result, researchers and questionnaire developers have long sought ways to assess, detect, and, if necessary, control for such distortions in self- reported responses. This has led to the development of a variety of measures to assess SDR. Some, such as the Balanced Inventory of Desirable Responding (BIDR; Paulhus, 1991, 1999, 2002) and Marlowe-Crowne Social Desirability Scale (MCSDS; Crowne & Marlowe, 1960) can be used independently as companions to other questionnaires. Others, such as the L or Lie and K or Defensiveness subscales from the Minnesota Multiphasic Personality Inventory-2 This article was published Online First August 6, 2012. Walter P. Vispoel and Shuqin Tao, Department of Psychological and Quantitative Foundations, University of Iowa. Shuqin Tao is now at the Psychometric Services Department of the Data Recognition Corporation, Maple Grove, Minnesota. We thank Hanyi Kim for her help with data analysis and reporting, Linan Sun and Yi He for their assistance with data collection, and Patricia Martin for her help in preparing various versions of the manuscript. Correspondence conceming this article should be addressed to Walter P. Vispoel, Department of Psychological and Quantitative Foundations, 361 Lindquist Center, University of Iowa, Iowa City, Iowa 52242-1529. E-mail; walter-vispoel@uiowa.edu (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) and Good Impression Scale from the Califomia Psychological Inventory (Gough & Bradley, 1996) are imbedded with other subscales within the same questionnaires. To clarify the nature of constructs assessed by measures of SDR, researchers have administered various measures collectively to the same individuals and factor analyzed the results (Cattell & Sheier, 1959; Edwards, Diers, & Walker, 1962; Jackson & Messick, 1962; Paulhus, 1991; Wiggins, 1964). The pervasive finding from such research is that SDR instruments as a group measure two basic factors that Paulhus (1991) has labeled self-deception enhancement (SDE) and impression management (IM). Individuals exhibiting SDE re- spond to questionnaire items in a narcissistic way. They believe that they are answering honestly, but demonstrate low self-knowledge by systematically overreporting their abilities. M can be a more calcu- lated attempt for respondents to tailor their answers to give a desired impression to others. This form of SDR is sometimes described as lying or faking, and it can take either positive or negative forms (Paulhus, 1999). For example, in one situation, the individual may exhibit a positive bias by endorsing socially desirable responses to look good or to hide negative personality traits. In a different situation, the same individual could answer questions in such a way as to exaggerate negative qualities either to get attention or sympathy or to qualify for some sort of compensation. IM typically represents a more serious threat to the validity of questionnaire results than SDE, be- cause it can represent willful distortion of information. However, measurement of SDE tendencies is important as well. Such inclina- tions, for example, may inflate or deflate responses to other subscales and thereby provide additional insights into interpreting those scores. A problem illuminated by the factor analjtic research on SDR instruments is that they do not necessarily measure the same facets of SDR. Most measures of SDR assess either SDE, IM, or an undefined combination of the two. The BIDR has the advantage of providing separate and distinct measurement of SDE and M . Inclusion of 94