When proportion consensus scoring works q Kimberly A. Barchard ⇑ , Spencer Hensley, Emily Anderson Department of Psychology, University of Nevada, Las Vegas, 4505 S. Maryland Parkway, P.O. Box 455030, Las Vegas, NV 89154-5030, USA article info Article history: Received 14 November 2012 Received in revised form 5 January 2013 Accepted 21 January 2013 Available online 28 February 2013 Keywords: Proportion consensus scoring Scoring key Item difficulty Emotional intelligence Leadership abstract Most objectively scored tests use items with easily identifiable correct answers. When such veridical scoring keys cannot be constructed, researchers sometimes use proportion consensus scoring (PCS) to identify the best answers. To determine if PCS identifies the best answers, we scored a test using both PCS and veridical scoring. Among 353 undergraduates, regular PCS, two-stage PCS, and expert PCS all had high correlations for easy items, but no PCS methods had high correlations for difficult items. Thus, PCS cannot reliability identify the best answers to individual items. However, PCS worked well for total scores. For easy items, total scores had correlations above .99 for all PCS methods. For difficult items, expert and two-stage PCS had correlations of .92 and .82 for the 60-item test. Thus, expert and two-stage PCS can be justified (even for difficult items) if the scoring key is based upon people who truly possess some degree of expertise and if scores are summed over many items. Ó 2013 Elsevier Ltd. All rights reserved. 1. When Proportion Consensus Scoring Works For most objectively scored test items, there is one and only one correct answer, and experts all agree on what that answer is. For example, questions on math tests have one clear answer. Creating scoring keys is easy. However, for some psychological constructs, such as emotional intelligence and leadership, experts may dis- agree about the best answer or the best answer may vary across context or culture. In those situations, it is impossible to create a veridical scoring key; another method is needed to identify the best answer and create the key. One increasingly popular method is to use responses from the norm group. This is referred to as con- sensus scoring. Consensus scoring has been used successfully to score tests of emotional intelligence (Legree, Psotka, Tremble, & Bourne, 2005; MacCann, Roberts, Matthews, & Zeidner, 2004; Mayer, Caruso, & Salovey, 2000; Mayer, Salovey, Caruso, & Sitaren- ios, 2003; Warwick, Nettelbeck, & Ward, 2010; Zeidner, Shani- Zinovich, Matthews, & Roberts, 2005), emotion perception (Geher, Warner, & Brown, 2001; Mayer, DiPaolo, & Salovey, 1990), social insight (Legree, 1995), driving knowledge (Legree, Martin, & Pso- tka, 2000), supervisory skills (Heffner & Porr, 2000), military lead- ership (Hedlund et al., 2003) and general cognitive ability (Legree et al., 2000). Thus, PCS is useful for measuring both standard and non-standard psychological constructs. Several types of consensus scoring exist. In mode consensus scoring, the most common answer in the norm group is designated as the best answer (Geher et al., 2001). Unfortunately, mode con- sensus scoring is mathematically biased against smaller groups (Barchard & Russell, 2006). In distance consensus scoring, the abso- lute or squared distance between the respondent’s answer and the average answer is calculated (Hedlund et al., 2003; Legree et al., 2000; Legree, Heffner, Psotka, Medsker, & Martin, 2003). However, it can only be used when response options are ordered, and so can- not be used for multiple-choice or forced-choice questions. Finally, in proportion consensus scoring, a person’s score is equal to the proportion of the norm group who gave that response. For exam- ple, if 35% of respondents selected option C, everyone who selected C receives a score of .35. Proportion consensus scoring (PCS) can be used for any type of response option (categorical, rating scales, etc.) and has been used successfully for several tests of emotional intelligence (Mayer et al., 2000, 2003; MacCann et al., 2004; War- wick et al., 2010; Zeidner et al., 2005). This paper will therefore fo- cus on PCS. The most common rational for PCS is that it can identify the best answers, so that the test measures knowledge or skill (Legree, 1995). Various logical and statistical arguments have been made to support this rationale. The first argument for PCS is that consen- sus scoring seems logically plausible when measuring human interaction: Emotion knowledge evolves within a social context, and so group consensus should be able to identify correct answers (Mayer et al., 2003). However, empirical investigations have not al- ways supported this conclusion. For example, Keele and Bell (2009) found no clear agreement on the Changes and Blends tasks on the Mayer-Salovey-Caruso Emotional Intelligence Test (Mayer et al., 0191-8869/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.paid.2013.01.017 Author note: Some portions of this paper were presented at the 2012 Western Psychological Association convention, in San Francisco, CA. ⇑ Corresponding author. Tel.: +1 702 895 0758; fax: +1 702 895 0195. E-mail addresses: barchard@unlv.nevada.edu (K.A. Barchard), spencer.hen- sley@gmail.com (S. Hensley), ander692@gmail.com (E. Anderson). Personality and Individual Differences 55 (2013) 14–18 Contents lists available at SciVerse ScienceDirect Personality and Individual Differences journal homepage: www.elsevier.com/locate/paid