Assessment of Clinical Skills Moderator: Scott Herrle, MD Discussant: Eric Holmboe, MD Contrast Effects in the USMLE Step 2 Clinical Skills Examination Chaitanya Ramineni, Brian E. Clauser, Polina Harik, and David B. Swanson Abstract Background As with any examination using human raters, it is possible that human subjectivity may introduce measurement error. An examinee’s performance might be scored differently on the basis of the quality of the preceding performance(s) (contrast effects). This research investigated the presence of contrast effects, within and across test sessions, for the communication and interpersonal skills component of the United States Medical Licensing Examination Step 2 Clinical Skills (CS) examination. Method Data from Step 2 CS examinees were analyzed using hierarchical and general linear modeling procedures. Results Contrast effect was significant for the communication and interpersonal skills score, both within and across test sessions. The effect was found to have a nontrivial impact on the overall score. Conclusions The presence of contrast effects suggests that scores for an examinee are influenced by the performance of other examinees. More research is needed to fully understand these effects. Acad Med. 2008;83(10 Suppl):S45–S48. The United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills (CS) Examination is designed to evaluate the clinical and communication skills of individuals seeking medical licensure in the United States. 1 The examination uses standardized patients, which allow for a high-fidelity simulation of the physician–patient encounter; the patients both portray the case scenario and assign ratings that form the basis of three of the examinee scores. As with any examination using human raters, despite efforts to train raters to perform accurately and consistently, it is possible that human subjectivity may introduce measurement error. As Guilford cautioned, “raters are human and they are therefore subject to all the errors to which humankind must plead guilty.” 2(p 272) Any such impact on examinee scores would represent construct- irrelevant variance. It is essential for test developers to scrutinize evidence regarding potential threats to the validity of score interpretations. 3 Previous research has accounted for some rater effects, such as differences in stringency 4 ; others remain less well understood. One such effect is the contrast effect. Daly and Dickson-Markman 5(p 309) defined contrast effects as “influences of previous stimuli on the evaluation or judgment of a new stimulus.” Previous research has examined the contrast effect in various contexts, such as perception of physical attractiveness, 6 employment interview ratings, 7 and writing assessments. 8 The concern that this effect may be present in expert ratings used in assessment was noted by Stalnaker 9(p 41) more than 70 years ago: “A C paper may be graded B if it is read after an illiterate theme, but if it follows an A paper, if such can be found, it seems to be of D caliber.” Authors have continued to express concern that an examinee’s work might be scored differently on the basis of the quality of the immediately preceding work sample. 10 The purpose of the present paper was to evaluate the presence of contrast effects for standardized patient ratings across the USMLE Step 2 CS Examination test administrations. The presence of these effects was investigated for the communication and interpersonal skills component of the test. The presence of contrast effects was evaluated both within a session and across test sessions. It was hypothesized that standardized patients may overestimate an examinee’s performance if the encounter was preceded by a group of poorly performing examinees and, conversely, underestimate an examinee’s performance when testing with a group of high performers. Previous research has shown a strong relationship between spoken English proficiency and communication and interpersonal skills scores. 11 The proportion of examinees with relatively low English proficiency scores was, therefore, used to evaluate the presence of contrast effects in this study. The question of interest was, is an examinee’s expected score on the communication scale a function of the proportion of individuals with low English proficiency who tested during a defined period before that examinee? Method Step 2 CS Examination. The USMLE Step 2 CS Examination is designed to evaluate several aspects of physicians’ clinical skills. In each test session, examinees interact with 12 standardized patients. For each of these encounters, examinees have 15 minutes to interact with the standardized patient and then 10 minutes to complete a structured patient note (notes are subsequently scored by trained physician raters). While examinees complete the patient note, standardized patients complete several instruments: (1) a structured checklist consisting of dichotomously scored items indicating whether the examinee inquired about critical aspects of the patient’s history and performed critical physical examination maneuvers, (2) a rating scale Correspondence: Polina Harik, MA, 3750 Market Street, Philadelphia, PA, 19104; e-mail: (pharik@nbme.org). Academic Medicine, Vol. 83, No. 10 / October 2008 Supplement S45