Article Notes From the Field: Automatic Item Generation, Standard Setting, and Learner Performance in Mastery Multiple-Choice Tests Eric Shappell 1 , Gregory Podolej 2 , James Ahn 3 , Ara Tekian 4 , and Yoon Soo Park 4 Abstract Mastery learning assessments have been described in simulation-based educational interventions; however, studies applying mastery learning to multiple-choice tests (MCTs) are lacking. This study investigates an approach to item generation and standard setting for mastery learning MCTs and evaluates the consistency of learner performance across sequential tests. Item models, variables for question stems, and mastery standards were established using a consensus process. Two test forms were created using item models. Tests were administered at two training programs. The primary outcome, the test–retest consistency of pass– fail decisions across versions of the test, was 94% (k ¼ .54). Decision-consistency classification was .85. Item-level consistency was 90% (k ¼ .77, SE ¼ .03). These findings support the use of automatic item generation to create mastery MCTs which produce consistent pass–fail decisions. This technique broadens the range of assessment methods available to educators that require serial MCT testing, including mastery learning curricula. Keywords mastery learning, multiple-choice tests, automatic item generation, standard setting An increasing focus on outcome-based education and test- enhanced learning has drawn attention to mastery learning as a promising model for learner assessment (Agrawal et al., 2012; Cook et al., 2013; Frank et al., 2010; Holmboe et al., 2010; Larsen et al., 2008, 2009; McGaghie, 2015). In mastery learning, participants who meet the criteria for mastery advance to the next phase of training; however, participants who do not must practice and retest until they are able to meet the criteria for mastery (Hodges, 2010). While this model has been described in simulation interventions, studies applying mastery learning to multiple-choice tests (MCTs) are lacking (Bandaranayake, 2008; Cook et al., 2013; McGaghie et al., 2014; Yudkowsky et al., 2014). Several challenges complicate the development and imple- mentation of MCTs for mastery learning: 1. Standard setting involves nuanced differences from tra- ditional techniques (Yudkowsky et al., 2015). 2. Selection of content is based on relevance to the mas- tery standard, as opposed to selection of content of variable difficulty in traditional MCTs (Yudkowsky et al., 2015). 3. Tests face different threats to validity, including issues with memorization and changes in variance with retest- ing (Lineberry et al., 2015). 4. Potential for repeated learner retesting warrants the development of large question banks (Holmboe et al., 2010; Lineberry et al., 2015). Without practical solutions to these challenges, the potential of mastery learning MCTs will remain unrealized. Regarding the challenges of mastery standard setting and content selec- tion, a traditional consensus process can still be utilized; how- ever, the goals that guide this process must be adjusted (Yudkowsky et al., 2015). Standard setting in mastery learning focuses on the goal of assessing the learner’s preparedness for advancement as defined by a high likelihood of success at the next training level. A consensus process may be used to iden- tify and appropriately weigh content based on relevance to this mastery standard. Time and expertise requirements make the costs of simply “scaling up” traditional MCT development strategies for mas- tery MCTs prohibitive (Gierl et al., 2012; Haladyna & Rodri- guez, 2013). Automatic item generation (AIG) offers a solution to this problem by using a systematic approach to efficiently 1 Department of Emergency Medicine, Harvard Medical School, Massachusetts General Hospital, Boston, MA, USA 2 Department of Emergency Medicine, University of Illinois at Peoria, IL, USA 3 Department of Medicine, Section of Emergency Medicine, University of Chicago, IL, USA 4 Department of Education, University of Illinois at Chicago, IL, USA Corresponding Author: Eric Shappell, Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, 5 Emerson Place, Suite 101, Boston, MA 02114, USA. Email: eshappell@mgh.harvard.edu Evaluation & the Health Professions 1-4 ª The Author(s) 2020 Article reuse guidelines: sagepub.com/journals-permissions DOI: 10.1177/0163278720908914 journals.sagepub.com/home/ehp