Measuring Problem Solving Skills in Plants vs. Zombies 2 Valerie J. Shute Florida State University 1114 West Call Street Tallahassee, FL 32306 vshute@fsu.edu Gregory R. Moore Florida State University 1114 West Call Street Tallahassee, FL 32306 grm13@my.fsu.edu Lubin Wang Florida State University 1114 West Call Street Tallahassee, FL 32306 lw10e@fsu.edu ABSTRACT We are using stealth assessment, embedded in Plants vs. Zombies 2, to measure middle-school students’ problem solving skills. This project started by developing a problem solving competency model based on a thorough review of the literature. Next, we identified relevant in-game indicators that would provide evidence about students’ levels on the various problem-solving facets. Our problem solving model was implemented in the game via Bayesian networks. To validate the stealth assessment, we ran a small pilot study to collect data from students who played our game-based assessment and completed an external problem solving measure (MicroDYN). Preliminary results indicate that problem solving estimates derived from the game significantly correlate with the external measure, suggesting that our stealth assessment is valid. Our next steps include running a larger validation study (in progress) and developing tools to help educators interpret the results of the assessment. Keywords Stealth Assessment, Problem Solving, Game-Based Learning, Bayesian Networks 1. INTRODUCTION In this paper, we describe the design, development, and preliminary validation of an assessment embedded in a video game to measure the problem solving skills of middle school students. After providing a brief background on stealth assessment and problem solving skills, we describe the game (Plants vs. Zombies 2) used to implement our stealth assessment, and discuss why it is good vehicle for assessing problem solving skills. Afterwards, we present the in-game indicators (i.e., gameplay evidence) of problem solving, describing how we decided on these indicators and how the indicators are used to collect data about the in-game actions of players. While discussing the indicators, we show how the evidence is used in a Bayesian network to produce an overall estimate for students’ problem solving skills. We then discuss the results of a pilot validation study, which show that our stealth assessment estimate of problem solving significantly correlates with an external measure of problem solving (MicroDYN). We conclude with the next steps in developing the assessment and practical applications of this work. 2. BACKGROUND 2.1 Stealth Assessment Good games are engaging, and engagement is important for learning. The challenge is validly and reliably measuring learning in games without disrupting engagement, and then leveraging that information to bolster learning. For the past 6-7 years, we have been researching various ways to embed valid assessments directly into games with a technology called stealth assessment (e.g., [15, 16, 20]). Stealth assessment is grounded in an assessment design framework called evidence-centered design (ECD) [10]. In general, the main purpose of any assessment is to collect information that will allow the assessor to make valid inferences about what people know, can do, and to what degree (collectively referred to as “competencies” in this paper). ECD defines a framework that consists of several conceptual and computational models that work in concert. The framework requires an assessor to: (a) define the claims to be made about learners’ competencies, (b) establish what constitutes valid evidence of a claim, and (c) determine the nature and form of tasks or situations that will elicit that evidence. Stealth assessment complements ECD by determining specific gameplay behaviors (specified in the evidence model and referred to as indicators) and linking them to the competency model [19]. As students interact with tasks/problems in a game during the solution process (see Figure 1), they are providing a continuous stream of data (captured in a log file, arrow 1) that is analyzed by the evidence model (arrow 2). The results of this analysis are data (e.g., scores) that are passed to the competency model, which statistically updates the claims about relevant competencies in the student model (arrow 3). The ECD approach, combined with stealth assessment, provides a framework for developing assessment tasks that are explicitly Figure 1. Stealth assessment cycle. Proceedings of the 8th International Conference on Educational Data Mining 428