Review of Recent Systems for Automatic Assessment of Programming Assignments Petri Ihantola Department of Computer Science and Engineering Aalto University petri@cs.hut.fi Tuukka Ahoniemi Digia Plc Finland tuukka.ahoniemi@digia.com Ville Karavirta Department of Computer Science and Engineering Aalto University vkaravir@cs.hut.fi Otto Seppälä Department of Computer Science and Engineering Aalto University oseppala@cs.hut.fi ABSTRACT This paper presents a systematic literature review of the re- cent (2006–2010) development of automatic assessment tools for programming exercises. We discuss the major features that the tools support and the different approaches they are using both from the pedagogical and the technical point of view. Examples of these features are ways for the teacher to define tests, resubmission policies, security issues, and so forth. We have also identified a list of novel features, like assessing web software, that are likely to get more research attention in the future. As a conclusion, we state that too many new systems are developed, but also acknowledge the current reasons for the phenomenon. As one solution we encourage opening up the existing systems and joining ef- forts on developing those further. Selected systems from our survey are briefly described in Appendix A. 1. INTRODUCTION Assessment provides the teacher with a feedback chan- nel that shows how learning goals are being met. It also ensures for an outside observer that students achieve those learning goals. Assessment provides both means to guide student learning and feedback for both the learner and the teacher about the learning process – from the level of a whole course down to a single student on some specific topic being assessed. Students often direct their efforts based on what is as- sessed and how it affects the final course grade [6, Chap- ter 9]. Continuous assessment during a programming course ensures that students get enough practice as well as get feed- back on the quality of their solutions. Providing quality assessment manually for even a small class means that feed- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Koli Calling ’10, October 28-31, 2010, Koli, Finland Copyright 2010 ACM 978-1-4503-0520-4/10/10 ...$10.00. back can not be as instant as in one-to-one tutoring. When the class size grows, the amount of assessed work has to be cut down or rationalized in some other way. Automatic assessment (AA), however, allows instant feedback without the need to reduce exercises. Why do so many automatic assessment systems exist, and why are new ones created every year? Many systems share common features and it would seem that systems already exist that fulfill most assessment needs. One clear reason for the variety of tools has to do with their availability and lifespan. Tools are often created as a part of a thesis or for a particular course. They are fin- ished enough for studying a research question or to support the needs of one particular course, but are not suitable for distribution. It is rather common that the very first ver- sion of a tool was something that the teacher did quickly for his/her very own purpose. These tools might get pub- licized if some research was the original motivator, but as they never emerge as supported pieces of software, similar systems get implemented again and again. Correspondingly, there are far less systems that are widely adopted than there are papers about new tools. We argue that presenting a big picture about the recently developed and currently available AA systems would help both teachers find the tools they might be searching and developers avoid reinventing the wheel. Literature survey is one way to achieve this. In this survey, our goal is to serve teachers who need to give grades to large classes. This is where the automatic grading of programming assignments can free the teachers’ time significantly for doing something else, that can not be automated [9]. Related research, with focus on related surveys, is pre- sented in Section 2. The exact research questions and the methodology used in this survey are described in Section 3. Results are introduced in Section 4. Selection of AA sys- tems, also mentioned in Section 4, are presented in Ap- pendix A. Conclusions, some recommendations based on the data, and our expectations related to the future trends in automatic assessment of programming assignments are discussed in Section 5.