Using BERT and XLNET for the Automatic Short Answer Grading Task Hadi Abdi Ghavidel, Amal Zouaq and Michel C. Desmarais Department of Computer Engineering and Software Engineering, Polytechnique Montr´ eal, Montreal, Canada Keywords: Automatic Short Answer Grading, SciEntBank, BERT, XLNET. Abstract: Over the last decade, there has been a considerable amount of research in automatic short answer grad- ing (ASAG). The majority of previous experiments were based on a feature engineering approach and used manually-engineered statistical, lexical, grammatical and semantic features for ASAG. In this study, we aim for an approach that is free from manually-engineered features and propose an architecture for deep learning based on the newly-introduced BERT (Bidirectional Encoder Representations from Transformers) and XL- NET (Extra Long Network) classifiers. We report the results achieved over one of the most popular dataset for ASAG, SciEntBank. Compared to past works for the SemEval-2013 2-way, 3-way and 5-way tasks, we ob- tained better or competitive performance with BERT Base (cased and uncased) and XLNET Base (cased) using a reference-based approach (considering students and model answers) and without any type of hand-crafted features. 1 INTRODUCTION Automatic grading of natural language answers is a highly desired goal in education. Advances in machine learning bring this goal closer to reality. Large classes and the success of Massive Open Online Courses (MOOCS) in education contribute to making this goal even more attractive. Open-ended answers provide teachers with a more accurate and detailed understanding of how a student comprehends domain- specific knowledge (Badger and Thomas, 1992). This is compared to traditional types of answers like multiple-choice questions or fill-in-the-gap items in which the student’s understanding is restricted to the choices that are presented and thus not examined deeply. (Riordan et al., 2017). For automatic grading, natural language answers can be divided into essays or short answers. Ac- cording to Burrows et al. (2015), short answers have the following characteristics: The answer should not be guessed from the words in the question (external knowledge); the answer should be given in natural language; the length of the answer should be about one phrase to one paragraph; the content of the an- swer is domain-related; and the answer should be close-ended. In both short answers and essays, each student an- swer is evaluated based on a nominal, ordinal or ratio scale (Roy et al., 2018). In the nominal scale, grades are in the format of labels like correct, incorrect, etc. Ordinal grades are in the letter format like A+, A-, etc. and ratio grades are in the numerical format like 1, 1.5, etc. Besides grades, the student answer is usually as- sociated with the question and a model (also called reference) answer(s). Sakaguchi et al. (2015) de- fined two general types of grading approaches. When automatic grading is done based only on the stu- dent answer and label, this is called a response-based approach. Otherwise, the grading is done using a reference-based approach in which the whole con- text is considered (model answer or question, or both along the student answer and the label). In this case, the system compares the student’s answer with the model answer using several types of similarity met- rics. The other approaches are hybrid, in which both response- based and reference-based techniques are taken into account simultaneously. In a majority of natural language processing(NLP) tasks such as ASAG, language model (LM)s have proven to be successful. In essence, these models help determine the probability of a sequence of words and can predict words given previous words within a se- quence (Goldberg, 2017). Traditional language mod- els such as n-gram language models use count meth- ods. Vector-space models based on counting n-grams have often been used for the ASAG task. For exam- ple, Mantecon et al. (2018) compared bag of n-grams 58 Ghavidel, H., Zouaq, A. and Desmarais, M. Using BERT and XLNET for the Automatic Short Answer Grading Task. DOI: 10.5220/0009422400580067 In Proceedings of the 12th International Conference on Computer Supported Education (CSEDU 2020) - Volume 1, pages 58-67 ISBN: 978-989-758-417-6 Copyright c 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved