Comparing Feature Engineering and Deep Learning Methods for Automated Essay Scoring of Brazilian National High School Examination Aluizio Haendchen Filho 1 a , Fernando Concatto 1 b , Hércules Antonio do Prado 2 c and Edilson Ferneda 2 d 1 Laboratory of Technological Innovation in Education (LITE), University of Vale do Itajaí (UNIVALI), Itajaí, Brazil 2 Catholic University of Brasilia (UCB) QS 07, Lote 01, Taguatinga, Brasília, DF, Brazil Keywords: Automated Essay Scoring, Machine Learning, Deep Learning. Abstract: The National High School Exam (ENEM) in Brazil is a test applied annually to assess students before entering higher education. On average, over 7.5 million students participate in this test. In the same sense, large educational groups need to conduct tests for students preparing for ENEM. For correcting each essay, it is necessary at least two evaluators, which makes the process time consuming and very expensive. One alternative for substantially reducing the cost and speed up the correction of essays is to replace one human evaluator by an automated process. This paper presents a computational approach for essays correction able to replace one human evaluator. Techniques based on feature engineering and deep learning were compared, aiming to obtain the best accuracy among them. It was found that is possible to reach accuracy indexes close to 100% in the most frequent classes that comprise near 80% of the essays set. 1 INTRODUCTION The Brazilian National High School Examination (ENEM) is an evaluation that happens annually in order to verify the knowledge of the participants about skills acquired during the high school years, including writing abilities. During the essay evaluation, two reviewers assign scores ranging from 0 to 2, in intervals of 0,5 for each of the five competencies: [C 1 ] Formal writing of Brazilian- Portuguese language; [C 2 ] Understanding the essay proposal within the structural limits of the essay- argumentative text; [C 3 ] Selecting, relating, organizing, and interpreting information, facts, options, and defence of a point of view; [C 4 ] Demonstrating knowledge of the linguistic mechanisms necessary to construct the argumentation; [C 5 ] Proposing of an intervention for the problem addressed based on consistent arguments. a https://orcid.org/0000-0002-7998-8474 b https://orcid.org/0000-0003-4361-7134 c https://orcid.org/0000-0002-8375-0899 d https://orcid.org/0000-0003-4164-5828 The scoring process varies from 0 to 2 for each competence, summing 10 for the essay. A grade 0 (zero) for a competence means that the author does not demonstrate mastery over the competence in question. In contrast, a score of 2 indicates that the author demonstrates mastery over that competence. It is important to mention that two reviewers are considered in agreement when the difference between grades is less or equal than 20%. Arguably, the essays evaluation by at least two reviewers makes the process time-consuming and expensive. According to a survey conducted by the Brazilian G1 portal, 6.1 million essays were evaluated in 2019 at a cost of US$ 4.96 per essay, reaching approximately US$ 30.27 million. This value includes the structure, logistics, and personnel needed to evaluate the national exam. On the other hand, large educational groups need to conduct training tests with students for the ENEM test. It is necessary to use at least two evaluators for each essay, which Filho, A., Concatto, F., Antonio do Prado, H. and Ferneda, E. Comparing Feature Engineering and Deep Learning Methods for Automated Essay Scoring of Brazilian National High School Examination. DOI: 10.5220/0010377505750583 In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) - Volume 1, pages 575-583 ISBN: 978-989-758-509-8 Copyright c  2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 575