Predicting Item Response Theory Parameters Using Qestion Statements Texts Wemerson Marinho Esteban Walter Clua wemerson_marinho@id.uf.br esteban@ic.uf.br Computing Institute - UFF Niterói, Rio de Janeiro, Brazil Luis Martí Inria Chile Research Center Santiago, Chile lmarti@inria.cl Karla Marinho Tieta.ai Niterói, Brazil ABSTRACT Recently, new Neural Language Models pre-trained on a massive corpus of texts are available. These models encode statistical fea- tures of the languages through their parameters, creating better word vector representations that allow the training of neural net- works with smaller sample sets. In this context, we investigate the application of these models to predict Item Response Theory pa- rameters in multiple choice questions. More specifcally, we apply our models for the Brazilian National High School Exam (ENEM) questions using the text of their statements and propose a novel optimization target for regression: Item Characteristic Curve. The architecture employed could predict the difculty parameter b of the ENEM 2020 and 2021 items with a mean absolute error of 70 points. Calculating the IRT score in each knowledge area of the exam for a sample of 100,000 students, we obtained a mean absolute below 40 points for all knowledge areas. Considering only the top quartile, the exam’s main target of interest, the average error was less than 30 points for all areas, being the majority lower than 15 points. Such performance allows predicting parameters on newly created questions, composing mock tests for student training, and analyzing their performance with excellent precision, dispensing with the need for costly item calibration pre-test step. CCS CONCEPTS · Computing methodologies → Information extraction; Neu- ral networks;· Applied computing → Education. KEYWORDS Item response theory, Question difculty prediction, Text regres- sion, ENEM ACM Reference Format: Wemerson Marinho, Esteban Walter Clua, Luis Martí, and Karla Marinho. 2023. Predicting Item Response Theory Parameters Using Question State- ments Texts. In LAK23: 13th International Learning Analytics and Knowledge Conference (LAK 2023), March 13ś17, 2023, Arlington, TX, USA. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3576050.3576139 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. LAK 2023, March 13ś17, 2023, Arlington, TX, USA © 2023 Association for Computing Machinery. ACM ISBN 978-1-4503-9865-7/23/03. . . $15.00 https://doi.org/10.1145/3576050.3576139 1 INTRODUCTION Measuring how much a student knows about a subject is a funda- mental task to monitor their progress in mastering a skill, decide if they are prepared for assimilating complex knowledge and make fairer selections for admission in universities. However, an individ- ual’s cognitive ability is a latent trait, not directly measurable by its very nature. The Item Response Theory (IRT) [1] is a family of mathematical models that attempt to explain the relationship between latent traits - unobservable attributes - and their manifestations - observed outcomes. IRT is based on the premise that the probability of getting a question/item right is a function of the ability of the individual and the characteristics of the item. The statistical procedure of fnding the best parameters that are characteristics of each question given a set of students’ responses is called pre-tests. They are laborious and expensive, as they must be applied to a sufcient number of individuals in each ability range. In 2022, for example, the government of Brazil cogitated repeat items in their annual National High School Exam (ENEM) because their data bank was missing calibrated items due to budget problems [28]. The hypothesis of this work is that we can, for an new generated question, estimate these parameters using neural networks and past ofcial exams applications, exploring similarities and characteristics present in the wording of the statements. Usually, these exams are applied to thousands or millions of students, with varied educational backgrounds, allowing consistent statistical analysis. Security against cheating is a great concern in these exams: inspectors are hired to supervise students, question sheets have diferent orders, and communication between candi- dates is strongly forbidden. For this reason, it is possible to assume that we can trust that the data refects the knowledge of its appli- cants, without further external interference. This is not the case when collecting data online or even in classroom questionnaires. We extract text features from statements using Neural Language Models (NLM), recent advances in Natural Language Processing (NLP), which pre-trains in massive corpus of texts. They allow the creation of representations that transform words into vectors and make them ready to be analyzed by computational optimization al- gorithms, such as Neural Networks. These approaches have become the state of the art in several NLP tasks, like Similarity Comparison, Named Entity Recognition and Question Answering, to mention a few [7, 11, 25, 32, 34] The main contribution of this paper is a new optimization target to apply NLP to predict the difculty. Instead of a direct regression 1