Predicting Item Response Theory Parameters Using Qestion
Statements Texts
Wemerson Marinho
Esteban Walter Clua
wemerson_marinho@id.uf.br
esteban@ic.uf.br
Computing Institute - UFF
Niterói, Rio de Janeiro, Brazil
Luis Martí
Inria Chile Research Center
Santiago, Chile
lmarti@inria.cl
Karla Marinho
Tieta.ai
Niterói, Brazil
ABSTRACT
Recently, new Neural Language Models pre-trained on a massive
corpus of texts are available. These models encode statistical fea-
tures of the languages through their parameters, creating better
word vector representations that allow the training of neural net-
works with smaller sample sets. In this context, we investigate the
application of these models to predict Item Response Theory pa-
rameters in multiple choice questions. More specifcally, we apply
our models for the Brazilian National High School Exam (ENEM)
questions using the text of their statements and propose a novel
optimization target for regression: Item Characteristic Curve. The
architecture employed could predict the difculty parameter b of
the ENEM 2020 and 2021 items with a mean absolute error of 70
points. Calculating the IRT score in each knowledge area of the
exam for a sample of 100,000 students, we obtained a mean absolute
below 40 points for all knowledge areas. Considering only the top
quartile, the exam’s main target of interest, the average error was
less than 30 points for all areas, being the majority lower than 15
points. Such performance allows predicting parameters on newly
created questions, composing mock tests for student training, and
analyzing their performance with excellent precision, dispensing
with the need for costly item calibration pre-test step.
CCS CONCEPTS
· Computing methodologies → Information extraction; Neu-
ral networks;· Applied computing → Education.
KEYWORDS
Item response theory, Question difculty prediction, Text regres-
sion, ENEM
ACM Reference Format:
Wemerson Marinho, Esteban Walter Clua, Luis Martí, and Karla Marinho.
2023. Predicting Item Response Theory Parameters Using Question State-
ments Texts. In LAK23: 13th International Learning Analytics and Knowledge
Conference (LAK 2023), March 13ś17, 2023, Arlington, TX, USA. ACM, New
York, NY, USA, 10 pages. https://doi.org/10.1145/3576050.3576139
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
LAK 2023, March 13ś17, 2023, Arlington, TX, USA
© 2023 Association for Computing Machinery.
ACM ISBN 978-1-4503-9865-7/23/03. . . $15.00
https://doi.org/10.1145/3576050.3576139
1 INTRODUCTION
Measuring how much a student knows about a subject is a funda-
mental task to monitor their progress in mastering a skill, decide if
they are prepared for assimilating complex knowledge and make
fairer selections for admission in universities. However, an individ-
ual’s cognitive ability is a latent trait, not directly measurable by
its very nature.
The Item Response Theory (IRT) [1] is a family of mathematical
models that attempt to explain the relationship between latent
traits - unobservable attributes - and their manifestations - observed
outcomes. IRT is based on the premise that the probability of getting
a question/item right is a function of the ability of the individual
and the characteristics of the item.
The statistical procedure of fnding the best parameters that are
characteristics of each question given a set of students’ responses is
called pre-tests. They are laborious and expensive, as they must be
applied to a sufcient number of individuals in each ability range. In
2022, for example, the government of Brazil cogitated repeat items
in their annual National High School Exam (ENEM) because their
data bank was missing calibrated items due to budget problems
[28].
The hypothesis of this work is that we can, for an new generated
question, estimate these parameters using neural networks and past
ofcial exams applications, exploring similarities and characteristics
present in the wording of the statements.
Usually, these exams are applied to thousands or millions of
students, with varied educational backgrounds, allowing consistent
statistical analysis. Security against cheating is a great concern in
these exams: inspectors are hired to supervise students, question
sheets have diferent orders, and communication between candi-
dates is strongly forbidden. For this reason, it is possible to assume
that we can trust that the data refects the knowledge of its appli-
cants, without further external interference. This is not the case
when collecting data online or even in classroom questionnaires.
We extract text features from statements using Neural Language
Models (NLM), recent advances in Natural Language Processing
(NLP), which pre-trains in massive corpus of texts. They allow the
creation of representations that transform words into vectors and
make them ready to be analyzed by computational optimization al-
gorithms, such as Neural Networks. These approaches have become
the state of the art in several NLP tasks, like Similarity Comparison,
Named Entity Recognition and Question Answering, to mention a
few [7, 11, 25, 32, 34]
The main contribution of this paper is a new optimization target
to apply NLP to predict the difculty. Instead of a direct regression
1