Predicting Students’ Marks in Hellenic Open University
Sotiris B. Kotsiantis & Panayiotis E. Pintelas
Educational Software Development Laboratory
Department of Mathematics University of Patras
{sotos, pintelas}@math.upatras.gr
Abstract
The ability to provide assistance for a student at the
appropriate level is invaluable in the learning process.
Not only does it aids the student’s learning process but
also prevents problems, such as student frustration
and floundering. Students’ key demographic
characteristics and their marks in a small number of
written assignments can constitute the training set for
a regression method in order to predict the student’s
performance. The scope of this work compares some of
the state of the art regression algorithms in the
application domain of predicting students’ marks. A
number of experiments have been conducted with six
algorithms, which were trained using datasets
provided by the Hellenic Open University. Finally, a
prototype version of software support tool for tutors
has been constructed implementing the M5rules
algorithm, which proved to be the most appropriate
among the tested algorithms.
1. Introduction
The application of Machine Learning Techniques in
predicting students’ performance proved to be helpful
for identifying poor performers and it can enable tutors
to take remedial measures at an earlier stage, even
from the very beginning of an academic year using
only students’ demographic data, in order to provide
additional help to the groups at risk [4]. The diagnosis
of students’ performance is increased as new
curriculum data is entered during the academic year,
offering the tutors more effective results. It was
showed in [4] that the most accurate machine learning
algorithm for identifying predicted poor performers is
the Naïve Bayes Classifier. However, that work could
only predict if a student passes a course module or not.
This paper uses existing regression techniques in
order to predict the students’ marks in a distance
learning system. It compares some of the state of the
art regression algorithms to find out which algorithm is
more appropriate not only to predict student’s
performance accurately but also to be used as an
educational supporting tool for tutors. For the purpose
of our study the ‘informatics’ course of the Hellenic
Open University (HOU) provided the data set.
Generally, the usage of regression analysis to
classify data can be an extremely useful tool for
researchers and Open University administrators. A
plethora of data can be utilized simultaneously to
classify cases and the resultant model can be evaluated
for usefulness relatively easily. The ability to develop a
predictive model based on the model produced through
the regression analysis procedure increases its
usefulness substantially. Open Universities can utilize
this dynamic and powerful procedure to target services
and interventions to students who need it most, thereby
utilizing their resources more effectively.
The following section describes in brief the
Hellenic Open University (HOU) distance learning
methodology and the data of our study. Some very
basic definitions about regression techniques are given
in section 3. Section 4 presents the experiment results
for all the tested algorithms and at the same time
compares these results. Section 5 presents the
produced educational decision support tool. Finally,
section 6 discusses the conclusions and some future
research directions.
2. Hellenic Open University and Data
Description
The mission of the Hellenic Open University
(HOU) is to offer university level education using the
distance learning methodology. The basic educational
unit of the HOU is the course module (referred simply
as module from now on) that covers a specific subject
in graduate and postgraduate level. For the purpose of
our study the ‘informatics’ course provided the
training set. A total of 354 instances (student’s
Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies (ICALT’05)
0-7695-2338-2/05 $20.00 © 2005 IEEE