International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 6 (2019), pp. 917-928
© International Research Publication House. http://www.irphouse.com
917
Predictive Model for the Academic Performance of the Engineering
Students Using CHAID and C 5.0 Algorithm
Editha Rivera Jorda
Centro Escolar University,
Technological University of the Philippines.
Avelina R. Raqueno
Centro Escolar University
Abstract
Many engineering students in Technological University of the
Philippines Manila (TUPM) were either dropouts or dismissed
from the engineering program they enrolled in. The dismissal
or dropping out of students resulted to wastage of the scarce
resources of the government and deprived the opportunity of
the other students. TUPM needs to increase the retention rate
to lessen the number of students who will drop out, on
probation or be dismissed from College of Engineering (COE).
Predictive modeling could be one of them. It is used to detect
student behavior, predicting or understanding student
educational outcome. It is one of the current popular method
in Educational Data Mining (EDM). EDM is a field of
scientific inquiry for the development of method to discover
unique kind of data in educational settings, and using this
method to understand better the students and their learning
environment. As such, the study aimed to develop and validate
a predictive model that will serve as a framework in predicting
the academic performance of the engineering students towards
an improved retention rate at TUPM. The research design of
the paper was descriptive-quantitative. The data of the
engineering students’ final grades from school year 2008 - 2015
were gathered from the Electronics Registration System of
TUPM. The dataset was divided into two sets: training and
testing set. The training set was used to build and validate two
decision tree algorithms namely, C5.0 and Chi-squared
Automatic Interaction Detection (CHAID), using IBM SPSS
Modeler Version 18.0 based on their overall accuracy and ten-
fold cross validation. To determine their significant difference,
t-test was used. Furthermore, the testing set was used to
evaluate C5.0 and CHAID on their overall accuracy, sensitivity
and specificity. Based on the result of the overall accuracy it
was found out that C 5.0 was slightly higher than CHAID and
both were valid. However, the predicted model is CHAID
based on the evaluation of the two algorithms. Hence, CHAID
was the best early warning system for TUPM to detect the
students who are academically at risk. As such, the study
concludes that CHAID modeling algorithm suited best as the
predictive model for identifying students who were likely to be
retained in the COE program and those who were academically
at-risk. .
Keywords: C5.0, CHAID, Decision Tree, Educational Data
Mining, Prediction Model
I. INTRODUCTION
Many students dropped out or were dismissed from the
engineering program at Technological University of the
Philippines-Manila (TUPM); hence, [9] it is urged to use
efficiently its resources to achieve their intended purpose. One
possibility is the use of Data Mining that determines valid,
useful and understandable patterns on the data on the academic
performance of the students by applying pattern recognition
(PR) and machine learning principles in different data sets
called Educational Data Mining (EDM). One popular method
of EDM is Prediction. The Prediction Model determines the
output value in context where it is not desirable to directly
obtain a label for that construct [6].
One of the three types of Prediction is Classification. It predicts
variable in binary or nominal categories. Some of the
classification methods include Decision Tree, Regression,
Neural Networks, Support Vector Machine and Bayesian
network. A classification model based on the technique of
decision tree was applied by [5]. This technique provided a
guideline that help students and school management to choose
the right track of study for a student. On the other hand, [18]
compared the Bayesian network classifiers to predict the
student’s academic performance to help in identifying the drop
outs and students who need special attention and allow the
teacher to provide appropriate counselling / advising.
Likewise, [8] investigated the application of Bayes Network to
predict causal relationship in a dataset that captures several
demographic and academic features of a group of students from
a four-year university.
Each technique employs a learning algorithm to identify the
model that best fits the relationship between the attribute set
and class label of the input data. Thus, a key objective of the
learning algorithm is to build models that accurately predict the
class labels of previously unknown records, that is, models with
good generalization capability. [3] Proposed a framework to
predict the students’ academic performance using the Decision
tree, Naïve Bayes, and Rule Based classification techniques.
The experiment revealed that the Rule Based technique is the
best model with a high accuracy value of 71.3%. Another paper
[14] tried to find out if there were patterns in the available data
that could be useful to predict the students‟ performance using
decision tree (C4.5, J48), Bayesian Classifiers (Naïve Bayes
and Bayes Net), A Nearest Neighbour algorithm and Two Rule
Learners (OneR and JRip). The results revealed that decision
tree classifier (J48) performs best with a high accuracy,
followed by the rule learner (JRip). However, all tested
classifiers had an overall accuracy below 70% which means
which means that the error rate was high and the predictions
were not reliable.
The Prediction Model was used in the study, because it aimed
to develop and validate a predictive model that will serve as a
framework in predicting the academic performance of the
engineering students towards an improved retention rate at