International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 12, Number 6 (2019), pp. 917-928 © International Research Publication House. http://www.irphouse.com 917 Predictive Model for the Academic Performance of the Engineering Students Using CHAID and C 5.0 Algorithm Editha Rivera Jorda Centro Escolar University, Technological University of the Philippines. Avelina R. Raqueno Centro Escolar University Abstract Many engineering students in Technological University of the Philippines Manila (TUPM) were either dropouts or dismissed from the engineering program they enrolled in. The dismissal or dropping out of students resulted to wastage of the scarce resources of the government and deprived the opportunity of the other students. TUPM needs to increase the retention rate to lessen the number of students who will drop out, on probation or be dismissed from College of Engineering (COE). Predictive modeling could be one of them. It is used to detect student behavior, predicting or understanding student educational outcome. It is one of the current popular method in Educational Data Mining (EDM). EDM is a field of scientific inquiry for the development of method to discover unique kind of data in educational settings, and using this method to understand better the students and their learning environment. As such, the study aimed to develop and validate a predictive model that will serve as a framework in predicting the academic performance of the engineering students towards an improved retention rate at TUPM. The research design of the paper was descriptive-quantitative. The data of the engineering students’ final grades from school year 2008 - 2015 were gathered from the Electronics Registration System of TUPM. The dataset was divided into two sets: training and testing set. The training set was used to build and validate two decision tree algorithms namely, C5.0 and Chi-squared Automatic Interaction Detection (CHAID), using IBM SPSS Modeler Version 18.0 based on their overall accuracy and ten- fold cross validation. To determine their significant difference, t-test was used. Furthermore, the testing set was used to evaluate C5.0 and CHAID on their overall accuracy, sensitivity and specificity. Based on the result of the overall accuracy it was found out that C 5.0 was slightly higher than CHAID and both were valid. However, the predicted model is CHAID based on the evaluation of the two algorithms. Hence, CHAID was the best early warning system for TUPM to detect the students who are academically at risk. As such, the study concludes that CHAID modeling algorithm suited best as the predictive model for identifying students who were likely to be retained in the COE program and those who were academically at-risk. . Keywords: C5.0, CHAID, Decision Tree, Educational Data Mining, Prediction Model I. INTRODUCTION Many students dropped out or were dismissed from the engineering program at Technological University of the Philippines-Manila (TUPM); hence, [9] it is urged to use efficiently its resources to achieve their intended purpose. One possibility is the use of Data Mining that determines valid, useful and understandable patterns on the data on the academic performance of the students by applying pattern recognition (PR) and machine learning principles in different data sets called Educational Data Mining (EDM). One popular method of EDM is Prediction. The Prediction Model determines the output value in context where it is not desirable to directly obtain a label for that construct [6]. One of the three types of Prediction is Classification. It predicts variable in binary or nominal categories. Some of the classification methods include Decision Tree, Regression, Neural Networks, Support Vector Machine and Bayesian network. A classification model based on the technique of decision tree was applied by [5]. This technique provided a guideline that help students and school management to choose the right track of study for a student. On the other hand, [18] compared the Bayesian network classifiers to predict the student’s academic performance to help in identifying the drop outs and students who need special attention and allow the teacher to provide appropriate counselling / advising. Likewise, [8] investigated the application of Bayes Network to predict causal relationship in a dataset that captures several demographic and academic features of a group of students from a four-year university. Each technique employs a learning algorithm to identify the model that best fits the relationship between the attribute set and class label of the input data. Thus, a key objective of the learning algorithm is to build models that accurately predict the class labels of previously unknown records, that is, models with good generalization capability. [3] Proposed a framework to predict the students’ academic performance using the Decision tree, Naïve Bayes, and Rule Based classification techniques. The experiment revealed that the Rule Based technique is the best model with a high accuracy value of 71.3%. Another paper [14] tried to find out if there were patterns in the available data that could be useful to predict the studentsperformance using decision tree (C4.5, J48), Bayesian Classifiers (Naïve Bayes and Bayes Net), A Nearest Neighbour algorithm and Two Rule Learners (OneR and JRip). The results revealed that decision tree classifier (J48) performs best with a high accuracy, followed by the rule learner (JRip). However, all tested classifiers had an overall accuracy below 70% which means which means that the error rate was high and the predictions were not reliable. The Prediction Model was used in the study, because it aimed to develop and validate a predictive model that will serve as a framework in predicting the academic performance of the engineering students towards an improved retention rate at