International Journal of Electrical and Computer Engineering (IJECE) Vol. 14, No. 6, December 2024, pp. 7126~7136 ISSN: 2088-8708, DOI: 10.11591/ijece.v14i6.pp7126-7136 7126 Journal homepage: http://ijece.iaescore.com Development of machine learning algorithms in student performance classification based on online learning activities Muhammad Aqif Hadi Alias, Mohd Azri Abdul Aziz, Najidah Hambali, Mohd Nasir Taib School of Electrical Engineering, College of Engineering, Universiti Teknologi MARA (UiTM), Selangor, Malaysia Article Info ABSTRACT Article history: Received Jun 6, 2024 Revised Jul 19, 2024 Accepted Aug 6, 2024 The field of educational data mining has gained significant traction for its pivotal role in assessing students' academic achievements. However, to ensure the compatibility of algorithms with the selected dataset, it is imperative for a comprehensive analysis of the algorithms to be done. This study delved into the development of machine learning algorithms utilizing students' online learning activities to effectively classify their academic performance. In the data cleaning stage, we employed VarianceThreshold for discarding features that have all zeros. Feature selection and oversampling techniques were integrated into the data preprocessing, using information gain to facilitate efficient feature selection and synthetic minority oversampling technique (SMOTE) to address class imbalance. In the classification phase, three supervised machine learning algorithms: k-nearest neighbors (KNN), multi-layer perceptron (MLP), and logistic regression (LR) were implemented, with 3-fold cross-validation to enhance robustness. Classifiers’ performance underwent refinement through hyperparameter tuning via GridSearchCV. Evaluation metrics, encompassing accuracy, precision, recall, and F1-score, were meticulously measured for each classifier. Notably, the study revealed that both MLP and LR achieved impeccable scores of 100% across all metrics, while KNN exhibited a noticeable performance boost after using hyperparameter tuning. Keywords: Classification algorithms Feature selection K-nearest neighbors Logistic regression Multi-layer perceptron Student performance Synthetic minority oversampling technique This is an open access article under the CC BY-SA license. Corresponding Author: Mohd Azri Abdul Aziz School of Electrical Engineering, College of Engineering, Universiti Teknologi MARA (UiTM) Selangor, Malaysia Email: azriaziz@uitm.edu.my 1. INTRODUCTION The performance of students in educational institutions has garnered increasing attention in which a substantial number of institutions have recognized this as a pivotal determinant in enhancing both the overall quality of the institutions and the educational outcomes of their students [1][3]. Identifying at-risk students early in the course offers us the capacity to implement interventions and initiatives to improve their academic performance [4][10]. Consequently, in the pursuit of a deeper comprehension of the learning process and the environmental factors influencing it, the field of educational data mining has gained notable momentum. This discipline assumes a critical role in the classification of students' academic achievements [11], [12]. The application of artificial intelligence in education, particularly machine learning, has increased, with the technology expected to give effective approaches to enhance education in general in the near future [13]. Intelligent m-learning systems have recently gained traction as a method of offering more effective education and flexible learning that is tailored to each student's learning ability [14]. The early attempts to enable such systems, for creating tools to help students and learning in a conventional or online context, through the use of machine learning techniques focused on anticipating student achievement in terms of grades attained [15].