IJFANS International Journal of Food and Nutritional Sciences ISSN PRINT 2319 1775 Online 2320 7876 Research paper © 2012 IJFANS. All Rights Reserved, UGC CARE Listed ( Group -I) Journal Volume 12, Iss 1, Jan 2023 197 | Page A Novel Machine Learning Approach for Tracking and Predicting Student Performance in Degree Programs VEERA SIVA PRASAD 1 , VIJAY KUMAR PADALA 2 , CHOTAPALLI. KISHORE BABU 3 , SURYA VARA PRASAD NEETIPUDI 4 1 ASST PROFESSOR DEPARTMENT OF COMPUTER SCIENCE, SIR C R REDDY COLLEGE, ELURU, INDIA. 2 ASST PROFESSOR DEPARTMENT OF COMPUTER SCIENCE, SIR C R REDDY COLLEGE, ELURU, INDIA. 3 ASST PROFESSOR DEPARTMENT OF COMPUTER SCIENCE, SIR C R REDDY COLLEGE, ELURU, INDIA. 4 ASST PROFESSOR DEPARTMENT OF COMPUTER SCIENCE, SIR C R REDDY COLLEGE, ELURU, INDIA. spv@sircrreddycollege.ac.in 1 , pvk@sircrreddycollege.ac.in 2 , ckb@sircrreddycollege.ac.in 3 , neethipudisvprasad@gmail.com 4 Abstract Accurately predicting studentsfuture performance based on their ongoing academic records is crucial for effectively carrying out necessary pedagogical interventions to ensure studentson-time and satisfactory graduation. Although there is a rich literature on predicting student performance when solving problems or studying for courses using data-driven approaches, predicting student performance in completing degrees (e.g. college programs) is much less studied and faces new challenges: (1) Students differ tremendously in terms of backgrounds and selected courses; (2) Courses are not equally informative for making accurate predictions; (3) Studentsevolving progress needs to be incorporated into the prediction. In this paper, we develop a novel machine learning method for predicting student performance in degree programs that is able to address these key challenges. The proposed method has two major features. First, a bilayered structure comprising of multiple base predictors and a cascade of ensemble predictors is developed for making predictions based on studentsevolving performance states. Second, a data-driven approach based on latent factor models and probabilistic matrix factorization is proposed to discover course relevance, which is important for constructing efficient base predictors. Through extensive simulations on an undergraduate student dataset collected over three years at UCLA, we show that the proposed method achieves superior performance to benchmark approaches. Keywords: Machine Learning, latent factor models, UCLA I. INTRODUCTION Making higher education affordable has a significant impact on ensuring the nations economic prosperity and represents a central focus of the government when making education policies. Yet student loan debt in the United States has blown past the trillion-dollar mark, exceeding Americanscombined credit card and auto loan debts. As the cost in college education (tuitions, fees and living expenses) has skyrocketed over the past few decades, prolonged graduation time has become a crucial contributing factor to the evergrowing student loan debt. In fact, recent studies show that only 50 of the more than 580 public four-year institutions in the United States have on-time graduation rates at or above 50 percent for their full-time students. To make college more affordable, it is thus crucial to ensure that many more students graduate on time through early interventions on students whose performance will be unlikely to meet the graduation criteria of the degree program on time. A critical step towards effective intervention is to build a system that can continuously keep track of studentsacademic performance and accurately predict their future performance, such as when they are likely to graduate and their estimated final GPAs, given the current progress. Although predicting student performance has been extensively studied in the literature, it was primarily studied in the contexts of solving problems in Intelligent Tutoring Systems (ITSs) or completing courses in classroom settings or in Massive Open Online Courses (MOOC) platforms. However, predicting student performance within a degree program (e.g. college program) is significantly different and faces new challenges. First, students can differ tremendously in terms of backgrounds as well as their chosen areas (majors, specializations), resulting in different selected courses as well as course sequences. On the other hand, the same course can be taken by students in different areas. Since predicting student performance in a particular course relies on the student past performance in other courses, a key challenge for training an