The Application of Gaussian Mixture Models for the Identification of At-Risk Learners in Massive Open Online Courses Raghad Alshabandar Department of Computer Science Liverpool John Moores University Liverpool, United Kingdom R.N.AlShabandar@2013.ljmu.ac.uk Abir Jaafar Hussain Department of Computer Science Liverpool John Moores University Liverpool, United Kingdom A.Hussain@ljmu.ac.uk Robert Keight Department of Computer Science Liverpool John Moores University Liverpool, United Kingdom R.Keight@2015.ljmu.ac.uk Andy Laws Department of Computer Science Liverpool John Moores University Liverpool, United Kingdom A.Laws@ljmu.ac.uk Dr Thar Baker Department of Computer Science Liverpool John Moores University Liverpool, United Kingdom T.M.Shamsa@ljmu.ac.uk Abstract— With high learner withdrawal rates in the setting of MOOC platforms, the early identification of at risk student groups has become increasingly important. Although many prior studies consider the dropout issue in form of a sequence classifi- cation problem, such works address only a limited set of behavioral dynamics, typically recorded as sequance of weekly interval, neglecting important contextual factors such as assignment deadlines that may be important components of student latent engagement. In this paper we therefore aim to investigate the use of Gaussian Mixture Models for the incorpo- ration such important dynamics, providing an analytical assess- ment of the influence of latent engagement on students and their subsequent risk of leaving the course. Additionally, linear regres- sion and , k- nearest neighbors classifiers were used to provide a performance comparison. The features used in the study were constructed from student behavioral records, capturing activity over time, which were subsequently organized into six time inter- vals, corresponding to assignment submission dates. Results ob- tained from the classification procedure yielded an F1-Measure of 0.835 for the Gaussian Mixture Model, indicating that such an approach holds promise for the identification of at risk students within the MOOC setting. I. INTRODUCTION With an increasing interest in open educational resources, Massive open online course (MOOCs) have become an area of continuing growth within both industry and academic settings[1]. In particular, MOOCs offers access to high-quality learning material for people around the world, for a nominal cost. However, despite the lowering of barriers to high-quality education, the ability of students to enrol and withdrew from courses freely often results in high rates of attrition[2][3]. As such, during 2012, the University of Duke offered a bioelectricity course, attracting around 12,175 registered par- ticipants, of which only 315 learners continued to undertake the final exam. At the end of 2012, It was reported that 93% of participants withdrew [2]. Identifying at-risk students with a high probability of premature withdrawal from courses has become of crucial importance, especially including the feedback of adequate information to the remote instructors, such that courses may be adapted to improve engagement [4]. Many studies have been conducted by researchers within which dropout predic- tion models have been proposed; such studies consider the learning behaviour record across various time intervals as key factor to infer student withdrawal [5][6]. Various behavioural features can be driven from behavioural records such as watching video, undertaking assignments, accessing the home page, and reading PDF documents [5][6]. For example, in the case of learner access to the home page of a course module, in current weeks they will continue to interact with the course in the next week. Otherwise, if the student fail to frequently click within the home page of the respective module, the probability of the student entering an at-risk status is increased [6]. Categorizing the latent engagement pattern of learners with respect to the impact on their continuation within course activities remains a challenge [6][7]. Few studies have been undertaken to investigate the latent engagement state as a se- quential classification problem[7]. A notable limitation of the few existing studies is that behavioural features are distributed weekly. As a consequence, prediction models over time depend on a weekly basis without accounting for the assignment submission date as factor of significant influence