Modeling Learner Engagement in MOOCs using Probabilistic Soft Logic Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daum´ e III, Lise Getoor Department of Computer Science, University of Maryland, MD, USA {artir, bert, hal, getoor}@cs.umd.edu, goldwas1@umiacs.umd.edu Abstract Massive open online courses (MOOCs) attract a large number of student registra- tions, but recent studies have shown that only a small fraction of these students complete their courses. Student dropouts are thus a major deterrent for the growth and success of MOOCs. We believe that understanding student engagement as a course progresses is essential for minimizing dropout rates. Formally defining student engagement in an online setting is challenging. In this paper, we leverage activity (such as posting in discussion forums, timely submission of assignments, etc.), linguistic features from forum content and structural features from forum interaction to identify two different forms of student engagement (passive and ac- tive) in MOOCs. We use probabilistic soft logic (PSL) to model student engage- ment by capturing domain knowledge about student interactions and performance. We test our models on MOOC data from Coursera and demonstrate that modeling engagement is helpful in predicting student performance. 1 Introduction Massive open online courses (MOOCs) often attract up to hundreds of thousands of registrants, but only a small fraction of these successfully complete their courses. Even of those students who declare at the start of a course an intent to complete, 75% do not (according to a recent Coursera study [1]). Maintaining and cultivating student engagement is a prerequisite for MOOCs to have broad educational impact. Unlike regular courses in which students engage with class materials in a structured and monitored way, and instructors directly observe student behavior and obtain feedback, the distant nature and the sheer size of an online course require new approaches for providing student feedback and guiding instructor intervention. MOOCs provide a tantalizing opportunity for analyzing large-scale online interaction and behavioral data to improve student engagement, outcomes, and overall experience. To date, this opportunity is purely speculative: little work has truly exploited content (language), structure (social interactions), and outcome data. One significant technical challenge is that to do so requires the ability to combine language analysis of forum posts with graph analysis over very large networks of entities (students, instructors, topics, assignments, quizzes, etc.) to perform predictive modeling. We follow the observation that quantifying and measuring engagement is key to understanding learner participation in the course. In MOOCs particularly, there are different notions of student engagement. Learners often engage in different aspects of the course throughout its duration. For example, some students engage in the social aspects of the online community by posting in forums, asking and answering questions; while others only watch lectures and take quizzes without interact- ing with the community. 1