ORIGINAL ARTICLE Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings Natthakan Iam-On Tossapon Boongoen Received: 19 August 2014 / Accepted: 9 February 2015 Ó Springer-Verlag Berlin Heidelberg 2015 Abstract Increasing student retention has been a com- mon goal of many academic institutions, especially in the university level. The negative effects of student attrition are evident to students, parents, university and the society as a whole. The first-year students are at the greatest risk of dropping out or not completing their degree on time. With this insight, a number of data mining methods have been developed for early detection of students at risk of dropout, hence the immediate application of assistive measure. As compared to western countries, this subject has attracted only a few studies in Thai university, with educational data mining being limited to the use of conventional classifi- cation models. This paper presents the most recent inves- tigation of student dropout at Mae Fah Luang University, Thailand, and the novel reuse of link-based cluster ensemble as a data transformation framework for more accurate prediction. The empirical study on mixed-type data collection related to students’ demographic detail, academic performance and enrollment record, suggests that the proposed approach is usually more effective than sev- eral benchmark transformation techniques, across different classifiers. Keywords Ensemble clustering Classification Feature transformation Student dropout Educational data mining 1 Introduction Having operated in a sophisticated and highly competitive environment, modern universities commonly seek to analyze their performance, to identify their uniqueness and to formulate a strategy from working experience and knowledge [4]. The recent increase of learning resources, educational software, databases of course and student information having created large repositories of data [28]. This provides a goldmine that can be explored to under- stand students’ learning behavior, preference and perfor- mance [35]. In response, the application of data mining (DM) methodology in education, also known as educa- tional data mining (EDM), has been a fast-growing in- terdisciplinary research field [3, 43]. Discovered knowledge is highly useful to better understand how students learn and effects of different settings to their achievement. This can help to improve educational out- comes and to gain insights into various educational phenomena. In the EDM literature, recent researches have focused on understanding student categories and targeted marketing [2, 44]. This is accomplished through, for instance, using predictive modeling for maximizing student retention [60], developing enrollment prediction models based on admis- sion data [58], predicting student performance and dropout [30, 40]. More specifically, if accurate predictors of aca- demic performance can be obtained, they can be used to gain understanding of success and risk factors with respect to the curriculum [54]. Awareness of these issues by educational staffs and management will help identifying the risk group and determining the appropriate course of measures. In other words, at-risk students would be pro- vided with academic and administrative support to increase the chance of staying on the course. N. Iam-On (&) School of Information Technology, Mae Fah Luang University, Chiang Rai 57100, Thailand e-mail: nt.iamon@gmail.com T. Boongoen Department of Mathematics and Computer Science, Royal Thai Air Force Academy, Bangkok 10220, Thailand e-mail: tossapon_b@rtaf.mi.th 123 Int. J. Mach. Learn. & Cyber. DOI 10.1007/s13042-015-0341-x