Identifying Students with Evasion Risk Using Data Mining Márcio Aurélio dos Santos Alencar Institute of Computing, Federal University of Amazonas - UFAM Manaus, Brazil marcio.alencar@icomp.ufam.edu.br Eulanda Miranda dos Santos Institute of Computing, Federal University of Amazonas - UFAM Manaus, Brazil emsantos@icomp.ufam.edu.br José Francisco de Magalhães Netto Institute of Computing, Federal University of Amazonas - UFAM Manaus, Brazil jnetto@icomp.ufam.edu.br Abstract: The amount of educational institutions which work with distance learning courses is increasing. As a consequence, studies have shown that student dropout rates in this type of educational system have also increased. Even though Virtual Learning Environments (VLE) record all the interactions of the students throughout the course, the information provided by VLE is not enough to predict and to prevent high student dropout rates. In this context, the objective of this paper is to present a survey on data mining approaches and techniques in order to point out data mining-based solutions that can be employed to predict and to prevent high student dropout rates. Keywords: Moodle, data mining, student dropout rates. Introduction Several educational institutions use Virtual Learning Environments (VLE) for teaching and learning. VLEs offer a range of tools whose objective is to facilitate the monitoring of students in distance courses. However, despite these various technological tools for learning, many educational institutions still face the problem of student dropout. Research by Wolff et al (2014) with students of the Open University, one of the largest distance education institutions in Europe, indicate that the main reason for student dropout is lack of tutor monitoring. On the other hand, when an intervention is performed at the right time, based on real time identification of students at risk of dropout and on the basis of appropriate decision-making, a reduction in quitting may be achieved. Besides lack of tutor monitoring, another aspects which increase the dropout rates in distance learning courses are: financial difficulties, lack of time, lack of interaction with the VLE, lack of motivation, lack of school knowledge, lack of technological knowledge, sense of isolation, health problems, excessive content in VLE and age (Mezzari et al, 2013). VLEs worldwide store a large amount of data. However, a human being is usually unable to interpret information and knowledge related to students’ participation in distance education courses recorded by VLEs. As a consequence, it is necessary to develop new tools to extract this data and to generate useful information from it. Focusing on using such stored data to investigate the factors that may contribute to student dropout, data mining techniques have been recently investigated. Data mining provides a set of techniques which can help educational system to overcome several issues such as identifying students need, personalization of training and predicting quality of student interactions by analyzing student’s trends and behaviors towards education in order to improve learning experience of students (Yadav and Pal, 2012). Precisely, Educational Data Mining (EDM) is a research domain which tries to extract and analyze information, recorded by VLEs, related to the process of teaching and learning. According to (Gamma et al, 2014), EDM may benefit students, teachers and education institutions. In such a context, this paper focuses on proposing an architecture to add an EDM module in the Moodle VLE used in School of Distance Education of the Amazon Technical Education Center (CETAM EaD). We plan to run experiments using data collected during 2010 and 2013 from the technical courses of -773- EdMedia 2015 - Montreal, Quebec, Canada, June 22-24, 2015