AbstractThe purpose of this paper is to develop models that would enable predicting student success. These models could improve allocation of students among colleges and optimize the newly introduced model of government subsidies for higher education. For the purpose of collecting data, an anonymous survey was carried out in the last year of undergraduate degree student population using random sampling method. Decision trees were created of which two have been chosen that were most successful in predicting student success based on two criteria: Grade Point Average (GPA) and time that a student needs to finish the undergraduate program (time-to-degree). Decision trees have been shown as a good method of classification student success and they could be even more improved by increasing survey sample and developing specialized decision trees for each type of college. These types of methods have a big potential for use in decision support systems. KeywordsData mining, knowledge discovery in databases, prediction models, student success. I. INTRODUCTION URRENTLY in Republic of Croatia there is an ongoing harmonization process in accordance with the Bologna declaration and reform of the higher education system and increase in student performance is one of the goals of the Bologna reform. A model for predicting student performance would provide useful information for proposing policies that could be implemented in the educational process and environment [1], [2]. There are a number of factors that are closely related to studying which influence students’ success, like lecture attendance, passing the course by attending preliminary exams or regular end of term exams, student responsibility, time spent studying for an exam etc. Since there are several studies that focus these factors in their research, in this study focus is shifted more on demographic factors of students. In previous research it has been indicated that demographic factors have significant influence on student success and therefore they should be included in constructing models for predicting student success. Term knowledge discovery in databases or data mining has been introduced relatively recently, in the nineties, but the M. Dragičević is with Zagrebačka banka (part of Unicredit Group), Trg bana Josipa Jelačića 10, 10000 Zagreb (e-mail: mladen.dragicevic@unicreditgroup.zaba.hr). M. Pejić Bach is with Faculty of Economics & Business Zagreb, University of Zagreb, Trg J.F. Kennedyja 6, 10000 Zagreb (e-mail: mpejic@efzg.hr). V. Šimičević is with Hrvatski studiji, Borongajska cesta 83d, Zagreb (e-mail: vsimicevic@hrstud.hr). scope that it covers has a much longer history [3]. Roughly speaking, scope that data mining covers include is statistics, artificial intelligence and machine learning. Statistics is the basis of the most technologies that are used in the process of knowledge discovery in databases. The purpose of statistics in the process is to study the data itself and correlations between the data. On the other side, artificial intelligence is based on heuristics and it represents an attempt to approach statistical problems similar to the human way of thinking [4]. Machine learning has a bit of both approaches and therefore it can be considered as a link between these two concepts. Data mining is usually carried out on larger quantities of data and extraction of new knowledge is usually done from databases, but it is not a rare concept to conduct data mining on data collected through surveys and by other methods. The process of knowledge discovery from databases is usually done in five steps: (1) Understanding problem, (2) Understanding data, (3) preparing data, (4) Modeling data, (5) Evaluation of the model. Knowledge discovery in databases is used in a number of applications for predicting students’ success. There are number of research that used intelligent methods for predicting students’ success. Hardgrave and Wilson [5] compare neural networks with traditional statistical methods for the purpose of predicting students’ success in the graduate study. In the follow-up of the research [32] use more additional models like linear regression, logistic regression and discriminant analysis. Naik and Ragothaman [33] use logit and probit models and compare them with neural networks for predicting students' success. Zaidah and Daliela [6] compare neural networks, linear regression and decision trees, and Oladokun, Adebanjo and Charles-Owaba [7] use multi-layer perception network. Matković, Tomić and Vehovec [8] have published a paper in which they analyze the efficiency of the process of higher education on a random sample of freshly graduated students. As a result of the research they concluded that the chances of successfully graduating are in correlation to the socioeconomic status of students. Zekić-Sušac, Frajman-Jakšić and Drvenkar [9] described models they created for predicting student success using neural network algorithms and classification decision trees. In the paper they also analyze factors that influence student success. Models were created based on demographic data of students, behavior and attitude toward studying. Success was measured by using GPA. Shaw, Marini and Mattern [10] used hierarchical linear modeling in order to predict first-year grade point average by using various variables of Advanced Placement exam. Another example of multilevel modeling is paper by Rienties and Tempelaar [11]. They showed that academic adjustment is Mladen Dragičević, Mirjana Pejić Bach, Vanja Šimičević Improving University Operations with Data Mining: Predicting Student Performance C World Academy of Science, Engineering and Technology International Journal of Economics and Management Engineering Vol:8, No:4, 2014 1101 International Scholarly and Scientific Research & Innovation 8(4) 2014 scholar.waset.org/1307-6892/9998014 International Science Index, Economics and Management Engineering Vol:8, No:4, 2014 waset.org/Publication/9998014