EDUCATIONAL DATA CLASSIFICATION USING K-NEAREST NEIGHBOURS: A SUPERVISED DATA MINING TECHNIQUE Pratiyush Guleria #1 , Manu Sood #2 # Department of Computer Science, Himachal Pradesh University Shimla, Himachal Pradesh, INDIA 1 pratiyushguleria@gmail.com 2 soodm_67@yahoo.com Abstract— With more stress on skilled manpower, new trends are introduced into the Educational System, which results in large data. The data is mostly unstructured and to transform it into structural form is the need of hour. Educational Data Mining helps in acquiring useful information related to students such as prediction of skilled students, finding new educational trends which are as per industry standards. In this paper training dataset of students are taken on the basis of their assessment and classification approach using KNN is followed on training data for predicting results which not only helps us in getting skilled students which meet the industry standards as well help us in effective decision making. KNN is a non- parametric method used for classification and prediction for test data is done on the basis of its neighbour. Using K nearest Neighbour, prediction of the class for test data is done on query point similarity between test data and training set records. In KNN, to classify a new point, we discover its K nearest Neighbours from the training data.KNN classification is good for large dataset having more variables and large noise. Keywords— Classification, KNN, Nearest Neighbour, Parametric, Prediction. I. INTRODUCTION Neighbour based learning consists of supervised and unsupervised learning methods. Unsupervised Learning consists of clustering i.e. spectral whereas supervised neighbour based learning consists of classification and regression for data. In nearest neighbour methods there are predefined numbers of training samples which are closest in distance to the new point and prediction of labels is done. In K-nearest neighbour learning, the no. of samples taken is user- defined constant and Euclidean Distance, Minkowski, Chebychev are the metric measures. According to [1], Nearest Neighbour classification is an instance-based classification and doesn‟t attempt to construct a general internal model rather stores instances of the training data. In [2], KNN is considered as Lazy Learning Algorithm. Classification is done from majority of the nearest neighbors of each point. Here the data class is assigned for the most representatives within the nearest neighbors of the point.KNN uses uniform weights and the value assigned is computed from majority of nearest neighbors [3]. The training data taken consists of set of vectors and the class label is associated with each vector. There are positive and negative classes. Nearest Neighbour learning is based on the memorization of the training data and classification is done by considering memorized examples [4].K-nearest neighbors uses the local neighbors to do prediction. The distance functions are used to compare the examples similarity [5]. According to Author in [6], a new observation is placed into the class of the observation from the learning set that is closest to the new observation. II. DATA DESCRIPTION The Dataset of 150 students shown in Table I consists of attributes like Innovation, Technical Skills, Logical Reasoning, Practical Knowledge for predicting the class of students in which group they fall. The data collected is based on the assessment done by the faculty members during the semester. In Table III,prediction is done from the calculations performed for the dataset given in Table I for the students of Group 7 which shows that how close Group 7 will be to the other groups like Group-1,Group -2 and other Group students. TABLE I. Educational Dataset Name of Group Innovation Technical Skills Logical Reasoning Practical Knowledge Class Group 1 8 9.5 9.5 8.5 Good Group 2 8 9 7 7 Good Group 3 9 4 8 8 Good Group 4 6 5 6 6.5 Average Group 5 6 6 7 5 Average Group 6 4 2 3 4 Bad ----- ----- ----- ----- ----- ----- Group n 3 5 4 4.5 Bad International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 12, December 2016 864 https://sites.google.com/site/ijcsis/ ISSN 1947-5500