International Journal of Multidisciplinary Research and Publications ISSN (Online): 2581-6187 6 Ifan Prihandi, “KNN on Iris Data with Python Programming,” International Journal of Multidisciplinary Research and Publications (IJMRAP), Volume 2, Issue 7, pp. 6-8, 2019. KNN on Iris Data with Python Programming Ifan Prihandi Computer Science, Mercubuana University, Kembangan, Jakarta Barat, 11650 Email address: ifan[DOT]prihandi[AT]mercubuana[DOT]ac[DOT]id Abstract— K-Nearest Neighbor (K-NN) is a classification technique that makes explicit predictions on test data based on a comparison of K nearest neighbors. In the process of data mining will extract valuable information by analyzing the existence of certain patterns or relationships of large data. Data mining is related to other fields of science, such as Database Systems, Data Warehousing, Statistics, Machine Learning, Information Retrieval, and High-Level Computing. In addition, data mining is supported by other sciences such as Neural Network, Pattern Recognition, Spatial Data Analysis, Image Database, Signal Processing. Several surveys of the modeling process and methodology state that, "Data mining is used as a guide, where data mining presents the essence of history, description and as a standard guide regarding the future of a data mining model process. Keywords— K-Nearest Neighbor, classification, Data mining, model. I. INTRODUCTION (Kamil, Kemas, Eng, W, & Kom, 2015) Dataset is the embodiment of data in memory that provides a consistent relational program model regardless of the origin of the data source. Used to set the query itself to be run by using DataAdapter in using parameters in report generation. (Yulianton, 2014) The formal definition of data mining is the process of extracting valid, useful, unknown, and understandable information from data and using it to make business decisions. Data mining is also commonly referred to as "Data or knowledge discovery" or discovering hidden patterns in data. Data mining is the process of analyzing data from different perspectives and summarizing it into useful information. Data mining is defined as the process of extracting or mining knowledge needed from large amounts of data. In the process of data mining will extract valuable information by analyzing the existence of certain patterns or relationships of large data. Data mining is related to other fields of science, such as Database Systems, Data Warehousing, Statistics, Machine Learning, Information Retrieval, and High-Level Computing. In addition, data mining is supported by other sciences such as Neural Network, Pattern Recognition, Spatial Data Analysis, Image Database, Signal Processing. Several surveys of the modeling process and methodology state that, "Data mining is used as a guide, where data mining presents the essence of history, description and as a standard guide regarding the future of a data mining model process. (Dwi Retnosari, 2014) Machine Learning is an area in artificial intelligence that is related to the development of techniques that can be programmed and learned from past data. Pattern recognition, data mining and machine learning are often used to refer to the same thing. This field deals with probability and statistics, sometimes optimization. Machine learning becomes an analytical tool in data mining. 1.1 Problem Based on the background described above, then in broad outline, the formulation of the problem is: 1. Find the group in the data, with the number represented by the K value. Variable K itself is the number of clusters desired? 2. What is the Train to Splits Process? 1.2 Writing The purpose of this paper is: 1. The centroid of cluster K, which can be used to label new data. 2. Label training data and testing data. 1.3 Scope / Limitation of Problems Based on the identification of the problems above, the authors limit the problems discussed in this study, namely the application of the K-NN algorithm only to training data and testing data. II. THEORETICAL BASIS 2.1 K-Nearest Neighbor (Prasetyo, 2015) K-Nearest Neighbor (K-NN is a classification technique that makes explicit predictions on test data based on a comparison of K nearest neighbors), KNN is one of the nonparametric machine learning algorithms (models). The discussion on parametric models and nonparametric models can be their article, but in short, the definition of nonparametric models is a model that does not assume anything about the distribution of instances in the dataset. Nonparametric models are usually more difficult to interpret, but one of the advantages is that the class decision