(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 8, 2022 An Improved K-Nearest Neighbor Algorithm for Pattern Classiﬁcation Zinnia Sultana 1 , Ashifatul Ferdousi 2 , Farzana Tasnim 3 and Lutfun Nahar 4 Dept. of Computer Science & Engineering International Islamic University Chittagong Chittagong, Bangladesh 1, 2, 3, 4 Abstract—This paper proposed a “Locally Adaptive K-Nearest Neighbor (LAKNN) algorithm” for pattern exploration prob- lem to enhance the obscenity of dimensionality. To compute neighborhood local linear discriminant analysis is an effective metric which determines the local decision boundaries from centroid information. KNN is a novel approach which uses in many classiﬁcations problem of data mining and machine learning. KNN uses class conditional probabilities for unfamiliar pattern. For limited training data in high dimensional feature space this hypothesis is unacceptable due to disﬁgurement of high dimensionality. To normalize the feature value of dissimilar metrics, Standard Euclidean Distance is used in KNN which s misguide to ﬁnd a proper subset of nearest points of the pattern to be predicted. To overcome the effect of high dimensionality LANN uses a new variant of Standard Euclidian Distance Metric. A ﬂexible metric is estimated for computing neighborhoods based on Chi-squared distance analysis. Chi-squared metric is used to ascertains most signiﬁcant features in ﬁnding k-closet points of the training patterns. This paper also shows that LANN outperformed other four different models of KNN and other machine-learning algorithm in both training and accuracy. Keywords—LANN algorithm; Standard Euclidian Distance; variance based Euclidian Distance; feature extraction; pattern classiﬁcation I. I NTRODUCTION Nearest neighbor classiﬁer is a simplest, oldest and wide- ranging method for classiﬁcation. It classiﬁes an unidentiﬁed pattern by choosing the adjacent example in the training set and measured by a distance metric. It is one of the most common instance-based learning method. Simplicity, transparency and fast training time are the advantage of this algorithm. Instances of nearest neighbor denoted as a point of Euclidian space. It is a conceptual method that can be used to approximate real- valued or discrete-valued target function. K nearest neighbor algorithm is best suited for small data sets and which datasets have less features. This algorithm considers close relationship for similar things. In other words, the similar things of neigh- bors are considered one of them. For example, if mangoes’ appearances is more similar to apple, orange, and guava (fruits) than horse, dog and cat (animals), then most likely mango is a fruit. In pattern recognition problem, a feature vector x =(x 1 , ——X q ) ǫℜ q , is considered as an object like J classes, and the goal is to form a classiﬁer that allots x to the exact class from a given set of N training samples. The simplest and alluring approach to solve this problem is the K Nearest Neighbor (KNN) [1][2] classiﬁcation. Rather than ﬁxed data points this method works on continuous and overlapping neighborhoods [3]. This method uses different neighborhood for each single query so that all points in the neighborhood are adjacent to the query to the extent possible [4][5][6]. KNN uses Straight Euclidean distance to discover the k-closest points from query point [7][8][9][10]. This can inﬂuence a real less important feature more than that of others to classify a pattern and misclassify the pattern due to dissimilar metric in measuring the feature values [11][12]. It can seriously affect in the training set with high dimensional feature space [13]. Several biases are introduced in KNN for high dimensional input feature space with limited samples [14]. A modiﬁed metric of Standard Euclidean Distance is proposed here, which uses the variance of each feature to give identical inﬂuence on the decision to all dissimilar metrics in the feature values [15]. Distance is weighted as chi-squared metric that discovers most relevant features in ﬁnding k-closet points to the pattern under consideration from the training space [16]. A locally adaptive form of nearest neighbor classiﬁcation (LANN) is proposed here to upgrade the obscenity of dimen- sionality [17]. An effective metric is used here to compute neighborhoods which determines the local decision boundaries from centroid information, and then shrink neighborhoods in directions orthogonal to these local decision boundaries, and extend them parallel to the boundaries [18][19] [20]. To give all features equal inﬂuences on the pattern classi- ﬁcation a variance based Euclidean distance metric is used in the proposed algorithm instead of straight Euclidean distance metric. The variance of each feature is calculated during training. Fig. 1. Neighborhood of the Query Point. Fig. 1 shows an example. There are two classes and both classes data are produced from a bivariate standard normal www.ijacsa.thesai.org 760 | Page