Received 31 October 2022, accepted 16 November 2022, date of publication 30 November 2022, date of current version 6 December 2022. Digital Object Identifier 10.1109/ACCESS.2022.3225682 A Weighted k -Nearest Neighbours Ensemble With Added Accuracy and Diversity NAZ GUL 1 , MUHAMMAD AAMIR 1 , (Senior Member, IEEE), SAEED ALDAHMANI 2 , AND ZARDAD KHAN 2 1 Department of Statistics, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan 2 Department of Analytics in the Digital Era, United Arab Emirates University, Al Ain, United Arab Emirates Corresponding authors: Zardad Khan (zaar@uaeu.ac.ae) and Saeed Aldahmani (saldahmani@uaeu.ac.ae) ABSTRACT Ensembles based on k NN models are considered effective in reducing the adverse effect of outliers, primarily, by identifying the closest observations to a test point in a given training data. Class label of the test point is estimated by taking a majority-vote of the nearest observations’ class labels. While identifying the closest observations, certain training patterns might possess high regulatory power than the others. Therefore, assigning weights to observations and then calculating weighted distances are deemed important in addressing this scenario. This paper proposes a k NN ensemble that identiﬁes nearest observations based on their weighted distance in relation to the response variable via support vectors. This is done by building a large number of k NN models each on a bootstrap sample from the training data along with a randomly selected subset of features from the given feature space. The estimated class of the test observation is decided via majority voting based on the estimates given by all the base k NN models.The ensemble is assessed on 14 benchmark and simulated datasets against other classical methods, including k NN based models using Brier score, classiﬁcation accuracy and Kappa as performance measures. On both the benchmark and simulated datasets, the proposed ensemble outperformed the other competing methods in majority of the cases. It gave better overall classiﬁcation performance than the other methods on 8 datasets. The analyses on simulated datasets reveal that the proposed method is effective in classiﬁcation problems that involve noisy features in the data. Furthermore, feature weighting and randomization also make the method robust to the choice of k , i.e., the number of nearest observations in a base model. INDEX TERMS Classiﬁcation, feature weighting, k -nearest neighbor ensemble, support vectors. I. INTRODUCTION A wide range of supervised learning techniques have been introduced to deal with classiﬁcation problems. Among these, the nearest neighbour (NN) model is one of the top ranked methods that classiﬁes an unseen observation on the basis of its neighbourhood in a given feature space. Although it is an efﬁcient method but it suffers from the issue of over-ﬁtting. To overcome this disadvantage, one of the most fundamental, simple and appealing approach is k -Nearest Neighbors (k NN) method [1]. This technique can be used for dealing with both classiﬁcation and regression problems in machine learning. The associate editor coordinating the review of this manuscript and approving it for publication was Fu-Kwun Wang . In the context of classiﬁcation, the k NN approach esti- mates a class value for a new/unseen instance by ﬁnding its k nearest neighbors whose classes are known [2], [3], [4]. Initially, it was developed to perform discriminant analysis in situations where reliable parametric estimates of probabil- ity densities are not known. Now-a-days, the k NN method is a preferred method in classifying data when there is little or no prior knowledge about the distribution of the data [5]. It is a widely used classiﬁer because of its simplicity. It shows robustness to noisy training data and effectiveness in case of large training data [5]. In spite of the above advantages this technique has some disadvantages such as: high computation cost due to comput- ing distance of each query point to all the training samples, requirement of large memory for implementation of data 125920 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 10, 2022