IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 8, No. 1, March 2019, pp. 77~86 ISSN: 2252-8938, DOI: 10.11591/ijai.v8.i1.pp77-86 r 77 Journal homepage: http://iaescore.com/online/index.php/IJAI An improved hybrid feature selection method for huge dimensional datasets F.Rosita Kamala 1 , P.Ranjit Jeba Thangaiah 2 1 Department of Computer Science, Bharathiar University, India 2 Department of Information Technology, Karunya Institute of Technology and Sciences, India Article Info ABSTRACT Article history: Received Nov 25, 2018 Revised Feb 1, 2019 Accepted Feb 22, 2019 High dimensions of data cause overfitting in machine learning models, can lead to reduction in accuracy during classification of instances. Variable selection is the most essential function in predictive analytics, that reduces the dimensionality, without losing an appropriate information by selecting a few significant features of machine learning problems. The major techniques involved in this process are filter and wrapper methodologies. While filters measure the weight of features based on the attribute weighting criterion, the wrapper approach computes the competence of the variable selection algorithms. The wrapper approach is achieved by the selection of feature subgroups by pruning the feature space in its search space. The objective of this paper is to choose the most favourable attribute subset from the novel set of features, by using the combination method that unites the merits of filters and wrappers. To achieve this objective, an Improved Hybrid Feature Selection (IHFS) method is performed to create well-organized learners. The results of this study shows that the IHFS algorithm can build competent business applications, which have got a better precision than that of the constructed which is stated by the previous hybrid variable selection algorithms. Experimentation with UCI (University of California, Irvine) repository datasets affirms that this method have got better prediction performance, more robust to input noise and outliers, balances well with the available features, when performed comparison with the present algorithms in the literature review. Keywords: Feature selection Hybrid approach Machine learning Overfitting Predictive analytics Variable selection Copyright © 2019 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Rosita Kamala F, Department of Computer Science, Bharathiar University, Coimbatore, Tamil Nadu, India. Email: rositakamala@gmail.com 1. INTRODUCTION The machine learning problems use the term curse of dimensionality to refer an exponential increase of more number of dimensions of features in a mathematical space [1]. High dimensional data is found to be a major problem identified in supervised and unsupervised learning. High dimensionality often entails high variance, leading to unstable learning outcomes. To produce stable learning in statistical models of higher dimensions, a large number of samples are required. Larger volumes result high variance, causing unstable learning outcomes. Larger calculation is enforced for dealing with high-dimensional datasets. Nowadays, it is becoming a big challenge to data scientists and business analysts. The increase of features leads to various problems like noise, error and overfitting [2]. It also leads to increase in computing cost, storing cost and make data mining a challenging task in various ways. The reduction in classification performance with number of features is shown in Figure 1. The most effective way to identify relevant features in machine learning is feature selection. To achieve more accurate prediction, the concept of relevant features is used in