IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 8, No. 1, March 2019, pp. 77~86
ISSN: 2252-8938, DOI: 10.11591/ijai.v8.i1.pp77-86 r 77
Journal homepage: http://iaescore.com/online/index.php/IJAI
An improved hybrid feature selection method for huge
dimensional datasets
F.Rosita Kamala
1
, P.Ranjit Jeba Thangaiah
2
1
Department of Computer Science, Bharathiar University, India
2
Department of Information Technology, Karunya Institute of Technology and Sciences, India
Article Info ABSTRACT
Article history:
Received Nov 25, 2018
Revised Feb 1, 2019
Accepted Feb 22, 2019
High dimensions of data cause overfitting in machine learning models, can
lead to reduction in accuracy during classification of instances. Variable
selection is the most essential function in predictive analytics, that reduces
the dimensionality, without losing an appropriate information by selecting a
few significant features of machine learning problems. The major techniques
involved in this process are filter and wrapper methodologies. While filters
measure the weight of features based on the attribute weighting criterion, the
wrapper approach computes the competence of the variable selection
algorithms. The wrapper approach is achieved by the selection of feature
subgroups by pruning the feature space in its search space. The objective of
this paper is to choose the most favourable attribute subset from the novel set
of features, by using the combination method that unites the merits of filters
and wrappers. To achieve this objective, an Improved Hybrid Feature
Selection (IHFS) method is performed to create well-organized learners. The
results of this study shows that the IHFS algorithm can build competent
business applications, which have got a better precision than that of the
constructed which is stated by the previous hybrid variable selection
algorithms. Experimentation with UCI (University of California, Irvine)
repository datasets affirms that this method have got better prediction
performance, more robust to input noise and outliers, balances well with the
available features, when performed comparison with the present algorithms
in the literature review.
Keywords:
Feature selection
Hybrid approach
Machine learning
Overfitting
Predictive analytics
Variable selection
Copyright © 2019 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Rosita Kamala F,
Department of Computer Science,
Bharathiar University,
Coimbatore, Tamil Nadu, India.
Email: rositakamala@gmail.com
1. INTRODUCTION
The machine learning problems use the term curse of dimensionality to refer an exponential increase
of more number of dimensions of features in a mathematical space [1]. High dimensional data is found to be
a major problem identified in supervised and unsupervised learning. High dimensionality often entails high
variance, leading to unstable learning outcomes. To produce stable learning in statistical models of higher
dimensions, a large number of samples are required. Larger volumes result high variance, causing unstable
learning outcomes. Larger calculation is enforced for dealing with high-dimensional datasets. Nowadays,
it is becoming a big challenge to data scientists and business analysts. The increase of features leads to
various problems like noise, error and overfitting [2]. It also leads to increase in computing cost, storing cost
and make data mining a challenging task in various ways. The reduction in classification performance with
number of features is shown in Figure 1. The most effective way to identify relevant features in machine
learning is feature selection. To achieve more accurate prediction, the concept of relevant features is used in