International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019
952
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number F8255088619/2019©BEIESP
DOI: 10.35940/ijeat.F8255.088619
Abstract: Prediction of client behavior and their feedback
remains as a challenging task in today’s world for all the
manufacturing companies. The companies are struggling to
increase their profit and annual turnover due to the lack of exact
prediction of customer like and dislike. This leads to the
accomplishment of machine learning algorithms for the
prediction of customer demands. This paper attempts to identify
the important features of the wine data set extracted from UCI
Machine learning repository for the prediction of customer
segment. The important features are extracted for the various
ensembling methods like Ada boost regressor, Ada boost
classifier, Random forest regressor, Extra Trees Regressor,
Gradient booster regressor. The extracted feature importance of
each of the ensembling methods is then fitted with logistic
regression to analyze the performance. The same extracted
feature importance of each of the ensembling methods are
subjected to feature scaling and then fitted with logistic
regression to analyze the performance. The Performance
analysis is done with the performance metric such as Mean
Squared error (MSE), Mean Absolute error (MAE), R2 Score,
Explained Variance Score (EVS) and Mean Squared Log Error
(MSLE). Experimental results shows that after applying feature
scaling, the feature importance extracted from the Extra Tree
Regressor is found to be effective with the MSE of 0.04, MAE of
0.03, R2 Score of 94%, EVS of 0.9 and MSLE of 0.01 as
compared to other ensembling methods.
Index Terms: Machine Learning, Mean Squared error, Mean
Absolute error, R2 Score, Explained Variance Score and Mean
Squared Log Error.
I. INTRODUCTION
Generally the dataset in the market have a lot of attributes.
The single dependent variable of the dataset is predicted by
the occurrence of one or more independent variables.
However, the dependent variable does not need the existence
of all independent variable for its predicted. Some of the
independent variable are not at all involved in the prediction
of the target variable. So it is very essential to find the
Revised Manuscript Received on August 05, 2019
M. Shyamala Devi, Associate Professor, Computer Science and
Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science
and Technology, Avadi, Chennai, TamilNadu, India.
Rincy Merlin Mathew, Lecturer, Department of Computer Science,
College of Science and Arts, Khamis Mushayt, King Khalid university, Abha,
Asir, Saudi Arabia.
R. Suguna, Professor, Computer Science and Engineering, Vel Tech
Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi,
Chennai, TamilNadu, India.
important features of the machine learning dataset so as to
predict the value of the dependent variable with high
accuracy. The paper is organized in such a way that Section 2
deals with the related works. Section 3 discuss about the
proposed work followed by the implementation and
Performance Analysis in Section 4. The paper is concluded
with Section 5.
II. RELATED WORK
A. Literature Review
The chemical samples and its proposition is needed to
predict the quality of wine. Due to the change in the mixing
of the chemicals and their existence in the wine, the quality
of wine greatly changes. Based on the quality of the wine, the
customers prefer the product. The machine learning models
can be built to find the exact combination of the chemicals to
be added based on the customers behavior. The machine
learning models like Linear Regression, Decision Trees and
Artificial Neural Networks are used to predict the customer
behavior that helps in finding the needed features to
understand the customer’s behavior and demand [1].
The data mining techniques are used to predict the
customers need and their behavior in choosing the wine. The
statistics that are involved in the data mining techniques can
find the exact combination of the independent variables that
are present in the dataset [2].
The customer relationship management is greatly needed
for any business to survive in the current market world. The
utilization charge of wine was evaluated using various
factors like such as manufactured goods involvement, biased
awareness, delicate qualities and socio demography [3].
Due to the growth in the online shopping, the customers
wish to buy the high quality wine through online web portal
shopping. In this scenario, the customers just view the quality
of the wine only through the ingredients present in the wine
[4]. The various wine brands has worth in their improvement
and the current market is highly competitive [5].
A critical review on various feature selection, feature
extraction methods, classification methods and the
performances parameters are examined for predicting the
wine quality [6]-[10].
Regressor Fitting Of Feature Importance For
Customer Segment Prediction With Ensembling
Schemes Using Machine Learning
M. Shyamala Devi, Rincy Merlin Mathew, R. Suguna