International Journal of Computer Applications (0975 – 8887) Volume 180 – No.50, June 2018 16 Fake Review Detection using Classification Neha S. Chowdhary Department of Computer Applications Veermata Jijabai Technological Institute Anala A. Pandit, PhD Department of Computer Applications Veermata Jijabai Technological Institute ABSTRACT In today‟s world, where Internet has become a household convenience, online reviews have become a critical tool for businesses to control their online reputation. Reviewing has changed the face of marketing in this new era. Nowadays, most companies invest money in mining the reviews to gain insights into customer preferences as well as to gain competitive intelligence and are hiring individuals to write fake reviews. The fraudsters‟ activities mislead potential customers and organizations reshaping their businesses and prevent opinion-mining techniques from reaching accurate conclusions. Thus, it has become essential to detect fake reviews to bring to surface the true product opinion. This paper focuses on product reviews and detecting spam fake reviews among them using supervised learning techniques using synthetic fake reviews (to cover all types) as a training set. Term frequency and user review frequency are two features whose impact on classification model is studied in this paper. It classifies the reviews to test the accuracy of the model. The results have been encouraging with an accuracy of over 98%. General Terms Review spam, Opinion mining, fake review detection, Review spam, fake reviews, opinion spam Keywords Review spam, Opinion mining, fake reviews, Naïve Bayes classification, Opinion Spamming, Random Forest Classifier, Classification Model Evaluation Measures 1. INTRODUCTION The Internet has vastly changed not only the customers‟ perspective on buying online but also the business processes. One could say, there are two worlds: one before ecommerce and one after it. Nowadays, customers prefer buying most products or services through e-commerce or online portals. These e-commerce or online portals have given rise to new techniques for marketing as well as influencing customer‟s decision i.e. reviews. Reviews refer to any view or opinion made about a product or service by an individual usually not associated with the business. The reviews that appear on the website are specifically referred to as user generated content (UGC) [2]. Reviews present a new way to learn about customer preferences, product quality as well as product‟s shortcomings. A review left online is a permanent record of that customer's experience; it can be found by anyone and reach a far wider audience than ever before. Today, almost every online portal enables posting reviews, images and expressing our own views about products or services in blogs or forums or dedicated review websites like Zomato, Yelp etc. This user generated content can be used to discover customers‟ preferences, the strengths and weaknesses of the product, study the market conditions, identify new product launch opportunities and strategize to win from competitors. The easy possibility of monetization using the intelligence obtained from reviews has led to the problem of opinion spam or creation of fake reviews. Companies hire spammers to write undeserving positive reviews to promote their products or negative reviews to destroy the competitor‟s reputation. Unfortunately, driven by the desire for profit or publicity, fraudsters have produced deceptive (spam) reviews [1]. There are various reasons that motivate people to write a review, like the desire to affect a change in the business, product or service or anger at poor product / service or delight at a great product / service or when a product / service is not as expected. The reason could also be an inherent desire to help the public, for instance if the customer is an expert in the product and one would want to share the expertise. Before making any decision about the product, one always first checks the reviews about the product or restaurants or services etc. [3]. Positive opinions can result in significant financial gains and/or fame for organizations and individuals. This provides a good incentive for creation of review/opinion spam. Fake reviews can be written by a shop retailer, business personnel, or individuals who maintain their online identity. As the reviews have become an important decision-making factor, some business hire experts to write spam review with the objective/ intention to promote their image or damage the competitor‟s reputation. There can be two types of fake review written for this purpose either forged positive review or undeserving negative review to encourage/discourage the customers from purchasing the product. In this paper, fake review detection has been considered as binary classification problem with the two classes being: fake and genuine. This paper focuses on detecting fake reviews from a set of product reviews by simulating spam reviews that incorporates various types of opinion spam review features and building a training set and then classifying it using Naïve Bayes Classification and ensemble classification model like random forest to test the accuracy of the model. Various features have been considered while classifying fake reviews. However, the author‟s introduced two more features: i. Using terms or bag of words as features for classification of reviews as either fake or genuine. ii. The impact on the classification model considering the user review frequency on the same product Classifying with these features, improved the accuracy by 26% for Naïve Bayes classifier. The F-Score has taken a leap by 23% for Naïve Bayes and 1% for Random Forest classifier. The remainder of this paper is structured as follows: the next section discusses the work done in the fake review detection domain. Section 3 gives an idea about the cleaning and pre- processing done prior to classification. Section 4 presents the proposed technique for identifying spam reviews. Section 5 gives a brief overview about the dataset, the experiment carried out and the analysis of the results. Finally, Section 6