A NOVEL SUPERVISED AND SEMI- SUPERVISED LEARNING BASED FAKE ONLINE REVIEWS DETECTION SYSTEM O.V Chakradhar Reddy*, S. Ramesh Babu** *P.G. Scholar (M. Tech), Dept. Of Computer Science Engineering, Srinivasa Institute of Technology and Science, KADAPA - 516002. Email Id: reddychakri1266@gmail.com **HOD, Dept. Of Computer Science Engineering, Srinivasa Institute of Technology and Science, KADAPA - 516002. Email Id: babu.ramesh19@gmail.com Abstract— Online reviews have great impact on today‘s business and commerce. Decision making for purchase of online products mostly depends on reviews given by the users. Hence, opportunistic individuals or groups try to manipulate product reviews for their own interests. This paper introduces some semi-supervised and supervised text mining models to detect fake online reviews as well as compares the efficiency of both techniques on dataset containing hotel reviews. 1. INTRODUCTION Technologies are changing rapidly. Old technologies are continuously being replaced by new and sophisticated ones. These new technologies are enabling people to have their work done efficiently. Such an evolution of technology is online marketplace. We can shop and make reservation using online websites. Almost, every one of us checks out reviews before purchasing some products or services. Hence, online reviews have become a great source of reputation for the companies. Also, they have large impact on advertisement and promotion of products and services. With the spread of online marketplace, fake online reviews are becoming great matter of concern. People can make false reviews for promotion of their own products that harms the actual users. Also, competitive companies can try to damage each other‘s reputation by providing fake negative reviews. Researchers have been studying about many approaches for detection of these fake online reviews. Some approaches are review content based and some are based on behavior of the user who is posting reviews. Content based study focuses on what is written on the review that is the text of the review where user behavior based method focuses on country, ip-address, number of posts of the reviewer etc. Most of the proposed approaches are supervised classification models. Few researchers, also have worked with semi- supervised models. Semi-supervised methods are being introduced for lack of reliable labeling of the reviews. In this paper, we make some classification approaches for detecting fake online reviews, some of which are semi supervised and supervised and others are supervised. For semi-supervised learning, we use Expectation- maximization algorithm. Statistical Naive Bayes classifier and Support Vector Machines(SVM) are used as classifiers in our research work to improve the performance of classification. We have mainly focused on the content of the review based approaches. As feature we have used word frequency count, sentiment polarity and length of review. In the present scenario, customers are more dependent on making decisions to buy products either on ecommerce sites or offline retail stores. Since these reviews are game changers for success or failure in sales of a product, reviews are being manipulated for positive or negative opinions. Manipulated reviews can also be referred to as fake/fraudulent reviews or opinion spam or untruthful reviews. In today's digital world deceptive opinion spam has become a threat to both customers and companies. Distinguishing these fake reviews is an important and difficult task. These deceptive reviewers are often paid to write these reviews. As a result, it is a herculean task for an ordinary customer to differentiate fraudulent reviews from genuine ones, by looking at each review. There have been serious allegations about multi-national companies that are indulging in defaming competitor‘s products in the same sector. A recent investigation conducted by Taiwan's Fair Trade Commission revealed that Samsung's Taiwan unit called Open tide had hired people to write online reviews against HTC and recommending Samsung smartphones. The people who wrote the reviews, foregrounded what they outlined as flaws in the HTC gadgets and restrained any negative features about Samsung products. Recently ecommerce giant amazon.com had admitted that it had fake reviews on its site and sued three websites accusing them of providing fake reviews, stipulating that they stop the practice. Fakespot.com has taken a lead in detecting fake reviews of products listed on amazon.com and its subsidiary ecommerce sites by providing percentage of fake reviews and grade. Reviews and ratings can directly influence customer purchase decisions. They are substantial to the success of businesses. While positive reviews with good ratings can provide financial improvements, negative reviews can harm the reputation and cause economic loss. Fake reviews and ratings can defile a business. It can affect how others view or purchase a product or service. So it is critical to determine fake/ fraudulent reviews. Traditional methods of The International journal of analytical and experimental modal analysis Volume XII, Issue VIII, August/ 2020 ISSN NO:0886-9367 Page No:1410