IJIRST –International Journal for Innovative Research in Science & Technology| Volume 1 | Issue 12 | May 2015 ISSN (online): 2349-6010 All rights reserved by www.ijirst.org 306 A Novel Approach to Improve Spam Detection using SDS Algorithm Jibi G Thanikkal Mohammad Danish PG Student Assistant Professor Department of Computer Science & Engineering Department of Computer Science & Engineering Al-Falah University, Faridabad, India Al-Falah University, Faridabad, India Abstract E-shopping is a form of electronic commerce which allows consumers to directly buy goods or services from a seller over the Internet using a web browser. This popularity has made web an excellent source of gathering customer opinions about a product. Positive opinions bring significant business growth and financial gains. Similarly negative opinion cause sales loss and affect companies reputation. There is no reported study on assessing the trustworthiness of opinions, which is crucial for all opinion based applications, although web spam and email spam have been investigated extensively. Existing research is more focused towards classification and summarization of online opinions. In this work, an attempt has been made to detect whether an opinion or the review is a spam or a non-spam, to provide a trusted view to help the customer in taking a decision. The trustworthiness of the reviews is assessed as spam or a non-spam review, which includes both duplicate and near duplicate reviews classified as spam reviews, and partially related and unique reviews classified as non-spam reviews. The proposed method improves the spam detection system using SDS algorithm. Experimental results demonstrate the effectiveness of the proposed technique in detecting spam and non-spam reviews. Keywords: Spam review, similarity, E-shopping, social media, opinion mining _______________________________________________________________________________________________________ I. INTRODUCTION Shopping was earlier meant to be a sensorial activity, but has changed its dimension with the fast growing online shopping or the E-shopping sites. One of the great benefits of online shopping is the ability to read product reviews, written either by experts or fellow online shoppers. User reviews has a critical role in helping shoppers pick one item over the other. A survey conducted on aspects of online shopping web and other factors that come into play when consumers try to decide how to spend their money on which product or service revealed that reviews play an important role in influencing consumers who tend to make purchases online [1]. In recent years, the consumer reviews has increased dramatically [2]. Such opinions, originating from users experiences, regarding specific products, straightaway influence future customer purchase decisions [3]. In other wordsopinionated postings influence prospective potential purchasers to make or reverse purchase decisions. Conversely, a large proportion of favorable reviews attract more customers for a particular product or brand. Positive reviews bring significant business growth and financial gains. Similarly, negative reviews cause sales loss and affect companies reputation [4, 5]. There is also a growing trend of merchants relying on general public‟s opinions to reshape their businesses by improving their products, services, and marketi ng [6,7]. The quality is not a controlling factor in posting review and hence trustworthiness of reviews is a challenging problem. These result in many low quality reviews and review spam. This spam review can mislead reader and the detection is one of the hot research topics. Typically, the reviews consist of an overall product score and some free-form review text to allow the reviewer to describe their experience with the product or service in question. Web user can post products reviews at merchant sites to express their views and interact with other users via blogs and forums. Reviewer gives review and also star rating on the product. It is now well recognized that the user generated content contains valuable information that can be exploited for many applications [7]. Spam is a serious universal problem whether it is in E-commerce reviews or in other fields and impacts all computer users. This affects not only normal users of the internet, but also causes a big problem for companies and organizations since it costs a huge amount of money in lost productivity, wasting users‟ time and network bandwidth. Many studies on spam indicate that spam cost organizations billions of dollars yearly [8]. Recently all major search engine companies have identified adversarial information retrieval [9] as a top priority because of multiple negative effects caused by spam and appearance of new challenges in this area of research. There is not many published studies on review spam although web spam and email spam has been investigated extensively. The existing work has been mainly focused on extracting and summarizing opinions from reviews using natural language processing and data mining techniques [2,10]. Web page spam is widespread due to the economic and publicity value of the rank position of a page returned by a search engine. Web page spam boosts the rank positions of some target pages in search engines