International Journal of Computer Applications (0975 – 8887) Volume 158 – No 5, January 2017 1 Sentiment Classification of Hotel Reviews in Social Media with Decision Tree Learning Stanimira Yordanova Ph.D. Student at University of National and World economy Studentski grad, Sofia 1000, Bulgaria Dorina Kabakchieva Assist. Professor, Ph.D. at University of National and World economy Studentski grad, Sofia1000, Bulgaria ABSTRACT The aim of this paper is to present an approach for prediction of customer opinion, using supervised machine learning approach and Decision tree method for classification of online hotel reviews as positive or negative. The preliminary extraction and preparation of the data used in the research are described. Three classification models are generated for three different data sets - balanced and unbalanced training sets with two schemes of filtering frequent and infrequent words in the attribute list. The results from the classifier evaluation are compared and discussed. The three classification models are also applied on new unseen data for predicting opinion of hotel guests. The achieved results reveal that the most accurate prediction is achieved when applying the model generated from the balanced training set with filtering rare words. General Terms Sentiment classification, Hotel Industry, Online reviews Keywords Sentiment classification, supervised machine learning, decision tree 1. INTRODUCTION At present, one of the main challenges a business organization is facing is to gather and use, in cost-effective and timely manner, all relevant information in order to acquire reliable and meaningful insights to support effective decision-making process. Business Intelligence (BI) Systems provide tools, methods, and technologies, and are a reliable instrument to respond to such challenges, therefore more businesses realize the value and the indispensability to use them in their decision making. Traditional BI systems process structured data, coming from various sources; apply advanced analytical tools and visualize the results interactively to help business users in discovering new beneficial business knowledge. Advanced BI systems also process unstructured data which not only come from organizational inner sources (emails, reports, etc.) but from social media as well. Very popular definition of a social media is ―a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of User Generated Content‖ [1] provided by Kaplan and Haenlein in 2010. Reviews, comments, blogs, microblogs, and forum posts are user generated content in the form of unstructured text data, published on Social media and expressing opinions on topics, products, services, people or organizations. Sharing experience on using products or services in the Social media sites increases the volume of unstructured data from which new business knowledge can be extracted. For most of the industries which are offering products or services, understanding customer experience becomes crucial for improving corporate performance and remaining competitive on the market. Reviews are very popular among hotel customers and extremely important for the hotel industry. On one hand, hotel guests share their experience of using hotel services on review sites like TripAdvisor and Booking.com, thus influencing both, booking decisions of future hotel guests and the online hotel reputation. On the other hand, negative social media feedback is a valuable source for guiding improvements in the provision of hotel services while maintaining positive online hotel reputation has direct impact on decision for purchasing hotel services. Management of online reputation implies monitoring of positive and negative reviews, published on different social media sources. Some of the review sites like Booking.com contain positive and negative feedback labeled by the authors’ review while others like TripAdvisor.com do not provide such option. The first challenge when analyzing hotel guest responses is to predict the opinion of an author, expressed in the hotel review, by classifying it as positive or negative feedback. It can be addressed by application of sentiment analysis. The second challenge is to visualize the results in order to extract business knowledge, achieved by using Business Intelligence tools. This paper focuses on the implementation of a methodology for sentiment classification and prediction of opinion of hotel guests, published in the review sections of hotel travel and accommodation sites. The generated models for prediction of online hotel reviews are presented and compared. Conclusions from the experimental cases are also provided at the end, as well as outlines of the future research activities that will be performed. 2. PROBLEM DEFINITION Discovering valuable knowledge from reviews requires, as a first step, to structure the unstructured user generated content, to analyze the data and to visualize the results in a way to be understood and used by business users. Text mining includes methods and tools for structuring and analyzing unstructured text content generated by hotel reviewers. The process of knowledge discovery from text content of hotel reviews covers (1) gathering and organizing text documents in a corpus; (2) using different techniques for text preprocessing, aimed at structuring the data and extracting key representative features and (3) extracting knowledge using data mining algorithms. In case of using classification algorithms for sentiment prediction expressed in a document, sentiment analysis is applied.