Product Features based Sentiment Analysis from Twitter Neamat El-Tazi #1 , Abla Lotfy #2 Information Systems department, Faculty of Computers and Artificial Intelligence Cairo University Giza, Egypt 1 n.eltazi@fci-cu.edu.eg 2 abla@fci-cu.edu.eg Abstract— People’s opinions are considered as the most powerful source of market research. Popularly, Social Media has become a tool that is used in an easy way including huge number of users who can share their opinions about products or services and their thoughts about current problems of the society and express their views on political and religious issues. The knowledge extracted from social media contains sentiment data – that is not included in corporate database – that can be used to improve the marketing campaigns to retain customers and meet their needs in a better way. The integration and merging between both social media data and corporate data can lead to better insights that would not have been possible to gain without such integration. In this paper, we will use Twitter as a social media source platform to do a feature based level sentiment analysis using tweets including opinions about a specific product. . The research discussed three different ways to extract (feature/opinion) pairs from each text including: Normal Tokenization, N-gram Modeling Extraction, and Noun Chunking Extraction. The extracted opinion phrase related to each extracted feature is being classified using sentiment classification algorithm. A decision is taken about the best between the three ways according to the resulted measurements. SCDJF had been evaluated using multiple techniques. The best results occurred from Noun Chunking Extraction with accuracy 77%. Summarization of the results will show how this can be used to enhance decision making process of the organization. Summarization of the results will show how this can be used to enhance decision making process of the organization. I. INTRODUCTION Business Intelligence and Analytics are considered as being the processes of extracting and predicting critical insights from different available types of data that are important to the business. The user-generated content considered to be the major component of social media that defines the characteristics of Web 2.0. To develop information connections, individuals are using variety of technologies to access content and join virtual communities on various social networking sites. Most of researchers and organizations are interested in the individuals' perception of social networking sites taking into consideration the dimensions of ease-of-use, usefulness, feeling, usage intention, and information quality. With the rise of this type of big data, there are a lot of opportunities and challenges for the business intelligence especially with the growing number of data sources. Collecting social media data is considered to be the basis of the analysis process as it has become a necessity for most companies to monitor social media reviews about their products and services. In order to do so, companies started to manually reviewing mentions of their brands on different social media platform. The manual approach proved to be not scalable with the huge amount of reviews and does not enable companies to detect real-time customers’ insights not to engage with social customers in a relevant and timely manner. Moreover, in order for companies to gain insights from both social media and their corporate data, data practices need to be adapted and extended to join both types of data. This new joined database can be easily accessed by business intelligence tools for querying and visualization. Because organizations spent a lot of money and time on surveying their products to get customers' feedback so they can know the defects of the system for the future enhancements. As a result, there has been a tremendous need to design methods and algorithms which can effectively process wide variety of text applications. Given a dataset of texts containing opinions about a specific product as a target to extract useful opinions about product features with polarity definition of each opinion and represent them in a format that is easy to digest. This thesis’s objective is to study doing sentiment analysis on the feature level after extracting every product feature from each opinion text; which may contain more than one feature with the related opinion word; by defining the polarity of opinions; which can be negative or positive; attached with each feature separately which resulted in that each text may contain more than one sentiment and they may be contradicting with each other. Having a collection of opined texts about a specific product, the main aim is to extract the features stated in each text with the opinion word that described each feature, filtering the feature- opinion pairs based on the frequency of each feature and then identify the associated sentiments with joining the resulted structured data with corporate database and display a summary of obtained results. The product features were identified in this thesis as explicitly mentioned features that appear as nouns or noun phrases in the opinion text. Fig 1 shows the general architecture of the generated business intelligence framework. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 8, August 2020 https://doi.org/10.5281/zenodo.4012460 62 https://sites.google.com/site/ijcsis/ ISSN 1947-5500