(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 9, No. 7, 2018 Detection of Sentiment Polarity of Unstructured Multi-Language Text from Social Media Saad Ahmed, Saman Hina, Raheela Asif Department of Computer Science NED University of Engineering and Technology Karachi, Pakistan Abstract—In recent years, Twitter has caught the attention of many researchers because of the fact that it is growing very rapidly in terms of number of users and also all the data present as tweets on twitter is public in nature while other social media networks such as Facebook, data is not completely public as users can restrict their post to only users present in their friend list. In this research study, aspect based sentiment analysis (ABSA) was done on the data acquired from social media related to the major cellular network companies of Pakistan (Telenor Pakistan, Mobilink Jazz, Zong, Warid and Ufone). For this research, we have speciﬁcally selected all tweets which are not only in English and Roman Urdu but also mixture of above two languages. We have employed natural language processing (NLP) techniques for pre-processing the dataset and machine learning (ML) techniques to detect the sentiments present in the data. The results are interesting and informative specially for policy makers of cellular companies. These companies can utilize this information to increase the performance of their services. In comparison with the state of the art algorithms, the performance of bagging algorithm with this framework on the acquired dataset has produced F Score of 92.25, which is very encouraging outcome of this research work. Keywords—Social media; sentiment analysis; data mining; cellular networks I. I NTRODUCTION As the advancement in science and technology continues, the research plays a vital role in every science and technology related ﬁeld. This work of research is done on the social media data associated with telecommunication domain. Twitter, a micro blogging website is one of the main stream social media website, which has seen tremendous growth in last few years. In a developing country like Pakistan, common people have now gained access to the Internet and are learning the advantages of social media as a source of information as well as using the same to express their views and ideas about politics, products and services. This makes social media a main source of user generated information which makes it a valuable source of data to perform opinion mining and sentiment analysis of general public. In the last few years, researchers are working on social media data to extract information and then analyze it using different techniques. Some methods of sentiment analysis have been developed in areas of different domains but still a lot of research needs to be done. The social media has become a vital part of everyday life where its users can express their ideas, views or comments about any product or service [1]. These views and comments about products and service are very important for companies which are the provider of those products and services. This information from social media can help these companies to reﬁne their strategies for the improvement of their products and services. Twitter, a micro-blogging real time social media network data is extracted from its website in this research. Twitter generates huge amount of data, this data is extremely valuable for data mining and analyzing sentiments of public. The simplicity of posting tweets in Twitter makes it a suitable data source for real-time sentiment analysis. Twitter has about 300M+ active users who post about 500M tweets in a single day. This huge data which is generated by users is public and is easily available through APIs (Applica- tion Program Interface) to anyone who wants to use this data for analysis. That is why twitter is very popular among research scientists for research purposes. There are several features of twitter such as tweets are maximum of 140 characters, mentions (@) and hashtags (#) which are used by users to refers to any particular event or a company in their tweet. This can be used to collect tweets related to a particular event or company. Tweets have short length, use of local languages and local terms makes it more challenging to analyze and ﬁnd out the sentiments and possible aspects present in it. The Twitter is an important source of data acquisition, but it is very complex analyzing its content as large number of the tweets either use slang language or shorten words. Sentence level and word level polarity classiﬁcation [2] was done using a method based on lexicons, namely, SentiCircles, which builds a dynamic depiction of words in order to determine their suitable semantics. Here, semantics refers to the co-occurrence patterns from each word in the dataset. A different method is feature engineering [3] which produces a result of seven dimensions. This feature engineering method was used to analyze aspects: frequency, afﬁnity, valence, shifter, feature sentiment scoring and categorization. Different type of representations can be utilize, based on dictionaries and lexical aspects of sentences [4], word embedding [5], word and character n-gram [6] among others. The extraction and classiﬁcation of user opinion on the diverse topics is known as sentiment analysis which is also referred as opinion mining. Mostly, two forms of methods are used for sentiment Analysis, which are either based on machine learning or based on vocabulary. The machine learn- www.ijacsa.thesai.org 199 | P a g e