(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 7, 2020 Effective Opinion Words Extraction for Food Reviews Classification Phuc Quang Tran 1 Department of Foreign Languages and Informatics People’s Police College II HCM City, Vietnam Ngoan Thanh Trieu 2 College of Information and Communication Technology Can Tho University Can Tho, Vietnam Nguyen Vu Dao 3 Department of Physical Education Can Tho University Can Tho, Vietnam Hai Thanh Nguyen 4 College of Information and Communication Technology Can Tho University Can Tho, Vietnam Hiep Xuan Huynh 5 College of Information and Communication Technology Can Tho University Can Tho, Vietnam Abstract—Opinion mining (known as sentiment analysis or emotion Artificial Intelligence) holds important roles for e- commerce and benefits to numerous business and organizations. It studies the use of natural language processing, text analysis, computational linguistics, and biometrics to provide us business valuable insights into how people feel about our product brand or service. In this study, we investigate reviews from Amazon Fine Food Reviews dataset including about 500,000 reviews and propose a method to transform reviews into features including Opinion Words which then can be used for reviews classification tasks by machine learning algorithms. From the obtained results, we evaluate useful Opinion Words which can be informative to identify whether the review is positive or negative. KeywordsReview classification; opinion words; machine learning; important features; Amazon I. I NTRODUCTION Along with the strong development of Internet, e- commerce applications, social media such as reviews, fo- rums, blogs, Facebook are increasingly popular. In order to effectively exploit the source of opinion data that users have implemented to evaluate products or raise their views on an issue they are interested in.From there, providing them with useful decisions suitable for individuals, organizations, opinion mining or sentiment analysis system is considered as a decision support tool. The main purpose of opinion mining [19][20] is the research to analyze, calculate the human viewpoints, assessments, attitudes and emotions about objects. such as products, services, organizations, individuals, problems, events, topics, and their various aspects.Opinion mining into three issues as follows: Document-based opinion mining: this is the level of simplicity of opinion mining, the document contains a point of view about a main object expressed by the author of the document. There are two main ways to explore material-based perspectives supervised and unsupervised learning. Sentence-based opinion mining: A single document can contain multiple perspectives even on similar entities. When a more detailed analysis of the various perspectives expressed in the entity documents is sought, a point-based concept mining is carried out. Aspect- based or feature-based opinion mining: research is a research problem focused on identifying all Emotional manifestations in a given document and the aspects they refer to. The previous two methods work well when the entire document or each sentence refers to a single entity. However, in many cases, when referring to entities with many aspects (many attributes) and different views on each of the above aspects. This usually happens in product reviews or in discussion forums specific to specific product categories. Currently, the main approaches to building a opinion mining system include lexicon-based approach [21] and ap- proach machine learning-based [22], hybrid-based approach [22], and recently there is an in-depth approach (deep learning- based) [24]. For lexicon-based approaches, the sentiment dic- tionary and sentinel words are used to determine polarity. There are three techniques [20] for building an emotional vocabulary: manual-based, corpus-based, and dictionary-based approach.These methods have the advantage that emotional vocabulary has broad knowledge. However, the finite number of words in the vocabulary and the emotional score are per- manently assigned to the words in the text [23]. For machine learning-based approach that uses classification techniques to conduct perspective classification, it consists of two data sets: training data set and test data set. Training sets are used to learn the different characteristics of a document, while test sets are used to test the effectiveness of the classifier.The approaches of machine learning method to classify views such as: specific probability classification are Naive Bayes, Bayesian Network, Maximum Entropy used [25]; classification based on deci- sion trees [26]; linear classification as SVM (Support Vector Machine) [25] or Neural Network; rule-based classification. Machine-based approaches are adaptable and create models for contextual specific purposes. However, the applicability is low for new data because it requires the labeling data that can be expensive, the learning ability of machine learning models is weak, so the predictive accuracy is not high. For hybrid- based approaches [23] is a combination of machine learning and vocabulary-based approaches to improve classification performance. However, the drawback of this method is that the assessment documents have a lot of noise from words not related to the entity or aspect of the assessment) are usually www.ijacsa.thesai.org 421 | P a g e