© 2015 Ahmed Alsaffar and Nazlia Omar. This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license. Journal of Computer Science Original Research Paper Integrating a Lexicon Based Approach and K Nearest Neighbour for Malay Sentiment Analysis Ahmed Alsaffar and Nazlia Omar Center for AI Technology, FTSM University Kebangsaan Malaysia, UKM 43000 Bangi Selangor, Malaysia Article history Received: 06-05-2015 Revised: 10-06-2015 Accepted: 16-06-2015 Corresponding Author: Ahmed Alsaffar Center for AI Technology, FTSM University Kebangsaan Malaysia, UKM 43000 Bangi Selangor, Malaysia Email: ahmed_saffar5@yahoo.com Abstract: Sentiment analysis or opinion mining refers to the automatic extraction of sentiments from a natural language text. Although many studies focusing on sentiment analysis have been conducted, there remains a limited amount of studies that focus on sentiment analysis in the Malay language. In this article, a new approach for automatic sentiment analysis of Malay movie reviews is proposed, implemented and evaluated. In contrast to most studies that focus on supervised or unsupervised machine learning approaches, this research aims to propose a new model for Malay sentiment analysis based on a combination of both approaches. We used sentiment lexicons in the new model to generate a new set of features to train a k- Nearest Neighbour (k-NN) classifier. We further illustrated that our hybrid method outperforms the state of-the-art unigram baseline. Keywords: Malay Sentiment Analysis, Feature Extraction, Machine Learning, Combinations Techniques Introduction Opinions are playing a primary role in decision- making processes. Whenever people need to make a choice, they are naturally inclined to hear others’ opinions. In particular, when the decision involves consuming valuable resources, such as the time and/or money, people strongly rely on their peers’ past experiences. On the other hand, customers could also learn about positivity or negativity of different features of products/services according to users’ opinions, to make an educated purchase. Furthermore, applications like rating movies based on online movie reviews (Pang et al., 2002) could not emerge without making use of these data. The topic of sentiment analysis has become extremely popular in the last couple of years. There has been a tremendous amount of research on this topic. There are several names for this topic, including opinion mining and sentiment classification. Generally, sentiment analysis is a unique case of text classification, which aims to classify sentiments for subjective texts, usually customer reviews for some product or service. The organizations are looking for opportunities to analyze the personal opinions that are gathered online about their services and products to develop their businesses outcomes. However, there is difficulty in classifying the large volume of online users’ information in order to reflect the users’ opinions accurately. Additionally, the users’ express their opinions based on free texts i.e., unstructured methods which maximize the difficulty of analyzing the opinions polarity from these texts (Puteh et al., 2013). The majority of studies concerns with analyzing the users’ opinions based on English language. There has been a very limited amount of research that focuses on sentiment analysis in the Malay language (Samsudin et al., 2013). The main goal of this work is to identify an optimized set of features that enhance the Malay sentiment analysis and classifications. We consider the bag-of-words (unigrams) as a baseline for sentiment classification. We train the k-Nearest Neighbour (k-NN) classifier based on the unigram feature set and compare them against our new proposed model which combines lexicon knowledge and a supervised machine learning approach for Malay sentiment analysis and classification. There are multiple approaches to sentiment analysis (SA), which may be separated into three main categories: Firstly, supervised machine learning approach that has been implemented in numerous studies (Balahur et al., 2014; Pang et al., 2002; Greaves et al., 2012; Kang et al., 2012; Turney, 2002) Secondly, unsupervised machine learning approach is also a popular technique for sentiment analysis (Gezici et al., 2013).