Vol.9 (2019) No. 6 ISSN: 2088-5334 Investigating the Relevant Agro Food Keyword in Malaysian Online Newspapers Mohamad Farhan Mohamad Mohsin # , Siti Sakira Kamaruddin # , Fadzilah Siraj # , Hamirul Aini Hambali # , Mohammed Ahmed Taiye # # School of Computing, College of Arts & Sciences, Universiti Utara Malaysia, Kedah, Malaysia E-mail:farhan@uum.edu.my; sakira@uum.edu.my; fad173@uum.edu.my; hamirul@uum.edu.my; tfeatslekan@gmail.com Abstract— Online newspaper is a valuable resource of information for decision making. To extract relevant information from them is a challenging process when their volume is massive, and its knowledge is in an unstructured form that is scattered on every page. This situation becomes more complicated when different news providers have different styles of journalism when reporting a similar event and use different concepts and terms. In this study, we examined the three Malaysian English online newspapers in order to identify knowledge in terms of the most relevant keywords used in daily online news. The news articles related to Agro-food industries were taken from online news websites - The Star Online, The Sun Daily, and The News Straits Times. During the extraction, about 458 Agro-food industries news articles were scrapped from the website within the time frame of 2014-2017. The keywords were extracted using the RAKE algorithm and were classified into 4 groups i.e. agriculture, livestock, fishery and miscellaneous. The agriculture keywords group was found as the most frequent keywords in all newspapers (58%) and it was followed by the livestock (23%), fishery (12%), and miscellaneous (7%). Through the analysis, there were 146 Agro-related keywords found in all newspapers, repeated 720 times, and the highest Agro terms were found in The Star Online (35.13%), followed by The Sun Daily (33.78%), and The News Straits Times (31.08%). There were 12 Agro keywords0 which considered as the most relevant when they appear in all newspapers- palm oil, rice, fruits, fish, vegetable, livestock, paddy, crop, chicken, animal, meat, and beef. The ‘palm oil’ is the most popular keyword among the three newspapers and it was found 37 times (38.9%) in The Star Online, 26 times (37.9%) in News Straits Time, and repeated 22 times (23.2%) in the Sun. The identified keywords can be recommended as input to form a future Agro inventory. Keywords— agro-food keywords; news mining; RAKE algorithm; text mining; online newspaper. I. INTRODUCTION Across the world, the internet has enabled various tasks to be performed through a computer or smartphone. With fast internet and diverse computer networking technologies, people are using the internet platform to promote business, communicate, search for information and keep abreast of the latest news. Journalism through the online newspaper publication is one of the areas that has overgrown due to internet explosion. With the widespread practice of online newspapers since 1970 [1], it has enabled the society to receive environmentally friendly, free, and instant interactive news updates that can be produced within a short time. Although the online news does not provide a detail description of an event, it offers a quick synopsis about what happened [2]. The online newspaper also benefits the news providers when it has fewer barriers to entry, more extensive distribution coverage, and lower distribution costs. The online newspaper is a valuable resource for information not only to update the reader about what has happened but also to provide input for decision making through the news mining approach. Since the daily news, their volume is massive; thus, the processing and analysis of that news become more challenging. In line with volume, each page in the newspaper often contains many unrelated topics and unstructured knowledge that are scattered on every page that causes difficulty in extracting them [3]. The information in newspaper comprises of the cultural, social, and historical facts of specific regions that bring value to readers and interested parties but the knowledge value is difficult to be extracted easily from the newspapers [4]. Another challenge during the process is to overcome the issues of different words and concepts because of the different journalism practices. The news providers tend to use different words and concepts when reporting a similar event, although they are referring to similar contexts. 2166