Product News Summarization for Competitor Intelligence using Topic Identification and Artificial Bee Colony Optimization Swapnajit Chakraborti Indian Institute of Management Indore Prabandh Shikhar Indore-453556, India +91-9811135138 fi13swapnajitc@iimidr.ac.in Shubhamoy Dey Indian Institute of Management Indore Prabandh Shikhar Indore-453556, India +91-731-2439526 shubhamoy@iimidr.ac.in ABSTRACT With proliferation of web content, nowadays, various information about companies have become publicly available online. These information are mostly text documents such as news, reports, which can provide useful insight into various aspects about corporations. In order to extract useful information from this huge and diverse collection of texts, appropriate state-of-the-art text mining techniques are necessary. In this paper, a novel multi- document extractive text summarization technique, based on topic identification and artificial bee colony optimization, is described which can be used by companies for extracting important facts from the product-specific news items of their competitors and subsequently use them as one of the inputs for strategic business decision making. The results presented in this paper are based on the corpus created by collecting news items for a specific consumer electronics company from authentic news sites available on the internet. The quality of summary generated using this approach is found to be better on many aspects as compared to summaries generated by a well-known benchmark summarizer called MEAD. CCS Concepts Information systemsInformation systems applications; Information systemsDocument representation; Applied computingDocument management and text processing; Keywords Text Analytics; Text Summarization; Multi Document Summarization; Topic Detection; Clustering; Decision Support Systems; Competitor Intelligence; Artificial Bee Colony Optimization 1. INTRODUCTION Text summarization sets the goal at taking an information source, extracting content, and presenting the most important content to a user in a condensed form and in a manner sensitive to the user’s or application’s needs [18]. Although text summarization techniques have matured a lot over the years, not many applications of summarization are found in real-life business decision making scenarios. This situation is really perplexing because there are indeed various business scenarios which can benefit from such summaries. Of late, researchers have started focusing on applying various text summarization techniques for solving real business problems using predominantly multi- document heterogeneous corpus which managers use for decision making. Especially, this technology has huge potential for competitor intelligence gathering process as conventional processes are not always effective. Competitor intelligence is defined as those activities by which company determines and understands its competitors, their strengths, weaknesses and expectations of their actions [23]. Note that competitor intelligence is a subset of competitive intelligence [22]. This paper focuses on competitor intelligence gathering from publicly available news items about various products of the competitor by applying text summarization technique. 2. LITERATURE REVIEW & MOTIVATION Text summarization definition has primarily three important components [7], namely, the number of text documents used to generate the summary, the size of the summary and retaining of important information. Summaries can either be abstractive or extractive and can use single or multiple documents as source. Significant researches on text summarization includes statistical methods, namely, word and phrase frequency [17], position in the text [3] and key phrases and cue words [10], structural relationship of important words, phrases. [19], NLP techniques [9], supervised learning [14], unsupervised machine learning [21], abstractive methods [20], graph-based methods [11], Maximal Marginal Relevance (MMR) based [4], genetic algorithms (GA) with single local search [2], adaptive differential evolution (DE) optimization problem [1], latent semantic based [8] methods etc. Text summarization has been applied to multiple domains including formation of abstracts of research papers [16], patent mining, biomedical text summarization etc. Recently a text summarization methodology has been proposed for extracting competitor intelligence from publicly available text resources on the web [6][5], but no experimental results were cited. This Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. RACS’15, October 9–12, 2015, Prague, Czech Republic. © 2015 ACM. ISBN 978-1-4503-3738-0/15/10 …$15.00. DOI: http://dx.doi.org/10.1145/2811411.2811465 1