Product News Summarization for Competitor Intelligence
using Topic Identification and Artificial Bee Colony
Optimization
Swapnajit Chakraborti
Indian Institute of Management Indore
Prabandh Shikhar
Indore-453556, India
+91-9811135138
fi13swapnajitc@iimidr.ac.in
Shubhamoy Dey
Indian Institute of Management Indore
Prabandh Shikhar
Indore-453556, India
+91-731-2439526
shubhamoy@iimidr.ac.in
ABSTRACT
With proliferation of web content, nowadays, various information
about companies have become publicly available online. These
information are mostly text documents such as news, reports,
which can provide useful insight into various aspects about
corporations. In order to extract useful information from this huge
and diverse collection of texts, appropriate state-of-the-art text
mining techniques are necessary. In this paper, a novel multi-
document extractive text summarization technique, based on topic
identification and artificial bee colony optimization, is described
which can be used by companies for extracting important facts
from the product-specific news items of their competitors and
subsequently use them as one of the inputs for strategic business
decision making. The results presented in this paper are based on
the corpus created by collecting news items for a specific
consumer electronics company from authentic news sites
available on the internet. The quality of summary generated using
this approach is found to be better on many aspects as compared
to summaries generated by a well-known benchmark summarizer
called MEAD.
CCS Concepts
• Information systems➝Information systems
applications; • Information systems➝Document
representation; • Applied computing➝Document management
and text processing;
Keywords
Text Analytics; Text Summarization; Multi Document
Summarization; Topic Detection; Clustering; Decision Support
Systems; Competitor Intelligence; Artificial Bee Colony
Optimization
1. INTRODUCTION
Text summarization sets the goal at taking an information source,
extracting content, and presenting the most important content to a
user in a condensed form and in a manner sensitive to the user’s
or application’s needs [18]. Although text summarization
techniques have matured a lot over the years, not many
applications of summarization are found in real-life business
decision making scenarios. This situation is really perplexing
because there are indeed various business scenarios which can
benefit from such summaries. Of late, researchers have started
focusing on applying various text summarization techniques for
solving real business problems using predominantly multi-
document heterogeneous corpus which managers use for decision
making. Especially, this technology has huge potential for
competitor intelligence gathering process as conventional
processes are not always effective. Competitor intelligence is
defined as those activities by which company determines and
understands its competitors, their strengths, weaknesses and
expectations of their actions [23]. Note that competitor
intelligence is a subset of competitive intelligence [22]. This paper
focuses on competitor intelligence gathering from publicly
available news items about various products of the competitor by
applying text summarization technique.
2. LITERATURE REVIEW &
MOTIVATION
Text summarization definition has primarily three important
components [7], namely, the number of text documents used to
generate the summary, the size of the summary and retaining of
important information. Summaries can either be abstractive or
extractive and can use single or multiple documents as source.
Significant researches on text summarization includes statistical
methods, namely, word and phrase frequency [17], position in the
text [3] and key phrases and cue words [10], structural
relationship of important words, phrases. [19], NLP techniques [9],
supervised learning [14], unsupervised machine learning [21],
abstractive methods [20], graph-based methods [11], Maximal
Marginal Relevance (MMR) based [4], genetic algorithms (GA)
with single local search [2], adaptive differential evolution (DE)
optimization problem [1], latent semantic based [8] methods etc.
Text summarization has been applied to multiple domains
including formation of abstracts of research papers [16], patent
mining, biomedical text summarization etc. Recently a text
summarization methodology has been proposed for extracting
competitor intelligence from publicly available text resources on
the web [6][5], but no experimental results were cited. This
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from
Permissions@acm.org.
RACS’15, October 9–12, 2015, Prague, Czech Republic.
© 2015 ACM. ISBN 978-1-4503-3738-0/15/10 …$15.00.
DOI: http://dx.doi.org/10.1145/2811411.2811465
1