16 International Journal of Intelligent Information Technologies, 10(1), 16-41, January-March 2014
Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
ABSTRACT
A user’s information need, normally represented as a search query, can be satisfed by creating a query focused
coherent and readable summary, by fusing the relevant parts of information from multiple documents. While
aggregating the information from multiple documents, the quality of the summary is improved by eliminating
redundant information from the document set. In this paper, we focus on removing such redundant information
and identifying the essential components from multiple documents (represented as a single global semantic
graph), with respect to the given query (represented as a query graph). While the redundancy elimination is
carried out using various levels of graph matching which are then indicated through canonical labeling of
graphs, the selection of essential components for a query focused summary is performed, through the modi-
fed spreading activation theory, where the query graph is also integrated during the spreading activation
over the global graph. The proposed system shows signifcant improvements in generating summaries when
compared to other existing summarization systems.
A Graph Based Query Focused
Multi-Document Summarization
J Balaji, Department of Computer Science and Engineering, Anna University, Chennai, India
T V Geetha, Department of Computer Science and Engineering, Anna University, Chennai,
India
Ranjani Parthasarathi, Department of Information Science and Technology, Anna University,
Chennai, India
Keywords: Multi-Document Summarization, Query Focused Summary, Redundancy Elimination, Semantic
Graphs, Spreading Activation, Universal Networking Language
INTRODUCTION
With a wide variety of documents available in
the web, text summarization is one of the im-
portant tasks, which effectively compresses the
information in a document(s). Multi-document
summarization is a task of identifying the
important common themes and/or aspects of
multiple documents.
The primary tasks in multi-document
summarization are the identification of simi-
larities and differences between documents
(Wan & Yang, 2008). One of the challenges
of multi-document summarization is that a set
of documents might contain diverse informa-
tion, which is either related or unrelated to the
particular topic. Therefore, effective methods
are needed to analyze the information stored
in different documents, and abstract the glob-
ally important information to reflect the main
topic. In single-document summarization, the
sentences in a document are unique and may
not have redundant information.
DOI: 10.4018/ijiit.2014010102