Intelligent Control and Automation, 2010, 1, 105-111
doi:10.4236/ica.2010.12012 Published Online November 2010 (http://www.SciRP.org/journal/ica)
Copyright © 2010 SciRes. ICA
Multi-Document Summarization Model Based on Integer
Linear Programming
Rasim Alguliev, Ramiz Aliguliyev, Makrufa Hajirahimova
Institute of Information Technology of National Academy of Sciences of Azerbaijan
E-mail: a.ramiz@science.az
Received August 28, 2010; revised October 1, 2010; accepted October 3, 2010
Abstract
This paper proposes an extractive generic text summarization model that generates summaries by selecting
sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main
content of the text, and summaries are created by extracting the highest scored sentences from the original
document. The model formalized as a multiobjective integer programming problem. An advantage of this
model is that it can cover the main content of source (s) and provide less redundancy in the generated sum-
maries. To extract sentences which form a summary with an extensive coverage of the main content of the
text and less redundancy, have been used the similarity of sentences to the original document and the
similarity between sentences. Performance evaluation is conducted by comparing summarization outputs
with manual summaries of DUC2004 dataset. Experiments showed that the proposed approach outperforms
the related methods.
Keywords: Multi-Document Summarization, Content Coverage, Less Redundancy, Integer Linear
Programming
1. Introduction
With the rapid growth of the Internet and information
explosion automatic document summarization has drawn
increasing attention in the past. The explosion of elec-
tronic documents has made it difficult for users to extract
useful information from them, and a lot of relevant and
interesting documents are not read by the user due to the
large amount of information [1].
The information overload problem can be reduced by
text summarization. Automatic document summariza-
tion aims to condense the original text into essential
content and to assist in filtering and selection of neces-
sary information. Present search engines usually pro-
vide a short summary for each retrieved document in
order that users can quickly skim through the main
content of the page. Therefore it saves users time and
improves the search engine’s service quality [2]. That is
why the necessity of tools that automatically generate
summaries arises. These tools are not just for profes-
sionals who need to find the information in a short time
but also for large searching engines such as Google,
Yahoo!, AltaVista, and others, which could obtain a lot
of benefits in its results if they use automatic generated
summaries. After that, the user only will require the
interesting documents, reducing the flow information
[1,3].
Depending on the number of documents to be summa-
rized, the summary can be a single-document or a
multi-document [4-6]. Single-document summarization
can only condense one document into a shorter represen-
tation, whereas multi-document summarization can con-
dense a set of documents into a summary. Multidocu-
ment summarization can be considered as an extension
of single-document summarization and used for precisely
describing the information contained in a cluster of
documents and facilitate users to understand the docu-
ment cluster. Since it combines and integrates the infor-
mation across documents, it performs knowledge synthe-
sis and knowledge discovery, and can be used for
knowledge acquisition [5,7].
This paper focuses on the multi-document summariza-
tion. It models text summarization task as an optimiza-
tion problem. This model directly discovers key sen-
tences in the given collection and covers the main con-
tent of the original source(s). The model implemented on
multi-document summarization task. Experiments on
DUC2004 datasets showed that the proposed approach