Soft Comput (2015) 19:3109–3121
DOI 10.1007/s00500-014-1471-x
METHODOLOGIES AND APPLICATION
Exploring actor–object relationships for query-focused
multi-document summarization
Mohammadreza Valizadeh · Pavel Brazdil
Published online: 14 October 2014
© Springer-Verlag Berlin Heidelberg 2014
Abstract Most research on multi-document summariza-
tion explores methods that generate summaries based on
queries regardless of the users’ preferences. We note that,
different users can generate somewhat different summaries
on the basis of the same source data and query. This paper
presents our study on how to exploit the information regards
how users summarized their texts. Models of different users
can be used either separately, or in an ensemble-like fash-
ion. Machine learning methods are explored in the con-
struction of the individual models. However, we explore yet
another hypothesis. We believe that the sentences selected
into the summary should be coherent and supplement each
other in their meaning. One method to model this relation-
ship between sentences is by detecting actor–object rela-
tionship (AOR). The sentences that satisfy this relationship
have their importance value enhanced. This paper combines
ensemble summarizing system and AOR to generate sum-
maries. We have evaluated this method on DUC 2006 and
DUC 2007 using ROUGE measure. Experimental results
show the supervised method that exploits the ensemble sum-
marizing system combined with AOR outperforms previous
models when considering performance in query-based multi-
document summarization tasks.
Keyword User-based summarization · Actor–object
relationship · Multi-document summarization · Ensemble
summarizing system · Training data construction
Communicated by V. Loia.
M. Valizadeh (B ) · P. Brazdil
LIAAD INESC Tec, University of Porto, Porto, Portugal
e-mail: valizadehmr@gmail.com
P. Brazdil
FEP, University of Porto, Porto, Portugal
e-mail: pbrazdil@inescporto.pt
1 Introduction
Document summarization generates a short text for a sin-
gle document or multiple documents. This summary should
be informative and non-redundant and the process should be
efficient. It means that summary should capture the important
concepts of the original documents. Abstractive and extrac-
tive methods are two main approaches to summarize docu-
ments automatically. Abstractive methods are based on lan-
guage processing and reformulating the original sentences
while extractive methods concatenate the relevant sentences
into the summary.
The relevant sentences are identified by ranking the sen-
tences based on certain features. Therefore, the features and
the ranking algorithms using those features are the core of
extractive methods.
Recent research has exploited various features (e.g., simi-
larity between two sentences, sentence length, sentence posi-
tion, etc.) that are based on the general information of the data
set regardless of users. It is possible to learn models of partic-
ular users and combine them into an ensemble summarizing
system to generate the summary. This method (i.e., ensem-
ble summarizing system) requires new features that enable to
capture the users’ preferences. Besides, the learning methods
employed should be fast and efficient.
Graph-based models are widely used in extractive multi-
document summarization systems that represent an unsuper-
vised approach. Document set is represented as a graph in
which the sentences are represented by nodes and the sim-
ilarities between sentences are represented as edges. This
paper introduces some features based on the graph topology
that have not been used in the past research. These features
permit to derive a model of how a particular user selects a
sentence for a given summary. Here, we are referring to a
supervised approach. We believe that the nodes correspond-
123