Soft Comput (2015) 19:3109–3121 DOI 10.1007/s00500-014-1471-x METHODOLOGIES AND APPLICATION Exploring actor–object relationships for query-focused multi-document summarization Mohammadreza Valizadeh · Pavel Brazdil Published online: 14 October 2014 © Springer-Verlag Berlin Heidelberg 2014 Abstract Most research on multi-document summariza- tion explores methods that generate summaries based on queries regardless of the users’ preferences. We note that, different users can generate somewhat different summaries on the basis of the same source data and query. This paper presents our study on how to exploit the information regards how users summarized their texts. Models of different users can be used either separately, or in an ensemble-like fash- ion. Machine learning methods are explored in the con- struction of the individual models. However, we explore yet another hypothesis. We believe that the sentences selected into the summary should be coherent and supplement each other in their meaning. One method to model this relation- ship between sentences is by detecting actor–object rela- tionship (AOR). The sentences that satisfy this relationship have their importance value enhanced. This paper combines ensemble summarizing system and AOR to generate sum- maries. We have evaluated this method on DUC 2006 and DUC 2007 using ROUGE measure. Experimental results show the supervised method that exploits the ensemble sum- marizing system combined with AOR outperforms previous models when considering performance in query-based multi- document summarization tasks. Keyword User-based summarization · Actor–object relationship · Multi-document summarization · Ensemble summarizing system · Training data construction Communicated by V. Loia. M. Valizadeh (B ) · P. Brazdil LIAAD INESC Tec, University of Porto, Porto, Portugal e-mail: valizadehmr@gmail.com P. Brazdil FEP, University of Porto, Porto, Portugal e-mail: pbrazdil@inescporto.pt 1 Introduction Document summarization generates a short text for a sin- gle document or multiple documents. This summary should be informative and non-redundant and the process should be efficient. It means that summary should capture the important concepts of the original documents. Abstractive and extrac- tive methods are two main approaches to summarize docu- ments automatically. Abstractive methods are based on lan- guage processing and reformulating the original sentences while extractive methods concatenate the relevant sentences into the summary. The relevant sentences are identified by ranking the sen- tences based on certain features. Therefore, the features and the ranking algorithms using those features are the core of extractive methods. Recent research has exploited various features (e.g., simi- larity between two sentences, sentence length, sentence posi- tion, etc.) that are based on the general information of the data set regardless of users. It is possible to learn models of partic- ular users and combine them into an ensemble summarizing system to generate the summary. This method (i.e., ensem- ble summarizing system) requires new features that enable to capture the users’ preferences. Besides, the learning methods employed should be fast and efficient. Graph-based models are widely used in extractive multi- document summarization systems that represent an unsuper- vised approach. Document set is represented as a graph in which the sentences are represented by nodes and the sim- ilarities between sentences are represented as edges. This paper introduces some features based on the graph topology that have not been used in the past research. These features permit to derive a model of how a particular user selects a sentence for a given summary. Here, we are referring to a supervised approach. We believe that the nodes correspond- 123