A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova Stanford University anenkova@stanford.edu Lucy Vanderwende Microsoft Research lucyv@microsoft.com Kathleen McKeown Columbia University kathy@cs.columbia.edu ABSTRACT The usual approach for automatic summarization is sen- tence extraction, where key sentences from the input docu- ments are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we study the contribution to summarization of three factors related to frequency: content word frequency, composition functions for estimating sentence importance from word frequency, and adjustment of frequency weights based on context. We carry out our analysis using datasets from the Document Understanding Conferences, studying not only the impact of these features on automatic summa- rizers, but also their role in human summarization. Our re- search shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensi- tivity improves performance and significantly reduces repe- tition. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing General Terms Measurement, Experimentation, Human Factors Keywords multi-document summarization, frequency, compositional- ity, context-sensitivity 1. INTRODUCTION Most current automatic summarization systems rely on sentence extraction 1 , where key sentences in the input docu- ments are selected to form the summary. Even systems that 1 A description of some most recent systems can be found Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR’06, August 6–11, 2006, Seattle, Washington, USA. Copyright 2006 ACM 1-59593-369-7/06/0008 ...$5.00. go beyond sentence extraction, reformulating or simplifying the text of the original articles, must decide which sentences should be simplified, compressed, fused together or rewritten [10, 11, 28, 2, 6]. Common approaches for identifying im- portant sentences to include in the summary include train- ing a binary classifier (e.g., [12]), training a Markov model (e.g., [4]), or directly assigning weights to sentences based on a variety of features and heuristically determined feature weights (e.g., [26, 14]). But the question of which com- ponents and features of automatic summarizers contribute most to their performance has largely remained unanswered [18]. In this paper, we examine several design decisions and the impact they have on the performance of generic multi- document summarizers of news. More specifically, we study the following issues: Content word frequency. Word frequency is one fea- ture that has been used in many summarization systems and originated in the earliest summarization research [17]. In this approach, content words such as nouns, verbs and ad- jectives serve as surrogates for the atomic units of meaning in text. While frequency has been used as a feature in many summarization systems, no study has isolated its impact on system performance. Only recently have large testsets for evaluation become available as a result of the annual Docu- ment Understanding Conference (DUC) run by NIST, which enable analysis of performance, and by the time DUC began, most systems were using a combination of features and not frequency alone. In this paper, we study the contribution of content word frequency in the input to system performance, showing that content word frequency also plays a role in human summarization behavior. Choice of composition function. The frequency, and thus the importance, of content words can easily be esti- mated from the input to a summarizer. But is this enough to build a summarization system? Normally, a summarizer produces readable text as a summary, not a list of keywords, and thus it must estimate the importance of larger text units, typically sentences. A composition function needs to be chosen that will estimate the importance of a sen- tence as a function of the importance of the content words that appear in the sentence. There are many possibilities for the choice of composition function, and in Section 3 we will discuss three of them, showing that the choice can have a significant impact on the performance of the summarizer, ranging from close to baseline performance to overall state- of-the-art performance. in the online proceedings of the Document Understanding Conference http://duc.nist.gov