Update Summarization Based on Latent Semantic Analysis Josef Steinberger 1 and Karel Jeˇ zek 1 University of West Bohemia, Univerzitni 22, 30614 Plzen Abstract. This paper deals with our recent research in text summa- rization. We went from single-document summarization through multi- document summarization to update summarization. We describe the de- velopment of our summarizer which is based on latent semantic analysis (LSA) and propose the update summarization component which deter- mines the redundancy and novelty of each topic discovered by LSA. The ﬁnal part of this paper presents the results of our participation in the experiment of Text Analysis Conference 2008. 1 Introduction Four years ago we started to develop a summarization method whose core was covered by latent semantic analysis (LSA – [9]). The proposed single-document method [14] modiﬁed the ﬁrst summarization approach, which used LSA repre- sentation of a document [6]. From single-document summarization we went on to produce multi-document summaries [15]. The next step from multi-document summarization is update summarization which piloted in DUC2007 1 and represented the main track in TAC2008. The task of update summarization is to generate short (∼100 words) ﬂuent multi- document summaries of recent documents under the assumption that the user has already read a set of earlier documents. The purpose of each update summary is to inform the reader of new information about a particular topic. When producing an update summary the system has to decide which infor- mation in the set of new documents is novel and which is redundant. (Redundant information is already contained in the set of earlier documents.) This decision is crucial for producing summaries with a high update value. In this paper we present our update summarizer which participated in the TAC2008 evaluation. Our method follows what has been called a term-based approach [7]. In term-based summarization, the most important information in documents is found by identifying their main terms, and then extracting from the documents the most important information about these terms. However, 1 The National Institute of Standards and Technology (NIST) initiated the Docu- ment Understanding Conference (DUC) series [1] to evaluate automatic text sum- marization. Its goal is to further progress in summarization and enable researchers to participate in large-scale experiments. Since 2008 DUC has moved to TAC (Text Analysis Conference) [2] that follows the summarization evaluation roadmap with new or upgraded tracks.