Document concept lattice for text understanding and summarization Shiren Ye, Tat-Seng Chua * , Min-Yen Kan, Long Qiu Department of Computer Science, School of Computing, National University of Singapore, Singapore 117543, Singapore Received 18 July 2006; received in revised form 16 February 2007; accepted 8 March 2007 Available online 22 May 2007 Abstract We argue that the quality of a summary can be evaluated based on how many concepts in the original document(s) that can be preserved after summarization. Here, a concept refers to an abstract or concrete entity or its action often expressed by diverse terms in text. Summary generation can thus be considered as an optimization problem of selecting a set of sen- tences with minimal answer loss. In this paper, we propose a document concept lattice that indexes the hierarchy of local topics tied to a set of frequent concepts and the corresponding sentences containing these topics. The local topics will spec- ify the promising sub-spaces related to the selected concepts and sentences. Based on this lattice, the summary is an opti- mized selection of a set of distinct and salient local topics that lead to maximal coverage of concepts with the given number of sentences. Our summarizer based on the concept lattice has demonstrated competitive performance in Document Understanding Conference 2005 and 2006 evaluations as well as follow-on tests. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Text summarization; Document concept lattice; Concept; Semantic 1. Introduction Text summarization is the process of distilling the most important information from sources to produce an abridged version for a particular users and tasks (Mani & Maybury, 1999). It has been applied to news articles, group meeting transcripts, and web pages. In this paper, we review and detail our approach to automatic, multi-document extractive summarization. Such summarization methods simplify the problem of summarization into the problem of selecting a represen- tative subset of the sentences in the original documents. This is in contrast to abstractive summarization, which may compose novel sentences, unseen in the original sources. However, abstractive approaches require deep natural language processing such as semantic representation, inference and natural language generation, which have yet to reach a mature stage today. 0306-4573/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2007.03.010 * Corresponding author. E-mail addresses: yesr@comp.nus.edu.sg (S. Ye), chuats@comp.nus.edu.sg (T.-S. Chua), kanym@comp.nus.edu.sg (M.-Y. Kan), qiul@comp.nus.edu.sg (L. Qiu). Information Processing and Management 43 (2007) 1643–1662 www.elsevier.com/locate/infoproman