Automatic Text Summarization with Genetic Algorithm-Based Attribute Selection Carlos N. Silla Jr. 1 , Gisele L.Pappa 2 , Alex A. Freitas 2 , Celso A.A. Kaestner 1 1 Pontifícia Universidade Católica do Paraná (PUCPR) Av. Imaculada Conceição 1155, 80215-901. Curitiba, PR, Brazil {silla; kaestner}@ppgia.pucpr.br 2 Computing Laboratory, University of Kent Canterbury, CT2 7NF, UK {glp6; A.A.Freitas}@kent.ac.uk Abstract. The task of automatic text summarization consists of generating a summary of the original text that allows the user to obtain the main pieces of information available in that text, but with a much shorter reading time. This is an increasingly important task in the current era of information overload, given the huge amount of text available in documents. In this paper the automatic text summarization is cast as a classification (supervised learning) problem, so that machine learning-oriented classification methods are used to produce summaries for documents based on a set of attributes describing those documents. The goal of the paper is to investigate the effectiveness of Genetic Algorithm (GA)-based attribute selection in improving the performance of classification algorithms solving the automatic text summarization task. Computational results are reported for experiments with a document base formed by news extracted from The Wall Street Journal of the TIPSTER collection –a collection that is often used as a benchmark in the text summarization literature. 1 Introduction We are surely living in an era of information overload. Recent studies published by the University of Berkeley [8] indicate that in 2002 about 5 million terabytes of information were produced (in films, printed media or magnetic/optic storage media). This number is equivalent to twice as much the corresponding number for 1999, which indicates a growth rate of about 30% per annum. The Web alone contains about 170 terabytes, which is roughly 17 times the size of the printed material in the USA’s Congress Library. On the other hand, it is very difficult to use the available information. Many problems – such as the search for information sources, the retrieval/extraction of information and the automatic summarization of texts – became important research topics in Computer Science. The use of automatic tools for the treatment of information became essential to the user, because without those tools it is virtually impossible to exploit all the relevant information available in the Web [22].