International Journal of Computing and ICT Research, Vol. 3, No. 1, June 2009 57 Text Summarization Based on Genetic Programming POOYA KHOSRAVIYAN DEHKORDI & DR. FARSHAD KUMARCI * Islamic Azad University (Shahrekord Branch) DR. HAMID KHOSRAVI International Center for Science & High Technology & Environmental Sciences, University of Shahid Bahonar Kerman Abstract: This work proposes an approach to address the problem of improving content selection in automatic text summarization by using some statistical tools. This approach is a trainable summarizer, which takes into account several features, for each sentence to generate summaries. First, we investigate the effect of each sentence feature on the summarization task. Then we use all features in combination to train genetic programming (GP), vector approach and fuzzy approach in order to construct a text summarizer for each model. Furthermore, we use trained models to test summarization performance. The proposed approach performance is measured at several compression rates on a data corpus composed of 17 English scientific articles. Categories and Subject Descriptors: H.3.1 [INFORMATION STORAGE AND RETRIEVAL]: Content Analysis and Indexing - Abstracting methods; J.3 [Computer Applications]: LIFE AND MEDICAL SCIENCES - Biology and genetics General Terms: Algorithms, Human Factors, Experimentation Additional Key Words and Phrases: Automatic Text Summarization, Genetic Programming, Vectorial Model, Fuzzy Model IJCIR Reference Format: Pooya Khosraviyan Dehkordi, Hamid Khosravi and Farshad Kumarci. Text Summarization Based on Genetic Programming. International Journal of Computing and ICT Research, ISSN 1818-1139 (Print), ISSN 1996-1065 (Online), Vol.3, No.1, pp 57-64. http://www.ijcir.org/volume3-number1/article7.pdf. 1. INTRODUCTION Automatic text summarization has been an active research area for many years. Evaluation of summarization is a quite hard problem. Often, a lot of manual labour is required, for instance by having humans read generated summaries and grading the quality of the summaries with regards to different aspects such as information content and text clarity. Manual labour is time consuming and expensive. Summarization is also subjective. The conception of what constitutes a good summary varies a lot between individuals, and of course also depending on the purpose of the summary. Recently many experiments have been conducted for the text summarization task. Some were about evaluation of summarization using relevance prediction [Hobson et al. 2007], and voted regression model [Hirao et al. 2007]. Others were about single- and multiple-sentence compression using ‘‘parse and trim” approach and a statistical noisy-channel approach [Zajic 2007] and conditional * Pooya Khosraviyan Dehkordi (PKhosravyan@iaushk.ac.ir ), Dr. Farshad Kumarci (FKumarci@iaushk.ac.ir ), Islamic Azad University (Shahrekord Branch); Dr. Hamid Khosravi (Hkhosravi@mail.uk.ac.ir ) International Center for Science & High Technology & Environmental Sciences, University of Shahid Bahonar Kerman "Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than IJCIR must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee." © International Journal of Computing and ICT Research 2009. International Journal of Computing and ICT Research, ISSN 1818-1139 (Print), ISSN 1996-1065 (Online), Vol.3, No.1 pp. 57- 64, June 2009.