International Journal of Computing and ICT Research, Vol. 3, No. 1, June 2009
57
Text Summarization Based on Genetic Programming
POOYA KHOSRAVIYAN DEHKORDI & DR. FARSHAD KUMARCI
*
Islamic Azad University (Shahrekord Branch)
DR. HAMID KHOSRAVI
International Center for Science & High Technology & Environmental Sciences,
University of Shahid Bahonar Kerman
Abstract:
This work proposes an approach to address the problem of improving content selection in automatic
text summarization by using some statistical tools. This approach is a trainable summarizer, which
takes into account several features, for each sentence to generate summaries. First, we investigate the
effect of each sentence feature on the summarization task. Then we use all features in combination to
train genetic programming (GP), vector approach and fuzzy approach in order to construct a text
summarizer for each model. Furthermore, we use trained models to test summarization performance.
The proposed approach performance is measured at several compression rates on a data corpus
composed of 17 English scientific articles.
Categories and Subject Descriptors: H.3.1 [INFORMATION STORAGE AND RETRIEVAL]:
Content Analysis and Indexing - Abstracting methods; J.3 [Computer Applications]: LIFE AND
MEDICAL SCIENCES - Biology and genetics
General Terms: Algorithms, Human Factors, Experimentation
Additional Key Words and Phrases: Automatic Text Summarization, Genetic Programming, Vectorial
Model, Fuzzy Model
IJCIR Reference Format:
Pooya Khosraviyan Dehkordi, Hamid Khosravi and Farshad Kumarci. Text Summarization Based on
Genetic Programming. International Journal of Computing and ICT Research, ISSN 1818-1139 (Print),
ISSN 1996-1065 (Online), Vol.3, No.1, pp 57-64. http://www.ijcir.org/volume3-number1/article7.pdf.
1. INTRODUCTION
Automatic text summarization has been an active research area for many years. Evaluation of
summarization is a quite hard problem. Often, a lot of manual labour is required, for instance by having
humans read generated summaries and grading the quality of the summaries with regards to different
aspects such as information content and text clarity. Manual labour is time consuming and expensive.
Summarization is also subjective. The conception of what constitutes a good summary varies a lot
between individuals, and of course also depending on the purpose of the summary.
Recently many experiments have been conducted for the text summarization task. Some were
about evaluation of summarization using relevance prediction [Hobson et al. 2007], and voted
regression model [Hirao et al. 2007]. Others were about single- and multiple-sentence compression
using ‘‘parse and trim” approach and a statistical noisy-channel approach [Zajic 2007] and conditional
*
Pooya Khosraviyan Dehkordi (PKhosravyan@iaushk.ac.ir ), Dr. Farshad Kumarci (FKumarci@iaushk.ac.ir ), Islamic Azad
University (Shahrekord Branch); Dr. Hamid Khosravi (Hkhosravi@mail.uk.ac.ir ) International Center for Science & High
Technology & Environmental Sciences, University of Shahid Bahonar Kerman
"Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for components of this work owned by others than IJCIR must be honored. Abstracting with
credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee."
© International Journal of Computing and ICT Research 2009.
International Journal of Computing and ICT Research, ISSN 1818-1139 (Print), ISSN 1996-1065 (Online), Vol.3, No.1 pp. 57-
64, June 2009.