International Journal of Applied Engineering Research
ISSN 0973-4562 Volume 10, Number 12 (2015) pp. 28873-28881
© Research India Publications
http://www.ripublication.com
Text Summarization using Clustering Technique and
SVM Technique
ShivaKumar KM and Soumya R
Dept.of Computer Science, Amrita Vishwavidyapeetham, Mysore campus
MCA Student, Amrita Vishwavidyapeetham, Mysore campus
3Sushma K Prasad, Amrita Vishwavidyapeetham, Mysore campus
Abstract
The Text Summarization is one of the problem under Natural Language
Processing.This system which gives a single summarized document from
multiple related documents. The summarizer provides an accurate result to the
input query in the form of a precise text document by analyzing the text from
various text document clusters. There are two methodologies- Clustering and
Support Vector Machine (SVM) are used to solve this NLP problem.The
present text summarizer system uses either SVM or Clustering technique. In
this work we propose a Hybrid approach to serve our purpose by cascading
both techniques to get an improved summary of data on related documents.
We pre process the documents to get tokens obtained after stemming and stop
word removal. The hybrid approach helps in summarizing the text documents
efficiently by avoiding redundancy among the words in the document and
ensures highest relevance to the input query.The guiding factors of our results
are the ratio of input to output sentences after summarization.
Keywords: NLP, Summarization, Sentence Score, Word count,cluster,SVM,
tokens, stemming, Frequency.
I. Introduction-
Text summarization has become very significant from many years. In the early days
storage for large data files was expensive. Hence if we store only summarized
documents we can overcome from this disadvantage. To generate a summarized
document we need a reader and identifier to choose between redundant and important
words/sentences in the document cluster to generate summary. A summary is a
content produced by collecting similar information files and extracting only important
points to be added in summary. When the user searches for information by hitting a