International Research Journal of Engineering and Technology ( I RJET) e-ISSN: 2395 -0056
Volume: 02 Issue: 02 | May-2015 www.irjet.net p-ISSN: 2395-0072
© 2015, IRJET.NET- All Rights Reserved Page 113
Test Model for Rich Semantic Graph Representation for Hindi Text
using Abstractive Method.
Manjula Subramaniam
1,
Prof. Vipul Dalal
2
Computer Engineering, Vidyalankar Institute of Technology, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - In this paper we present a method for
summarizing Hindi Text document by creating rich
semantic graph(RSG) of original document and
identifying substructures of graph that can extract
meaningful sentences for generating a document
summary. This paper contributes the idea to
summarize Hindi text document using abstractive
method. We extract a set of features from each sentence
that helps identify its importance in the document. It
uses Hindi WordNet to identify appropriate position of
word for checking SOV (Subject-Object-Verb)
qualification. Therefore to optimize the summary, we
find similarity among the sentences and merge the
sentence which represented using Rich Semantic Sub
graph which in turn produces a summarized text
document.
Key Wor ds: Text Analysis, Text Summarization,
Abstractive Summary and Rich Semantic Graph
Representation.
1. INTRODUCTION
The data on World Wide Web is growing at an exponential
pace. Nowadays, people use the internet to find
information through information retrieval (IR) tools such
as Google, Yahoo, and Bing and so on.
However, with the exponential growth of information on
the internet, information abstraction or summary of the
retrieved results has become necessary for users. In the
current era of information overload, text summarization
has become an important and timely tool for user to
quickly understand the large volume of information.
Therefore to achieve this goal of summarizing a text
document is condense the document and preserve the
important contents. Nowadays there is a wide range of
technologies which focuses on areas like Human Language
Technology (HLT). These include areas such as Natural
Language Processing (NLP), Speech Recognition,
Machine Translation, Text Generation and Text Mining.
In this paper, we will focus on two of these areas: NLP and
Text Mining which leads to summarizing text.
Text summarization is the process of extracting salient
information from the source text and to present that
information to the user in the form of summary.
Currently, the need for text summarization has appeared
in many areas such as news articles summary, email
summary, short message news on mobile, and information
summary for businessman, government officials, research,
online search engines to receive the summary of pages
found and so on[1].
Text summarization approach is broadly classified into
two summary: extractive and abstractive.
Extractive summary is the procedure of identifying
important sections of the text and producing them
verbatim while Abstractive summary aims to produce
important material in a new generalized form. [1]
In this paper, a novel approach is presented to generate an
abstractive summary automatically for the Hindi input text
document using a semantic graph reducing technique. This
approach exploits a new semantic graph called Rich
Semantic Graph (RSG) [3, 7].RSG is an ontology-based
representation developed to be used as an intermediate
representation for Natural Language Processing (NLP)
applications. The new approach consists of three phases:
creating a rich semantic graph for the source document,
reducing the generated rich semantic graph to more
abstracted graph, and finally generate the abstractive
summary from the abstracted rich semantic graph.
2. BACKGROUND AND RELATED WORK
Text Summarization is shorter version of the original
document while still preserving the main content available
in the source documents. There are various definitions on
text summary in the literature.
According to [8] “The aim of automatic text summarization
is to condense the source text by extracting its most
important content that meets a user’s or application
needs”. According to [9],”A summary is a text that is
produced from one or more texts that contains a
significant portion of the information text(s), and is no
longer than half of the original text(s)”.
There ar e various effective techniques to generate
extractive summary which helps to find relevant
sentences to be added to the summary. This can be
classified as : Statistical, Linguistic and Hybrid approach.