International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 08 Issue: 06 | June 2021 www.irjet.net p-ISSN: 2395-0072
© 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 4276
A Survey of Extractive Text Summarization for Regional Language
Marathi
Deepali Kadam
Asst. Prof, DMCE, Maharashtra, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - An Automatic text summarization is a data
reduction process to exclude unnecessary details and present
important information in a shorter version. Technology of
automatic text summarization plays an important role in
information retrieval and text classification, and may provide
a solution to the information overload problem we are facing
due to this ocean of data. Though there are major basic two
ways of summarizing data: abstractive and extractive, we are
going to focus on later one. One way to summarize document
is by extracting important sentences in the document and that
is what we call as extractive summarization. Though the
technique has been getting used since almost seven decades
now, quite a less research work has been done in Marathi
language considering summarization. As per the latest survey
(2020 & 2021), Marathi holds 15th rank globally in terms of
having most native speakers. And thus, in this paper, the
survey for the work done in extractive summarization of
Marathi language, though it is limited one, has been presented
and on the basis of that the gaps where the improvement can
be done for the better results have been listed out.
Key Words: Marathi Text summarization, Indian
language, Extractive summarization, Regional language,
Automatic summary, feature extraction.
1. INTRODUCTION
Knowledge is wealth and so is time. There is a huge amount
of data available in this massive empire ruled by the internet
today. People are too busy to read the wordy documents, but
they prefer to read concise ones. And thus, to serve the
society with automatic ready gist of what they aspire to read
without wasting their valuable time, text Summarization
comes to the rescue.
Text summarization provides the user with condensed
description of documents and a non redundant presentation
of facts found in the document. Automatic text
summarization has been in existence since last few decades.
While there are a number of problems remaining to be
solved, the field has seen quite a lot of progress, especially in
the last two decades, on extraction-based methods. The
rapidly growing popularity of Internet has become an
important symbol of the information age. In the phase of the
flood tide of electronic literature, to search for a way to read
the necessary and compact information, is clearly inefficient
and infeasible. Therefore, by reading the summary to obtain
useful information is the best way to save our time.
Automatic Text Summarization plays an inevitable role in
everyday life. For example, headlines of news, summary of
technical paper, review of book or preview of a movie.
There are two major basic types: Abstractive and Extractive.
Extractive process is significantly different from human
based text summarization i.e. abstractive one. Since human
can capture and relate deep meanings and themes of text
documents while automation of such a skill is very difficult
to implement.
One of the obvious questions to ask in doing summarization
is “what are the properties of text that should be represented
or kept in a summary?” Summarization generally happens in
two phases: Pre-processing and Processing. Pre-processing
further subdivides into sentence segmentation, tokenization,
stop word removal and stemming. The most common
features for pre-processing, researchers have considered till
now are average tf-isf, sentence length, sentence position,
numerical data, sentence to sentence similarity, title word
feature, thematic word feature, proper noun feature etc.
Language is surely a tool we use to communicate and
express ourselves, a means towards an end, if you will.
However, one’s mother tongue is way more than being
merely a tool. Marathi language is ranked 15th in terms of
having 83 million native speakers in India alone, not to
speak of the overall 95 million speakers who speak Marathi
as their mother tongue. It is a regional and official language
of Maharashtra. Even then nominal work has been done in
Marathi Text Summarization. So, Marathi language has been
chosen as language of study. Marathi is written in the
Devanagari script which has one of the largest alphabet set.
2. LITERATURE SURVEY
2.1. A Survey of Automatic Text Summarization
System for Different Regional Language in India
Virat Giri and Dr. M. M. Math[9] contemplated automatic text
summarization system for different regional languages in
India. They have done most of their work from the ground
up. This is because, a rare work had been done before 2016
in Marathi language processing. They studied the techniques
in various languages and tried to apply it for Maharashtra’s
official language. They developed Marathi stemmer, Marathi
proper name list, English-Marathi noun list, Marathi
keywords extraction; Marathi rule based named entity
recognition etc. These lexical resources are used in pre-
processing and processing steps. The three major steps