International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 08 Issue: 02 | Feb 2021 www.irjet.net p-ISSN: 2395-0072
© 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2147
Marathi Text Summarizer Using Deep Learning Model
Shruti Bhoir
1
, Tanvi Hule
2
, Deepali Kadam
3
1
BE Student, Information Technology, Datta Meghe College of Engineering, Airoli, Maharashtra
2
BE Student, Information Technology, Datta Meghe College of Engineering, Airoli, Maharashtra
3
Asst. Professor, Information Technology, Datta Meghe College of Engineering, Airoli, Maharashtra
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – Text summarization is a technique which
converts the original text into short text by selecting
important sentences without changing its original meaning .
It is difficult for human to convert text manually. We present
technique for extractive summarization of articles for
Marathi language, in which it will consists of feature
extraction, selecting important sentences, paragraphs etc.
from the original document and catenating them into shorter
form using deep learning model. In this system, we develop
system in two stages. First stage is Summarization of Domain
Specific Marathi article. In Second stage we will extend our
model for generic article will be tested on various Marathi
inputs. Such a summarization technique is known for English
articles, and doing it for Marathi news is the novel part of the
work.
Key Words: Marathi article, Summarization, Extractive
summary, Feature extraction ,Deep learning.
1.INTRODUCTION
Automatic Marathi text summarization is technique of
shortening the original text into shorter form which will give
exact meaning of the original text. Summarization can be
classified into two groups: extractive and abstractive
summarization. Maximum of summarization systems are for
English and other languages. For Marathi language,
automatic text summarization systems are less. There is very
less work done on Marathi summarization systems.
1.1 Problem Definition
The overall scope of the project is to have a deeper
knowledge of the techniques in Machine Learning, Deep
Learning and Data Analysis in order to generate concise
summaries of long texts, which lets a user see a summary.
The scope also involves understanding of why Machine
Learning are successful at phrasing sentences and how they
treat some input words more important than the others by
assigning the appropriate weights, and have a better
overview internally of how Machine Learning. We chose
Marathi over all other languages because there are no
projects or summarizer created on Marathi language yet. We
present technique for extractive summarization of articles for
Marathi language, in which it will consists of selecting
important sentences, paragraphs etc. from the original
document and creating a summary out of it.
2. PROPOSED SYSTEM
The summary produced by summarization system allows
the user to easily understand the content of original
documents without having to read each the whole document.
2.1 Abstractive and Extractive:
A. Abstractive: Abstractive summarization consists of
understanding the source text by using linguistic method
to interpret the text and expressing it in own language.
B. Extractive: Extractive summaries involve extracting
relevant sentences from the source text in proper order.
The important sentences are selected by applying
statistical and language features to the input text.
2.2 Modules
The proposed Marathi text summarization method is
extraction based.
Fig -1: Design of Module
A. Pre-processing:
In pre-processing step the stop-word is removed, stemming
and breaking the input document into a collection of
sentences. The punctuation marks, characters like ; , ――:
()[]{} space character, tab space are removed. We will
eliminate these words from text.
B. Stemming:
In stemming, a word is split into its stem and root word. A
stemmer algorithm involves removing suffixes using a list of
frequent suffixes. Pre-processing output is taken as input to
stemming . It is further subdivided into two parts: Root
verification, Suffix removal.