International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 08 Issue: 06 | June 2021 www.irjet.net p-ISSN: 2395-0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 4276 A Survey of Extractive Text Summarization for Regional Language Marathi Deepali Kadam Asst. Prof, DMCE, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - An Automatic text summarization is a data reduction process to exclude unnecessary details and present important information in a shorter version. Technology of automatic text summarization plays an important role in information retrieval and text classification, and may provide a solution to the information overload problem we are facing due to this ocean of data. Though there are major basic two ways of summarizing data: abstractive and extractive, we are going to focus on later one. One way to summarize document is by extracting important sentences in the document and that is what we call as extractive summarization. Though the technique has been getting used since almost seven decades now, quite a less research work has been done in Marathi language considering summarization. As per the latest survey (2020 & 2021), Marathi holds 15th rank globally in terms of having most native speakers. And thus, in this paper, the survey for the work done in extractive summarization of Marathi language, though it is limited one, has been presented and on the basis of that the gaps where the improvement can be done for the better results have been listed out. Key Words: Marathi Text summarization, Indian language, Extractive summarization, Regional language, Automatic summary, feature extraction. 1. INTRODUCTION Knowledge is wealth and so is time. There is a huge amount of data available in this massive empire ruled by the internet today. People are too busy to read the wordy documents, but they prefer to read concise ones. And thus, to serve the society with automatic ready gist of what they aspire to read without wasting their valuable time, text Summarization comes to the rescue. Text summarization provides the user with condensed description of documents and a non redundant presentation of facts found in the document. Automatic text summarization has been in existence since last few decades. While there are a number of problems remaining to be solved, the field has seen quite a lot of progress, especially in the last two decades, on extraction-based methods. The rapidly growing popularity of Internet has become an important symbol of the information age. In the phase of the flood tide of electronic literature, to search for a way to read the necessary and compact information, is clearly inefficient and infeasible. Therefore, by reading the summary to obtain useful information is the best way to save our time. Automatic Text Summarization plays an inevitable role in everyday life. For example, headlines of news, summary of technical paper, review of book or preview of a movie. There are two major basic types: Abstractive and Extractive. Extractive process is significantly different from human based text summarization i.e. abstractive one. Since human can capture and relate deep meanings and themes of text documents while automation of such a skill is very difficult to implement. One of the obvious questions to ask in doing summarization is “what are the properties of text that should be represented or kept in a summary?” Summarization generally happens in two phases: Pre-processing and Processing. Pre-processing further subdivides into sentence segmentation, tokenization, stop word removal and stemming. The most common features for pre-processing, researchers have considered till now are average tf-isf, sentence length, sentence position, numerical data, sentence to sentence similarity, title word feature, thematic word feature, proper noun feature etc. Language is surely a tool we use to communicate and express ourselves, a means towards an end, if you will. However, one’s mother tongue is way more than being merely a tool. Marathi language is ranked 15th in terms of having 83 million native speakers in India alone, not to speak of the overall 95 million speakers who speak Marathi as their mother tongue. It is a regional and official language of Maharashtra. Even then nominal work has been done in Marathi Text Summarization. So, Marathi language has been chosen as language of study. Marathi is written in the Devanagari script which has one of the largest alphabet set. 2. LITERATURE SURVEY 2.1. A Survey of Automatic Text Summarization System for Different Regional Language in India Virat Giri and Dr. M. M. Math[9] contemplated automatic text summarization system for different regional languages in India. They have done most of their work from the ground up. This is because, a rare work had been done before 2016 in Marathi language processing. They studied the techniques in various languages and tried to apply it for Maharashtra’s official language. They developed Marathi stemmer, Marathi proper name list, English-Marathi noun list, Marathi keywords extraction; Marathi rule based named entity recognition etc. These lexical resources are used in pre- processing and processing steps. The three major steps