Appl. Sci. 2022, 12, 6584. https://doi.org/10.3390/app12136584 www.mdpi.com/journal/applsci Article Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm Arti Jain 1, *, Anuja Arora 1 , Jorge Morato 2 , Divakar Yadav 3 and Kumar Vimal Kumar 1 1 Department of CSE, Jaypee Institute of Information Technology, Noida 201309, India; anuja.arora29@gmail.com (A.A.); vimalkumar.k@gmail.com (K.V.K.) 2 Computer Science, Universidad Carlos III de Madrid, 28911 Leganes, Spain; jmorato@inf.uc3m.es 3 Department of CSE, NIT Hamirpur, Hamirpur 177005, India; divakar.yadav0@gmail.com * Correspondence: ajain.jiit@gmail.com; Tel.: +919313519476 Featured Application: This paper provides applicability of the Real Coded Genetic Algorithm to the Natural Language Processing Task, i.e., Text Summarization. The purpose of text summari zation is to reduce an extensive document into a concise format such that the essence of the con tent is retained. By doing so, users can utilize the summarized document for vivid applications such as Question Answering, Machine Translation, Fake News Detection, and Named Entity Recognition to name a selected few. Abstract: In the present scenario, Automatic Text Summarization (ATS) is in great demand to ad dress the evergrowing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology com prises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing fea tures, namelysentence similarity and named entity features are combined with others for compu ting the evaluation metrics. The top 14 feature combinations are evaluated through RecallOriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulat ing Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the cor pus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%. Keywords: automatic text summarization; extractive summary; feature set; Hindi language; Hindi health data; named entity; real coded genetic algorithm; ROUGE metric; summarization tool 1. Introduction Automatic Text Summarization (ATS) [1,2] is a process to generate a summary while preserving the essence, by eliminating irrelevant or redundant content from the text. ATS provides vital information in a much shorter version, usually reduced to less than half of the length of the input text. It remedies the challenge of information overload and helps in information retrieval tasks. ATS provides concise information with reduced redun dancy [3] in an effective manner related to news articles [4], emails, official government documents, and many more. In generality, ATS utilizes either an extractive summary [5] or an abstractive summary [6]. An extractive summary is generated while selecting essen tial sentences from the given textual document. The sentence selection criteria are based on the textʹs statistical parameters and linguistic features to combine those sentences into the final summary. On the other hand, an abstractive summary is generated by consider ing into the more profound understanding of semantics for the given textual document. Citation: Jain, A.; Arora, A.; Morato, J.; Yadav, D.; Kumar, K. V. Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm. Appl. Sci. 2022, 12, 6584. https://doi.org/10.3390/app12136584 Academic Editors: Julian Szymanski, Higinio Mora, Doina Logofătu and Andrzej Sobecki Received: 20 April 2022 Accepted: 26 June 2022 Published: 29 June 2022 Publisher’s Note: MDPI stays neu tral with regard to jurisdictional claims in published maps and institu tional affiliations. Copyright: © 2022 by the authors. Li censee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and con ditions of the Creative Commons At tribution (CC BY) license (https://cre ativecommons.org/licenses/by/4.0/).