International Journal of Electrical and Computer Engineering (IJECE) Vol. 13, No. 6, December 2023, pp. 6663~6672 ISSN: 2088-8708, DOI: 10.11591/ijece.v13i6.pp6663-6672 6663 Journal homepage: http://ijece.iaescore.com A hybrid approach for text summarization using semantic latent Dirichlet allocation and sentence concept mapping with transformer Bharathi Mohan Gurusamy 1 , Prasanna Kumar Rengarajan 1 , Parthasarathy Srinivasan 2 1 Department of Computer Science Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India 2 Oracle SGSISTS, Lehi, United States Article Info ABSTRACT Article history: Received Oct 11, 2022 Revised Mar 13, 2023 Accepted Apr 3, 2023 Automatic text summarization generates a summary that contains sentences reflecting the essential and relevant information of the original documents. Extractive summarization requires semantic understanding, while abstractive summarization requires a better intermediate text representation. This paper proposes a hybrid approach for generating text summaries that combine extractive and abstractive methods. To improve the semantic understanding of the model, we propose two novel extractive methods: semantic latent Dirichlet allocation (semantic LDA) and sentence concept mapping. We then generate an intermediate summary by applying our proposed sentence ranking algorithm over the sentence concept mapping. This intermediate summary is input to a transformer-based abstractive model fine-tuned with a multi-head attention mechanism. Our experimental results demonstrate that the proposed hybrid model generates coherent summaries using the intermediate extractive summary covering semantics. As we increase the concepts and number of words in the summary the rouge scores are improved for precision and F1 scores in our proposed model. Keywords: Hybrid model Semantic latent Dirichlet allocation Sentence concept mapping Text summarization Transformer This is an open access article under the CC BY-SA license. Corresponding Author: Bharathi Mohan Gurusamy Department of Computer Science Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham Chennai, India Email: g_bharathimohan@ch.amrita.edu 1. INTRODUCTION Today’s world is full of information, mostly from web articles [1]. Users read the articles on the web based on their requirements and need to process the data further. Users need to read one or more articles many times to understand and comprehend the required information. The main goal of a text summarizer is to apply some methods and natural language processing (NLP) to reduce the original data in text documents. When generating a summary, we reduce the content of the original documents without compromising their main concepts [2]. The summary we generate from a large document helps the user to skim the documents, saving them time. Text summarization is a challenging task that has been studied extensively, and the approaches used for this task can be broadly classified into three categories: extractive, abstractive, and hybrid summarizers [1]. Extractive summarization techniques extract information from the original document’s content and arrange the sentences to provide a summary. Ranking sentences in a document involves statistical and semantic approaches, which assign a weight to each sentence based on its position in the ordered list. In contrast, abstractive summarization approaches aim to create a semantic and meaningful summary by generating new sentences that convey essential information from the original document(s) using