Digital Humanities 2023 Using text summarization models to improve digital reading of scientific papers Mastrobattista, Ludovica l.mastrobattista@usal.es University of Salamanca, Spain Alrahabi, Motasem motasem.alrahabi@gmail.com Sorbonne Université, France Fedchenko, Valentina valentina.fedchenko@sorbonne-universite.fr Sorbonne Université, France Jomaa, Oussama oussama.jomaa@fabriquenumerique.fr La fabrique numérique, France Gawley, James james.gawley@sorbonne-universite.fr Sorbonne Université, France Cordova, Johanna johanna.cordova@sorbonne-universite.fr Sorbonne Université, France Roe, Glenn glenn.roe@sorbonne-universite.fr Sorbonne Université, France Abstract: This paper presents an evaluation and comparison of three state-of-the-art models for text summarization, and proposes a new digital reading interface designed for neophyte users to ex- ploit these models, as well as automatic keyword extraction, with little or no programming experience. Keywords: digital reading, text summarization, keyword extrac- tion, qualitative evaluation. The scale of peer-reviewed publications in many fields is ra- pidly becoming unmanageable (Atlasocio.com, 2019). As a result, academic researchers need new solutions to stay abreast of deve- lopments in their fields and take advantage of the latest research. Automatic text summarization (El-Kassas et al., 2021) represents an important opportunity to leverage automated reading techni- ques at scale, providing an overview of the study object, method and results, and to orient readers in the text even before reading it (Overstreet, 2021). This paper presents an evaluation and com- parison of three state-of-the-art models for text summarization, and proposes a web interface designed for neophyte users to ex- ploit these models without resorting to scripting or using the com- mand-line interface. This study is part of a larger research project aimed at exploring the potential of digital environments that in- corporate computational tools in order to provide alternative ap- proaches to reading and improve the reader's performance and di- gital practice in an academic context. Using a corpus of open-access academic articles related to the fields of communication and education from the journal Comuni- car: Scientific Journal of Media Education 1 , we evaluated the abs- tract summarizations produced by three existing models: BART (Lewis et al., 2019), PEGASUS (Zhang et al., 2020) and T5 (Raf- fel et al., 2020). For the corpus construction, the length of the text to be summarized was not taken into account, but rather its struc- ture. The three models automatically generated texts with a length of 2-3 sentences. We determined the evaluation criteria according to what in our opinion is a proper text assessment, which consists of three intrinsic measurements (Mani, 2001): the quality of the content, syntactic and morphological validity, and the accuracy of vocabulary. Based on these considerations, we conducted our eva- luations using five rating levels developed according to the Mean Opinion Score (MOS) scale (Streijl et al., 2016; Iskender et al., 2021). This rating scale is typically expressed as a number bet- ween 1 (poor) and 5 (excellent). Scores for each criteria were ap- plied and the unweighted average of these represented the overall quality of the summary. Our evaluation 2 of 21 articles shows that the T5 model best sum- marizes the content of each section of the documents, according to the IMRaD structure (Sollaci & Pereira, 2004). T5 generated new data, i.e. sentences, from the original text extracting essen- tial information and rendering the general sense of the source text coherently. On the other hand, the BART and PEGASUS models reached quite similar scores, that is around 3.5 for each summa- rized part of the paper. This means that the extracted summaries from both models present a level of quality between "fair" and "good", according to the MOS scale. Regarding the accuracy of vocabulary style, we noted that the majority of the extracted sum- maries copy-and-pasted sentences from the original text. In order to take advantage of the affordances of a digital rea- ding platform, allowing users to orient themselves in a text before reading it, we thought it would be interesting to cross-reference the abstract with another source of knowledge, i.e., the associated keyword for each part of the paper. In this way, an overview of the terminology of the text at hand is delivered by the combination of two sources: the summary and related keywords. KeyBERT, a keyword extraction method using the BERT embeddings model (Devlin et al., 2018), was used for this purpose. A second evalua- tion on seven articles shows that this option may improve the con- textualisation of the article and the reader's orientation on the to- pic in question by providing relevant information present in the text from dual sources, the summary and the related keywords. In some cases, the keywords extracted from each part of the article contain multiple text-related terms that can then enrich the sum- mary content. To make these models available for testing and use by resear- chers without a strong technical background, we have created a freely accessible online interface 3 . This tool allows users to upload an article in XML format and receive an automatic summary of each section of the article, with the related keywords (the interface also allows for the summarization of plain text but in this case ar- ticle titles are not taken into account). Different parameters can be selected to modify the size of the summary. Our main scientific contribution in this paper is thus the quali- tative evaluation of three important models for automatic summa- rization; and the development of an easy-to-use interface allowing researchers to experiment with these models through summaries generated along with keywords for each section of the analyzed article, that we believe this is a crucial step in the case of scienti- fic papers. We aim to evaluate these models on other sources of 1