Volume 20, 2023 Accepting Editor: Eli Cohen │ Received: January 25, 2023 │ Revised: May 24, 2023 │ Accepted: June 4, 2023 Cite as: Liebeskind, C., & Valnait-Oleškevičien, G. (2023). Corpus processing of multi-word discourse markers for advanced learners. Issues in Informing Science and Information Technology, 20, 149-169. https://doi.org/10.28945/5144 (CC BY-NC 4.0) This article is licensed to you under a Creative Commons Attribution-NonCommercial 4.0 International License. When you copy and redistribute this paper in full or in part, you need to provide proper attribution to it to ensure that others can later locate this work (and to ensure that others do not accuse you of plagiarism). You may (and we encour- age you to) adapt, remix, transform, and build upon the material for any non-commercial purposes. This license does not permit you to use this material for commercial purposes. CORPUS PROCESSING OF MULTI-WORD DISCOURSE MARKERS FOR ADVANCED LEARNERS Chaya Liebeskind* Jerusalem College of Technology , Jerusalem, Israel liebchaya@gmail.com Giedr Valnait-Oleškevičien Mykolas Romeris University , Vilnius, Lithuania gvalunaite@mruni.eu * Corresponding author ABSTRACT Aim/Purpose The most crucial aspects of teaching a foreign language to more advanced learners are building an awareness of discourse modes, how to regulate dis- course, and the pragmatic properties of discourse components. However, in dif- ferent languages, the connections and structure of discourse are ensured by dif- ferent linguistic means which makes matters complicated for the learner. Background By uncovering regularities in a foreign language and comparing them with pat- terns in one’ s own tongue, the corpus research method offers the student unique opportunities to acquire linguistic knowledge about discourse markers. This paper reports on an investigation of the functions of multi-word discourse markers. Methodology In our research, we combine the alignment model of the phrase-based statistical machine translation and manual treatment of the data in order to examine Eng- lish multi-word discourse markers and their equivalents in Lithuanian and He- brew translations by researching their changes in translation. After establishing the full list of multi-word discourse markers in our generated parallel corpus, we research how the multi-word discourse markers are treated in translation. Contribution Creating a parallel research corpus to identify multi-word expressions used as dis- course markers, analyzing how they are translated into Lithuanian and Hebrew, and attempting to determine why the translators made the choices add value to corpus-driven research and how to manage discourse. Findings Our research proves that there is a possible context-based influence guiding the translation to choose a particle or other lexical item integration in Lithuanian or Hebrew translated discourse markers to express the rhetorical domain which could be related to the so-called phenomenon of “over-specification. ”