Galley Proof 2/07/2020; 14:35 File: his–1-his200284.tex; BOKCTP/xjm p. 1 International Journal of Hybrid Intelligent Systems -1 (2020) 1–13 1 DOI 10.3233/HIS-200284 IOS Press Hybrid plagiarism detection method for French language Maryam Elamine * , Seifeddine Mechti and Lamia Hadrich Belguith University of Sfax, Sfax, Tunisia Abstract. With the growth of the content found throughout the Web, every information can be plagiarized. Plagiarism is the process of using the ideas of another without naming the source. Consequently, plagiarism detection is necessary but complicated as it is often facing significant challenges given the large amount of material on the World-wide-web and the limited access to a substantial part of them. In this paper, we present a novel plagiarism detection method for French documents. The proposed method combines the intrinsic and extrinsic aspects for plagiarism detection. We achieved good results with both approaches. For the extrinsic method, we achieved an accuracy of 62% for the first tests of the method. As for the intrinsic, we achieved an F-score of 0.328. Keywords: Intrinsic plagiarism detection, extrinsic plagiarism detection, embeddings, style-breach 1. Introduction 1 The expansion of the media, including the Internet 2 has made it feasible to come by a numerous amount of 3 data [11]. In fact, researchers around the world have 4 access to a wide range of information via the Internet as 5 it represents a much easier and faster method to acquire 6 knowledge [16]. Hence, this ease of access remains a 7 threat to the integrity of information and copyrights. 8 Plagiarism is the unacknowledged reuse of others’ ideas 9 or text without giving proper credit [8]. Plagiarism de- 10 tection approaches have a long history of attempts to 11 improve their performance in detecting text misuse [21]. 12 Actually, the plagiarism detection task is one of the ac- 13 tive research topics in computational Natural Language 14 Processing (NLP). It has already attracted broad inter- 15 est and multiple international competitions have been 16 convened since 2009. This task aims to detect reuse, 17 reproduction and/or modification of text from one doc- 18 ument to another [36]. As a matter of fact, plagiarism 19 is considered a major problem in the modern world 20 since it affects many domains including education and 21 * Corresponding author: Maryam Elamine, ANLP-RG, MIRACL laboratory, University of Sfax, Sfax, Tunisia. E-mail: mary.elamine @gmail.com. research [13]. Therefore, plagiarism detection systems 22 are becoming a necessity. Furthermore, the plagiarism 23 that occurs in academic research is the most critical and 24 requires more attention to identify [34]. Consequently, 25 detecting plagiarism is a continuous concern within 26 academia, and the last two decades have witnessed re- 27 markable advances in automatic plagiarism detection 28 tools [18]. While the concept of plagiarism is not new, 29 the way that individuals plagiarize has changed [5]. 30 Actually, we can distinguish multiple forms of plagia- 31 rism [6,15]: Firstly, we have Copy/paste, which rep- 32 resents the act of copying word for word a part of a 33 text without proper citation of the author. Secondly, we 34 can find Paraphrasing, in which the copied segment is 35 modified but the idea and some words stay the same. 36 Thirdly, we have Idea plagiarism, where the same idea 37 is expressed using di?erent words or a different lan- 38 guage. Finally, we have Authorship plagiarism, which 39 is the case of obtrusively putting one’s name to some- 40 one else’s work. As a matter of fact, the rapid evolution 41 of information content has made the field of scientific 42 research so vulnerable to plagiarism [38]. Indeed, it 43 has strong negative impacts on academia and the pub- 44 lic [37]. We can single out two types of detections: First, 45 extrinsic plagiarism detection, which performs a com- 46 parison between a source document and a collection of 47 1448-5869/20/$35.00 c 2020 – IOS Press. All rights reserved uncorrected proof version