Turkish Journal of Computer and Mathematics Education Vol.12 No 13 (2021), 5707-5719 Research Article 5707 Hybrid System for Plagiarism Detection on A Scientific Paper Farah K. AL-Jibory, Mohammed S. H. Al- Tamimi* a Post graduate Student, Dept. of Computer Science, College of Science, University of Baghdad, IRAQ Abstract: Plagiarism Detection Systems are critical in identifying instances of plagiarism, particularly in the educational sector whenever it comes to scientific publications and papers. Plagiarism occurs when any material is copied without the author's consent or attribution. To identify such acts, thorough knowledge of plagiarism types and classes is required. It is feasible to detect several sorts of plagiarism using current tools and methodologies. With the advancement of information and communication technologies (ICT) and the availability of online scientific publications, access to these publications has grown more convenient. Additionally, with the availability of several software text editors, plagiarism detection has become a crucial concern. Numerous scholarly articles have previously examined plagiarism detection and the two most often used datasets for plagiarism detection, WordNet and the PAN Dataset. The researchers described verbatim plagiarism detection as a straightforward case of copying and pasting, and then shed light on clever plagiarism, which is more difficult to detect since it may involve original text alteration, borrowing ideas from other studies, and Other scholars have said that plagiarism can obscure the scientific content by substituting terms, deleting or introducing material, rearranging or changing the original publications. The suggested system incorporated natural language processing (NLP) and machine learning (ML) techniques, as well as an external plagiarism detection strategy based on text mining and similarity analysis. The suggested technique employs a mix of Jaccard and cosine similarity. It was examined using the PAN-PC-11 corpus. The proposed system outperforms previous systems on the PAN-PC-11, as demonstrated by the findings. Additionally, the proposed system obtains an accuracy of 0.96, a recall of 0.86, an F-measure of 0.86, and a PlagDet score of 0.86. (0.86). 0.865 and the proposed technique is substantiated by a design application that is used to detect plagiarism in scientific publications and generate non- medication notifications. Portable Document Format (PDF) . Keywords:Natural language processing, Machine Learning, text mining technic ,External plagiarism detection, Plagiarism detection. 1. Introduction Plagiarism is a complicated and ethically difficult issue that simply refers to the act of stealing and publishing another author's work under one's own name without recognizing the original author (Miguel.R 2015). Plagiarism is a form of fraud. Authors should properly recognize sources in order to adhere to ethical standards, and plagiarism is a failure to do so. However, the writers' pupils occasionally fail to properly credit the source. These problems are primarily the result of a lack of information on correct citation usage. Thus, plagiarism should be avoided to maintain ethics (Miguel.R.2006) Perhaps the best definition of plagiarism is "the unacknowledged copying of papers or programs(Asif.E.etal.,2012). Thus, it is necessary to be resolute in one's resistance. Plagiarism, on the other hand, isn't only a problem in academia; it affects nearly every industry. Plagiarism can happen by mistake, but most of the time it is the result of a deliberate procedure (Durga & Venu 2014). The problem of plagiarism has lately been more prevalent as a result of the digital era of materials available on the World Wide Web (WWW). Plagiarism detection (PD) in natural languages using statistical or automated approaches began in the 1990s, with investigations on copy detection mechanisms in digital texts as a forerunner (Methieu & Michal 2008). Since the 1970s, investigations to identify computer code plagiarism in the Pascal and C languages have been conducted to identify code clones and software abuse (Xie.R 2018). To prevent plagiarism, a huge number of researchers have spent decades developing software detection systems (Hussaim & Dhrub 2018) . Initially, plagiarism was identified manually (by hand) or through resemblance to previously consulted content. Today, the abundance of available internet materials makes manual detection more difficult. As a result, the development of automatic plagiarism detectors is critical(Mayank & Dilip 2017) (Efstathio.S .2011) . 2.Review Of Related Studies In(Parth Gupta. et al., 2011). The proposed system focuses on the importance of paraphrases in detecting plagiarism, both monolingually and cross-lingually. To investigate the detection challenges, The authors examined the efficacy of an external plagiarism detection system based on the Vector Space Model (VSM) on the PAN-PC-2011 corpus. The system employed only 250 documents as candidate documents and 20 documents as suspect documents. And the outcome of Monolingual Simulated Plagdet Score (0.0524298), Recall (0.0293390), Precision (0.3780321), and Granularity (1.0541872), and When used in conjunction with any synonym addition mechanism, such as the thesaurus, dictionary, or wordnet, this strategy may be more effective. In (Asif Ekbal .et al., 2012) offer a method for detecting external plagiarism based on the classic VSM and n- gram language model techniques. The proposed system's methodology is comprised of four major components. In the first step, all texts are processed to create tokens and lemmas, as well as to identify Part-of-Speech (PoS) classes, character offsets, sentence numbers, and Named-Entity (NE) classes. The documents are then forwarded to the pipeline's second stage. Select a subset of documents that may be potential sources of plagiarism in the