© 2021 IJRAR August 2021, Volume 8, Issue 3 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138) IJRAR21C1604 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 689 AN EXTENSIVE STUDY ON MACHINE LEARNING METHOD BASED CODE CLONE DETECTION TECHNIQUES S. Karthik 1 1 Research Scholar, Department of Computer Science,PSG College of Arts and Science (Autonomous), Coimbatore, Tamilnadu, India. Dr. B. Rajdeepa 2 2 Associate Professor and Head, Department of Information Technology, PSG College of Arts and Science(Autonomous),Coimbatore, Tamilnadu, India. Abstract :Code fragments are reused by software developers through copying pasting with or without slight modifications. As a consequence in software systems, code sections also include very similar sections known as code clones. Code cloning can be harmful in software evolution and maintenance. Additionally, duplicated fragments will greatly increase the amount of work required when adapting or improving code. Various software engineering processes including software evaluation analysis, code quality analysis, plagiarism detection, program understanding, aspect mining, copyright infringement investigation, code compaction and Bug detection can necessitate the extraction of code fragments that are semantically or syntactically identical, making clone detection an important and valuable software analysis process. Various clone detection methods have been proposed over the last decade. In this article, an adequate comprehension of the text, token, tree, Program Dependency Graph (PDG) and machine learning based clone detection techniques. Also, their benefits and limitations are analyzed in a tabular form. Based on the analysis, future direction towards the clone detection is suggested for better software development. IndexTerms - Software development, clone code, clone code detection, duplicated fragments . 1. INTRODUCTION The segment of code typically happen because of copying from a location and then they are rewritten into a new part of code, with or without modifications is called as software cloning [1-2] and the copied code is referred to as clone. Different studies discovered a duplication of code of over 20-59%. The dilemma is that a bug contained in the actual must be examined for the same flaw in each copy. In addition, the copied code extends the work needed to incorporate the code. The analysis of the consistency of the code, duplication identification, facet mining, virus recognition and bug disclosure are also the activities of software design that involve syntactically or semantically similar code to be mined to allow meaningful clone detection to be carried out in software analytics. Generally, clones are created purposely or un-purposely. Baseline clones can be either purposeful or unpurposeful. When clones become a hindrance to software maintenance, they can be deleted or refectory. Clone detection is one of the main issues to concentrate on, as software cloning has emerged as an effective area of research. [3]. Clone classifications are used in expansion reengineering and detection methods. Exact or approximate (based on the form and volume of duplication), contiguous or non-contiguous (based on the contiguity of matching programme elements), maximal or subsumed (based on the size of the detected clone pair), and so on are some of the classifications used for code clones. E Groups of clones include exact (Type 1), renamed or parameterized (Type 2), near miss (Type 3) and semantic clones (Type 4). The exact clones seem like actual code with differences in commentaries and blank spacing. The differences in variables, literal names, keywords are the main factors to generate Type2 clone. Statement insertion, variation, and deletion are used to create near miss clones from base code. The function or action of the clone in semantic clones remains the same, even the software coding or syntax is different. Various strategies for detecting code clones have been established over time by taking into account clone management attempts [4-5]. The techniques are graded according to the data, representation, and algorithms that they employ. Text, token, tree-based, PDG, and machine learning based clone detection are the most common techniques. The main goal of this article is to gain an understanding of the current research in the field of clone detection and to recognize research gaps in terms of merits and demerits to address. It will also aid in the selection of appropriate techniques for code clone detection, as the article includes a comparative study of different techniques based on various parameters. The following is how the rest of this article is organized: The second section discusses the most current techniques for detecting code clones. Section 3 summarizes the entire survey and addresses the survey's potential reach.