DRAFT to ARPN JOURNAL - 2015 CODE COMMENT EXTRACTION FOR SOFTWARE MAINTENANCE PROCESS USING REGULAR EXPRESSION STUDY AT ITATS SURABAYA Andy Rachman 1 , Nanang Fakhrur Rozi 2 and Ari Faradisa 3 1 Department of Informatics Engineering, Adhi Tama Institute of Technology Surabaya, Indonesia, Email: andyrachman248@gmail.com 2 Department of Informatics Engineering, Adhi Tama Institute of Technology Surabaya, Indonesia, Email: nfrozy@gmail.com 3 Department of Informatics Engineering, Adhi Tama Institute of Technology Surabaya, Indonesia, Email: faradisaari91@gmail.com ABSTRACT Software engineering is the study and engineering application in the design process, development and maintenance of software. The maintenance is a highly complex activity for developers or students of Informatics department. It costs by 40% to 80% of the overall engineering process. It mostly done for not well-structured programs or domain knowledge and insufficient documentation. In this study, we tried to do program slicing on Java and C/C++ code. It is done by dividing the program into code and comments. The resulting comments will be used to understand the program. We use regular expression to slice the program. It is a pattern-based technique for text manipulation. Code comment extraction can be accelerated using regular expression. This research outcome was the creation of open source application so that it can be used by Informatics-ITATS students or others to assist in understanding the program. The precision and recall obtained was 100% in code comments extraction. Keywords: regular expression, software maintenance, program slicing, program comprehension. INTRODUCTION Background Institut Teknologi Adhi Tama Surabaya (ITATS) is one of private university in Surabaya which owns the Informatics Department. One of the subjects that are taught at the department is Software Engineering. This course aims to make students able to design, build and handle errors that occur in the software. In order to make a good software, students can do two things, reading books related to programming or creating programs, either simple or complex program. In program creation, students can do two things also, writing their own program or downloading program from the Internet. If students choose to download the program, they must be able to read and understand someone else programs. Software engineering is an activity that consists of development process, design, and maintenance of a software. The focus of software engineering is the obtainment of software quality. In addition, by implementing the software engineering activities, developers or students will be able to detect errors that occur in software. Therefore, the process of understanding the program, which is contained in maintenance activity, is a very important process in software engineering [1]. Program comprehension is one of the most important part of software maintenance. It is required when the developers perform a maintenance, reuse, re- engineer, or evolve a program. Much research has been done in providing instructions and supporting the software development process [2]. The understanding activity closely related to software maintenance. It requires more than 50% of the overall performance of programmer in maintenance activity that ranged from 40% to 80%. It is due to several factors, one of them is the misconception between programmers against the given task [3]. Regular expression is a notation used to represent a pattern of text. It is available in many programming languages like PHP, Java, and Python. Regular expression can be used to get a specific area of text which has a pattern [4]. In this research, the regular expression is used to extract the comments within the source code. Based on the above background, we take the software maintenance research topic. This is because the topic is still new and has not been deeply explored in ITATS. This research is expected to help students and the society in understanding the program in the form of open source application. The aim of the research This research is aimed to: Getting the comments from open source program in the software maintenance process. Using a regular expression in the extraction of program comment. Software Engineering Computer is divided into two main parts, hardware and software [5]. The hardware is divided into three parts, the physical layer, micro-programming, and machine language [6]. The software is divided into two main parts, the operating system and the application. Operating system is a software that regulates the system itself and controlling the hardware as well as providing space for running existing applications [6]. The application is a program that was built to solve the user problems [7][8][9]. Software is not like hardware that gets damaged or decreased performance due to the time, dust, as well as usage. The software has very different characteristics to the hardware, i.e. [10]: a. The software is a development or engineering activity, unlike hardware that mass-produced. b. The software is never obsolete while hardware will