Computer Engineering and Intelligent Systems www.iiste.org ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online) Vol.6, No.2, 2015 43 Optimizing Software Clustering using Hybrid Bee Colony Approach Kawal Jeet (Corresponding author) Assistant Professor, Department of Computer science, D.A.V. College Jalandhar, India E-mail: kawaljeet80@yahoo.com Abstract Maintenance of software is the most expensive and complicated phase of the software development lifecycle. It becomes more cumbersome if the architecture of the software system is not available. Search-based optimization is found to be a technique very efficient in recovering the architecture of such a system. In this paper, we propose a technique which is based on a combination of artificial honey bee swarm intelligent algorithm and genetic algorithm to recover this architecture. In this way, it will be very helpful to software maintainers for efficient and effective software maintenance. In order to evaluate the success of this approach, it has been applied to a few real-world module clustering problems. The results we obtained support our claim that this approach produces architecture significantly better than the existing approaches. Keywords: Artificial bee colony algorithm, Genetic algorithm, Software clustering, Software Modularization. 1. Introduction The maintenance and evolution of a software system is a most cumbersome, costly and time-consuming task (Schneidewind 1987). This problem is further enhanced if the system is poorly documented or not documented at all (Perry and Wolf 1992; Shaw and Garlan 1996). Sometimes a documented architecture becomes outdated due to regular changes that are made to the system as a consequence of changing customer requirements (Eick et al. 2001). Apparently, the software maintainers need software architecture for efficient and effective maintenance of the software. So, there must be a way to identify this architecture from the source code of the software system if it is not available. A software system is composed of modules which could be a class, or variables which are related to each other due to procedure calls, inheritance relationships, variable references, etc. The syntactic structure of these systems can be represented as a graph called a Module Dependency Graph (MDG) where the nodes are the modules and edges are the relations between the modules. These MDGs could be retrieved by parsing the source code to determine the modules of the software system and relationship between these modules. Large numbers of source code analysis tools (http://depfind.sourceforge.net, http://source.valtech.com/display/dpm/Dependometer, https://drewnoakes.com/code/dependency-analyser/) are available that could be used to retrieve these MDGs. In order to identify the architecture of the system, the researchers in the reverse engineering community have been developing clustering tools. Creating appropriate cluster partition of an MDG is NP hard because the number of possible partitions is very large even for a small graph (Mancoridis, 1998). So, automated assistance to partition MDGs is required that would help system maintainers to efficiently work in the absence of original design documentation (Harman, 2007). According to Tzerpos and Holt (Tzerpos and Holt 1998), it is beneficial for the software maintainers to use the clustering techniques that are available rather than re-engineer the software from scratch. In this paper, we use a technique which is a combination of Artificial Bee Colony (ABC) (Karaboga, 2007 ;Karaboga, 2012; Karaboga, 2011;Yan, 2012) and Genetic Algorithm (GA) (Goldberg, 2006) and is called Genetic Bee Colony algorithm (GBC). It automatically finds a good partition of a system’s MDG. This approach treats software partition as a search-based optimization problem in which the aim is to find the best possible partition. To the best of the authors’ knowledge, this is the first time that bee colony algorithm has been applied for software clustering. 2.Related Work Wiggerts (Wiggerts 1997) introduced clustering techniques quite well that have been successfully applied to system modularization. Similar to the technique followed in this paper, various other clustering techniques like Rigi (Müller et al. 1993) and Arch (Schwanke 1991) work in a bottom-up fashion and produce the architecture of the software system by using its source code only. The main shortcoming of these tools is that they need key involvement of the user. Various other search-based optimization techniques have been successfully used for partitioning of MDGs. One such remarkable one in this field is the BUNCH tool (Mancoridis, 1999). This tool is based on the optimization of an objective function Modularization Quality (MQ) (Mitchell, 2002). The major goal of MQ is to find a balance between cohesion and coupling. So, the larger the MQ, the better is the partition of the MDG and