The use of random graph theory to assess the quality of sequential source code check-ins Mahir Arzoky 1 , Stephen Swift 1 , Steve Counsell 1 and James Cain 2 Brunel University, Middlesex, UK {mahir.arzoky, stephen.swift, steve.counsell}@brunel.ac.uk Quantel Limited, Newbury, UK james.cain@quantel.com Abstract. Software module clustering is the problem of automatically partition- ing the structure of a software system using low-level dependencies in the source code to understand and improve the system's architecture. Munch, a clustering tool based on search-based software engineering techniques, was used to modularise a unique dataset of sequential source code software ver- sions. This paper investigates whether the dataset used for the modularisation resembles a random graph by computing the probabilities of observing certain connectivity. Modularisation will not be possible with data that resembles ran- dom graphs. Thus, this paper demonstrates that our real world time-series da- taset does not resemble a random graph except for small sections where there were large maintenance activities. Furthermore, the random graph metric can be used as a tool to indicate areas of interest in the dataset, without the need to run the modularisation. Keywords: software module clustering; modularisation; SBSE; random graph; time-series; fitness function 1 Introduction Large software systems tend to have complex structures that are often difficult to comprehend due to the large number of modules (classes) and inter-relationships that exist between them. As the modular structure of a software system tends to decay over time, it is important to modularise. Modularisation is the process of partitioning the structure of software system into subsystems. It makes the problem at hand easier to understand, as it reduces the amount of data needed by developers [7]. Subsystems group together related source-level components to assist with system's understanda- bility. Subsystems can be organised hierarchically to allow developers to navigate through the system at various levels of details, they include resources such as mod- ules, classes and other subsystems [7]. Graphs can be used to make the software structure of complex systems more com- prehensible [17]. They can be described as language-independent, whereby compo-