Clustering and Local Regression in Object Oriented Metrics Gabriel Jarillo, Giancarlo Succi, and Witold Pedrycz Electrical and Computer Engineering, University of Alberta Edmonton, Alberta T6G 2G7, Canada ABSTRACT This paper gives a brief review of clustering and local regression techniques; we are mainly focused on its implementation to software engineering data and we present an example of the preliminary results using clustering and local regression. The clustering, and local regression are part of the data processing of a project called Analysis of Software Engineering Data Using Computational Intelligence Techniques. The project takes advantage of Genetic Algorithms (GA), Genetic Programming (GP), and Local Regression to find models to describe the data we are dealing with. The clustering technique is used to find similar structures in the dataset before it is introduced to the GA, GP, and Local Regression algorithms. The final goal of this work is to build mathematical models to determine the number of fixes in software projects. Number of fixes is our dependent variable while CK-Metrics and lines of code (LOC) are the independent variables of the system. Keywords: Genetic Algorithms (GA), Genetic Programming (GP), Clustering, CK-Metrics, and Local Regression. 1 INTRODUCTION The accurate estimation of software development effort has major implications for the management of software development in the industry. Underestimates lead to time pressures that may compromise full functional development and thorough testing of the software product. On the other hand, overestimates can result in over allocation of development resources and personnel [1]. Organizations are wondering how they can determine the quality of their software before it is used. Generally there are three approaches to do so [2]: 1. Predicting the number of defects in the system. 2. Estimating the reliability of the system in terms of time and failure. 3. Understanding the impact of the design and testing processes on defect counts and failure densities. Knowing the quality of the software allows the organization to estimate the amount of resources to be invested on its maintenance. Software maintenance is a factor that consumes most of the resources in many software organizations [3]; therefore, it’s worth it to be able to characterize, assess and predict defects in the software at early stages of its development in order to reduce maintenance costs. The following chart depicts a general diagram of the computational intelligence techniques that are mostly used for this type of software engineering data. Figure 1: Computational Intelligence Approaches on Software Engineering Data. In general, we can say that the statistical models use a pre-defined model that is adapted to our own data by adjusting some parameters; these models will always give the same result if we use the same input G. Jarillo, G. Succi, W. Pedrycz (July 2001) “Clustering and Local Regression in Object Oriented Metrics.” Proceedings of the 5th World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, Florida.