ARIMAmmse: An Improved ARIMA-based Software Productivity Prediction Method Li Ruan 1,2 , Yongji Wang 1,3 ,Qing Wang 1 , Fengdi Shu 1 , Haitao Zeng 1,2 ,Shen Zhang 1,2 1 Laboratory for Internet Software Technologies, Institute of Software, The Chinese Academy of Sciences, Beijing 100080, China 2 Graduate University, The Chinese Academy of Sciences, Beijing 100039, China 3 Key Laboratory for Computer Science, The Chinese Academy of Sciences, Beijing 100080, China {ruanli,ywang,wq, fdshu,zenghaitao,zhangshen}@ itechs.iscas.ac.cn Abstract Productivity is a critical performance index of process resources. As successive history productivity data tends to be auto-correlated, time series prediction method based on Auto-Regressive Integrated Moving Average (ARIMA) model was introduced into software productivity prediction by Humphrey et al. In this paper, a variant of their prediction method named ARIMAmmse is proposed. This variant formulates the ARIMA parameter estimation issue as a minimum mean square error (MMSE) based constrained optimization problem. The ARIMA model is used to describe constraints of the parameter estimation problem, while MMSE is used as the objective function of the constrained optimization problem. According to the optimization theory, ARIMAmmse will definitely achieve a higher MMSE prediction precision than Humphrey et al’s which is based on the Yule-Walk estimation technique. Two comparative experiments are also presented. The experimental results further confirm the theoretical superiority of ARIMAmmse. 1. Introduction Developing reliable and high quality software requires a well-coordinated and executed software process. From 1980s, software process technology [1, 2] has emerged as a new discipline to develop software systems with expected quality (e.g. dependability and security) requirements. Productivity is a critical performance index of software process resources. Precise software productivity prediction forms the first premise to achieve optimal resource schedule, task assignment and cost control [2]. However, current software process is a highly evolving process with often- changing technologies and development methods [3]. This requires productivity prediction method to achieve a balance between forecasting stability and responsiveness to changing conditions. Furthermore, Humphrey et al in Carnegie Mellon University Software Engineering Institute (CMU-SEI) stated in [1] that “Because of the nature of software work, the successive development times for individual programmer or programming teams will also be auto- correlated. This is because of an underlying learning process which tends to improve successive development productivities.” Therefore, the productivity data generated by software process usually displays patterns, time-dependent and successive characteristics [4]. However, most of the current typical productivity prediction methods (e.g. CORBRA [5, 6], etc.) are not under the auto- correlated successive assumption. [7] proposes a learning curve-based productivity prediction method that takes the dynamic characteristics into consideration. But the structure of the learning curve is pre-defined. Time Series Analysis (TSA) method based on ARIMA model [4], which has already achieved successful application in finance and industrial control, was introduced into software process field for software productivity prediction by Humphrey et al in [1]. Parameter estimation is a critical step to establish an ARIMA model. Inaccurate estimation will result in large prediction variance and low prediction precision. The Yule-Walk technique, which was adopted by Humphrey et al in [1], is one of the most typical parameter estimation techniques [4]. In the Yule-Walk technique, first, a set of auto-correlation function values of the series are calculated. Next, according to the relationship between the parameters and the auto- correlation function values embedded in the Yule- Walk equation, parameters are estimated through the solution to the equation. Although this technique is easy to use, many studies (e.g. [4] ) proved that it Proceedings of the 30th Annual International Computer Software and Applications Conference (COMPSAC'06) 0-7695-2655-1/06 $20.00 © 2006