I.J. Information Technology and Computer Science, 2021, 1, 1-17 Published Online February 2021 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2021.01.01 Copyright © 2021 MECS I.J. Information Technology and Computer Science, 2021, 1, 1-17 Duration Estimation Models for Open Source Software Projects Donatien Koulla Moulla 1,2 1 Faculty of Mines and Petroleum Industries, University of Maroua, Maroua, P.O. Box 46, Cameroon 2 LaRI Lab, University of Maroua, Maroua, P.O. Box 814, Cameroon E-mail: moulladonatien@gmail.com, donatien-koulla.moulla@univ-maroua.cm Alain Abran Department of Software Engineering and Information Technology, École de Technologie Supérieure, 1100, rue Notre- Dame Ouest, Montréal, Québec, Canada H3C 1K3 E-mail: alain.abran@etsmtl.ca Kolyang The Higher Teachers’ Training College, University of Maroua, Maroua, P.O. Box 46, Cameroon E-mail: kolyang@cde-saare.de Received: 28 April 2020; Accepted: 04 July 2020; Published: 08 February 2021 Abstract: For software organizations that rely on Open Source Software (OSS) to develop customer solutions and products, it is essential to accurately estimate how long it will take to deliver the expected functionalities. While OSS is supported by government policies around the world, most of the research on software project estimation has focused on conventional projects with commercial licenses. OSS effort estimation is challenging since OSS participants do not record effort data in OSS repositories. However, OSS data repositories contain dates of the participants’ contributions and these can be used for duration estimation. This study analyses historical data on WordPress and Swift projects to estimate OSS project duration using either commits or lines of code (LOC) as the independent variable. This study proposes first an improved classification of contributors based on the number of active days for each contributor in the development period of a release. For the WordPress and Swift OSS projects environments the results indicate that duration estimation models using the number of commits as the independent variable perform better than those using LOC. The estimation model for full-time contributors gives an estimate of the total duration, while the models with part-time and occasional contributors lead to better estimates of projects duration with both for the commits data and the lines of data. Index Terms: Data Repositories, Duration Estimation, Estimation Models, Open Source Software Project Estimation, Regression Models. 1. Introduction Open Source Software (OSS) provides significant technical and economic benefits for a multitude of organizations, large and small, that rely on OSS to develop customer solutions and products [1,2,3]. Estimation, in particular, is challenging for OSS projects due to the collaborative nature of OSS, which is developed and maintained by distributed communities of contributors from all over the world. According to Asundi [4], planning and delivering projects based on an Open Source (OS) distributed community is challenging since resource allocation and budgeting lack a rigorous basis: there is generally no formal project management, budget or schedule. For this reason, he argues that using existing effort estimation models for OSS projects has many disadvantages and therefore, new models are needed. In particular, he notes that information about when functionalities will be available is of major interest to organizations that have adopted OSS as their business strategy. There is a large number of studies on software effort estimation but very few discuss effort and duration in OSS projects. Effort data is not available in OSS repositories and the categorization of contributors and the size and choice of the datasets also present a challenge for research on OSS projects estimation. To date, in OSS effort estimation studies researchers have had to use substitutes for the missing effort variable, which substitutes had a very weak methodological basis. However, for the duration estimation, this variable can be directly computed from projects calendar dates. Therefore, this research investigates exclusively OSS duration estimation models. An OSS source code management repository is an online community platform that makes project history data