I.J. Information Technology and Computer Science, 2021, 1, 1-17
Published Online February 2021 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijitcs.2021.01.01
Copyright © 2021 MECS I.J. Information Technology and Computer Science, 2021, 1, 1-17
Duration Estimation Models for Open Source
Software Projects
Donatien Koulla Moulla
1,2
1
Faculty of Mines and Petroleum Industries, University of Maroua, Maroua, P.O. Box 46, Cameroon
2
LaRI Lab, University of Maroua, Maroua, P.O. Box 814, Cameroon
E-mail: moulladonatien@gmail.com, donatien-koulla.moulla@univ-maroua.cm
Alain Abran
Department of Software Engineering and Information Technology, École de Technologie Supérieure, 1100, rue Notre-
Dame Ouest, Montréal, Québec, Canada H3C 1K3
E-mail: alain.abran@etsmtl.ca
Kolyang
The Higher Teachers’ Training College, University of Maroua, Maroua, P.O. Box 46, Cameroon
E-mail: kolyang@cde-saare.de
Received: 28 April 2020; Accepted: 04 July 2020; Published: 08 February 2021
Abstract: For software organizations that rely on Open Source Software (OSS) to develop customer solutions and
products, it is essential to accurately estimate how long it will take to deliver the expected functionalities. While OSS is
supported by government policies around the world, most of the research on software project estimation has focused on
conventional projects with commercial licenses. OSS effort estimation is challenging since OSS participants do not
record effort data in OSS repositories. However, OSS data repositories contain dates of the participants’ contributions
and these can be used for duration estimation. This study analyses historical data on WordPress and Swift projects to
estimate OSS project duration using either commits or lines of code (LOC) as the independent variable. This study
proposes first an improved classification of contributors based on the number of active days for each contributor in the
development period of a release. For the WordPress and Swift OSS projects environments the results indicate that
duration estimation models using the number of commits as the independent variable perform better than those using
LOC. The estimation model for full-time contributors gives an estimate of the total duration, while the models with
part-time and occasional contributors lead to better estimates of projects duration with both for the commits data and the
lines of data.
Index Terms: Data Repositories, Duration Estimation, Estimation Models, Open Source Software Project Estimation,
Regression Models.
1. Introduction
Open Source Software (OSS) provides significant technical and economic benefits for a multitude of organizations,
large and small, that rely on OSS to develop customer solutions and products [1,2,3]. Estimation, in particular, is
challenging for OSS projects due to the collaborative nature of OSS, which is developed and maintained by distributed
communities of contributors from all over the world. According to Asundi [4], planning and delivering projects based
on an Open Source (OS) distributed community is challenging since resource allocation and budgeting lack a rigorous
basis: there is generally no formal project management, budget or schedule. For this reason, he argues that using
existing effort estimation models for OSS projects has many disadvantages and therefore, new models are needed. In
particular, he notes that information about when functionalities will be available is of major interest to organizations
that have adopted OSS as their business strategy.
There is a large number of studies on software effort estimation but very few discuss effort and duration in OSS
projects. Effort data is not available in OSS repositories and the categorization of contributors and the size and choice of
the datasets also present a challenge for research on OSS projects estimation. To date, in OSS effort estimation studies
researchers have had to use substitutes for the missing effort variable, which substitutes had a very weak
methodological basis. However, for the duration estimation, this variable can be directly computed from projects
calendar dates. Therefore, this research investigates exclusively OSS duration estimation models.
An OSS source code management repository is an online community platform that makes project history data