Dynamics of Software Development Crowdsourcing Alpana Dubey*, Kumar Abhinav † , Sakshi Taneja † , Gurdeep Virdi*, Anurag Dwarakanath*, Alex Kass ‡ , Mani Suma Kuriakose* * Accenture Technology Labs, Bangalore, INDIA † IIIT-Delhi, Delhi, INDIA ‡ Accenture Technology Labs, San Jose, USA Abstract—The emergence of online labor markets has concentrated a lot of attention on the prospect of using crowdsourcing for software development, with a potential to reduce costs, improve time-to-market, and access high-quality skills on demand. However, crowdsourcing of software development is still not widely adopted. A key barrier to adoption is a lack of confidence that a task will be completed on time with the required quality standards. While good managers can develop good, intuitive estimates of task completion when assigning work to their team members, they might lack similar intuition for individuals drawn from an online crowd. The phrase, “Post and Hope” is thus sometimes used when talking about the crowdsourcing of software-development tasks. The objective of this paper is to show the value of replacing the traditional, intuitive assessment of a team’s capability with a quantitative assessment of the crowd, derived through analysis of historical performance on similar tasks. This analysis will serve to transform “Post and Hope” to “Post and Expect.” We demonstrate this by analyzing data about tasks performed on two popular crowdsourcing platforms: Topcoder and Upwork. Analysis of historical data from these platforms indicates that the platforms indeed demonstrate some level of predictability in task completion. We have identified certain factors that consistently contribute to task completion on both the platforms. Our findings suggest that a data-driven decision processes can play an important role in successful adoption of crowdsourcing practice for software development. Keywords— Crowdsourcing; tracking; forecasting; software development; workforce analytics. I. INTRODUCTION Crowdsourcing is gaining a lot of attention in industry as alternative workforce model for software engineering efforts [8, 10, 11, 12]. Crowdsourcing has already been successfully applied to micro-tasks, such as image tagging, translation services etc. [2, 3, 4, 5, 9] in which specific deep skills are not required. In contrast, software development tasks require sustained effort, complex cognitive skills, and training. Therefore, there is more uncertainty when posting a software development task to a crowd, whether someone with the relevant skills will find the task, take it on, and successfully complete in the required timeframe. Making crowdsourced software engineering a significant, industrial-scale success will require addressing several challenges [2, 6, 7]. One of the key challenges is to reliably estimate whether or not a task posted on a crowdsourcing platform will be completed, and beyond that, whether it will be completed to the required quality standard [35]. The situation today is what we refer to as, “Post and Hope,” reflecting the difficulty of estimating how a crowd will handle a task. This doesn’t necessarily imply that the crowd or crowd workers themselves are less reliable. It may actually be more reliable in many cases, but task posters lack a data-driven analysis that could turn Post-and-Hope into something more deterministic, i.e. Post-and-Expect. The high level goal of this paper is to transform the Post-and-Hope model of crowdsourcing to the Post-and-Expect model. To achieve this goal, we investigate whether patterns can be detected in what tasks are successfully completed by a given crowd, and what factors correlate with successful outcomes. Breaking this focus down a bit further gives rise to the following research questions: RQ1: What are the factors that influence task completion? RQ2: What are the factors that influence quality of results? RQ3: Can the probability of task completion be predicted for future tasks based on historical data? RQ4: Can the quality of completed task be predicted from historical data? To answer the above questions we performed an analysis of multiple years’ worth of data from two popular crowdsourcing platforms on which significant amount of software development tasks are posted: Topcoder [11] and Upwork [3]. Topcoder uses a competition-based work model: Tasks are posted as contests and multiple people work on the same task and submit their work. A set of top submitters are selected as winners. Upwork uses a model that is closer to traditional freelancer hiring. People are hired from an online marketplace to perform a task. The results of our data analysis are encouraging. Our experiments with various machine learning algorithms demonstrate that task completion can be predicted with better accuracy compare to the baseline algorithm. This study can be used as a starting point for the organizations who are investigating crowdsourcing as a means of alternative software development workforce. The main contributions of the paper are as follows: 1. An exploratory analysis of various characteristics of the two popular crowdsourcing platforms that follow entirely different engagement models based on 2.5 years of task data. 2. An assessment of the importance of various features for predicting task completion and quality of output, and an approach to predictive analytics based on the data from the two platforms. 2016 IEEE 11th International Conference on Global Software Engineering 2329-6313/16 $31.00 © 2016 IEEE DOI 10.1109/ICGSE.2016.13 49