Using Data Mining and Recommender Systems to Facilitate Large-Scale, Open, and Inclusive Requirements Elicitation Processes Carlos Castro-Herrera + , Chuan Duan + , Jane Cleland-Huang + , and Bamshad Mobasher * Systems and Requirements Engineering Center + Center for Web Intelligence * DePaul University {ccastroh, duanchuan, jhuang, mobasher}@cs.depaul.edu Abstract Requirements related problems, especially those originating from inadequacies in the human-intensive task of eliciting stakeholders’ needs and desires, have contributed to many failed and challenged software projects. This is especially true for large and complex projects in which requirements knowledge is distributed across thousands of stakeholders. This short paper introduces a new process and related framework that utilizes data mining and recommender technologies to create an open, scalable, and inclusive requirements elicitation process capable of supporting projects with thousands of stakeholders. The approach is illustrated and evaluated using feature requests mined from an open source software product. 1. Problem statement Requirements elicitation is a human-intensive task in which analysts proactively identify the stakeholders‟ needs, wants, and desires, using a broad array of elicitation techniques such as interviews, surveys, brainstorming sessions, Joint Application Design (JAD), and ethnographic studies [4]. Unfortunately, there are numerous accounts of large projects which have failed, primarily due to problems in scaling up the requirements process. This short paper describes a new requirements framework which utilizes data mining and machine learning techniques to address these problems in large-scale systems. In our framework, stakeholdersneeds are gathered using a web-based collection tool. Unsupervised clustering techniques are then employed to identify dominant and cross-cutting themes around which a set of discussion forums are created. Stakeholders are assigned to these forums based upon the needs they have contributed. They then work collaboratively with other members of the forum to transform their needs into more formal requirements. Our framework also utilizes a collaborative recommender system to introduce the concept of serendipity by recommending forums based on the interests of similar stakeholders. These additional recommendations increase the likelihood that critical stakeholders will be placed into relevant forums in a timely manner. The need for this type of recommender system is illustrated through examining the requirements features of open source projects. For example, in SugarCRM, a large open-source customer management system, users create new feature requests by browsing through a list of existing threads and determining whether to submit to an existing thread or to create a new one. An analysis of the resulting threads showed that many users created either a new thread for each feature request, or placed requests into one or two mega-threads. Neither of these approaches is ideal in an online requirements gathering tool, as the resulting threads are either too isolated or too large to effectively support collaborative requirements activities. 2. Forum recommendations Recommender technologies, such as those adopted in our framework, have traditionally been used in information systems to dynamically target content to one or more users and also in e-commerce domains to recommend products to customers [1]. The recommendation problem is typically formulated as a prediction task in which a predictive model is built according to prior training data and then used in conjunction with the dynamic profile of a new user to predict the level of interest of that user in a target item. Recommender systems generally fall into three categories: content-based systems which make recommendations based on semantic content of data