Automatic Detection of Nocuous Coordination Ambiguities in Natural Language Requirements Hui Yang 1 Alistair Willis 1 Anne De Roeck 1 Bashar Nuseibeh 1,2 1 Department of Computing The Open University Milton Keynes, MK7 6AA, UK {h.yang, a.g.willis, a.deroeck, b.nuseibeh}@open.ac.uk 2 Lero University of Limerick Limerick, Ireland Basher.Nuseibeh@iero.ie ABSTRACT Natural language is prevalent in requirements documents. How- ever, ambiguity is an intrinsic phenomenon of natural language, and is therefore present in all such documents. Ambiguity occurs when a sentence can be interpreted differently by different read- ers. In this paper, we describe an automated approach for charac- terizing and detecting so-called nocuous ambiguities, which carry a high risk of misunderstanding among different readers. Given a natural language requirements document, sentences that contain specific types of ambiguity are first extracted automatically from the text. A machine learning algorithm is then used to determine whether an ambiguous sentence is nocuous or innocuous, based on a set of heuristics that draw on human judgments, which we collected as training data. We implemented a prototype tool for Nocuous Ambiguity Identification (NAI), in order to illustrate and evaluate our approach. The tool focuses on coordination ambigu- ity. We report on the results of a set of experiments to assess the performance and usefulness of the approach. Categories and Subject Descriptors D.2.1 [Requirements/Specification]: Elicitation Methods, Lan- guage, Methodologies, Tools General Terms Management, Measurement, Performance, Experimentation Keywords Natural language requirements, nocuous ambiguity, coordination ambiguity, machine learning, human judgments 1. INTRODUCTION Natural language (NL) is still prevalent in the vast majority of requirements documents [3]. One important reason for this is that NL can help various stakeholders articulate and communicate requirements during the entire life cycle of the software develop- ment. However, NL requirements also suffer from typical NL problems such as ambiguity. Ambiguity occurs when a single linguistic expression can be interpreted differently by different readers. Ambiguous expressions in requirements can be poten- tially dangerous when they result in poor requirements quality [4]. Our research is motivated by the need to reduce the costs of mis- understandings that can occur during requirements engineering, when these misunderstandings are due to ambiguities in the NL requirements. Our practical goal is to provide a tool to assist writ- ers of requirements documents by alerting them to potentially harmful ambiguities, called nocuous ambiguities [6]. Unlike in- nocuous ambiguities, which tend to be interpreted in the same way by all readers, nocuous ambiguities give rise to different in- terpretations by different readers, thus contributing to misunder- standings between stakeholders. Such a tool needs to highlight those linguistic expressions in requirements that are recognized as nocuous ambiguities, and allow the writers to return to elicitation, or rephrase for the purpose of improving requirements quality, and facilitating effective communication of these requirements among different stakeholders. In earlier work [25], we proposed a general methodology for automatic identification of nocuous ambiguity, which we use to guide our research on two types of ambiguity, coordination ambi- guity [6, 22] and anaphora ambiguity [24]. In contrast to other work, which is intended to resolve ambiguity [5, 18], our research concerns identification of those ambiguities that are likely to lead to misunderstandings between different readers, while discounting those which tend to be interpreted in the same way by different readers despite their surface features. As such, we consider ambi- guity as a property of the relationship between a text and a group of interpreters, rather than a property of a text or expression per se. We also add the categorization of nocuous and innocuous ambiguity depending on the likely distribution of interpretations held by a group of readers of that text. We have observed that not all cases of the ambiguity are actually dangerous: in fact, most remain unnoticed and are resolved to the same interpretation by all stakeholders. Only nocuous ambiguity cases that have a high risk of misunderstanding between different readers are truly dis- ruptive and deserving of further attention. Our previous work [6, 22] focused on coordination ambiguity, a particularly common kind of structural ambiguity, highly preva- lent in requirements documents. We investigated a methodology that used a number of heuristics based on corpus-based statistical information together with human judgments to predict whether a