Minimizing Ambiguity in Natural Language Software Requirements Specification Ashfa Umber Department of Computer Science & IT The Islamia University of Bahawalpur Bahawalpur, Pakistan ashfaumber@yahoo.com Imran Sarwar Bajwa School of Computer Science University of Birmingham Birmingham, UK i.s.bajwa@cs.bham.ac.uk Abstract—Software requirements are typically captured in natural languages (NL) such as English and then analyzed by software engineers to generate a formal software design/model (such as UML model). However, English is syntactically ambiguous and semantically inconsistent. Hence, the English specifications of software requirements can not only result in erroneous and absurd software designs and implementations but the informal nature of English is also a main obstacle in machine processing of English specification of the software requirements. To address this key challenge, there is need to introduce a controlled NL representation for software requirements to generate accurate and consistent software models. In this paper, we report an automated approach to generate Semantic of Business Vocabulary and Rules (SBVR) standard based controlled representation of English software requirement specification. The SBVR based controlled representation can not only result in accurate and consistent software models but also machine process able because SBVR has pure mathematical foundation. We also introduce a java based implementation of the presented approach that is a proof of concept. Keywords-Software Requirement Specifications; Natural Lanaague Processing; SBVR I. INTRODUCTION It is a typical practice that software requirements are specified in natural languages (NL). It is a common knowledge that 71.80% of the software requirements specifications are captured in NL [1]. However, the natural languages are intrinsically ambiguous. For automated software modeling, impervious and explicit software requirements are a primary necessity as computers cannot accurately process ambiguous requirements. A few scientists have proposed various approaches to identify and measure the typical ambiguities in NL based software requirements specifications (SRS) e.g. Kiyavitskaya et al. [3] presented a couple of tools to identify ambiguous sentence in a NL SRS document and find the reason of ambiguity. Similarly, Popescu et al. presented a tool Dowser [4] to identify ambiguous and inconsistent sentences in a NL SRS. However, a drawback of the used approach is that input should be in a constrained language, and this pitfall makes the approach impractical. According to our knowledge, there is no appropriate approach or tool that can provide an automatic procedure of minimizing or removing ambiguity in NL SRS. In this paper, we aim to present an approach capable of automatically generating an unambiguous and semantically consistent representation of SRS specified in English language. To achieve a semantically controlled representation, we propose the use of Semantic of Business Vocabulary and Rules (SBVR) 1.0 [4]. SBVR is an OMG standard, initially presented to assist business requirements specifiers and analyzers. In [5], we presented that similar to business requirement, the software requirements can be captured and specified using SBVR syntax. In this paper, we propose the use of SBVR to overcome the typical ambiguities in a natural language. The SBVR incorporate not only ability of generating accurate and consistent software representation but also provides capability of machine processing as SBVR is based on mathematical or higher order logic [4]. The presented approach is also implemented in Java. The performance of the tool is evaluated by solving a case study, presented in section 4. The remaining paper is structured into the following sections: Section 2 states preliminaries of the presented research. Section 3 presents the framework for translation of English to SBVR representation. Section 4 presents a case study. The evaluation of our approach is presented in section 5. Finally, the paper is concluded to discuss the future work. II. SEMANTIC BUSINESS VOCABULARY AND RULES In 2008, OMG presented a new standard Semantic Business Vocabulary and Rules (SBVR) [4]. SBVR supports capturing of requirement in a controlled natural language. There are various controlled natural languages such as Attempto but we have used SBVR due to following reasons: SBVR is a standard. Latest available version is 1.0. SBVR is easy to read and understand for human beings as SBVR uses syntax of natural languages e.g. English. SBVR is easy to machine translate as it is based on higher logic such as First Order Logic (FOL).