2014 International Conference on Recent Trends in Information Technology 978-1-4799-4989-2/14/$31.00 c 2014 IEEE Application of Natural Language Processing in Object Oriented Software Development Abinash Tripathy Department of Computer Science and Engg. NIT, Rourkela. abi.tripathy@gmail.com Santanu Ku. Rath Department of Computer Science and Engg. NIT, Rourkela. skrath@nitrkl.ac.in Abstract—Software Development Life Cycle (SDLC) starts with eliciting requirement of user in the form of Software Requirement Specification (SRS). SRS document is mostly written in the form of any natural language (NL) that is convenient for the client. In order to develop a right software based on user’s requirements, the objects, methods and attributes needs to be identified from SRS document. In this paper, an attempt is made to develop a methodology, using the concept of Natural Language Processing (NLP) for Object Oriented (OO) Programming System analysis concept, by finding out the class name and its details directly form SRS. KeywordsSoftware Development Life Cycle (SDLC), Software Requirement Specification (SRS), Natural Language Processing (NLP), Natural Language (NL), Parts Of Speech (POS) I. I NTRODUCTION Software Requirement Specification (SRS) document forms the basis of problem analysis between client and developer. SRS needs to be very specific, while serving as a basis, to proceed towards implementation of desired software. It is observed that sometimes SRS is expressed in any natural language as comprehensible by the client; but it may be ambiguous, possibly inconsistent, and probably unmanageably large from the developer‘s planning point of view. Identifying major functionalities from the OO analysis point of view plays an important role in project success because they are extracted from SRS, which is written in an informal style. The use of formal languages like Unified Modeling Language(UML) have been applied to avoid the inherent problems of natural language such as incompleteness and ambiguity [1]. The problem with SRS document is that the text in natural language may be ambiguous, with many possible interpretations. Vagueness may lead to unwanted changes in the delivered system. Hence text in Natural language needs to be processed to get a better understanding from SRS, and with less likelihood of having errors. Earlier programmers used an explanatory model called as build and fix programming style. This style was observed to be very informal and there are no set of rules as to which one is superior. Every programmer himself formulates his own software development techniques solely guided by his expertize and in his own language and style [2]. In recent years, the object-oriented software development style is the accepted style by developers as the present day software development languages are object oriented in nature. Class is the one of the core parameter of Object-oriented concept. Hence, the first step in object-oriented analysis of software is to find out the classes, functions, and the attributes associated with those classes. Natural Language Processing (NLP) combines the effect of computer science and linguistics branch that concern with the interaction between the computer and human languages [3]. Natural Language generation systems mostly convert informa- tion from human readable form to a right kind of database. During the course of the paper, an attempt has been made to analyze the document of SRS, written in any natural language and find out the class diagram and the relationship between them. The method of finding out the candidate for class diagram is done automatically. There are a good number of literature, articles available in which the process of finding out the candidate for class diagram is done manually. But in this paper an attempt is made to replace the manual search of class name with an automated one. The rest of the paper is organized as follows: In section 2 the related work in the field of NLP is presented. In section 3 the methodology is being presented about NLP and different approaches in NLP. In section 4 the proposed approach is being highlighted. Section 5 highlight on the application of proposed scheme on a case study. Section 6 provides a comparison between an existing paper and the present paper. Section 7 concludes the paper and presents the scope for future work. II. RELATED WORK R.J.Abbott proposed an approach to analyse a particular language in Ada, design being based on linguistic analysis of informal strategies written in English [4]. His approach involved developing an informal strategy using natural lan- guage and then formalizing the strategy by identifying the data types, Objects (variables of those types) and operators (applied to those objects). Congruent with the object-oriented approach, Abbott’s work focused on the use of nouns and noun phrases as references in natural language, especially common nouns, proper nouns, mass nouns, and their units of measure. Common nouns suggest data types (i.e., object classes). Proper nouns and references suggest objects. Verbs, attributes, pred- icates, and descriptive expressions suggest operators. Control structures are suggested by English phrases using if, then, else, for, do, until, when, etc. Abbott’s work provided an initial set of heuristics for mapping natural language elements to operands and operators (i.e., objects and their methods).