Asian Journal of Convergence in Technology Volume IV Issue I ISSN NO: 2350-1146 I.F-5.11 www.asianssr.org Mail: asianjournal2015@gmail.com Survey of two different Approaches for Named Entity Recognition. Based on Natural Language Processing. Prof. Sarita Rathod, Samriddhi Jain, and Vijeta Shah Department of Information Technoogy, K. J. Somaiya Institute of Engineering and Information Technology, Mumbai, India vijeta.shah@somaiya.edu Abstract— Named Entity Recognition[NER] refers to a data extraction task that is responsible for finding, storing and sorting textual content into pre-defined categories such as the name of a person, organizations, locations, expression of time, quantities, monetary values, and percentages. Named Entity Recognition can be implemented using two different approaches such as Rule Based Approach and Statistical Based Approach. This Project does a comparative study of these two approaches on various types of inputs on the named entities like name of person, organization, and location and analyzes the outcome on the basis of parameters such as Recall, Precision, and F-Measure and determines whether the Rule Based Approach or the Statistical Based Approach should be implemented for better performance and efficiency in Named Entity Recognition. Keywords—Named Entity Recognition, ANNIE, CRF, Recall, Precision, F-Measure. I. INTRODUCTION A. Natural language processing: Natural Language Processing [NLP] is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computer and human (natural language), and, in particular, concerned with programming computers to fruitfully process large corpora. The ultimate goal of the natural language processing is to build software that will analyze, understand, and generate human languages naturally, enabling communication with a computer as if it were a human itself [5]. B. Named entity recognition: The term “Named Entity”, which was first introduced by Grishman and Sundheim, is widely used in Natural Language Processing. Named entity recognition is a sub-task of information extraction that seeks to locate and classify named entities (recognizing proper nouns) in text into pre-defined categories such as names of person, organization, location, expression of times, monetary values, percentages, and quantities, etc. The researchers were focusing on extracting structured information from the unstructured text like newspaper articles. Not only is named entity recognition a subtask of information extraction, but it also plays a vital role in reference resolution, other types of disambiguation, and meaning representation in other natural language processing applications. Semantic parsers, part of speech taggers, and thematic meaning representations could all be extended with this type of tagging to provide better results [3]. II. APPLICATIONS Named Entity Recognition and Extraction is important to solve most problems in hot research areas such as Question Answering and Summarization Systems, Information Retrieval, Machine Translation, Video Annotation, Ontology Learning, Semantic Web Search and Bio-Informatics. Named Entity Recognition involves two tasks, which is firstly the identification of proper nouns in text, and secondly the classification of these entities into set of pre-defined categories of interests, such as person names, organizations (companies, government organizations, committees,etc.), locations (cities, countries, rivers, etc.), date and time expressions, etc [5]. III. NAMED ENTITY The term “Named Entity” was introduced in the sixth Message Understanding Conference (MUC-6). In fact, the MUC conferences were the events that have contributed in a decisive way to the research of this area. It has provided the benchmark for named entity systems that perform a variety of information extraction tasks. In MUC-6, Named Entities (NEs) were categorized into three types of labels, each of which uses specific attribute for a particular entity type. Entities and their labels were defined as follows: