International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.4, July 2013 DOI:10.5121/ijfcst.2013.3408 67 HIDDE N MARKOV MODE L BASE D NAME D ENTITY RECOGNITION TOOL Deepti Chopra 1 , Sudha Morwal 2 and Dr. G.N. Purohit 3 Department of Computer Engineering, Banasthali Vidyapith, (Raj.), INDIA deeptichopra11@yahoo.co.in sudha_morwal@yahoo.co.in gn_purohitjaipur@yahoo.co.in ABSTRACT Named Entity Recognition is the task of recognizing Named Entities or Proper Nouns in a document and then classifying them into different categories of Named Entity classes. In this paper we have introduced our modified tool that not only performs Named Entity Recognition (NER) in any of the Natural Languages, performs Corpus Development task i.e. assist in developing Training and Testing document but also solves unknown words problem in NER, handles spurious words and automatically computes Performance Metrics for NER based system i.e. Recall, Precision and F-Measure. KEYWORDS NER, Transliteration, Unknown words, Performance Metrics 1. INTRODUCTION Named Entity Recognition (NER) is one of the application areas of Natural Language Processing, in which Named Entities are identified and thereafter categorised into different classes of Named Entities. The various classes of Named Entities can be the name of person, location, organization, state, sport, river, city, country, percentage, time, quantity etc. Various applications of NER include: Information extraction, Machine Translation, Question Answering System, Information Retrieval, Automatic Summarization etc. e. g. Consider Training Sentences: Ram/PER is/OTHER a/OTHER intelligent/OTHER boy/OTHER Deepa/PER lives/OTHER in/OTHER Nagpur/CITY Ankit/PER is/OTHER a/OTHER football/SPORT player/OTHER Aabhas/PER plays/OTHER cricket/SPORT In the given above tagged training text in English, ‘PER’ denotes that ‘Ram’, ‘Deepa’,’ Ankit’ and ‘Aabhas’ are the Names of Person. ’Nagpur’ is tagged with ‘CITY’ tag since it is a Name of City. Similarly, ‘football’ and ‘cricket’ are the names of Sport, so they are tagged with ‘SPORT’ tag. The entities that are tagged with ‘OTHER’ tag are not Named Entities. The above tagged sentences are input to HMM Train module that computes HMM Parameters i.e. Start Probability, Transition Probability and Emission Probability. HMM Parameters and Testing sentences are input to the HMM Test module, and using Viterbi Algorithm Named Entities can be derived. If testing sentence in NER is given as: