Study of Existing Method of Finding Aliases and Improve Method of Finding Aliases from Web Ravi H. Gedam Prof. T.Yengantiwar Prof. P.Velavan Student Faculty Faculty Department of CSE Department of CSE Department of CSE T.G.P.C.E,Nagpur T.G.P.C.E,Nagpur T.G.P.C.E,Nagpur Abstract— We know that the person might be referred by his nickname and we finds the information using internet. Extracting information from the internet not an easy task because the same name having different entity or same entity might be referred by different name like name and its nickname. So we should know the nickname to find out the information about particular entity. We know that various sites of web contain aliases of popular celebrity in various field like music, cinema, sport etc., and does not contain alias information about common man. There is one existing methods of extracting aliases through lexical pattern based retrieval tested using real-world name-alias pairs using training data related to limited domains. In this paper, we discuss what are the problem arrive while retrieving the information, and how we should correct alias extraction in Inter-disciplinary fields. Keyword : Retrieval ,semantic metadata 1. INTRODUCTION Internet provide www (world wide web) a easy way to retrieve an information of a celebrity .huge amount of information can be access from the web. [1].But it can be tedious when a celebrity referred by nickname or instance, a popular cinema artiste original name Amitabh Bachhan is referred by different alias names like “Super Star”, “Kuli”, ”Shehenshah”, “KBC”, “BIG B”, “Lambuji”, and much more. So ,We will not be able to retrieve all information about the artiste from the web, unless we extract the top ranked alias names. Here, different entities can share the same name called lexical ambiguity. On the other hand, a single entity can be designated by multiple names (i.e., referential ambiguity). A real-world example is alias name “Mahi” refers to Mahendra singh Dhoni another actor in the same domain of expertise. This problem is solved by semantic Meta data for entities and automatic extraction of Meta data [3] can accelerate the process of semantic annotation. Matsuo, and Ishizuka proposed a method to extract aliases from the web for a given personal name. They have used lexical pattern Approach to extract candidate aliases. The incorrect aliases have been removed by page counts, anchor text co-occurrence frequency, and lexical pattern frequency. However, this method considered only the first order co-occurrences on aliases to rank them but did not focus on the second order co-occurrences to improve recall and achieve a substantial MRR for the web search engine.[1]For named entities, automatically extracted aliases can serve as a useful source of Meta data, thereby providing a means to disambiguate an entity. Identifying aliases of a name are important for extracting relations among entities. For example, Matsuo et al[4] propose a social network extraction algorithm in which they compute the strength of relation between two individuals X and Y by the web hits for the on junctive query , “X “ and “Y”. However, both persons X and Y might also appear in their alias names in web contents. Consequently, by expanding the conjunctive query using aliases for the names, a social network extraction algorithm can accurately compute the strength of a relationship between two persons. 2.RELATED WORK Our research is headed towards building a web extraction system which extracts efficient patterns for Indian name aliases and further this system can be adapted to various fields. Alias extraction is basically an information retrieval task [IR], which looks for similar, preceding, succeeding, adjacent, lexico-syntactic, supervised co-occurring text from a large cluster of documents. The main function of information retrieval is to build a term-weighting system [39] which will enhance the retrieval effectiveness. Two measures are normally used to assess the ability of a system to retrieve the relevant and reject the non-relevant items of a collection, which is known as Recall and Precision respectively. Determining recall and precision is the significant accuracy measure of any information retrieval task in web and holds good for alias extraction too. Below we will discuss further on various techniques viz., Word Association Norms and lexicography, collocation extraction in natural language processing, cross-document co-reference resolution, Duplicate Detection Using learnable String Similarity Measures, Unsupervised clustering to identify the referents of personal names, self annotating web, people searching strategies in World Wide Web, Disambiguating web appearances of people in a social network, ‘PolyPhonet’-a Social network system, disambiguating name sakes, approximate string matching method, mnemonic extraction, approximate name matching using finite state graphs, measuring semantic similarity between words, Weps-2 Evaluation campaign and Web mining for Alias extraction which has been used in this area. 2.1 Word Association Norms and Lexicography In linguistics, it is a general practice to classify words not only on the basis of their meaning but also on the basis of their co-occurrence with other words. The word ’bank’ has dual meaning with respect to the association of adjacent words and expressions. For instance words such as, Ravi H. Gedam et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3) , 2014, 2963-2973 www.ijcsit.com 2963