International Journal of Future Generation Communication and Networking
Vol. 10, No. 11 (2017), pp.19-36
http://dx.doi.org/10.14257/ijfgcn.2017.10.11.03
ISSN: 2233-7857 IJFGCN
Copyright © 2017 SERSC Australia
Discovery of Entity Synonym Using Anchor Text and URLs
1
Mamta Kathuria
1
, Anurahda Singh
2
, C. K. Nagpal
3
and Neelam Duhan
4
1
Assistant Professor, YMCA University of Science & Technology,
Faridabad (India)
2
Student (M.Tech), YMCA University of Science & Technology,
Faridabad (India)
3
Professor, YMCA University of Science & Technology, Faridabad (India)
4
Assistant Professor, YMCA University of Science & Technology,
Faridabad (India)
1
mamtakathuria@ymcaust.ac.in,
2
anuradhasngh13@gmail.com,
3
nagpalckumar@rediffmail.com,
4
neelam.duhan@gmail.com
Abstract
In the current scenario, the web queries have become more and more pin-pointed so as
to find results relating to specific entity in a specific context of time, place, etc. For
example, information pertaining to a movie-show, a particular train, newspaper of a
particular date, performance of a particular stock etc. All these references associated
with a particular entity are known as entity references. The problem with these references
is that they vary with the heterogeneous contexts of the web and one may not be getting
the required answers to his/her query owing to these varied entity references known as
entity synonyms. These entity synonyms cannot be handled through lexical resources like
WordNet [1]. Therefore, every search engine will have to create its own mechanism for
finding the entity synonyms of a particular entity in order to properly answer the users’
queries, the process being known as entity resolution. In recent past, many researchers
have tried to devise the mechanisms to generate the entity synonyms. This paper is also an
effort in this direction and creates a rich set of entity synonyms for a given entity using
inbound anchor text and URLs.
Keywords: Entity, Candidate Entity Synonym, Web Query, Inbound Anchor text, Entity
Synonym Extraction
1. Introduction
With the growth of the web, users have been making diverse forms of queries relating
to a variety of domains concerned with daily life issues. These queries associated with
products, brands, recipes, weather forecast, show timings, quotes for various products etc.
are being searched by common users to accomplish their daily needs. These searches
related to entities can be best sorted out from the latest product catalogs and associated
databases if the references are specific. However, if the references are general and refer to
common entities, then product catalogs and databases may not be available. When the
entities are well known, the synonyms can be found by the usage of sources like
Wikipedia [2] and FreeBase [3]. For instance, the Bhabha Atomic research Center may be
referred as Bhabha Institute, BARC, Atomic Energy Center, and Nuclear Energy Center
etc. However, for the common entities these online resources may not work.
The problem with these generic entities is their multiple types of references to the same
entity due to different creators of the web pages. For example, a paper like The Hindustan
Times may be referred as The HT, HT, The Hindustan, The Hindustan Times Today, and
Received (May 15, 2017), Review Result (August 31, 2017), Accepted (September 20, 2017)