AELA: an Adaptive Entity Linking Approach Bianca Pereira DERI, NUIG Lower Dangan Galway, Ireland bianca.pereira@deri.org Nitish Aggarwal DERI, NUIG Lower Dangan Galway, Ireland nitish.aggarwal@deri.org Paul Buitelaar DERI, NUIG Lower Dangan Galway, Ireland paul.buitelaar@deri.org ABSTRACT The number of available Linked Data datasets has been in- creasing over time. Despite this, their use to recognise enti- ties in unstructured plain text (Entity Linking task) is still limited to a small number of datasets. In this paper we propose a framework adaptable to the structure of generic Linked Data datasets. This adaptability allows a broader use of Linked Data datasets for the Entity Linking task. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Miscellaneous; I.2.7 [Natural Language Processing]: Text analysis Keywords Entity Linking; Linked Data; Named Entity 1. INTRODUCTION The Entity Linking task is concerned with recognising en- tity mentions in a text, similar to Named Entity Recog- nition, but additionally also to link them with respective record(s) in an external database. Since its advent, Wikipe- dia has played a major role in providing background knowl- edge for this task [2], but attention has recently shifted to- wards using Linked Data (LD) instead. Wikipedia contains semi-structured information and requires an effort to extract the links between entities, on the other hand, LD datasets are already structured and allow a more straightforward way to find entities and relationships between them. Tools such as DBPedia Spotlight [4] and AIDA [6] have been created to benefit as much as possible from LD structure and the amount of data available in the LD cloud. As LD datasets vary in schema, these tools are becoming very spe- cialised for particular datasets. Due to this, only a small number of LD datasets available on the Web is effectively used for the Entity Linking task. A broader use of available LD datasets would allow us to en- able domain-specific identification of entities besides allow- ing the usage of non-public data such as Linked Enterprise Data. Instead of creating a specialised tool for each avail- able dataset we aim to have a general self-adaptive one that can perform Entity Linking with different LD datasets un- der varied schemas. For this we created AELA, an Adaptive Entity Linking Approach. Copyright is held by the author/owner(s). WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil. ACM 978-1-4503-2038-2/13/05. Figure 1: AELA framework 2. OUR APPROACH AELA is a framework consisting of different modules that perform each step needed to conduct the Entity Linking task. Each module was designed to be independent and adaptable to the structure of the given LD dataset. All modules can be seen in Figure 1. LD dataset Selector: The first module is responsible for verifying the suitability of the LD dataset for the Entity Linking task. It verifies if both text and dataset share the same domain (music, films and so on) and performs a qual- ity assessment of the dataset. The assessment is based on three criteria given by [1]: Acces- sibility, Comprehensibility and Validity of Documents. We also created the Data Richness criterion with two indicators: number of classes with Named Entities as their instances and the number of relationships between entities in the dataset. PIN Recogniser: The second module adapts the frame- work with the schema of the selected LD dataset. It detects which classes have Named Entities as their instances and which properties refer to their names. These properties we call PIN (Properties that Identify Names) and each class may have its own set of PIN. To recognise PIN we are assuming that Named Entity la- bels are proper names or acronyms with almost all letters capitalised. For this we applied the method and heuristics presented in [5]. Gazetteer Generator: To recognise the Named Entities in a text, a dictionary of Named Entities is required for mapping the names to LD resources. Thus the Gazetteer Generator transforms the LD dataset dynamically into a dictionary through a Lookup Service. This service performs a series of SPARQL queries to find all the resources that refer to a Named Entity with the name provided as input. Named Entity Mention Recogniser: In order to find each piece of text (potential terms) that mentions a Named Entity, this module uses a sliding window over the text. The dictionary is used to identify names and to link them to a set of candidates in LD resources. 87