Building a Cross-Language Entity Linking Collection in Twenty-One Languages James Mayﬁeld 1 , Dawn Lawrie 1,2 , Paul McNamee 1 , and Douglas W. Oard 1,3 1 Johns Hopkins University Human Language Technology Center of Excellence 2 Loyola University Maryland 3 University of Maryland, College Park Abstract. We describe an efﬁcient way to create a test collection for evaluat- ing the accuracy of cross-language entity linking. Queries are created by semi- automatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corre- sponding to the name, and projecting the English name onto the non-English doc- ument using word alignments. We applied the technique to produce the ﬁrst pub- licly available multilingual cross-language entity linking collection. The collec- tion includes approximately 55,000 queries, comprising between 875 and 4,329 queries for each of twenty-one non-English languages. Keywords: Entity Linking, Cross-Language Entity Linking, Multilingual Cor- pora, Crowdsourcing. 1 Introduction Given a mention of an entity in a document and a set of known entities, the entity linking task is to ﬁnd the entity ID of the mentioned entity within a knowledge base (KB), or return NIL if the mentioned entity was previously unknown. In the cross-language entity linking task, the document in which the entity is men- tioned is in one language (e.g., Serbian) while the set of known entities is de- scribed using another language (in our experiments, English). Entity linking is a crucial requirement for automated knowledge base population, and can be used to generate metadata about entities that can be used to improve search tasks. Entity linking has been the subject of signiﬁcant study over the past ﬁve years. Pioneering work focused on matching entity mentions to Wikipedia arti- cles [5,7]. Although focused on clustering equivalent names rather than entity linking, the ACE 2008 workshop conducted evaluations of cross-document en- tity coreference resolution in Arabic and English [4] but not across languages. In 2009, the Text Analysis Conference (TAC) Knowledge Base Population track (TAC KBP) conducted a formal evaluation of English entity linking using a ﬁxed set of documents and Wikipedia articles [11]. Shared tasks with a variety of characteristics have since emerged elsewhere, including CLEF [2], FIRE [15],