Cross-lingual named entity extraction and disambiguation Tadej Štajner 1,2 , Dunja Mladenić 1,2 1 Artificial Intelligence Laboratory, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International Postgraduate School, Ljubljana, Slovenia tadej.stajner@ijs.si Abstract. We propose a method for the task of identifying and disambiguation of named entities in a scenario where the language of the input text differs from the language of the knowledge base. We demonstrate this functionality on English and Slovene named entity disambiguation Keywords: Natural language processing, knowledge management, multilingual information management, cross-lingual information retrieval 1 Introduction Since a lot of our world’s knowledge is present in textual format in multiple languages rather than a more explicit or language-neutral format, an interesting challenge is automatically integrating texts with structured and semi-structured resources, such as knowledge bases, collections of entities having various properties, such as labels and textual descriptions. Recent work focuses on the fact that all of this knowledge can be spread over many languages [6]. While Wikipedia, the free encyclopaedia, is a famous example, the same problem is applicable on many domains where text is present in multiple languages. In the domain of cross- lingual text annotation, we focus on the tasks of entity extraction and disambiguation (NED). We demonstrate a multilingual named entity extraction and disambiguation pipeline, operating for English and Slovene in order to demonstrate the capability of re-using language resources across languages within the Enrycher system [8]. 1 Motivation Many machine translation systems are not aware of named entities and special handling that is often required for them, and instead simply attempt to literally translate them. This often results in errors, for instance in Google Translate