Translation- and projection-based unsupervised coreference resolution for Polish Maciej Ogrodniczuk Institute of Computer Science, Polish Academy of Sciences Abstract. Creating a coreference resolution tool for a new language is a challenging task due to substantial effort required by development of associated linguistic data, regardless of rule-based or statistical nature of the approach. In this paper, we test the translation- and projection-based method for an inflectional language, evaluate the result on a corpus of general coreference and compare the results with state-of-the-art solu- tions of this type for other languages. 1 Introduction A widely known problem of coreference resolution — the process of “determining which NPs in a text or dialogue refer to the same real-world entity” [1], crucial for higher-level NLP applications such as text summarisation, text categorisa- tion and textual entailment — has so far been tackled from many perspectives. However, there still exist languages which do not have state-of-the-art solutions available, which is most likely caused by the substantial effort required by de- velopment of language resources and tools, some of them knowledge-intensive, either leading to development of language-specific rules or preparation of training data for statistical approaches. One of the solutions to this problem is following the translation-projection path, i.e., (1) translating the text (in the source language) to be coreferentially annotated into the target language, for which coreference resolution tools are available, (2) running the target language coreference resolver, (3) transferring the produced annotations (mentions — discourse world entities and clusters — sets of mentions referring to the same entity) from the target to the source language. Such a solution has so far been proposed e.g. by Rahman and Ng [2] and evaluated for Spanish and Italian with projection from English (see Section 2). Although the source and target languages in this setting come from two different language families, they differ markedly from inflectional languages such as Polish, which makes the approach interesting to test with different language pairs. The work reported here was carried out within the Computer-based methods for coref- erence resolution in Polish texts (CORE) project financed by the Polish National Science Centre (contract number 6505/B/T02/2011/40) and University Research Program for Google Translate.