Detecting word level metaphors in Polish Aleksander Wawer, Malgorzata Marciniak, Agnieszka Mykowiecka Institute of Computer Science PAS, Jana Kazimierza 5, 01-248 Warszawa, Poland (axw,mm, agn)@ipipan.waw.pl Abstract The paper addresses an experiment in detecting metaphorical usage of adjectives and nouns in Polish data. First, we describe the data developed for the experiment. The corpus consists of 1833 excerpts containing adjective-noun phrases which can have both metaphorical and literal senses. Annotators assign literal or metaphorical senses to all adjectives and nouns in the data. Then, we describe a method for literal/metaphorical sense classiﬁcation. We use Bi-LSTM neural network architecture and word embeddings of both token- and character-level. We examine the inﬂuence of adversarial training and perform analysis by part-of-speech. Our approach proved successful and an F1 score that exceeded 0.81 was achieved. 1. Introduction Understanding natural language utterances requires solving very many problems on very different levels. In spite of many attempts to solve NLP problems as an end- to-end task, there are still many contexts in which we want to understand words, to combine their meanings into larger schemes, and to add context constraints to sentence mean- ing. At every step, there is a need to resolve ambiguities which are an inherent feature of natural language under- standing. Starting at the word level, many of them have several different meanings, like bat which can mean either a kind of solid stick or a ﬂying mammal. To make the pro- cess of understanding more complicated, and the commu- nication in natural language at the same time more interest- ing and challenging, people “invent” meanings resembling but different from canonical senses, e.g. blue means one of the colours but also sad. These meanings that have be- come very popular are listed in language dictionaries. We nevertheless often use a non-literal combination of words whose listing in dictionaries is difﬁcult and not necessary, as the mechanism used to formulate such expressions is both predictable and highly productive. For example, there are many meanings of raise noted in the Oxford dictionary, but all of them are somehow connected with changing a po- sition in physical space or in some sorts of lists. And then, we have the phrase raise a question which transfers raise from concrete space to an abstract one. Such word usages are generally called non-literal, and in this particular case – metaphorical (Lakoff and Johnson, 2008). An efﬁcient application capable of distinguishing lit- eral from non-literal word occurrences can be very useful in many situations as in web search engines, information extraction modules and document clustering. Technically, the task can be treated as a word sense distinguishing one, but as unsupervised methods are still much less efﬁcient than supervised ones, it is treated more as a classiﬁcation or a sequence labelling task. We adapted a slightly modi- ﬁed approach in our paper – we identify all occurrences of words but only for the nominal and adjectival classes. 2. Related Work Over the last decade quite a lot of work was done on metaphor detection, see (Shutova, 2015). In these many approaches, the metaphor identiﬁcation task was deﬁned variously. One group of papers concerned the classiﬁcation of selected types of phrases (taken in isolation) into those which nearly always have a literal meaning, like brown pencil and those which have only ﬁgurative usage, e.g. dark mood. In this type of task adjective-noun phrases for English ((Tsvetkov et al., 2014), (Gutierrez et al., 2016)) and Polish (Wawer and Mykowiecka, 2017) were explored as well as verb constructions for English (Beigman Kle- banov et al., 2016). Phrases which can have different us- age can be clasiffy only in the wider context. In this ﬁeld of research, some papers present experiments with identiﬁ- cation of the type of a particular phrase occurrence in text, while in other approaches, all words from a given text are classiﬁed into literal or ﬁgurative use. At ﬁrst, mostly supervised machine learning ap- proaches were used in which apart from features derived directly from the data, many additional data resources have been used. Among others, these features included, con- creteness, imageability, WordNet relations, SUMO ontol- ogy concepts, sectional preference information, and syn- tactic patterns. Solutions based on neural nets training were then published. Several new approaches were elab- orated and compared due to the shared task on metaphor identiﬁcation on the VU Amsterdam Metaphor Corpus (Steen et al., 2010) conducted at the NAACL 2018 Work- shop on Figurative Language Processing (Beigman Kle- banov et al., 2018). Participants were given two tasks: the ALL_POS task, in which they had to repeat annotation at word level of every token in the presented test data, and the Verbs task, in which only verb annotation was taken into account. The best performing solution (Wu et al., 2018) used pretrained word2vec embeddings, embedding clusterings and POS tags as input to CNN and Bi-LSTM layers. In our approach, we tested adversarial training with Bi-LSTM layers. 3. Data Description The experiment was performed on a corpus consist- ing of 1833 short pieces of text selected from the NKJP (National Corpus of Polish, (Przepiórkowski et al., 2012)). The corpus is built from over 45,000 tokens including punctuation marks and excerpt delimiters. Each excerpt consists of one to three sentences and the average length is 24.5 tokens. The part-of-speech annotation is done with