On the generation of word vectors using dictionaries Pranav Jindal pranavj@stanford.edu Ashwin Paranjape ashwinp@stanford.edu Abhijit Sharang abhisg@stanford.edu Abstract Language dictionaries are a high quality linguistic resource curated by trained pro- fessionals and are key to human learning of language. In our work, we attempt to use the definitions in dictionaries for learn- ing word embeddings by mapping the def- initions to their target word embeddings through an RNN-LSTM model. We ex- periment with two versions of the output embedding : in the first, we keep the out- put embedding representing the term fixed to pre-trained GloVe word vectors, and in the second, we jointly learn the input em- beddings which represent the definitions and the output embeddings which repre- sent the term. We proceed to evaluate the word vectors on various tasks defined and used by existing literature for word embedding evaluation. Our word vec- tors trained on dictionaries perform sig- nificantly better on certain important tasks such as distinguishing similarity from re- latedness. 1 Introduction During the process of learning any language, the dictionary is an indispensable tool for understand- ing words and utilizing them in framing sentences. The definition for each word, which has been writ- ten by a human expert, accurately describes the word’s meaning in different contexts. As a lan- guage learner, one can understand the meaning of a word using other words themselves, and this process unfurls recursively. Even though humans learn word-usage through reading and speaking, the dictionary continues to be an indispensable tool for learning word meanings. In our work, we wish to address the question of whether we can use dictionaries to get continuous vector represen- tation of words which better capture the meaning of the word. Recent work in NLP on representing words and phrases as vectors by training on huge corpora does a fine job in capturing semantic and linguis- tic information, as evaluated on intrinsic evalua- tion tasks like word-analogy in single as well as divergent contexts and extrinsic tasks like Named- Entity recognition and sentiment analysis. We observe that even though large amounts of text might help in capturing word meanings in vec- tors, using the dictionary to supplement training is a natural extension since the word-to-meaning map is explicitly given. Also, capturing simi- larity independently of relatedness/association is hard because most language-based representation- learning models infer connections between con- cepts from their co-occurrence in corpora, and co- occurrence primarily reflects relatedness not sim- ilarity. The dictionary provides a natural resource to overcome this problem. Through this work, we make the following con- tributions to vector space word representations: • We supplement existing state of the art with the definitions made available in the dictio- nary. • We propose a new model for leveraging the fact that words define other words recursively and can hence be used for enriching the pro- cedure for learning word embeddings. • We evaluate the learnt embeddings, compare them with the existing embeddings and high- light the salient features of the new embed- dings in the comparison. 2 Previous work Most approaches to training word vectors involve using large corpora of text to capture embeddings in such a manner that words which are close to each other in the semantic and syntactic space are also close to each other in the vector space.