Eurographics Conference on Visualization (EuroVis) 2018 J. Heer, H. Leitte, and T. Ropinski (Guest Editors) Volume 37 (2018), Number 3 Interactive Analysis of Word Vector Embeddings F. Heimerl 1 and M. Gleicher 1 1 Department of Computer Sciences, University of Wisconsin–Madison, USA Abstract Word vector embeddings are an emerging tool for natural language processing. They have proven beneﬁcial for a wide variety of language processing tasks. Their utility stems from the ability to encode word relationships within the vector space. Applications range from components in natural language processing systems to tools for linguistic analysis in the study of language and literature. In many of these applications, interpreting embeddings and understanding the encoded grammatical and semantic relations between words is useful, but challenging. Visualization can aid in such interpretation of embeddings. In this paper, we examine the role for visualization in working with word vector embeddings. We provide a literature survey to catalogue the range of tasks where the embeddings are employed across a broad range of applications. Based on this survey, we identify key tasks and their characteristics. Then, we present visual interactive designs that address many of these tasks. The designs integrate into an exploration and analysis environment for embeddings. Finally, we provide example use cases for them and discuss domain user feedback. CCS Concepts •Visualization → Information Visualization; Visual Analytics; •Artiﬁcial Intelligence → Natural Language Processing; 1. Introduction Word embeddings are mathematical models that encode word re- lations within a vector space. They are created by an unsuper- vised training process based on co-occurrence information between words in a large corpus. The encoded relations include seman- tic and syntactic properties of words. For example, word embed- dings have been shown to reveal semantic analogies [MCCD13] and groups of semantically related words [PSM14]. Due to their ability to capture word meaning, embeddings are valuable in many diverse applications. They are particu- larly popular in natural language processing (NLP) applications because of their potential to signiﬁcantly improve accuracy of language processing methods. Examples include text classiﬁca- tion [KSKW15], sentiment analysis [YWLZ17], and natural lan- guage parsing [SBM * 13]. Embeddings have also sparked the in- terest of linguists and researchers from the humanities. For them, word vector embeddings can provide valuable insights into the use and structure of language. Examples include etymological studies of word meanings [HLJ16], and assembling dictionaries [FCB16]. These diverse scenarios come with a variety of challenges to un- derstand and compare embeddings. They range from learning how to interpret similarity in the vector space, to understanding the in- ﬂuence of source corpora on the resulting embeddings. For exam- ple, neighborhood relations are often used to probe embeddings for speciﬁc information. But, there are many reasons words may be close in an embedding. They may have close semantic mean- ings, or similar syntactic roles within sentences in the source data set. In addition, embedding algorithms are non-deterministic and depend on critical input parameters, including the dimensionality of the resulting word vector space. This can lead to quite different embeddings even with identical input data sets, making their inter- pretation [LG14] and evaluation [LHK * 16, BGH * 17] challenging. Interactive visual interfaces are effective for the task of ana- lyzing word vectors embeddings, because (1) the problems to be solved are inherently human-centric and ﬁnding solutions involves enabling expert users to gain a deeper understanding of word em- bedding spaces, for which visualization is a primary tool, and (2) while effective visual encodings can make interesting features read- ily available, human linguistic and domain knowledge is necessary to drive the process and ultimately deﬁne relevant and non-relevant artifacts of the data and are a strong motivation for interactive visu- alization. However, such exploratory processes require tightly inte- grated feedback loops that let users navigate, ﬁlter, and drill down onto aspects relevant to them and their speciﬁc analysis goals. While visualization has been used to analyze word embeddings in the NLP and digital humanities (DH) literature, it is often based on standard dimensionality reduction techniques. Such tools con- vey a rough impression of similarities but fail to serve a broader range of tasks. Recent work [LBT * 17] has identiﬁed speciﬁc tasks in analyzing word vector embeddings and shown that these tasks c  2018 The Author(s) Computer Graphics Forum c  2018 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.