Analogical Reasoning with Knowledge-based Embeddings Douglas Summers-Stay, Dandan Li U.S. Army Research Laboratory douglas.a.summers-stay.civ@mail.mil, happydandan2016@gmail.com Abstract For robots to interact with natural language and handle real- world situations, some ability to perform analogical and as- sociational reasoning is desirable. Consider commands like ”Fetch the ball” vs. ”Fetch the wagon”, the robot needs to know that carrying a ball is (in the appropriate sense) analogous to dragging a wagon. Without the ability to per- form analogical reasoning, robots are incapable of gener- alizing in the ways that true natural language understand- ing requires. Inspired by implicit Verlet integration methods for mass spring systems in physics simulations, we present a novel knowledge-based embedding method in this paper, where distributional word representations and semantic re- lations derived from knowledge bases are incorporated. We use some SAT-style analogy questions to demonstrate poten- tial feasibility of our approach on the analogical reasoning framework. Introduction Attempts at analogical reasoning using knowledge bases have only been successful in very limited, carefully arranged scenarios. Semantic vector representations, in contrast, have shown a surprisingly sophisticated level of analogy forma- tion. In a semantic vector space, concepts that are related to each other in meaning have vectors that are nearby (in an appropriate distance metric, such as cosine similarity). This property directly gives rise to their ability to form analogies. Consider the analogy sculptor:chisel::painter:paintbrush, sculptor is related to other types of artist, and it is also related to terms having to do with stone. The fact that it must be near both these terms forces the word to be found somewhere near the midpoint of a line connecting the vec- tors for artist and stone: sculptor ≈ (artist + stone)/2. In the cosine similarity metric, division by a constant can be neglected, so we can just write sculptor ≈ artist + stone. Performing a similar decomposition on the other terms and neglecting some noise-like variation, we get the new anal- ogy artist+stone : tool+stone :: artist+paint : tool+paint. This new representation makes it easy to see that adding the middle two terms and subtracting the first term: [tool+stone] + [artist+paint] - [artist+stone] must equal the fourth term [tool+paint]. Note that this isn’t the only decomposition we Copyright c 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. could have performed. A different analogy involving sculp- tor might depend on the ability to decompose sculptor into stonecarving+worker, or some other decomposition. Addi- tionally, these analogical relations should hold from any vec- tor representation where semantically similar entities have similar vectors. This means that we could potentially use vectors derived from weights in deep learning networks trained on vision, depth, or other modalities, as well as lan- guage. Relational knowledge is commonly stored in knowledge bases such as ConceptNet as triples of the form head, rela- tion, tail. Knowledge extraction tools are also capable of ex- tracting such triples directly from natural language sources. The semantic arithmetic above can be restated as a geomet- ric constraint: the vector connecting two terms that share a particular relation should be approximately equal to the vec- tor connecting two other terms that share the same relation. This suggests a way of incorporating knowledge base triples into a semantic vector space– match the knowledge base en- tities to their corresponding vector representations, and en- sure that the relation vector connecting the head entity to the tail is similar for all examples of the relation. The method we explore in this paper begins with a set of distributional semantic vectors and attempts to modify these vectors such that the constraint holds. Related Work Vector space models have a long, rich history in the field of natural language processing, where each word is repre- sented as a real-valued vector in a continuous vector space and the relationships between words can be encoded by vec- tor operations. There are mainly three families for learn- ing word vectors: (1) global matrix factorization methods, such as latent semantic analysis, which generates embed- dings from term-document matrices by singular value de- composition (Deerwester et al., 1990). (2) neural network models, such as the skip-gram and continuous bag of words models of (Mikolov et al., 2013a, 2013b), referred to as word2vec, which learn embeddings by training a network to predict neighboring words. Mikolov et al.(2013c) demon- strate that the embeddings created by a recursive neural network encode not only attributional similarities between words, but also similarities between pairs of words. (3) knowledge graph embeddings: there are three main types of