2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Locally Constructing Product Taxonomies from Scratch Using Representation Learning Mayank Kejriwal, Ravi Kiran Selvam Information Sciences Institute University of Southern California Marina del Rey, CA {kejriwal,rselvam}@isi.edu Chien-Chun Ni, Nicolas Torzec Verizon Media {chien-chun.ni,torzecn}@verizonmedia.com Abstract—Given a domain-specific set of concepts, local tax- onomy construction (LTC) is the problem of ‘locally’ inducing the neighborhood of a concept (from the set of target concepts) without being given any example links. The problem, despite having practical importance, has received little research attention due to its difficulty (in contrast with link prediction, a problem that resembles it and has undergone broad study). In this paper, we present a formalism and deep empirical study on the LTC problem. In particular, we show that an innovative application of representation learning approaches from the natural language community could be adapted to tackle the problem, often quite effectively. We also present a detailed information retrieval (IR)- based methodology for evaluating these solutions on three real- world product datasets of varying sizes. To the best of our knowledge, this is the first paper to introduce the LTC problem, especially for e-commerce applications, and offer effective, nearly unsupervised, solutions, for addressing it on real-world data. Index Terms—Taxonomy Induction, Local Taxonomy Con- struction, Concept Ranking, Information Retrieval, Representa- tion Learning, E-Commerce I. I NTRODUCTION Frequently, in many domains, website designers and builders of recommendation systems start from a set of semantic categories or concepts that needs to be compiled into a proper taxonomy. For example, as shown in Figure 1, a clothing retailer may start with a catalog of product ‘concepts’ (such as Overalls and Dresses), but needs to impose a structure such as on the right to better organize and understand her domain. In its most general form, this problem is known as taxonomy induction [1], [2]. For example, in the e-commerce domain, price shopping and comparison websites pull in product categories (‘concepts’) from multiple websites by the thousands. Some kind of relational ordering between these concepts is necessary, both for developing a deeper understanding about the domain, but also for building practical products such as websites and catalogs that make for Fig. 1: An illustration of the taxonomy induction problem, using real-data from the Google Product Taxonomy (Section V). an intuitive and satisfying user experience. Such a taxonomy could even serve as a simpler version of, or even the backbone to, a final ontology that is more ‘graph-like’ and contains other ontological components such as constraints. A knowledge engineer would not have to begin from scratch in constructing the ontology, but could instead start from the taxonomy as a baseline domain model. Manually building such taxonomies is difficult since, in real-world problem settings, there could be thousands of concepts to organize. The total number of possible taxonomies is exponential in this number. Unlike the traditional link prediction problem in social networks and knowledge graphs [3], the problem of inducing such a taxonomy given only a set C of concepts (hereby referred to as the concept-set) is a difficult problem because it falls under a class of machine learning problems that have to work without any examples. The best known examples of these problems are clustering-based applications such as community detection. However, taxonomy induction is different, since we have to discover a set of highly localized links for each concept. Another way to understand the difference between clustering and taxonomy induction is that, for the former, the number of clusters is often a small constant number (almost never more than 100, and far fewer than the data points, which can sometimes number in the tens of thousands or even millions) while the number of links that have to be inferred in a taxonomy induction setting is a multiple of the number of concepts. In this paper, we address a simpler, but still important, version of the global taxonomy induction problem called local taxonomy construction (LTC). We state this problem as follows Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. IEEE/ACM ASONAM 2020, December 7-10, 2020 978-1-7281-1056-1/20/$31.00 © 2020 IEEE 507