2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
Locally Constructing Product Taxonomies from
Scratch Using Representation Learning
Mayank Kejriwal, Ravi Kiran Selvam
Information Sciences Institute
University of Southern California
Marina del Rey, CA
{kejriwal,rselvam}@isi.edu
Chien-Chun Ni, Nicolas Torzec
Verizon Media
{chien-chun.ni,torzecn}@verizonmedia.com
Abstract—Given a domain-specific set of concepts, local tax-
onomy construction (LTC) is the problem of ‘locally’ inducing
the neighborhood of a concept (from the set of target concepts)
without being given any example links. The problem, despite
having practical importance, has received little research attention
due to its difficulty (in contrast with link prediction, a problem
that resembles it and has undergone broad study). In this paper,
we present a formalism and deep empirical study on the LTC
problem. In particular, we show that an innovative application
of representation learning approaches from the natural language
community could be adapted to tackle the problem, often quite
effectively. We also present a detailed information retrieval (IR)-
based methodology for evaluating these solutions on three real-
world product datasets of varying sizes. To the best of our
knowledge, this is the first paper to introduce the LTC problem,
especially for e-commerce applications, and offer effective, nearly
unsupervised, solutions, for addressing it on real-world data.
Index Terms—Taxonomy Induction, Local Taxonomy Con-
struction, Concept Ranking, Information Retrieval, Representa-
tion Learning, E-Commerce
I. I NTRODUCTION
Frequently, in many domains, website designers and
builders of recommendation systems start from a set of
semantic categories or concepts that needs to be compiled
into a proper taxonomy. For example, as shown in Figure
1, a clothing retailer may start with a catalog of product
‘concepts’ (such as Overalls and Dresses), but needs to
impose a structure such as on the right to better organize
and understand her domain. In its most general form, this
problem is known as taxonomy induction [1], [2]. For example,
in the e-commerce domain, price shopping and comparison
websites pull in product categories (‘concepts’) from multiple
websites by the thousands. Some kind of relational ordering
between these concepts is necessary, both for developing a
deeper understanding about the domain, but also for building
practical products such as websites and catalogs that make for
Fig. 1: An illustration of the taxonomy induction problem,
using real-data from the Google Product Taxonomy (Section
V).
an intuitive and satisfying user experience. Such a taxonomy
could even serve as a simpler version of, or even the backbone
to, a final ontology that is more ‘graph-like’ and contains other
ontological components such as constraints. A knowledge
engineer would not have to begin from scratch in constructing
the ontology, but could instead start from the taxonomy as a
baseline domain model. Manually building such taxonomies
is difficult since, in real-world problem settings, there could
be thousands of concepts to organize. The total number of
possible taxonomies is exponential in this number.
Unlike the traditional link prediction problem in social
networks and knowledge graphs [3], the problem of inducing
such a taxonomy given only a set C of concepts (hereby
referred to as the concept-set) is a difficult problem because it
falls under a class of machine learning problems that have to
work without any examples. The best known examples of these
problems are clustering-based applications such as community
detection. However, taxonomy induction is different, since
we have to discover a set of highly localized links for each
concept. Another way to understand the difference between
clustering and taxonomy induction is that, for the former, the
number of clusters is often a small constant number (almost
never more than 100, and far fewer than the data points,
which can sometimes number in the tens of thousands or even
millions) while the number of links that have to be inferred
in a taxonomy induction setting is a multiple of the number
of concepts.
In this paper, we address a simpler, but still important,
version of the global taxonomy induction problem called local
taxonomy construction (LTC). We state this problem as follows
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from permissions@acm.org.
IEEE/ACM ASONAM 2020, December 7-10, 2020
978-1-7281-1056-1/20/$31.00 © 2020 IEEE 507