Information Sciences 518 (2020) 56–70
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
CAGE: Constrained deep Attributed Graph Embedding
Debora Nozza
∗
, Elisabetta Fersini, Enza Messina
DISCo, University of Milano-Bicocca, Viale Sarca, 336 -20126 Milan, Italy
a r t i c l e i n f o
Article history:
Received 3 June 2019
Revised 15 November 2019
Accepted 30 December 2019
Available online 31 December 2019
Keywords:
Deep learning
Representation learning
Graph embedding
Attributed graph
a b s t r a c t
In this paper we deal with complex attributed graphs which can exhibit rich connectivity
patterns and whose nodes are often associated with attributes, such as text or images. In
order to analyze these graphs, the primary challenge is to find an effective way to repre-
sent them by preserving both structural properties and node attribute information. To cre-
ate low-dimensional and meaningful embedded representations of these complex graphs,
we propose a fully unsupervised model based on Deep Learning architectures, called Con-
strained Attributed Graph Embedding model (CAGE). The main contribution of the pro-
posed model is the definition of a novel two-phase optimization problem that explicitly
models node attributes to obtain a higher representation expressiveness while preserving
the local and the global structural properties of the graph. We validated our approach on
two different benchmark datasets for node classification. Experimental results demonstrate
that this novel representation provides significant improvements compared to state of the
art approaches, also showing higher robustness with respect to the size of the training
data.
© 2020 Elsevier Inc. All rights reserved.
1. Introduction
Real-world data are often characterized by an underlying relational structure, usually represented by graphs. Social and
communication networks, citation networks, transport and utility networks are only some of the most common examples
where we can observe complex relational interactions among a potentially large number of entities.
Efficient and scalable approaches for handling these large, complex and sparse graphs regard the learning of graph rep-
resentations, or Graph Embeddings [2,12], aimed at creating low-dimensional and meaningful representation of nodes by
observing different graph properties. This permits to effectively apply off-the-shelf machine learning algorithms designed
for handling vector representations on rich relational data for solving a wide variety of data analytics problems [6,38].
At the state of the art, most of the graph Representation Learning approaches derive graph embeddings by preserving
the relational structure [26,31,33]. However, they disregard the fact that in real-world domains the nodes in a graph are
often associated with a rich set of features or attributes (e.g. text, image, audio), and therefore they would be modeled by
the so-called attributed graphs [37].
Capturing also the attribute information could be of paramount importance, especially when nodes are not structurally
related but they are similar looking at their attributes. Starting from the primary source of information given by the rela-
tional structure, the creation of graph embeddings can take advantage of attributes to enrich the knowledge about nodes
and in particular when the graph is sparse and with noisy connections.
∗
Corresponding author.
E-mail addresses: debora.nozza@unimib.it (D. Nozza), elisabetta.fersini@unimib.it (E. Fersini), enza.messina@unimib.it (E. Messina).
https://doi.org/10.1016/j.ins.2019.12.082
0020-0255/© 2020 Elsevier Inc. All rights reserved.