openCypher: New Directions in Property Graph Qerying
Alastair Green
Neo4j
alastair.green@neo4j.com
Martin Junghanns
Neo4j & University of Leipzig
martin.junghanns@neo4j.com
Max Kiessling
Neo4j
max.kiessling@neo4j.com
Tobias Lindaaker
Neo4j
tobias.lindaaker@neo4j.com
Stefan Plantikow
Neo4j
stefan.plantikow@neo4j.com
Petra Selmer
Neo4j
petra.selmer@neo4j.com
ABSTRACT
Cypher is a property graph query language that provides expres-
sive and efcient querying of graph data. Originally designed
and implemented within the Neo4j graph database, it is now
being used by several industrial database products, as well as
open-source and research projects. Since 2015, Cypher has been
an open, evolving language, with the aim of becoming a fully-
specifed standard with many independent implementations.
We introduce Cypher and the property graph model, and then
describe extensions ś either actively being developed or under
discussion ś which will be incorporated into Cypher in the near
future. These include (i) making Cypher into a fully compositional
language by supporting multiple graphs and allowing graphs to
be returned from queries; (ii) allowing for more complex patterns
(based on regular path queries) to be expressed; and (iii) allowing
for diferent pattern matching semantics ś homomorphism, rela-
tionship isomorphism (the current default) or node isomorphism
ś to be confgured at a query-by-query level.
A subset of the proposed Cypher language extensions has
already been implemented on top of Apache Spark. In the tutorial,
we will present our approach including an in-depth analysis of the
challenges we faced. This includes mapping the property graph
model to the Spark DataFrame abstraction and the translation
of Cypher query operators into relational transformations. The
tutorial will conclude with a demonstration based on a real-world
graph analytical use case.
1 INTRODUCTION
The past few years have seen a marked increase of property graph
databases [12] ś such as Neo4j [20], Sparksee and JanusGraph
ś in both the industrial and research arenas. Property graphs
have become the model of choice for next-generation graph ap-
plications
1
. Their use increasingly replaces older approaches to
graph data processing such as cross-linked document stores or
object-oriented database management systems.
Across both research and industry, property graphs have been
used in a wide variety of domains, spanning areas as diverse as
fraud detection, recommendations, geospatial data, master data
management, network and data centre management, authorisa-
tion and access control [23], the analysis of social networks [5],
bioinformatics [1, 14, 28] and pharmaceuticals [18], software
system analysis [9], and investigative journalism [3].
This trend of increased usage of property graphs is grounded
in: (i) their ability to operate on multiple large and highly-connected
data sets as one graph that enables novel pattern matching and
1
https://db-engines.com/en/ranking/graph+dbms
© 2018 Copyright held by the owner/author(s). Published in Proceedings of the 21st
International Conference on Extending Database Technology (EDBT), March 26-29,
2018, ISBN 978-3-89318-078-3 on OpenProceedings.org.
Distribution of this paper is permitted under the terms of the Creative Commons
license CC-by-nc-nd 4.0.
graph analytical queries; (ii) their natural ability to cleanly map
onto object-oriented or document-centric data models in pro-
gramming languages; (iii) their visual nature that helps commu-
nication between business, application domain, and technical
experts; and (iv) their historical development based on the prag-
matic needs of real world application developers.
This trend is evidenced by two major factors. The frst is the
emergence of Cypher as the de-facto standard declarative query
language for property graphs, and the second is the growing
number of both industrial and academic software products for
property graphs.
Since 2015, as part of the openCypher project [22], Cypher has
been an open language, and is evolving under the auspices of the
openCypher Implementers Group (oCIG), with the aim of becom-
ing a fully-specifed standard that can be independently imple-
mented. The recently released Cypher 9 reference [21] along with
accompanying formal grammar defnitions (EBNF and ANTLR4)
and conformance test suite (TCK) ś published under the Apache
2.0 license ś already provide implementers with a solid basis for
adopting Cypher. At the time of writing, Cypher is supported by
several commercial systems including SAP HANA Graph [24],
Agens Graph, Redis Graph, and Memgraph, along with research
frameworks including ś in varying degrees of completeness ś
Gradoop [11], inGraph [15], Cytosm [25], Cypher for Apache
Spark [19] and Cypher over Gremlin.
Current developments that are under way include the ability
to pass multiple graphs and a table as input to a Cypher query.
Moreover, queries will also be able to project and save multiple
graphs, and this, coupled with the ability to chain queries to-
gether, will render Cypher as the frst graph compositional query
language. Following on from this work, complex pattern match-
ing and confgurable pattern matching semantics will further
increase the utility of Cypher in the very near future.
2 SCOPE OF THE TUTORIAL
2.1 Intended audience
This tutorial is aimed at a wide scope of audience, including
researchers, students, developers, and industrial practitioners
who are interested in the emerging and quickly-evolving area
of graph data, databases and languages. All attendees will gain
a comprehensive idea of what this feld comprises, as well as
the future features and challenges that lie ahead for Cypher, the
most-used property graph query language.
It is our hope that owing to the many challenges that exist in
this area, researchers and students will be motivated to consider
this area as a future topic of research.
There are no preliminary requirements for this tutorial, as it
will be self-contained and commence with the property graph
data model and Cypher, thus assuming no prior knowledge of
these.
Tutorial
Series ISSN: 2367-2005 520 10.5441/002/edbt.2018.62