openCypher: New Directions in Property Graph Qerying Alastair Green Neo4j alastair.green@neo4j.com Martin Junghanns Neo4j & University of Leipzig martin.junghanns@neo4j.com Max Kiessling Neo4j max.kiessling@neo4j.com Tobias Lindaaker Neo4j tobias.lindaaker@neo4j.com Stefan Plantikow Neo4j stefan.plantikow@neo4j.com Petra Selmer Neo4j petra.selmer@neo4j.com ABSTRACT Cypher is a property graph query language that provides expres- sive and efcient querying of graph data. Originally designed and implemented within the Neo4j graph database, it is now being used by several industrial database products, as well as open-source and research projects. Since 2015, Cypher has been an open, evolving language, with the aim of becoming a fully- specifed standard with many independent implementations. We introduce Cypher and the property graph model, and then describe extensions ś either actively being developed or under discussion ś which will be incorporated into Cypher in the near future. These include (i) making Cypher into a fully compositional language by supporting multiple graphs and allowing graphs to be returned from queries; (ii) allowing for more complex patterns (based on regular path queries) to be expressed; and (iii) allowing for diferent pattern matching semantics ś homomorphism, rela- tionship isomorphism (the current default) or node isomorphism ś to be confgured at a query-by-query level. A subset of the proposed Cypher language extensions has already been implemented on top of Apache Spark. In the tutorial, we will present our approach including an in-depth analysis of the challenges we faced. This includes mapping the property graph model to the Spark DataFrame abstraction and the translation of Cypher query operators into relational transformations. The tutorial will conclude with a demonstration based on a real-world graph analytical use case. 1 INTRODUCTION The past few years have seen a marked increase of property graph databases [12] ś such as Neo4j [20], Sparksee and JanusGraph ś in both the industrial and research arenas. Property graphs have become the model of choice for next-generation graph ap- plications 1 . Their use increasingly replaces older approaches to graph data processing such as cross-linked document stores or object-oriented database management systems. Across both research and industry, property graphs have been used in a wide variety of domains, spanning areas as diverse as fraud detection, recommendations, geospatial data, master data management, network and data centre management, authorisa- tion and access control [23], the analysis of social networks [5], bioinformatics [1, 14, 28] and pharmaceuticals [18], software system analysis [9], and investigative journalism [3]. This trend of increased usage of property graphs is grounded in: (i) their ability to operate on multiple large and highly-connected data sets as one graph that enables novel pattern matching and 1 https://db-engines.com/en/ranking/graph+dbms © 2018 Copyright held by the owner/author(s). Published in Proceedings of the 21st International Conference on Extending Database Technology (EDBT), March 26-29, 2018, ISBN 978-3-89318-078-3 on OpenProceedings.org. Distribution of this paper is permitted under the terms of the Creative Commons license CC-by-nc-nd 4.0. graph analytical queries; (ii) their natural ability to cleanly map onto object-oriented or document-centric data models in pro- gramming languages; (iii) their visual nature that helps commu- nication between business, application domain, and technical experts; and (iv) their historical development based on the prag- matic needs of real world application developers. This trend is evidenced by two major factors. The frst is the emergence of Cypher as the de-facto standard declarative query language for property graphs, and the second is the growing number of both industrial and academic software products for property graphs. Since 2015, as part of the openCypher project [22], Cypher has been an open language, and is evolving under the auspices of the openCypher Implementers Group (oCIG), with the aim of becom- ing a fully-specifed standard that can be independently imple- mented. The recently released Cypher 9 reference [21] along with accompanying formal grammar defnitions (EBNF and ANTLR4) and conformance test suite (TCK) ś published under the Apache 2.0 license ś already provide implementers with a solid basis for adopting Cypher. At the time of writing, Cypher is supported by several commercial systems including SAP HANA Graph [24], Agens Graph, Redis Graph, and Memgraph, along with research frameworks including ś in varying degrees of completeness ś Gradoop [11], inGraph [15], Cytosm [25], Cypher for Apache Spark [19] and Cypher over Gremlin. Current developments that are under way include the ability to pass multiple graphs and a table as input to a Cypher query. Moreover, queries will also be able to project and save multiple graphs, and this, coupled with the ability to chain queries to- gether, will render Cypher as the frst graph compositional query language. Following on from this work, complex pattern match- ing and confgurable pattern matching semantics will further increase the utility of Cypher in the very near future. 2 SCOPE OF THE TUTORIAL 2.1 Intended audience This tutorial is aimed at a wide scope of audience, including researchers, students, developers, and industrial practitioners who are interested in the emerging and quickly-evolving area of graph data, databases and languages. All attendees will gain a comprehensive idea of what this feld comprises, as well as the future features and challenges that lie ahead for Cypher, the most-used property graph query language. It is our hope that owing to the many challenges that exist in this area, researchers and students will be motivated to consider this area as a future topic of research. There are no preliminary requirements for this tutorial, as it will be self-contained and commence with the property graph data model and Cypher, thus assuming no prior knowledge of these. Tutorial Series ISSN: 2367-2005 520 10.5441/002/edbt.2018.62