Statistics about Data Shape Use in RDF Data
Sven Lieber, Ben De Meester, Anastasia Dimou, and Ruben Verborgh
Ghent University – imec – IDLab,
Department of Electronics and Information Systems,
Technologiepark-Zwijnaarde 122, 9052 Ghent, Belgium
{firstname.lastname}@ugent.be
Abstract. Statistics about constraint use in RDF data bring insights in
common practices to address data quality. However, we only have such
statistics for OWL axioms, not for constraint languages, such as SHACL
or ShEx, that have recently become more popular. We extended previous
work on axiom statistics to provide evidence of constraint type use. In
this poster
1
we present preliminary statistics about the use of SHACL
core constraints in data shapes found on GitHub. We found that class,
datatype and cardinality constraints are predominantly used, similar to
the dominant use of domain and range in ontologies. Less-used constraint
types need further attention in visualization or modeling tools to address
data quality issues. More constraints of SHACL but also ShEx need to be
included to deepen the understanding. Data quality researchers and tool
designers can make informed decisions based on the provided statistics.
Keywords: SHACL · Statistics · RDF · Constraints · Montolo
1 Introduction
Recently, RDF constraint languages, such as SHACL [5] or ShEx [7], have been
developed to model restrictions in the form of constraints on data. Statistics for
OWL ontologies showed that only a subset of possible axioms are commonly
used [6], but such evidence does not yet exist for constraints which poses a gap
and leaves users to anticipate possible use cases or cover whole specifications.
Insights about used constraint types can be taken from generated constraints
or curated repositories. Astrea [3] and OSLO [4] which generate shapes from
existing sources cover specific subsets of SHACL, but this is due to limited
mapping and not because of evidence of broad use. To the best of our knowledge,
only small repositories of SHACL constraints with less than 5 entries exist
23
.
In this poster paper, we present preliminary statistics generated by a con-
straint type extension of our Montolo framework [6] to collect RDF Data Cube
compliant statistics about axiom use. Following the same approach, we used the
1
Copyright ©2020 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0).
2
https://schreckl.inspirito.de/
3
http://shacl-play.sparna.fr/catalog